Hi, Chandni - we are presently dealing with some environment woes due to HDFS issues and amusingly enough I can no longer reproduce this problem. I suspect that this might have been a symptom of deeper cluster issues. If I am able to again reproduce it consistently, I'll let you know, and, now that I know how to provide complete stack logs, I'll be able to provide those as well.
Sent with Good (www.good.com) ________________________________ From: Chandni Singh <chan...@datatorrent.com> Sent: Monday, March 21, 2016 7:29:27 PM To: dev@apex.incubator.apache.org Subject: Re: Stack overflow errors when launching job Hi Ilya, Are you available at 2 pm tomorrow for webex? Chandni On Mon, Mar 21, 2016 at 2:53 PM, Chandni Singh <chan...@datatorrent.com> wrote: > Ilya, > > I have launched the application on our Yarn cluster and I don't see this > happening. > > Chandni > > On Sun, Mar 20, 2016 at 9:43 PM, Ganelin, Ilya < > ilya.gane...@capitalone.com> wrote: > >> Sure thing. If you guys have time tomorrow I can hop on a WebEx. >> >> >> >> Sent with Good (www.good.com<http://www.good.com>) >> ________________________________ >> From: Amol Kekre <a...@datatorrent.com> >> Sent: Sunday, March 20, 2016 12:54:22 PM >> To: dev@apex.incubator.apache.org >> Subject: Re: Stack overflow errors when launching job >> >> Can we get on a webex to take a look? >> >> thks >> Amol >> >> >> On Sat, Mar 19, 2016 at 7:27 PM, Ganelin, Ilya < >> ilya.gane...@capitalone.com> >> wrote: >> >> > I don't think I have any time really to connect to the container. The >> > application launches and crashes almost immediately. Total runtime is 50 >> > seconds. >> > >> > >> > >> > Sent with Good >> > (www.good.com<http://www.good.com<http://www.good.com<http://www.good.com>>) >> > ________________________________ >> > From: Munagala Ramanath <r...@datatorrent.com> >> > Sent: Saturday, March 19, 2016 5:39:11 PM >> > To: dev@apex.incubator.apache.org >> > Subject: Re: Stack overflow errors when launching job >> > >> > There is some info here, near the end of the page: >> > >> > http://docs.datatorrent.com/troubleshooting/ >> > >> > under the heading "How do I get a heap dump when a container gets an >> > OutOfMemoryError ?" >> > >> > However since you're blowing the stack, you may need to manually run >> jmap >> > on the running container >> > which may be difficult if it doesn't stay up for very long. There is a >> way >> > to dump the heap programmatically >> > as described, for instance, here: >> > >> > >> > >> https://blogs.oracle.com/sundararajan/entry/programmatically_dumping_heap_from_java >> > >> > Ram >> > >> > On Sat, Mar 19, 2016 at 2:07 PM, Ganelin, Ilya < >> > ilya.gane...@capitalone.com> >> > wrote: >> > >> > > How would we go about getting a heap dump? >> > > >> > > >> > > >> > > Sent with Good >> > > (<http://>www.good.com<http://www.good.com<http://www.good.com< >> http://www.good.com>>) >> > > ________________________________ >> > > From: Yogi Devendra <yogideven...@apache.org> >> > > Sent: Saturday, March 19, 2016 12:19:26 AM >> > > To: dev@apex.incubator.apache.org >> > > Subject: Re: Stack overflow errors when launching job >> > > >> > > Stack trace in the gist shows some symptoms of infinite recursion. >> > > But, I could not figure out exact cause for it. >> > > >> > > Can you please check your heap dump to see if there are any cycles in >> the >> > > object hierarchy? >> > > >> > > ~ Yogi >> > > >> > > On 19 March 2016 at 00:36, Ashwin Chandra Putta < >> > ashwinchand...@gmail.com> >> > > wrote: >> > > >> > > > In the example you posted, do you have any locality constraint >> applied? >> > > > >> > > > From what I see, you have two operators - hdfs input operator and >> hdfs >> > > > output operator. Each of them have 40 partitions each and you don't >> > have >> > > > any other constraints on them. And the partitioner implementation >> you >> > are >> > > > using is com.datatorrent.common.partitioner.StatelessPartitioner >> > > > >> > > > Please confirm. >> > > > >> > > > Regards, >> > > > Ashwin. >> > > > >> > > > On Thu, Mar 17, 2016 at 5:00 PM, Ganelin, Ilya < >> > > > ilya.gane...@capitalone.com> >> > > > wrote: >> > > > >> > > > > I’ve updated the gist with a more complete example, and updated >> the >> > > > > associated JIRA that I’ve created. >> > > > > https://issues.apache.org/jira/browse/APEXCORE-392 >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > On 3/17/16, 4:33 AM, "Tushar Gosavi" <tus...@datatorrent.com> >> wrote: >> > > > > >> > > > > >Hi, >> > > > > >> > > > > > >> > > > > >I created a sample application with operators from the given >> link. >> > > just >> > > > a >> > > > > >simple input and output and created 32 partitions of each. Could >> not >> > > > > >reproduce the >> > > > > >stack overflow issue. Do you have a small sample application >> which >> > > could >> > > > > >reproduce this issue? >> > > > > > >> > > > > > @Override >> > > > > > public void populateDAG(DAG dag, Configuration configuration) >> > > > > > { >> > > > > > NewlineFileInputOperator in = dag.addOperator("Input", new >> > > > > >NewlineFileInputOperator()); >> > > > > > in.setDirectory("/user/tushar/data"); >> > > > > > in.setPartitionCount(32); >> > > > > > >> > > > > > HdfsFileOutputOperator out = dag.addOperator("Output", new >> > > > > >HdfsFileOutputOperator()); >> > > > > > out.setFilePath("/user/tushar/outdata"); >> > > > > > >> > > > > >> > > > >> > > >> > >> >dag.getMeta(out).getAttributes().put(Context.OperatorContext.PARTITIONER, >> > > > > >new StatelessPartitioner<HdfsFileOutputOperator>(32)); >> > > > > > >> > > > > > dag.addStream("s1", in.output, out.input); >> > > > > > } >> > > > > > >> > > > > >-Tushar. >> > > > > > >> > > > > > >> > > > > > >> > > > > >On Thu, Mar 17, 2016 at 12:30 AM, Ganelin, Ilya < >> > > > > ilya.gane...@capitalone.com >> > > > > >> wrote: >> > > > > > >> > > > > >> Hi guys – I’m running into a very frustrating issue where >> certain >> > > DAG >> > > > > >> configurations cause the following error log (attached). When >> this >> > > > > happens, >> > > > > >> my application even fails to launch. This does not seem to be a >> > YARN >> > > > > issue >> > > > > >> since this occurs even with a relatively small number of >> > > > > partitions/memory. >> > > > > >> >> > > > > >> I’ve attached the input and output operators in question: >> > > > > >> https://gist.github.com/ilganeli/7f770374113b40ffa18a >> > > > > >> >> > > > > >> I can get this to occur predictable by >> > > > > >> >> > > > > >> 1. Increasing the partition count on my input operator >> (reads >> > > from >> > > > > >> HDFS) - values above 20 cause this error >> > > > > >> 2. Increase the partition count on my output operator >> (writes >> > to >> > > > > HDFS) >> > > > > >> - values above 20 cause this error >> > > > > >> 3. Set stream locality from the default to either thread >> local, >> > > > node >> > > > > >> local, or container_local on the output operator >> > > > > >> >> > > > > >> This behavior is very frustrating as it’s preventing me from >> > > > > partitioning >> > > > > >> my HDFS I/O appropriately, thus allowing me to scale to higher >> > > > > throughputs. >> > > > > >> >> > > > > >> Do you have any thoughts on what’s going wrong? I would love >> your >> > > > > feedback. >> > > > > >> ________________________________________________________ >> > > > > >> >> > > > > >> The information contained in this e-mail is confidential and/or >> > > > > >> proprietary to Capital One and/or its affiliates and may only >> be >> > > used >> > > > > >> solely in performance of work or services for Capital One. The >> > > > > information >> > > > > >> transmitted herewith is intended only for use by the >> individual or >> > > > > entity >> > > > > >> to which it is addressed. If the reader of this message is not >> the >> > > > > intended >> > > > > >> recipient, you are hereby notified that any review, >> > retransmission, >> > > > > >> dissemination, distribution, copying or other use of, or >> taking of >> > > any >> > > > > >> action in reliance upon this information is strictly >> prohibited. >> > If >> > > > you >> > > > > >> have received this communication in error, please contact the >> > sender >> > > > and >> > > > > >> delete the material from your computer. >> > > > > >> >> > > > > ________________________________________________________ >> > > > > >> > > > > The information contained in this e-mail is confidential and/or >> > > > > proprietary to Capital One and/or its affiliates and may only be >> used >> > > > > solely in performance of work or services for Capital One. The >> > > > information >> > > > > transmitted herewith is intended only for use by the individual or >> > > entity >> > > > > to which it is addressed. If the reader of this message is not the >> > > > intended >> > > > > recipient, you are hereby notified that any review, >> retransmission, >> > > > > dissemination, distribution, copying or other use of, or taking of >> > any >> > > > > action in reliance upon this information is strictly prohibited. >> If >> > you >> > > > > have received this communication in error, please contact the >> sender >> > > and >> > > > > delete the material from your computer. >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > >> > > > Regards, >> > > > Ashwin. >> > > > >> > > ________________________________________________________ >> > > >> > > The information contained in this e-mail is confidential and/or >> > > proprietary to Capital One and/or its affiliates and may only be used >> > > solely in performance of work or services for Capital One. The >> > information >> > > transmitted herewith is intended only for use by the individual or >> entity >> > > to which it is addressed. If the reader of this message is not the >> > intended >> > > recipient, you are hereby notified that any review, retransmission, >> > > dissemination, distribution, copying or other use of, or taking of any >> > > action in reliance upon this information is strictly prohibited. If >> you >> > > have received this communication in error, please contact the sender >> and >> > > delete the material from your computer. >> > > >> > ________________________________________________________ >> > >> > The information contained in this e-mail is confidential and/or >> > proprietary to Capital One and/or its affiliates and may only be used >> > solely in performance of work or services for Capital One. The >> information >> > transmitted herewith is intended only for use by the individual or >> entity >> > to which it is addressed. If the reader of this message is not the >> intended >> > recipient, you are hereby notified that any review, retransmission, >> > dissemination, distribution, copying or other use of, or taking of any >> > action in reliance upon this information is strictly prohibited. If you >> > have received this communication in error, please contact the sender and >> > delete the material from your computer. >> > >> ________________________________________________________ >> >> The information contained in this e-mail is confidential and/or >> proprietary to Capital One and/or its affiliates and may only be used >> solely in performance of work or services for Capital One. The information >> transmitted herewith is intended only for use by the individual or entity >> to which it is addressed. If the reader of this message is not the intended >> recipient, you are hereby notified that any review, retransmission, >> dissemination, distribution, copying or other use of, or taking of any >> action in reliance upon this information is strictly prohibited. If you >> have received this communication in error, please contact the sender and >> delete the material from your computer. >> > > ________________________________________________________ The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.