Please add the "*-XXMaxJavaStackTraceDepth=-1*" to the JVM options and regenerate the stack trace. Please note that the argument is a negative 1 which forces unlimited stack trace depth.
For example: <property> <name>dt.attr.CONTAINER_JVM_OPTIONS</name> <value>-XXMaxJavaStackTraceDepth=-1</value> </property> Ram On Mon, Mar 21, 2016 at 11:06 AM, Ganelin, Ilya <ilya.gane...@capitalone.com > wrote: > Ram - that is the complete log. I have nothing else available, either > through YARN or through the DT UI. > > > > > On 3/21/16, 10:33 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote: > > >The call chain is not complete; it ends abruptly with: > > > >at java.util.ArrayList.writeObject(ArrayList.java:742) > >at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > >at > > >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >at java.lang.reflect.Method.invoke(Method.java:606) > >at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) > >at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) > > > > > >We need to see the point of origin. > > > >Ram > > > >On Mon, Mar 21, 2016 at 10:02 AM, Ganelin, Ilya < > ilya.gane...@capitalone.com > >> wrote: > > > >> I uploaded the complete stack trace to the gist in the issue: > >> https://gist.github.com/ilganeli/7f770374113b40ffa18a > >> > >> > >> > >> > >> > >> On 3/21/16, 9:38 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote: > >> > >> >Ilya, could you upload a full stack trace of the failure so we can see > >> >where the call chain > >> >originated ? > >> > > >> >Ram > >> > > >> >On Mon, Mar 21, 2016 at 9:21 AM, Ganelin, Ilya < > >> ilya.gane...@capitalone.com> > >> >wrote: > >> > > >> >> Chandni- my application fails when launching in YARN, not in local > mode. > >> >> There is no custom partitioning - the code in the example is complete > >> for > >> >> both the input and output classes. > >> >> > >> >> > >> >> > >> >> Sent with Good (www.good.com) > >> >> ________________________________ > >> >> From: Chandni Singh <chan...@datatorrent.com> > >> >> Sent: Monday, March 21, 2016 3:45:46 AM > >> >> To: dev@apex.incubator.apache.org > >> >> Subject: Re: Stack overflow errors when launching job > >> >> > >> >> > >> >> debug.zip > >> >> < > >> >> > >> > https://drive.google.com/a/datatorrent.com/file/d/0BxX8sOLG8CxHLXFjUjBxM0hIZDg/view?usp=drive_web > >> >> > > >> >> Hi Ilya, > >> >> > >> >> Attached is the debug application with 20 partitions of input and > output > >> >> operators. I changed the default locality. This application doesn't > >> fail in > >> >> local mode. > >> >> > >> >> I am using the Stateless Partitioner for both Input and Output. > >> >> Test configuration is in ApplicationTest and cluster configuration > is in > >> >> my-app-conf1.xml > >> >> > >> >> Have you added custom partitioning? They maybe causing the stack > >> overflow > >> >> in the app master. > >> >> > >> >> Can you modify this application so that the ApplicationTest throws > this > >> >> stack overflow? > >> >> > >> >> - Chandni > >> >> > >> >> > >> >> > >> >> > >> >> On Sun, Mar 20, 2016 at 11:30 AM, Chandni Singh < > >> chan...@datatorrent.com> > >> >> wrote: > >> >> > >> >> > Hi Ilya, > >> >> > As Ram mentioned that we don't know the beginning of the stack > track > >> from > >> >> > where this is triggered. We can add jvm options in the > configuration > >> file > >> >> > so that app master is deployed with those configurations. > >> >> > > >> >> > Anyways I will look into creating this application (with 20 > >> partitions) > >> >> > and run it in local mode to find out where the problem is. > >> >> > > >> >> > Will get back to you today or tomorrow. > >> >> > > >> >> > Chandni > >> >> > > >> >> > On Sun, Mar 20, 2016 at 9:54 AM, Amol Kekre <a...@datatorrent.com> > >> >> wrote: > >> >> > > >> >> >> Can we get on a webex to take a look? > >> >> >> > >> >> >> thks > >> >> >> Amol > >> >> >> > >> >> >> > >> >> >> On Sat, Mar 19, 2016 at 7:27 PM, Ganelin, Ilya < > >> >> >> ilya.gane...@capitalone.com> > >> >> >> wrote: > >> >> >> > >> >> >> > I don't think I have any time really to connect to the > container. > >> The > >> >> >> > application launches and crashes almost immediately. Total > runtime > >> is > >> >> 50 > >> >> >> > seconds. > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > Sent with Good (www.good.com<http://www.good.com>) > >> >> >> > ________________________________ > >> >> >> > From: Munagala Ramanath <r...@datatorrent.com> > >> >> >> > Sent: Saturday, March 19, 2016 5:39:11 PM > >> >> >> > To: dev@apex.incubator.apache.org > >> >> >> > Subject: Re: Stack overflow errors when launching job > >> >> >> > > >> >> >> > There is some info here, near the end of the page: > >> >> >> > > >> >> >> > http://docs.datatorrent.com/troubleshooting/ > >> >> >> > > >> >> >> > under the heading "How do I get a heap dump when a container > gets > >> an > >> >> >> > OutOfMemoryError ?" > >> >> >> > > >> >> >> > However since you're blowing the stack, you may need to manually > >> run > >> >> >> jmap > >> >> >> > on the running container > >> >> >> > which may be difficult if it doesn't stay up for very long. > There > >> is a > >> >> >> way > >> >> >> > to dump the heap programmatically > >> >> >> > as described, for instance, here: > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > >> >> > >> > https://blogs.oracle.com/sundararajan/entry/programmatically_dumping_heap_from_java > >> >> >> > > >> >> >> > Ram > >> >> >> > > >> >> >> > On Sat, Mar 19, 2016 at 2:07 PM, Ganelin, Ilya < > >> >> >> > ilya.gane...@capitalone.com> > >> >> >> > wrote: > >> >> >> > > >> >> >> > > How would we go about getting a heap dump? > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > > Sent with Good (www.good.com<http://www.good.com< > >> >> http://www.good.com<http://www.good.com>>) > >> >> >> > > ________________________________ > >> >> >> > > From: Yogi Devendra <yogideven...@apache.org> > >> >> >> > > Sent: Saturday, March 19, 2016 12:19:26 AM > >> >> >> > > To: dev@apex.incubator.apache.org > >> >> >> > > Subject: Re: Stack overflow errors when launching job > >> >> >> > > > >> >> >> > > Stack trace in the gist shows some symptoms of infinite > >> recursion. > >> >> >> > > But, I could not figure out exact cause for it. > >> >> >> > > > >> >> >> > > Can you please check your heap dump to see if there are any > >> cycles > >> >> in > >> >> >> the > >> >> >> > > object hierarchy? > >> >> >> > > > >> >> >> > > ~ Yogi > >> >> >> > > > >> >> >> > > On 19 March 2016 at 00:36, Ashwin Chandra Putta < > >> >> >> > ashwinchand...@gmail.com> > >> >> >> > > wrote: > >> >> >> > > > >> >> >> > > > In the example you posted, do you have any locality > constraint > >> >> >> applied? > >> >> >> > > > > >> >> >> > > > From what I see, you have two operators - hdfs input > operator > >> and > >> >> >> hdfs > >> >> >> > > > output operator. Each of them have 40 partitions each and > you > >> >> don't > >> >> >> > have > >> >> >> > > > any other constraints on them. And the partitioner > >> implementation > >> >> >> you > >> >> >> > are > >> >> >> > > > using is > >> com.datatorrent.common.partitioner.StatelessPartitioner > >> >> >> > > > > >> >> >> > > > Please confirm. > >> >> >> > > > > >> >> >> > > > Regards, > >> >> >> > > > Ashwin. > >> >> >> > > > > >> >> >> > > > On Thu, Mar 17, 2016 at 5:00 PM, Ganelin, Ilya < > >> >> >> > > > ilya.gane...@capitalone.com> > >> >> >> > > > wrote: > >> >> >> > > > > >> >> >> > > > > I’ve updated the gist with a more complete example, and > >> updated > >> >> >> the > >> >> >> > > > > associated JIRA that I’ve created. > >> >> >> > > > > https://issues.apache.org/jira/browse/APEXCORE-392 > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > On 3/17/16, 4:33 AM, "Tushar Gosavi" < > tus...@datatorrent.com > >> > > >> >> >> wrote: > >> >> >> > > > > > >> >> >> > > > > >Hi, > >> >> >> > > > > > >> >> >> > > > > > > >> >> >> > > > > >I created a sample application with operators from the > given > >> >> >> link. > >> >> >> > > just > >> >> >> > > > a > >> >> >> > > > > >simple input and output and created 32 partitions of > each. > >> >> Could > >> >> >> not > >> >> >> > > > > >reproduce the > >> >> >> > > > > >stack overflow issue. Do you have a small sample > application > >> >> >> which > >> >> >> > > could > >> >> >> > > > > >reproduce this issue? > >> >> >> > > > > > > >> >> >> > > > > > @Override > >> >> >> > > > > > public void populateDAG(DAG dag, Configuration > >> configuration) > >> >> >> > > > > > { > >> >> >> > > > > > NewlineFileInputOperator in = > dag.addOperator("Input", > >> new > >> >> >> > > > > >NewlineFileInputOperator()); > >> >> >> > > > > > in.setDirectory("/user/tushar/data"); > >> >> >> > > > > > in.setPartitionCount(32); > >> >> >> > > > > > > >> >> >> > > > > > HdfsFileOutputOperator out = > dag.addOperator("Output", > >> new > >> >> >> > > > > >HdfsFileOutputOperator()); > >> >> >> > > > > > out.setFilePath("/user/tushar/outdata"); > >> >> >> > > > > > > >> >> >> > > > > > >> >> >> > > > > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > >> > >dag.getMeta(out).getAttributes().put(Context.OperatorContext.PARTITIONER, > >> >> >> > > > > >new StatelessPartitioner<HdfsFileOutputOperator>(32)); > >> >> >> > > > > > > >> >> >> > > > > > dag.addStream("s1", in.output, out.input); > >> >> >> > > > > > } > >> >> >> > > > > > > >> >> >> > > > > >-Tushar. > >> >> >> > > > > > > >> >> >> > > > > > > >> >> >> > > > > > > >> >> >> > > > > >On Thu, Mar 17, 2016 at 12:30 AM, Ganelin, Ilya < > >> >> >> > > > > ilya.gane...@capitalone.com > >> >> >> > > > > >> wrote: > >> >> >> > > > > > > >> >> >> > > > > >> Hi guys – I’m running into a very frustrating issue > where > >> >> >> certain > >> >> >> > > DAG > >> >> >> > > > > >> configurations cause the following error log > (attached). > >> When > >> >> >> this > >> >> >> > > > > happens, > >> >> >> > > > > >> my application even fails to launch. This does not > seem to > >> >> be a > >> >> >> > YARN > >> >> >> > > > > issue > >> >> >> > > > > >> since this occurs even with a relatively small number > of > >> >> >> > > > > partitions/memory. > >> >> >> > > > > >> > >> >> >> > > > > >> I’ve attached the input and output operators in > question: > >> >> >> > > > > >> https://gist.github.com/ilganeli/7f770374113b40ffa18a > >> >> >> > > > > >> > >> >> >> > > > > >> I can get this to occur predictable by > >> >> >> > > > > >> > >> >> >> > > > > >> 1. Increasing the partition count on my input > operator > >> >> >> (reads > >> >> >> > > from > >> >> >> > > > > >> HDFS) - values above 20 cause this error > >> >> >> > > > > >> 2. Increase the partition count on my output > operator > >> >> >> (writes > >> >> >> > to > >> >> >> > > > > HDFS) > >> >> >> > > > > >> - values above 20 cause this error > >> >> >> > > > > >> 3. Set stream locality from the default to either > >> thread > >> >> >> local, > >> >> >> > > > node > >> >> >> > > > > >> local, or container_local on the output operator > >> >> >> > > > > >> > >> >> >> > > > > >> This behavior is very frustrating as it’s preventing me > >> from > >> >> >> > > > > partitioning > >> >> >> > > > > >> my HDFS I/O appropriately, thus allowing me to scale to > >> >> higher > >> >> >> > > > > throughputs. > >> >> >> > > > > >> > >> >> >> > > > > >> Do you have any thoughts on what’s going wrong? I would > >> love > >> >> >> your > >> >> >> > > > > feedback. > >> >> >> > > > > >> > ________________________________________________________ > >> >> >> > > > > >> > >> >> >> > > > > >> The information contained in this e-mail is > confidential > >> >> and/or > >> >> >> > > > > >> proprietary to Capital One and/or its affiliates and > may > >> only > >> >> >> be > >> >> >> > > used > >> >> >> > > > > >> solely in performance of work or services for Capital > One. > >> >> The > >> >> >> > > > > information > >> >> >> > > > > >> transmitted herewith is intended only for use by the > >> >> >> individual or > >> >> >> > > > > entity > >> >> >> > > > > >> to which it is addressed. If the reader of this > message is > >> >> not > >> >> >> the > >> >> >> > > > > intended > >> >> >> > > > > >> recipient, you are hereby notified that any review, > >> >> >> > retransmission, > >> >> >> > > > > >> dissemination, distribution, copying or other use of, > or > >> >> >> taking of > >> >> >> > > any > >> >> >> > > > > >> action in reliance upon this information is strictly > >> >> >> prohibited. > >> >> >> > If > >> >> >> > > > you > >> >> >> > > > > >> have received this communication in error, please > contact > >> the > >> >> >> > sender > >> >> >> > > > and > >> >> >> > > > > >> delete the material from your computer. > >> >> >> > > > > >> > >> >> >> > > > > ________________________________________________________ > >> >> >> > > > > > >> >> >> > > > > The information contained in this e-mail is confidential > >> and/or > >> >> >> > > > > proprietary to Capital One and/or its affiliates and may > >> only be > >> >> >> used > >> >> >> > > > > solely in performance of work or services for Capital One. > >> The > >> >> >> > > > information > >> >> >> > > > > transmitted herewith is intended only for use by the > >> individual > >> >> or > >> >> >> > > entity > >> >> >> > > > > to which it is addressed. If the reader of this message is > >> not > >> >> the > >> >> >> > > > intended > >> >> >> > > > > recipient, you are hereby notified that any review, > >> >> >> retransmission, > >> >> >> > > > > dissemination, distribution, copying or other use of, or > >> taking > >> >> of > >> >> >> > any > >> >> >> > > > > action in reliance upon this information is strictly > >> prohibited. > >> >> >> If > >> >> >> > you > >> >> >> > > > > have received this communication in error, please contact > the > >> >> >> sender > >> >> >> > > and > >> >> >> > > > > delete the material from your computer. > >> >> >> > > > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > -- > >> >> >> > > > > >> >> >> > > > Regards, > >> >> >> > > > Ashwin. > >> >> >> > > > > >> >> >> > > ________________________________________________________ > >> >> >> > > > >> >> >> > > The information contained in this e-mail is confidential > and/or > >> >> >> > > proprietary to Capital One and/or its affiliates and may only > be > >> >> used > >> >> >> > > solely in performance of work or services for Capital One. The > >> >> >> > information > >> >> >> > > transmitted herewith is intended only for use by the > individual > >> or > >> >> >> entity > >> >> >> > > to which it is addressed. If the reader of this message is not > >> the > >> >> >> > intended > >> >> >> > > recipient, you are hereby notified that any review, > >> retransmission, > >> >> >> > > dissemination, distribution, copying or other use of, or > taking > >> of > >> >> any > >> >> >> > > action in reliance upon this information is strictly > prohibited. > >> If > >> >> >> you > >> >> >> > > have received this communication in error, please contact the > >> sender > >> >> >> and > >> >> >> > > delete the material from your computer. > >> >> >> > > > >> >> >> > ________________________________________________________ > >> >> >> > > >> >> >> > The information contained in this e-mail is confidential and/or > >> >> >> > proprietary to Capital One and/or its affiliates and may only be > >> used > >> >> >> > solely in performance of work or services for Capital One. The > >> >> >> information > >> >> >> > transmitted herewith is intended only for use by the individual > or > >> >> >> entity > >> >> >> > to which it is addressed. If the reader of this message is not > the > >> >> >> intended > >> >> >> > recipient, you are hereby notified that any review, > retransmission, > >> >> >> > dissemination, distribution, copying or other use of, or taking > of > >> any > >> >> >> > action in reliance upon this information is strictly > prohibited. If > >> >> you > >> >> >> > have received this communication in error, please contact the > >> sender > >> >> and > >> >> >> > delete the material from your computer. > >> >> >> > > >> >> >> > >> >> > > >> >> > > >> >> ________________________________________________________ > >> >> > >> >> The information contained in this e-mail is confidential and/or > >> >> proprietary to Capital One and/or its affiliates and may only be used > >> >> solely in performance of work or services for Capital One. The > >> information > >> >> transmitted herewith is intended only for use by the individual or > >> entity > >> >> to which it is addressed. If the reader of this message is not the > >> intended > >> >> recipient, you are hereby notified that any review, retransmission, > >> >> dissemination, distribution, copying or other use of, or taking of > any > >> >> action in reliance upon this information is strictly prohibited. If > you > >> >> have received this communication in error, please contact the sender > and > >> >> delete the material from your computer. > >> >> > >> ________________________________________________________ > >> > >> The information contained in this e-mail is confidential and/or > >> proprietary to Capital One and/or its affiliates and may only be used > >> solely in performance of work or services for Capital One. The > information > >> transmitted herewith is intended only for use by the individual or > entity > >> to which it is addressed. If the reader of this message is not the > intended > >> recipient, you are hereby notified that any review, retransmission, > >> dissemination, distribution, copying or other use of, or taking of any > >> action in reliance upon this information is strictly prohibited. If you > >> have received this communication in error, please contact the sender and > >> delete the material from your computer. > >> > ________________________________________________________ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >