Yea we also didn't find anything related to this online. Are you aware of any memory leaks in worker in 1.6.2 spark which might be causing this ? Do you know of any documentation which explains all the tasks that a worker is performing ? Maybe we can get some clue from there.
Regards, Behroz On Fri, Mar 24, 2017 at 2:21 PM, Yong Zhang <java8...@hotmail.com> wrote: > I never experienced worker OOM or very rarely see this online. So my guess > that you have to generate the heap dump file to analyze it. > > > Yong > > > ------------------------------ > *From:* Behroz Sikander <behro...@gmail.com> > *Sent:* Friday, March 24, 2017 9:15 AM > *To:* Yong Zhang > *Cc:* user@spark.apache.org > *Subject:* Re: [Worker Crashing] OutOfMemoryError: GC overhead limit > execeeded > > Thank you for the response. > > Yes, I am sure because the driver was working fine. Only 2 workers went > down with OOM. > > Regards, > Behroz > > On Fri, Mar 24, 2017 at 2:12 PM, Yong Zhang <java8...@hotmail.com> wrote: > >> I am not 100% sure, but normally "dispatcher-event-loop" OOM means the >> driver OOM. Are you sure your workers OOM? >> >> >> Yong >> >> >> ------------------------------ >> *From:* bsikander <behro...@gmail.com> >> *Sent:* Friday, March 24, 2017 5:48 AM >> *To:* user@spark.apache.org >> *Subject:* [Worker Crashing] OutOfMemoryError: GC overhead limit >> execeeded >> >> Spark version: 1.6.2 >> Hadoop: 2.6.0 >> >> Cluster: >> All VMS are deployed on AWS. >> 1 Master (t2.large) >> 1 Secondary Master (t2.large) >> 5 Workers (m4.xlarge) >> Zookeeper (t2.large) >> >> Recently, 2 of our workers went down with out of memory exception. >> java.lang.OutOfMemoryError: GC overhead limit exceeded (max heap: 1024 MB) >> >> Both of these worker processes were in hanged state. We restarted them to >> bring them back to normal state. >> >> Here is the complete exception >> https://gist.github.com/bsikander/84f1a0f3cc831c7a120225a71e435d91 >> <https://gist.github.com/bsikander/84f1a0f3cc831c7a120225a71e435d91> >> Worker crashing >> <https://gist.github.com/bsikander/84f1a0f3cc831c7a120225a71e435d91> >> gist.github.com >> Worker crashing >> >> >> >> Master's spark-default.conf file: >> https://gist.github.com/bsikander/4027136f6a6c91eabad576495c4d797d >> <https://gist.github.com/bsikander/4027136f6a6c91eabad576495c4d797d> >> Default Configuration file for MASTER >> <https://gist.github.com/bsikander/4027136f6a6c91eabad576495c4d797d> >> gist.github.com >> Default Configuration file for MASTER >> >> >> >> Master's spark-env.sh >> https://gist.github.com/bsikander/42f76d7a8e4079098d8a2df3cdee8ee0 >> >> Slave's spark-default.conf file: >> https://gist.github.com/bsikander/54264349b49e6227c6912eb14d344b8c >> >> So, what could be the reason of our workers crashing due to OutOfMemory ? >> How can we avoid that in future. >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Worker-Crashing-OutOfMemoryError-GC-ov >> erhead-limit-execeeded-tp28535.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >