What version of Spark are you using? Have you set any shuffle configs? On Wed, Aug 19, 2015 at 11:46 AM, unk1102 <umesh.ka...@gmail.com> wrote:
> I have one Spark job which seems to run fine but after one hour or so > executor start getting lost because of time out something like the > following > error > > cluster.yarnScheduler : Removing an executor 14 650000 timeout exceeds > 600000 seconds > > and because of above error couple of chained errors starts to come like > FetchFailedException, Rpc client disassociated, Connection reset by peer, > IOException etc > > Please see the following UI page I have noticed when shuffle read/write > starts to increase more than 10 GB executors starts getting lost because of > timeout. How do I clear this stacked memory of 10 GB in shuffle read/write > section I dont cache anything why Spark is not clearing those memory. > Please > guide. > > IMG_20150819_231418358.jpg > < > http://apache-spark-user-list.1001560.n3.nabble.com/file/n24345/IMG_20150819_231418358.jpg > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-executor-time-out-on-yarn-spark-while-dealing-with-large-shuffle-skewed-data-tp24345.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >