What version of Spark are you using?  Have you set any shuffle configs?

On Wed, Aug 19, 2015 at 11:46 AM, unk1102 <umesh.ka...@gmail.com> wrote:

> I have one Spark job which seems to run fine but after one hour or so
> executor start getting lost because of time out something like the
> following
> error
>
> cluster.yarnScheduler : Removing an executor 14 650000 timeout exceeds
> 600000 seconds
>
> and because of above error couple of chained errors starts to come like
> FetchFailedException, Rpc client disassociated, Connection reset by peer,
> IOException etc
>
> Please see the following UI page I have noticed when shuffle read/write
> starts to increase more than 10 GB executors starts getting lost because of
> timeout. How do I clear this stacked memory of 10 GB in shuffle read/write
> section I dont cache anything why Spark is not clearing those memory.
> Please
> guide.
>
> IMG_20150819_231418358.jpg
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n24345/IMG_20150819_231418358.jpg
> >
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-executor-time-out-on-yarn-spark-while-dealing-with-large-shuffle-skewed-data-tp24345.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to