Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin
k.apache.org<mailto:user@spark.apache.org> Subject: Re: How to configure spark on Yarn cluster Not sure that we are OK on one thing: Yarn limitations are for the sum of all nodes, while you only specify the memory for a single node through Spark. By the way, the memory displayed in the

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi
jeff saremi; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: How to configure spark on Yarn cluster Check the executor page of the Spark UI, to check if your storage level is limiting. Also, instead of starting with 100 TB of data, sample it, make it work, and grow it little by litt

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin
park.apache.org> Subject: Re: How to configure spark on Yarn cluster Check the executor page of the Spark UI, to check if your storage level is limiting. Also, instead of starting with 100 TB of data, sample it, make it work, and grow it little by little until you reached 100 TB.

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi
From: yohann jardin <yohannjar...@hotmail.com> Sent: Thursday, July 27, 2017 11:15:39 PM To: jeff saremi; user@spark.apache.org Subject: Re: How to configure spark on Yarn cluster Check the executor page of the Spark UI, to check if your storage level is limiting. Also, i

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin
Check the executor page of the Spark UI, to check if your storage level is limiting. Also, instead of starting with 100 TB of data, sample it, make it work, and grow it little by little until you reached 100 TB. This will validate the workflow and let you see how much data is shuffled, etc.

How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi
I have the simplest job which i'm running against 100TB of data. The job keeps failing with ExecutorLostFailure's on containers killed by Yarn for exceeding memory limits I have varied the executor-memory from 32GB to 96GB, the spark.yarn.executor.memoryOverhead from 8192 to 36000 and similar