Re: Inferring Data driven Spark parameters

2018-07-04 Thread Mich Talebzadeh
Hi Aakash, For clarification are you running this in Yarn client mode or standalone? How much total yarn memory is available? >From my experience for a bigger cluster I found the following incremental settings useful (CDH 5.9, Yarn client) so you can scale yours [1] - 576GB --num-executors 24

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Prem Sure
Can you share the API that your jobs use.. just core RDDs or SQL or DStreams..etc? refer recommendations from https://spark.apache.org/docs/2.3.0/configuration.html for detailed configurations. Thanks, Prem On Wed, Jul 4, 2018 at 12:34 PM, Aakash Basu wrote: > I do not want to change

Re: Inferring Data driven Spark parameters

2018-07-04 Thread Aakash Basu
I do not want to change executor/driver cores/memory on the fly in a single Spark job, all I want is to make them cluster specific. So, I want to have a formulae, with which, depending on the size of driver and executor details, I can find out the values for them before submitting those details in

Re: Inferring Data driven Spark parameters

2018-07-03 Thread Vadim Semenov
You can't change the executor/driver cores/memory on the fly once you've already started a Spark Context. On Tue, Jul 3, 2018 at 4:30 AM Aakash Basu wrote: > > We aren't using Oozie or similar, moreover, the end to end job shall be > exactly the same, but the data will be extremely different

Re: Inferring Data driven Spark parameters

2018-07-03 Thread Aakash Basu
We aren't using Oozie or similar, moreover, the end to end job shall be exactly the same, but the data will be extremely different (number of continuous and categorical columns, vertical size, horizontal size, etc), hence, if there would have been a calculation of the parameters to arrive at a

Re: Inferring Data driven Spark parameters

2018-07-03 Thread Jörn Franke
Don’t do this in your job. Create for different types of jobs different jobs and orchestrate them using oozie or similar. > On 3. Jul 2018, at 09:34, Aakash Basu wrote: > > Hi, > > Cluster - 5 node (1 Driver and 4 workers) > Driver Config: 16 cores, 32 GB RAM > Worker Config: 8 cores, 16 GB

Inferring Data driven Spark parameters

2018-07-03 Thread Aakash Basu
Hi, Cluster - 5 node (1 Driver and 4 workers) Driver Config: 16 cores, 32 GB RAM Worker Config: 8 cores, 16 GB RAM I'm using the below parameters from which I know the first chunk is cluster dependent and the second chunk is data/code dependent. --num-executors 4 --executor-cores 5