Hi Aakash,
For clarification are you running this in Yarn client mode or standalone?
How much total yarn memory is available?
>From my experience for a bigger cluster I found the following incremental
settings useful (CDH 5.9, Yarn client) so you can scale yours
[1] - 576GB
--num-executors 24
Can you share the API that your jobs use.. just core RDDs or SQL or
DStreams..etc?
refer recommendations from
https://spark.apache.org/docs/2.3.0/configuration.html for detailed
configurations.
Thanks,
Prem
On Wed, Jul 4, 2018 at 12:34 PM, Aakash Basu
wrote:
> I do not want to change
I do not want to change executor/driver cores/memory on the fly in a single
Spark job, all I want is to make them cluster specific. So, I want to have
a formulae, with which, depending on the size of driver and executor
details, I can find out the values for them before submitting those details
in
You can't change the executor/driver cores/memory on the fly once
you've already started a Spark Context.
On Tue, Jul 3, 2018 at 4:30 AM Aakash Basu wrote:
>
> We aren't using Oozie or similar, moreover, the end to end job shall be
> exactly the same, but the data will be extremely different
We aren't using Oozie or similar, moreover, the end to end job shall be
exactly the same, but the data will be extremely different (number of
continuous and categorical columns, vertical size, horizontal size, etc),
hence, if there would have been a calculation of the parameters to arrive
at a
Don’t do this in your job. Create for different types of jobs different jobs
and orchestrate them using oozie or similar.
> On 3. Jul 2018, at 09:34, Aakash Basu wrote:
>
> Hi,
>
> Cluster - 5 node (1 Driver and 4 workers)
> Driver Config: 16 cores, 32 GB RAM
> Worker Config: 8 cores, 16 GB
Hi,
Cluster - 5 node (1 Driver and 4 workers)
Driver Config: 16 cores, 32 GB RAM
Worker Config: 8 cores, 16 GB RAM
I'm using the below parameters from which I know the first chunk is cluster
dependent and the second chunk is data/code dependent.
--num-executors 4
--executor-cores 5