That worked great, thanks Andrew.

On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <and...@databricks.com> wrote:

> Hi Axel,
>
> You can try setting `spark.deploy.spreadOut` to false (through your
> conf/spark-defaults.conf file). What this does is essentially try to
> schedule as many cores on one worker as possible before spilling over to
> other workers. Note that you *must* restart the cluster through the sbin
> scripts.
>
> For more information see:
> http://spark.apache.org/docs/latest/spark-standalone.html.
>
> Feel free to let me know whether it works,
> -Andrew
>
>
> 2015-08-18 4:49 GMT-07:00 Igor Berman <igor.ber...@gmail.com>:
>
>> by default standalone creates 1 executor on every worker machine per
>> application
>> number of overall cores is configured with --total-executor-cores
>> so in general if you'll specify --total-executor-cores=1 then there would
>> be only 1 core on some executor and you'll get what you want
>>
>> on the other hand, if you application needs all cores of your cluster and
>> only some specific job should run on single executor there are few methods
>> to achieve this
>> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition
>>
>>
>> On 18 August 2015 at 01:36, Axel Dahl <a...@whisperstream.com> wrote:
>>
>>> I have a 4 node cluster and have been playing around with the
>>> num-executors parameters, executor-memory and executor-cores
>>>
>>> I set the following:
>>> --executor-memory=10G
>>> --num-executors=1
>>> --executor-cores=8
>>>
>>> But when I run the job, I see that each worker, is running one executor
>>> which has  2 cores and 2.5G memory.
>>>
>>> What I'd like to do instead is have Spark just allocate the job to a
>>> single worker node?
>>>
>>> Is that possible in standalone mode or do I need a job/resource
>>> scheduler like Yarn to do that?
>>>
>>> Thanks in advance,
>>>
>>> -Axel
>>>
>>>
>>>
>>
>

Reply via email to