That worked great, thanks Andrew. On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <and...@databricks.com> wrote:
> Hi Axel, > > You can try setting `spark.deploy.spreadOut` to false (through your > conf/spark-defaults.conf file). What this does is essentially try to > schedule as many cores on one worker as possible before spilling over to > other workers. Note that you *must* restart the cluster through the sbin > scripts. > > For more information see: > http://spark.apache.org/docs/latest/spark-standalone.html. > > Feel free to let me know whether it works, > -Andrew > > > 2015-08-18 4:49 GMT-07:00 Igor Berman <igor.ber...@gmail.com>: > >> by default standalone creates 1 executor on every worker machine per >> application >> number of overall cores is configured with --total-executor-cores >> so in general if you'll specify --total-executor-cores=1 then there would >> be only 1 core on some executor and you'll get what you want >> >> on the other hand, if you application needs all cores of your cluster and >> only some specific job should run on single executor there are few methods >> to achieve this >> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition >> >> >> On 18 August 2015 at 01:36, Axel Dahl <a...@whisperstream.com> wrote: >> >>> I have a 4 node cluster and have been playing around with the >>> num-executors parameters, executor-memory and executor-cores >>> >>> I set the following: >>> --executor-memory=10G >>> --num-executors=1 >>> --executor-cores=8 >>> >>> But when I run the job, I see that each worker, is running one executor >>> which has 2 cores and 2.5G memory. >>> >>> What I'd like to do instead is have Spark just allocate the job to a >>> single worker node? >>> >>> Is that possible in standalone mode or do I need a job/resource >>> scheduler like Yarn to do that? >>> >>> Thanks in advance, >>> >>> -Axel >>> >>> >>> >> >