Hi Axel, You can try setting `spark.deploy.spreadOut` to false (through your conf/spark-defaults.conf file). What this does is essentially try to schedule as many cores on one worker as possible before spilling over to other workers. Note that you *must* restart the cluster through the sbin scripts.
For more information see: http://spark.apache.org/docs/latest/spark-standalone.html. Feel free to let me know whether it works, -Andrew 2015-08-18 4:49 GMT-07:00 Igor Berman <igor.ber...@gmail.com>: > by default standalone creates 1 executor on every worker machine per > application > number of overall cores is configured with --total-executor-cores > so in general if you'll specify --total-executor-cores=1 then there would > be only 1 core on some executor and you'll get what you want > > on the other hand, if you application needs all cores of your cluster and > only some specific job should run on single executor there are few methods > to achieve this > e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition > > > On 18 August 2015 at 01:36, Axel Dahl <a...@whisperstream.com> wrote: > >> I have a 4 node cluster and have been playing around with the >> num-executors parameters, executor-memory and executor-cores >> >> I set the following: >> --executor-memory=10G >> --num-executors=1 >> --executor-cores=8 >> >> But when I run the job, I see that each worker, is running one executor >> which has 2 cores and 2.5G memory. >> >> What I'd like to do instead is have Spark just allocate the job to a >> single worker node? >> >> Is that possible in standalone mode or do I need a job/resource scheduler >> like Yarn to do that? >> >> Thanks in advance, >> >> -Axel >> >> >> >