Max number of cores per executor can be controlled using spark.executor.cores. And maximum number of executors on a single worker can be determined by environment variable: SPARK_WORKER_INSTANCES.
However, to ensure that all available cores are used, you will have to take care of how the stream is partitioned. Copy pasting help text of Spark. *The number of tasks per receiver per batch will be approximately (batch interval / block interval). For example, block interval of 200 ms will create 10 tasks per 2 second batches. If the number of tasks is too low (that is, less than the number of cores per machine), then it will be inefficient as all available cores will not be used to process the data. To increase the number of tasks for a given batch interval, reduce the block interval. However, the recommended minimum value of block interval is about 50 ms, below which the task launching overheads may be a problem.An alternative to receiving data with multiple input streams / receivers is to explicitly repartition the input data stream (using inputStream.repartition(<number of partitions>)). This distributes the received batches of data across the specified number of machines in the cluster before further processing.* Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> www.snappydata.io On Sun, Feb 21, 2016 at 11:01 PM, Saiph Kappa <saiph.ka...@gmail.com> wrote: > Hi, > > I'm running a spark streaming application onto a spark cluster that spans > 6 machines/workers. I'm using spark cluster standalone mode. Each machine > has 8 cores. Is there any way to specify that I want to run my application > on all 6 machines and just use 2 cores on each machine? > > Thanks >