Hi Mohammed, Thanks for your reply. I agree with you, however a single application can use multiple executors as well, so I am still not clear which option is best. Let me make an example to be a little more concrete.
Let's say I am only running a single application. Let's assume again that I have 192GB of memory and 24 cores on each node. Which one of the following two options is best and why: 1. Running 6 workers with 32GB each and 1 executor/worker (i.e. set SPARK_WORKER_INSTANCES=6, leave spark.executor.cores to its default, which is to assign all available cores to an executor in standalone mode). 2. Running 1 worker with 192GB memory and 6 executors/worker (i.e. SPARK_WORKER_INSTANCES=1 and spark.executor.cores=5, spark.executor.memory=32GB). Also one more question. I understand that workers and executors are different processes. How many resources is the worker process actually using and how do I set those? As far as I understand the worker does not need many resources, as it is only spawning up executors. Is that correct? Thanks, Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Mon, May 2, 2016 at 7:47 PM, Mohammed Guller <moham...@glassbeam.com> wrote: > The workers and executors run as separate JVM processes in the standalone > mode. > > > > The use of multiple workers on a single machine depends on how you will be > using the clusters. If you run multiple Spark applications simultaneously, > each application gets its own its executor. So, for example, if you > allocate 8GB to each application, you can run 192/8 Spark applications > simultaneously (assuming you also have large number of cores). Each > executor has only 8GB heap, so GC should not be issue. Alternatively, if > you know that you will have few applications running simultaneously on that > cluster, running multiple workers on each machine will allow you to avoid > GC issues associated with allocating large heap to a single JVM process. > This option allows you to run multiple executors for an application on a > single machine and each executor can be configured with optimal memory. > > > > > > Mohammed > > Author: Big Data Analytics with Spark > <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> > > > > *From:* Simone Franzini [mailto:captainfr...@gmail.com] > *Sent:* Monday, May 2, 2016 9:27 AM > *To:* user > *Subject:* Fwd: Spark standalone workers, executors and JVMs > > > > I am still a little bit confused about workers, executors and JVMs in > standalone mode. > > Are worker processes and executors independent JVMs or do executors run > within the worker JVM? > > I have some memory-rich nodes (192GB) and I would like to avoid deploying > massive JVMs due to well known performance issues (GC and such). > > As of Spark 1.4 it is possible to either deploy multiple workers > (SPARK_WORKER_INSTANCES + SPARK_WORKER_CORES) or multiple executors per > worker (--executor-cores). Which option is preferable and why? > > > > Thanks, > > Simone Franzini, PhD > > http://www.linkedin.com/in/simonefranzini > > >