Hi,

I am testing some parallel processing of Spark applications.

I have a two node spark cluster and currently running two worker processes
on each in Yarn-client mode. The master has 12 cores and 24GB of RAM. The
worker node has 4GB of RAM and 2 cores (well an old 32 bit host). The OS on
both is RHES5 with master 64-bit and the worker 32-bit.

I submit identical jobs to master via spark-shell, one from master and the
other from the worker node

However, whichever one comes first is executed but the other job is
waiting. I know this has nothing to do with Hadoop as I can run a Spark job
and identical Hive job in parallel.

I have allocated the following resources in spark-env.sh

export SPARK_EXECUTOR_CORES=3 ##, Number of cores for the workers (Default:
1).
export SPARK_EXECUTOR_MEMORY=3G ## , Memory per Worker (e.g. 1000M, 2G)
(Default: 1G)
export SPARK_DRIVER_MEMORY=1G ## , Memory for Master (e.g. 1000M, 2G)
(Default: 512 Mb)

The problem is that the first job goes and grabs 12 cores despite the above
settings. Is this behaviour expected in Yarn -client mode or there is
something wrong in here?

The Spark GUI shows the jobs. The master worker runs on address ending in
216 whereas the worker runs on 217 as shown below



Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to