To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: How to force parallel processing of RDD using multiple thread
Check the number of partitions in your input. It may be much less than the
available parallelism of your small cluster. For example, input that lives in
just 1 partition
: Friday, January 16, 2015 9:44 AM
To: Wang, Ningjun (LNG-NPV)
Cc: Sean Owen; user@spark.apache.org
Subject: Re: How to force parallel processing of RDD using multiple thread
Spark will use the number of cores available in the cluster. If your cluster is
1 node with 4 cores, Spark will execute up
I have a standalone spark cluster with only one node with 4 CPU cores. How can
I force spark to do parallel processing of my RDD using multiple threads? For
example I can do the following
Spark-submit --master local[4]
However I really want to use the cluster as follow
Spark-submit --master
Subject: Re: How to force parallel processing of RDD using multiple thread
What is your cluster manager? For example on YARN you would specify
--executor-cores. Read:
http://spark.apache.org/docs/latest/running-on-yarn.html
On Thu, Jan 15, 2015 at 8:54 PM, Wang, Ningjun (LNG-NPV)
ningjun.w