RE: How to force parallel processing of RDD using multiple thread

2015-01-16 Thread Wang, Ningjun (LNG-NPV)
To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: How to force parallel processing of RDD using multiple thread Check the number of partitions in your input. It may be much less than the available parallelism of your small cluster. For example, input that lives in just 1 partition

RE: How to force parallel processing of RDD using multiple thread

2015-01-16 Thread Wang, Ningjun (LNG-NPV)
: Friday, January 16, 2015 9:44 AM To: Wang, Ningjun (LNG-NPV) Cc: Sean Owen; user@spark.apache.org Subject: Re: How to force parallel processing of RDD using multiple thread Spark will use the number of cores available in the cluster. If your cluster is 1 node with 4 cores, Spark will execute up

How to force parallel processing of RDD using multiple thread

2015-01-15 Thread Wang, Ningjun (LNG-NPV)
I have a standalone spark cluster with only one node with 4 CPU cores. How can I force spark to do parallel processing of my RDD using multiple threads? For example I can do the following Spark-submit --master local[4] However I really want to use the cluster as follow Spark-submit --master

Re: How to force parallel processing of RDD using multiple thread

2015-01-15 Thread Sean Owen
Subject: Re: How to force parallel processing of RDD using multiple thread What is your cluster manager? For example on YARN you would specify --executor-cores. Read: http://spark.apache.org/docs/latest/running-on-yarn.html On Thu, Jan 15, 2015 at 8:54 PM, Wang, Ningjun (LNG-NPV) ningjun.w