Check the number of partitions in your input. It may be much less than
the available parallelism of your small cluster. For example, input
that lives in just 1 partition will spawn just 1 task.

Beyond that parallelism just happens. You can see the parallelism of
each operation in the Spark UI.

On Thu, Jan 15, 2015 at 10:53 PM, Wang, Ningjun (LNG-NPV)
<ningjun.w...@lexisnexis.com> wrote:
> Spark Standalone cluster.
>
> My program is running very slow, I suspect it is not doing parallel 
> processing of rdd. How can I force it to run parallel? Is there anyway to 
> check whether it is processed in parallel?
>
> Regards,
>
> Ningjun Wang
> Consulting Software Engineer
> LexisNexis
> 121 Chanlon Road
> New Providence, NJ 07974-1541
>
>
> -----Original Message-----
> From: Sean Owen [mailto:so...@cloudera.com]
> Sent: Thursday, January 15, 2015 4:29 PM
> To: Wang, Ningjun (LNG-NPV)
> Cc: user@spark.apache.org
> Subject: Re: How to force parallel processing of RDD using multiple thread
>
> What is your cluster manager? For example on YARN you would specify 
> --executor-cores. Read:
> http://spark.apache.org/docs/latest/running-on-yarn.html
>
> On Thu, Jan 15, 2015 at 8:54 PM, Wang, Ningjun (LNG-NPV) 
> <ningjun.w...@lexisnexis.com> wrote:
>> I have a standalone spark cluster with only one node with 4 CPU cores.
>> How can I force spark to do parallel processing of my RDD using
>> multiple threads? For example I can do the following
>>
>>
>>
>> Spark-submit  --master local[4]
>>
>>
>>
>> However I really want to use the cluster as follow
>>
>>
>>
>> Spark-submit  --master spark://10.125.21.15:7070
>>
>>
>>
>> In that case, how can I make sure the RDD is processed with multiple
>> threads/cores?
>>
>>
>>
>> Thanks
>>
>> Ningjun
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to