Check the number of partitions in your input. It may be much less than the available parallelism of your small cluster. For example, input that lives in just 1 partition will spawn just 1 task.
Beyond that parallelism just happens. You can see the parallelism of each operation in the Spark UI. On Thu, Jan 15, 2015 at 10:53 PM, Wang, Ningjun (LNG-NPV) <ningjun.w...@lexisnexis.com> wrote: > Spark Standalone cluster. > > My program is running very slow, I suspect it is not doing parallel > processing of rdd. How can I force it to run parallel? Is there anyway to > check whether it is processed in parallel? > > Regards, > > Ningjun Wang > Consulting Software Engineer > LexisNexis > 121 Chanlon Road > New Providence, NJ 07974-1541 > > > -----Original Message----- > From: Sean Owen [mailto:so...@cloudera.com] > Sent: Thursday, January 15, 2015 4:29 PM > To: Wang, Ningjun (LNG-NPV) > Cc: user@spark.apache.org > Subject: Re: How to force parallel processing of RDD using multiple thread > > What is your cluster manager? For example on YARN you would specify > --executor-cores. Read: > http://spark.apache.org/docs/latest/running-on-yarn.html > > On Thu, Jan 15, 2015 at 8:54 PM, Wang, Ningjun (LNG-NPV) > <ningjun.w...@lexisnexis.com> wrote: >> I have a standalone spark cluster with only one node with 4 CPU cores. >> How can I force spark to do parallel processing of my RDD using >> multiple threads? For example I can do the following >> >> >> >> Spark-submit --master local[4] >> >> >> >> However I really want to use the cluster as follow >> >> >> >> Spark-submit --master spark://10.125.21.15:7070 >> >> >> >> In that case, how can I make sure the RDD is processed with multiple >> threads/cores? >> >> >> >> Thanks >> >> Ningjun >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org