If increasing executors really isn't enough, then you can consider using
mapPartitions to process whole partitions at a time. Within that you can
multi thread your processing of the elements in the partition. (And you
should probably use more like one worker per machine then.)

The question is how to parallelize. If you can tolerate the input and
output being in memory, then you can make the Iterator into a parallel
collection and trivially map it in parallel locally.  Otherwise you can
look at Future.traverse to iterate it in parallel instead but I have not
tried it.
On Sep 28, 2014 4:44 AM, "myasuka" <myas...@live.com> wrote:

> Hi, everyone
>     I come across with a problem about increasing the concurency. In a
> program, after shuffle write, each node should fetch 16 pair matrices to do
> matrix multiplication. such as:
>
> *
> import breeze.linalg.{DenseMatrix => BDM}
>
> pairs.map(t => {
>         val b1 = t._2._1.asInstanceOf[BDM[Double]]
>         val b2 = t._2._2.asInstanceOf[BDM[Double]]
>
>         val c = (b1 * b2).asInstanceOf[BDM[Double]]
>
>         (new BlockID(t._1.row, t._1.column), c)
>       })*
>
>     Each node has 16 cores. However, no matter I set 16 tasks or more on
> each node, the concurrency cannot be higher than 60%, which means not every
> core on the node is computing. Then I check the running log on the WebUI,
> according to the amount of shuffle read and write in every task, I see some
> task do once matrix multiplication, some do twice while some do none.
>
>     Thus, I think of using java multi thread to increase the concurrency. I
> wrote a program in scala which calls java multi thread without Spark on a
> single node, by watch the 'top' monitor, I find this program can use CPU up
> to 1500% ( means nearly every core are computing). But I have no idea how
> to
> use Java multi thread in RDD transformation.
>
>     Is there any one can provide some example code to use Java multi thread
> in RDD transformation, or give any idea to increase the concurrency ?
>
> Thanks for all
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to