Thank you for your reply, Actually, we have already used this parameter. Our cluster is a standalone cluster with 16 nodes, every node has 16 cores. We have 256 pairs matrices along with 256 tasks , when we set --total-executor-cores as 64, each node can launch 4 tasks simultaneously, each task can do once matrix multiplication, CPU usage is nearly 25%. If we set --total-executor-cores as 128, each node can launch 8 tasks simultaneously, but not every task do once matrix multiplication, CPU usage is nearly 35%. Then if we set --total-executor-cores as 256, each node can launch 16 tasks simultaneously, but still not every task do once matrix multiplication, some do none some do twice, CPU usage is nearly 50%.
If we can increase the concurency to increase the CPU usage, thus less running time we will cost. Hope for any solution! Qin Wei wrote > in the options of spark-submit, there are two options which may be helpful > to your problem, they are "--total-executor-cores NUM"(standalone and > mesos only), "--executor-cores"(yarn only) > > > qinwei > From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use multi > thread in RDD map function ?Hi, everyone > I come across with a problem about increasing the concurency. In a > program, after shuffle write, each node should fetch 16 pair matrices to > do > matrix multiplication. such as: > > * > import breeze.linalg.{DenseMatrix => BDM} > > pairs.map(t => { > val b1 = t._2._1.asInstanceOf[BDM[Double]] > val b2 = t._2._2.asInstanceOf[BDM[Double]] > > val c = (b1 * b2).asInstanceOf[BDM[Double]] > > (new BlockID(t._1.row, t._1.column), c) > })* > > Each node has 16 cores. However, no matter I set 16 tasks or more on > each node, the concurrency cannot be higher than 60%, which means not > every > core on the node is computing. Then I check the running log on the WebUI, > according to the amount of shuffle read and write in every task, I see > some > task do once matrix multiplication, some do twice while some do none. > > Thus, I think of using java multi thread to increase the concurrency. > I > wrote a program in scala which calls java multi thread without Spark on a > single node, by watch the 'top' monitor, I find this program can use CPU > up > to 1500% ( means nearly every core are computing). But I have no idea how > to > use Java multi thread in RDD transformation. > > Is there any one can provide some example code to use Java multi > thread > in RDD transformation, or give any idea to increase the concurrency ? > > Thanks for all > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > user-unsubscribe@.apache > For additional commands, e-mail: > user-help@.apache > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286p15290.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org