my aim of setting task number is to increase the query speed, and I have also found " mapPartitionsWithIndex at Operator.scala:333<http://192.168.1.101:4040/stages/stage?id=17>" is costing much time. so, my another question is : how to tunning mapPartitionsWithIndex<http://192.168.1.101:4040/stages/stage?id=17> to make the costing time down?
2014-05-22 18:09 GMT+08:00 qingyang li <liqingyang1...@gmail.com>: > i have added SPARK_JAVA_OPTS+="-Dspark. > default.parallelism=40 " in shark-env.sh, > but i find there are only10 tasks on the cluster and 2 tasks each machine. > > > 2014-05-22 18:07 GMT+08:00 qingyang li <liqingyang1...@gmail.com>: > > i have added SPARK_JAVA_OPTS+="-Dspark.default.parallelism=40 " in >> shark-env.sh >> >> >> 2014-05-22 17:50 GMT+08:00 qingyang li <liqingyang1...@gmail.com>: >> >> i am using tachyon as storage system and using to shark to query a table >>> which is a bigtable, i have 5 machines as a spark cluster, there are 4 >>> cores on each machine . >>> My question is: >>> 1. how to set task number on each core? >>> 2. where to see how many partitions of one RDD? >>> >> >> >