Hi Eric - please direct this to the user@ list. This list is for development of Spark itself.
On Sun, Apr 26, 2015 at 1:12 AM, eric wong <win19...@gmail.com> wrote: > > > > Hi developers, > > I have sent to user mail list but no response... > > When running a exprimental KMeans job for expriment, the Cached RDD is > original Points data. > > I saw poor locality in Task details from WebUI. Almost one half of the input > of task is Network instead of Memory. > > And Task with network input consumes almost the same time compare with the > task with Hadoop(Disk) input, and twice with task(Memory input). > e.g > Task(Memory): 16s > Task(Network): 9s > Task(Hadoop): 9s > > > I see fectching RDD with 30MB form remote node consumes 5 seconds in > executor logs like below: > > 15/03/31 04:08:52 INFO CoarseGrainedExecutorBackend: Got assigned task 58 > 15/03/31 04:08:52 INFO Executor: Running task 15.0 in stage 1.0 (TID 58) > 15/03/31 04:08:52 INFO HadoopRDD: Input split: > hdfs://master:8000/kmeans/data-Kmeans-5.3g:2013265920+134217728 > 15/03/31 04:08:52 INFO BlockManager: Found block rdd_3_15 locally > 15/03/31 04:08:58 INFO Executor: Finished task 15.0 in stage 1.0 (TID 58). > 1920 bytes result sent to driver > 15/03/31 04:08:58 INFO CoarseGrainedExecutorBackend: Got assigned task 60 > -----------------Task60 > 15/03/31 04:08:58 INFO Executor: Running task 17.0 in stage 1.0 (TID 60) > 15/03/31 04:08:58 INFO HadoopRDD: Input split: > hdfs://master:8000/kmeans/data-Kmeans-5.3g:2281701376+134217728 > 15/03/31 04:09:02 INFO BlockManager: Found block rdd_3_17 remotely > 15/03/31 04:09:12 INFO Executor: Finished task 17.0 in stage 1.0 (TID 60). > 1920 bytes result sent to driver > > > So > 1)is that means i should use RDD with cache(MEMORY_AND_DISK) instead of > Memory only? > > 2)And should i expand Network capacity or turn Schduling locality parameter? > i set spark.locality.wait up to 15000, but no effect seems to increase the > Memory input percentage > > Any suggestion will be appreciated. > > > ------Env info----------- > > Cluster: 4 worker, with 1 Cores and 2G executor memory > > Spark version: 1.1.0 > > Network: 30MB/s > > -----Submit shell------- > bin/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans --master > spark://master:7077 --executor-memory 1g > lib/spark-examples-1.1.0-hadoop2.3.0.jar > hdfs://master:8000/kmeans/data-Kmeans-7g 8 1 > > > Thanks very much and forgive for my poor English. > > -- > Wang Haihua > > > > > > > -- > 王海华 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org