My Detail test process: 1. In initialization, it will create 100 string RDDs and distribute them in spark workers. for (int i = 1; i <= numOfRDDs; i++) { JavaRDD<String> rddData = sc.parallelize(Arrays.asList(Integer.toString(i))).coalesce(1); rddData.cache().count(); simpleRDDs.put(Integer.toString(i), rddData); } 2. In Jmeter, configure 100 threads and loop 100 times, each thread will send the get method use its number as RDDId:
3. This function simply return the RDD string, note: the dictionary simpleRDDs is initialized at first with 100 RDDs. public static String simpleRDDTest(String keyOfRDD) { JavaRDD<String> rddData = simpleRDDs.get(keyOfRDD); return rddData.first(); } 4. Test three cases for different number of workers: During the test, I run several times to get the stable throughput. The throughput in three cases vary between 85-95/sec. There is no significantly difference between different worker number. 5. I think this result means even if there is no calculation, the through put has a limitation because spark job initialization and dispatch. Add more workers can’t help improve this situation. Is anyone can explain this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-spark-take-so-much-time-for-simple-task-without-calculation-tp27628p27656.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org