My Detail test process:
1.       In initialization, it will create 100 string RDDs and distribute
them in spark workers.
for (int i = 1; i <= numOfRDDs; i++) {
                        JavaRDD<String> rddData =
sc.parallelize(Arrays.asList(Integer.toString(i))).coalesce(1);
                        rddData.cache().count();
                        simpleRDDs.put(Integer.toString(i), rddData);
                }
2.       In Jmeter, configure 100 threads and loop 100 times, each thread
will send the get method use its number as RDDId:

3.       This function simply return the RDD string, note: the dictionary
simpleRDDs is initialized at first with 100 RDDs.
       public static String simpleRDDTest(String keyOfRDD) {
                JavaRDD<String> rddData = simpleRDDs.get(keyOfRDD);
                return rddData.first();
        }
 
4.       Test three cases for different number of workers:
During the test, I run several times to get the stable throughput. 
The throughput in three cases vary between 85-95/sec. There is no
significantly difference between different worker number.
5.       I think this result means even if there is no calculation, the
through put has a limitation because spark job initialization and dispatch.
Add more workers can’t help improve this situation. Is anyone can explain
this?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-spark-take-so-much-time-for-simple-task-without-calculation-tp27628p27656.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to