I install a spark standalone and run the spark cluster(one master and one worker) in a windows 2008 server with 16cores and 24GB memory.
I have done a simple test: Just create a string RDD and simply return it. I use JMeter to test throughput but the highest is around 35/sec. I think spark is powerful at distribute calculation, but why the throughput is so limit in such simple test scenario only contains simple task dispatch and no calculation? 1. In JMeter I test both 10 threads or 100 threads, there is little difference around 2-3/sec. 2. I test both cache/not cache the RDD, there is little difference. 3. During the test, the cpu and memory are in low level. Below is my test code: @RestController public class SimpleTest { @RequestMapping(value = "/SimpleTest", method = RequestMethod.GET) @ResponseBody public String testProcessTransaction() { return SparkShardTest.simpleRDDTest(); } } final static Map<String, JavaRDD<String>> simpleRDDs = initSimpleRDDs(); public static Map<String, JavaRDD<String>> initSimpleRDDs() { Map<String, JavaRDD<String>> result = new ConcurrentHashMap<String, JavaRDD<String>>(); JavaRDD<String> rddData = JavaSC.parallelize(data); rddData.cache().count(); //this cache will improve 1-2/sec result.put("MyRDD", rddData); return result; } public static String simpleRDDTest() { JavaRDD<String> rddData = simpleRDDs.get("MyRDD"); return rddData.first(); } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-spark-take-so-much-time-for-simple-task-without-calculation-tp27628.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org