Hi, I'm learning about Apache Spark Streaming and I'm doing some tests. Now, I have a modified version of the app NetworkWordCount that perform a /reduceByKeyAndWindow/ with window of 10 seconds in intervals of 5 seconds.
I'm using also the function to measure the rate of records/second like this: /words.foreachRDD(rdd => { val count = rdd.count() println("Current rate: "+ (count/1) +" records/second") })/ Then, In my computer with 4 cores and 8gb (running: /"local[4]"/) I have this average result: Current rate: 130 000 Running locally with my computer as /master and worker/ I have this: Current rate: 25 000 And running in a cloud computing azure with 4 cores and 7 gb, the result is: Current rate: 10 000 I read the Spark Streaming paper <http://www.eecs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf> and the performance evaluation to a similar application was 250 000 records/second. To send data in the socket I'm using an application similar to this: http://apache-spark-user-list.1001560.n3.nabble.com/streaming-code-to-simulate-a-network-socket-data-source-td3431.html#a13814 So, Can anyone suggest me something to improve these rate? /(I increased the memory in executor and I didn't have better results)/ Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-question-about-streaming-throughput-tp16416.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org