A question about streaming throughput

danilopds Tue, 14 Oct 2014 13:49:08 -0700

Hi,
I'm learning about Apache Spark Streaming and I'm doing some tests.

Now, 
I have a modified version of the app NetworkWordCount that perform a
/reduceByKeyAndWindow/ with window of 10 seconds in intervals of 5 seconds.


I'm using also the function to measure the rate of records/second like this:
/words.foreachRDD(rdd => {
        val count = rdd.count()
         println("Current rate: "+ (count/1) +" records/second")
})/

Then,
In my computer with 4 cores and 8gb (running: /"local[4]"/) I have this
average result:
Current rate: 130 000 

Running locally with my computer as /master and worker/ I have this:
Current rate: 25 000

And running in a cloud computing azure with 4 cores and 7 gb, the result is:
Current rate: 10 000

I read the  Spark Streaming paper
<http://www.eecs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf>  
and the performance evaluation to a similar application was 250 000
records/second.

To send data in the socket I'm using an application similar to this:
http://apache-spark-user-list.1001560.n3.nabble.com/streaming-code-to-simulate-a-network-socket-data-source-td3431.html#a13814

So,
Can anyone suggest me something to improve these rate?
/(I increased the memory in executor and I didn't have better results)/

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/A-question-about-streaming-throughput-tp16416.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

A question about streaming throughput

Reply via email to