I'm learning about Apache Spark Streaming and I'm doing some tests.

I have a modified version of the app NetworkWordCount that perform a
/reduceByKeyAndWindow/ with window of 10 seconds in intervals of 5 seconds.

I'm using also the function to measure the rate of records/second like this:
/words.foreachRDD(rdd => {
        val count = rdd.count()
         println("Current rate: "+ (count/1) +" records/second")

In my computer with 4 cores and 8gb (running: /"local[4]"/) I have this
average result:
Current rate: 130 000 

Running locally with my computer as /master and worker/ I have this:
Current rate: 25 000

And running in a cloud computing azure with 4 cores and 7 gb, the result is:
Current rate: 10 000

I read the  Spark Streaming paper
and the performance evaluation to a similar application was 250 000

To send data in the socket I'm using an application similar to this:

Can anyone suggest me something to improve these rate?
/(I increased the memory in executor and I didn't have better results)/


View this message in context: 
Sent from the Apache Spark User List mailing list archive at Nabble.com.

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to