Hi Gwen, I have changed the java code kafkawordcount to use reducebykeyandwindow in spark.
----- Messaggio originale ----- Da: "Gwen Shapira" <gshap...@cloudera.com> Inviato: 03/11/2014 21:08 A: "us...@kafka.apache.org" <us...@kafka.apache.org> Cc: "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org> Oggetto: Re: Spark Kafka Performance Not sure about the throughput, but: "I mean that the words counted in spark should grow up" - The spark word-count example doesn't accumulate. It gets an RDD every n seconds and counts the words in that RDD. So we don't expect the count to go up. On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Alfaia <e.costaalf...@unibs.it > wrote: > Hi Guys, > Anyone could explain me how to work Kafka with Spark, I am using the > JavaKafkaWordCount.java like a test and the line command is: > > ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount > spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3 > > and like a producer I am using this command: > > rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt > -l 100 -n 10 > > > rdkafka_cachesender is a program that was developed by me which send to > kafka the output.txt’s content where -l is the length of each send(upper > bound) and -n is the lines to send in a row. Bellow is the throughput > calculated by the program: > > File is 2235755 bytes > throughput (b/s) = 699751388 > throughput (b/s) = 723542382 > throughput (b/s) = 662989745 > throughput (b/s) = 505028200 > throughput (b/s) = 471263416 > throughput (b/s) = 446837266 > throughput (b/s) = 409856716 > throughput (b/s) = 373994467 > throughput (b/s) = 366343097 > throughput (b/s) = 373240017 > throughput (b/s) = 386139016 > throughput (b/s) = 373802209 > throughput (b/s) = 369308515 > throughput (b/s) = 366935820 > throughput (b/s) = 365175388 > throughput (b/s) = 362175419 > throughput (b/s) = 358356633 > throughput (b/s) = 357219124 > throughput (b/s) = 352174125 > throughput (b/s) = 348313093 > throughput (b/s) = 355099099 > throughput (b/s) = 348069777 > throughput (b/s) = 348478302 > throughput (b/s) = 340404276 > throughput (b/s) = 339876031 > throughput (b/s) = 339175102 > throughput (b/s) = 327555252 > throughput (b/s) = 324272374 > throughput (b/s) = 322479222 > throughput (b/s) = 319544906 > throughput (b/s) = 317201853 > throughput (b/s) = 317351399 > throughput (b/s) = 315027978 > throughput (b/s) = 313831014 > throughput (b/s) = 310050384 > throughput (b/s) = 307654601 > throughput (b/s) = 305707061 > throughput (b/s) = 307961102 > throughput (b/s) = 296898200 > throughput (b/s) = 296409904 > throughput (b/s) = 294609332 > throughput (b/s) = 293397843 > throughput (b/s) = 293194876 > throughput (b/s) = 291724886 > throughput (b/s) = 290031314 > throughput (b/s) = 289747022 > throughput (b/s) = 289299632 > > The throughput goes down after some seconds and it does not maintain the > performance like the initial values: > > throughput (b/s) = 699751388 > throughput (b/s) = 723542382 > throughput (b/s) = 662989745 > > Another question is about spark, after I have started the spark line > command after 15 sec spark continue to repeat the words counted, but my > program continue to send words to kafka, so I mean that the words counted > in spark should grow up. I have attached the log from spark. > > My Case is: > > ComputerA(Kafka_cachsesender) -> ComputerB(Kakfa-Brokers-Zookeeper) -> > ComputerC (Spark) > > If I don’t explain very well send a reply to me. > > Thanks Guys > -- > Informativa sulla Privacy: http://www.unibs.it/node/8155 > -- Informativa sulla Privacy: http://www.unibs.it/node/8155