Hi Gwen,
I have changed the java code kafkawordcount to use reducebykeyandwindow in
spark.
- Messaggio originale -
Da: Gwen Shapira gshap...@cloudera.com
Inviato: 03/11/2014 21:08
A: us...@kafka.apache.org us...@kafka.apache.org
Cc: u...@spark.incubator.apache.org u...@spark.incubator.apache.org
Oggetto: Re: Spark Kafka Performance
Not sure about the throughput, but:
I mean that the words counted in spark should grow up - The spark
word-count example doesn't accumulate.
It gets an RDD every n seconds and counts the words in that RDD. So we
don't expect the count to go up.
On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it
wrote:
Hi Guys,
Anyone could explain me how to work Kafka with Spark, I am using the
JavaKafkaWordCount.java like a test and the line command is:
./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount
spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3
and like a producer I am using this command:
rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt
-l 100 -n 10
rdkafka_cachesender is a program that was developed by me which send to
kafka the output.txt’s content where -l is the length of each send(upper
bound) and -n is the lines to send in a row. Bellow is the throughput
calculated by the program:
File is 2235755 bytes
throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745
throughput (b/s) = 505028200
throughput (b/s) = 471263416
throughput (b/s) = 446837266
throughput (b/s) = 409856716
throughput (b/s) = 373994467
throughput (b/s) = 366343097
throughput (b/s) = 373240017
throughput (b/s) = 386139016
throughput (b/s) = 373802209
throughput (b/s) = 369308515
throughput (b/s) = 366935820
throughput (b/s) = 365175388
throughput (b/s) = 362175419
throughput (b/s) = 358356633
throughput (b/s) = 357219124
throughput (b/s) = 352174125
throughput (b/s) = 348313093
throughput (b/s) = 355099099
throughput (b/s) = 348069777
throughput (b/s) = 348478302
throughput (b/s) = 340404276
throughput (b/s) = 339876031
throughput (b/s) = 339175102
throughput (b/s) = 327555252
throughput (b/s) = 324272374
throughput (b/s) = 322479222
throughput (b/s) = 319544906
throughput (b/s) = 317201853
throughput (b/s) = 317351399
throughput (b/s) = 315027978
throughput (b/s) = 313831014
throughput (b/s) = 310050384
throughput (b/s) = 307654601
throughput (b/s) = 305707061
throughput (b/s) = 307961102
throughput (b/s) = 296898200
throughput (b/s) = 296409904
throughput (b/s) = 294609332
throughput (b/s) = 293397843
throughput (b/s) = 293194876
throughput (b/s) = 291724886
throughput (b/s) = 290031314
throughput (b/s) = 289747022
throughput (b/s) = 289299632
The throughput goes down after some seconds and it does not maintain the
performance like the initial values:
throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745
Another question is about spark, after I have started the spark line
command after 15 sec spark continue to repeat the words counted, but my
program continue to send words to kafka, so I mean that the words counted
in spark should grow up. I have attached the log from spark.
My Case is:
ComputerA(Kafka_cachsesender) - ComputerB(Kakfa-Brokers-Zookeeper) -
ComputerC (Spark)
If I don’t explain very well send a reply to me.
Thanks Guys
--
Informativa sulla Privacy: http://www.unibs.it/node/8155
--
Informativa sulla Privacy: http://www.unibs.it/node/8155