R: Spark Streaming with Kafka

2015-01-18 Thread Eduardo Alfaia
I have the same issue.

- Messaggio originale -
Da: Rasika Pohankar rasikapohan...@gmail.com
Inviato: ‎18/‎01/‎2015 18:48
A: user@spark.apache.org user@spark.apache.org
Oggetto: Spark Streaming with Kafka

I am using Spark Streaming to process data received through Kafka. The Spark 
version is 1.2.0. I have written the code in Java and am compiling it using 
sbt. The program runs and receives data from Kafka and processes it as well. 
But it stops receiving data suddenly after some time( it has run for an hour up 
till now while receiving data from Kafka and then always stopped receiving). 
The program continues to run, it only stops receiving data. After a while, 
sometimes it starts and sometimes doesn't. So I stop the program and start 
again.

Earlier I was using Spark 1.0.0. Upgraded to check if the problem was in that 
version. But after upgrading also, it is happening.


Is this a known issue? Can someone please help.


Thanking you.







View this message in context: Spark Streaming with Kafka
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Kestrel and Spark Stream

2014-11-18 Thread Eduardo Alfaia
Hi guys,
Has anyone already tried doing this work?

Thanks
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


R: Spark Kafka Performance

2014-11-04 Thread Eduardo Alfaia
Hi Gwen,
I have changed the java code kafkawordcount to use reducebykeyandwindow in 
spark.

- Messaggio originale -
Da: Gwen Shapira gshap...@cloudera.com
Inviato: ‎03/‎11/‎2014 21:08
A: us...@kafka.apache.org us...@kafka.apache.org
Cc: u...@spark.incubator.apache.org u...@spark.incubator.apache.org
Oggetto: Re: Spark Kafka Performance

Not sure about the throughput, but:

I mean that the words counted in spark should grow up - The spark
word-count example doesn't accumulate.
It gets an RDD every n seconds and counts the words in that RDD. So we
don't expect the count to go up.



On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it
 wrote:

 Hi Guys,
 Anyone could explain me how to work Kafka with Spark, I am using the
 JavaKafkaWordCount.java like a test and the line command is:

 ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount
 spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3

 and like a producer I am using this command:

 rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt
 -l 100 -n 10


 rdkafka_cachesender is a program that was developed by me which send to
 kafka the output.txt’s content where -l is the length of each send(upper
 bound) and -n is the lines to send in a row. Bellow is the throughput
 calculated by the program:

 File is 2235755 bytes
 throughput (b/s) = 699751388
 throughput (b/s) = 723542382
 throughput (b/s) = 662989745
 throughput (b/s) = 505028200
 throughput (b/s) = 471263416
 throughput (b/s) = 446837266
 throughput (b/s) = 409856716
 throughput (b/s) = 373994467
 throughput (b/s) = 366343097
 throughput (b/s) = 373240017
 throughput (b/s) = 386139016
 throughput (b/s) = 373802209
 throughput (b/s) = 369308515
 throughput (b/s) = 366935820
 throughput (b/s) = 365175388
 throughput (b/s) = 362175419
 throughput (b/s) = 358356633
 throughput (b/s) = 357219124
 throughput (b/s) = 352174125
 throughput (b/s) = 348313093
 throughput (b/s) = 355099099
 throughput (b/s) = 348069777
 throughput (b/s) = 348478302
 throughput (b/s) = 340404276
 throughput (b/s) = 339876031
 throughput (b/s) = 339175102
 throughput (b/s) = 327555252
 throughput (b/s) = 324272374
 throughput (b/s) = 322479222
 throughput (b/s) = 319544906
 throughput (b/s) = 317201853
 throughput (b/s) = 317351399
 throughput (b/s) = 315027978
 throughput (b/s) = 313831014
 throughput (b/s) = 310050384
 throughput (b/s) = 307654601
 throughput (b/s) = 305707061
 throughput (b/s) = 307961102
 throughput (b/s) = 296898200
 throughput (b/s) = 296409904
 throughput (b/s) = 294609332
 throughput (b/s) = 293397843
 throughput (b/s) = 293194876
 throughput (b/s) = 291724886
 throughput (b/s) = 290031314
 throughput (b/s) = 289747022
 throughput (b/s) = 289299632

 The throughput goes down after some seconds and it does not maintain the
 performance like the initial values:

 throughput (b/s) = 699751388
 throughput (b/s) = 723542382
 throughput (b/s) = 662989745

 Another question is about spark, after I have started the spark line
 command after 15 sec spark continue to repeat the words counted, but my
 program continue to send words to kafka, so I mean that the words counted
 in spark should grow up. I have attached the log from spark.

 My Case is:

 ComputerA(Kafka_cachsesender) - ComputerB(Kakfa-Brokers-Zookeeper) -
 ComputerC (Spark)

 If I don’t explain very well send a reply to me.

 Thanks Guys
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155