Re: Publishing a transformed DStream to Kafka

2014-11-30 Thread Josh J
Is there a way to do this that preserves exactly once semantics for the
write to Kafka?

On Tue, Sep 2, 2014 at 12:30 PM, Tim Smith secs...@gmail.com wrote:

 I'd be interested in finding the answer too. Right now, I do:

 val kafkaOutMsgs = kafkInMessages.map(x=myFunc(x._2,someParam))
 kafkaOutMsgs.foreachRDD((rdd,time) = { rdd.foreach(rec = {
 writer.output(rec) }) } ) //where writer.ouput is a method that takes a
 string and writer is an instance of a producer class.





 On Tue, Sep 2, 2014 at 10:12 AM, Massimiliano Tomassi 
 max.toma...@gmail.com wrote:

 Hello all,
 after having applied several transformations to a DStream I'd like to
 publish all the elements in all the resulting RDDs to Kafka. What the best
 way to do that would be? Just using DStream.foreach and then RDD.foreach ?
 Is there any other built in utility for this use case?

 Thanks a lot,
 Max

 --
 
 Massimiliano Tomassi
 
 e-mail: max.toma...@gmail.com
 





Re: Publishing a transformed DStream to Kafka

2014-11-30 Thread francois . garillot
How about writing to a buffer ? Then you would flush the buffer to Kafka if and 
only if the output operation reports successful completion. In the event of a 
worker failure, that would not happen.



—
FG

On Sun, Nov 30, 2014 at 2:28 PM, Josh J joshjd...@gmail.com wrote:

 Is there a way to do this that preserves exactly once semantics for the
 write to Kafka?
 On Tue, Sep 2, 2014 at 12:30 PM, Tim Smith secs...@gmail.com wrote:
 I'd be interested in finding the answer too. Right now, I do:

 val kafkaOutMsgs = kafkInMessages.map(x=myFunc(x._2,someParam))
 kafkaOutMsgs.foreachRDD((rdd,time) = { rdd.foreach(rec = {
 writer.output(rec) }) } ) //where writer.ouput is a method that takes a
 string and writer is an instance of a producer class.





 On Tue, Sep 2, 2014 at 10:12 AM, Massimiliano Tomassi 
 max.toma...@gmail.com wrote:

 Hello all,
 after having applied several transformations to a DStream I'd like to
 publish all the elements in all the resulting RDDs to Kafka. What the best
 way to do that would be? Just using DStream.foreach and then RDD.foreach ?
 Is there any other built in utility for this use case?

 Thanks a lot,
 Max

 --
 
 Massimiliano Tomassi
 
 e-mail: max.toma...@gmail.com
 




Re: Publishing a transformed DStream to Kafka

2014-09-02 Thread Tim Smith
I'd be interested in finding the answer too. Right now, I do:

val kafkaOutMsgs = kafkInMessages.map(x=myFunc(x._2,someParam))
kafkaOutMsgs.foreachRDD((rdd,time) = { rdd.foreach(rec = {
writer.output(rec) }) } ) //where writer.ouput is a method that takes a
string and writer is an instance of a producer class.





On Tue, Sep 2, 2014 at 10:12 AM, Massimiliano Tomassi max.toma...@gmail.com
 wrote:

 Hello all,
 after having applied several transformations to a DStream I'd like to
 publish all the elements in all the resulting RDDs to Kafka. What the best
 way to do that would be? Just using DStream.foreach and then RDD.foreach ?
 Is there any other built in utility for this use case?

 Thanks a lot,
 Max

 --
 
 Massimiliano Tomassi
 
 e-mail: max.toma...@gmail.com