subject:"spark kafka batch integration"

Re: spark kafka batch integration

2014-12-15 Thread Cody Koeninger

For an alternative take on a similar idea, see https://github.com/koeninger/spark-1/tree/kafkaRdd/external/kafka/src/main/scala/org/apache/spark/rdd/kafka An advantage of the approach I'm taking is that the lower and upper offsets of the RDD are known in advance, so it's deterministic. I

Re: spark kafka batch integration

2014-12-15 Thread Koert Kuipers

thanks! i will take a look at your code. didn't realize there was already something out there. good point about upper offsets, i will add that feature to our version as well if you dont mind. i was thinking about making it deterministic for task failure transparently (even if no upper offsets

spark kafka batch integration

2014-12-14 Thread Koert Kuipers

hello all, we at tresata wrote a library to provide for batch integration between spark and kafka (distributed write of rdd to kafa, distributed read of rdd from kafka). our main use cases are (in lambda architecture jargon): * period appends to the immutable master dataset on hdfs from kafka

Re: spark kafka batch integration

Re: spark kafka batch integration

spark kafka batch integration

3 matches

Site Navigation

Mail list logo

Footer information