For direct stream questions: https://github.com/koeninger/kafka-exactly-once
Yes, it is used in production. For general spark streaming question: http://spark.apache.org/docs/latest/streaming-programming-guide.html On Mon, Aug 10, 2015 at 7:51 AM, Mohit Durgapal <durgapalmo...@gmail.com> wrote: > Hi All, > > I just wanted to know how does directAPI for spark streaming compare with > earlier receivers based API. Has anyone used directAPI based approach on > production or is it still being used for pocs? > > Also, since I'm new to spark, could anyone share a starting point from > where I could find a working code for both of the above APIs? > > Also, in my use case I want to analyse a data stream(comma separated > string) & aggregate over certain fields based on their types. Ideally I > would like to push that aggregated data to a column family based > datastore(like HBase, we are using it currently). But my first I'd like to > find out how to aggregate that data and how does streaming work, whether It > polls & fetches data in batches or does it continuously listen to the kafka > queue for any new message. And how can I configure my application for > either cases. I hope my questions make sense. > > > Regards > Mohit >