Re: spark-kafka directAPI vs receivers based API

Cody Koeninger Mon, 10 Aug 2015 06:15:59 -0700

For direct stream questions:

https://github.com/koeninger/kafka-exactly-once


Yes, it is used in production.


For general spark streaming question:

http://spark.apache.org/docs/latest/streaming-programming-guide.html


On Mon, Aug 10, 2015 at 7:51 AM, Mohit Durgapal <durgapalmo...@gmail.com>
wrote:

> Hi All,
>
> I just wanted to know how does directAPI for spark streaming compare with
> earlier receivers based API. Has anyone used directAPI based approach on
> production or is it still being used for pocs?
>
> Also, since I'm new to spark, could anyone share a starting point from
> where I could find a working code for both of the above APIs?
>
> Also, in my use case I want to analyse a data stream(comma separated
> string) & aggregate over certain fields based on their types. Ideally I
> would like to push that aggregated data to a column family based
> datastore(like HBase, we are using it currently). But my first I'd like to
> find out how to aggregate that data and how does streaming work, whether It
> polls & fetches data in batches or does it continuously listen to the kafka
> queue for any new message. And how can I configure my application for
> either cases. I hope my questions make sense.
>
>
> Regards
> Mohit
>

Re: spark-kafka directAPI vs receivers based API

Reply via email to