Hi All,

I just wanted to know how does directAPI for spark streaming compare with
earlier receivers based API. Has anyone used directAPI based approach on
production or is it still being used for pocs?

Also, since I'm new to spark, could anyone share a starting point from
where I could find a working code for both of the above APIs?

Also, in my use case I want to analyse a data stream(comma separated
string) & aggregate over certain fields based on their types. Ideally I
would like to push that aggregated data to a column family based
datastore(like HBase, we are using it currently). But my first I'd like to
find out how to aggregate that data and how does streaming work, whether It
polls & fetches data in batches or does it continuously listen to the kafka
queue for any new message. And how can I configure my application for
either cases. I hope my questions make sense.


Regards
Mohit

Reply via email to