Hi All, I just wanted to know how does directAPI for spark streaming compare with earlier receivers based API. Has anyone used directAPI based approach on production or is it still being used for pocs?
Also, since I'm new to spark, could anyone share a starting point from where I could find a working code for both of the above APIs? Also, in my use case I want to analyse a data stream(comma separated string) & aggregate over certain fields based on their types. Ideally I would like to push that aggregated data to a column family based datastore(like HBase, we are using it currently). But my first I'd like to find out how to aggregate that data and how does streaming work, whether It polls & fetches data in batches or does it continuously listen to the kafka queue for any new message. And how can I configure my application for either cases. I hope my questions make sense. Regards Mohit