Re: Create DStream consisting of HDFS and (then) Kafka data

2015-01-07 Thread Tobias Pfeiffer
Hi, On Thu, Jan 8, 2015 at 2:19 PM, rekt...@voodoowarez.com wrote: dstream processing bulk HDFS data- is something I don't feel is super well socialized yet, fingers crossed that base gets built up a little more. Just out of interest (and hoping not to hijack my own thread), why are you

Create DStream consisting of HDFS and (then) Kafka data

2015-01-07 Thread Tobias Pfeiffer
Hi, I have a setup where data from an external stream is piped into Kafka and also written to HDFS periodically for long-term storage. Now I am trying to build an application that will first process the HDFS files and then switch to Kafka, continuing with the first item that was not yet in HDFS.

Re: Create DStream consisting of HDFS and (then) Kafka data

2015-01-07 Thread rektide
I've started 1 or 2 emails to ask more broadly- what are good practices for doing DStream computations in a non-realtime fashion? I'd love to have a good intro article to pass around to people, and some resources for those others chasing this problem. Back when I was working with Storm, managing

Re: Create DStream consisting of HDFS and (then) Kafka data

2015-01-07 Thread rektide
On Thu, Jan 08, 2015 at 02:33:30PM +0900, Tobias Pfeiffer wrote: Hi, On Thu, Jan 8, 2015 at 2:19 PM, rekt...@voodoowarez.com wrote: dstream processing bulk HDFS data- is something I don't feel is super well socialized yet, fingers crossed that base gets built up a little more.