Re: Spark Streaming with long batch / window duration

2017-07-28 Thread emceemouli
Thanks. If i not use Window and choose to use Streaming the data on to HDFS, could you suggest how to only store 1 week worth of data. Should i create a cron job to delete HDFS files older than a week. PLease let me know if you have any other suggestions -- View this message in context:

Re: Spark Streaming with long batch / window duration

2014-07-21 Thread aaronjosephs
So I think I may end up using hourglass (https://engineering.linkedin.com/datafu/datafus-hourglass-incremental-data-processing-hadoop) a hadoop framework for incremental data processing, it would be very cool if spark (not streaming ) could support something like this -- View this message in

Re: Spark Streaming with long batch / window duration

2014-07-18 Thread Tathagata Das
If you want to process data that spans across weeks, then it best to use a dedicated data store (file system, sql / nosql database, etc.) that is designed for long term data storage and retrieval. Spark Streaming is not designed as a long term data store. Also it does not seem like you need low

Re: Spark Streaming with long batch / window duration

2014-07-18 Thread aaronjosephs
Unfortunately for reasons I won't go into my options for what I can use are limited, it was more of a curiosity to see if spark could handle a use case like this since the functionality I wanted fit perfectly into the reduceByKeyAndWindow frame of thinking. Anyway thanks for answering. -- View