Thanks. If i not use Window and choose to use Streaming the data on to HDFS,
could you suggest how to only store 1 week worth of data. Should i create a
cron job to delete HDFS files older than a week. PLease let me know if you
have any other suggestions
--
View this message in context:
So I think I may end up using hourglass
(https://engineering.linkedin.com/datafu/datafus-hourglass-incremental-data-processing-hadoop)
a hadoop framework for incremental data processing, it would be very cool if
spark (not streaming ) could support something like this
--
View this message in
If you want to process data that spans across weeks, then it best to use a
dedicated data store (file system, sql / nosql database, etc.) that is
designed for long term data storage and retrieval. Spark Streaming is not
designed as a long term data store. Also it does not seem like you need low
Unfortunately for reasons I won't go into my options for what I can use are
limited, it was more of a curiosity to see if spark could handle a use case
like this since the functionality I wanted fit perfectly into the
reduceByKeyAndWindow frame of thinking. Anyway thanks for answering.
--
View