Hi TD, regarding to the performance of updateStateByKey, do you have a
JIRA for that so we can watch it? Thank you!
From: Tathagata Das [mailto:t...@databricks.com]
Sent: Wednesday, April 15, 2015 8:09 AM
To: Krzysztof Zarzycki
Cc: user
Subject: Re: Is it
Can you clarify more on what you want to do after querying? Is the batch
not completed until the querying and subsequent processing has completed?
On Tue, Apr 14, 2015 at 10:36 PM, Krzysztof Zarzycki k.zarzy...@gmail.com
wrote:
Thank you Tathagata, very helpful answer.
Though, I would like
Fundamentally, stream processing systems are designed for processing
streams of data, not for storing large volumes of data for a long period of
time. So if you have to maintain that much state for months, then its best
to use another system that is designed for long term storage (like
Cassandra)
Thank you Tathagata, very helpful answer.
Though, I would like to highlight that recent stream processing systems are
trying to help users in implementing use case of holding such large (like 2
months of data) states. I would mention here Samza state management