This does sound like a good use case for that feature. Note that Spark 2.2. adds a similar [flat]MapGroupsWithState operation to structured streaming. Stay tuned for a blog post on that!
On Thu, Jun 29, 2017 at 6:11 PM, kant kodali <kanth...@gmail.com> wrote: > Is mapWithState an answer for this ? https://databricks.com/blog/ > 2016/02/01/faster-stateful-stream-processing-in-apache- > spark-streaming.html > > On Thu, Jun 29, 2017 at 11:55 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> Here is a problem and I am wondering if Spark Streaming is the right tool >> for this ? >> >> I have stream of messages m1, m2, m3....and each of those messages can be >> in state s1, s2, s3,....sn (you can imagine the number of states are about >> 100) and I want to compute some metrics that visit all the states from s1 >> to sn but these state transitions can happen at indefinite amount of >> time. A simple example of that would be count all messages that visited >> state s1, s2, s3. Other words, the transition function should know that say >> message m1 had visited state s1 and s2 but not s3 yet and once the message >> m1 visits s3 increment the counter +=1 . >> >> If it makes anything easier I can say a message has to visit s1 before >> visiting s2 and s2 before visiting s3 and so on but would like to know both >> with and without order. >> >> Thanks! >> >> >