Re: [Spark Streaming] How to do join two messages in spark streaming(Probabaly messasges are in differnet RDD) ?

Tathagata Das Tue, 06 Dec 2016 22:17:15 -0800

This sounds like something you can solve by a stateful operator. check out
mapWithState. If both the message can be keyed with a common key, then you
can define a keyed-state. the state will have a field for the first
message.When you see the first message for a key, fill the first field with
timestamp, etc. Then when the second message of the same key arrives, Spark
Streaming will ensure that it calls your state update function with old
state (i.e. first message filled up) and you can take the time difference.


Check out my blog -
https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html

On Tue, Dec 6, 2016 at 5:50 PM, sancheng <sanchuanch...@gmail.com> wrote:

> any valuable feedback is appreciated!
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-Streaming-How-to-do-join-two-
> messages-in-spark-streaming-Probabaly-messasges-are-in-
> differnet--tp28161p28163.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: [Spark Streaming] How to do join two messages in spark streaming(Probabaly messasges are in differnet RDD) ?

Reply via email to