The result of updatestatebykey is a dstream that emits the entire state every 
batch - as an RDD - nothing special about it.

It easy to join / cogroup with another RDD if you have the correct keys in both.
You could load this one when the job starts and/or have it update with 
updatestatebykey as well, based on streaming updates from cassandra.

Sent from my iPhone

> On 22 Oct 2015, at 12:54, Arttii <a.topch...@reply.de> wrote:
> 
> Hi,
> 
> So I am working on a usecase, where Clients are walking in and out of
> geofences and sendingmessages based on that.
> I currently have some in Memory Broadcast vars to do certain lookups for
> client and geofence info, some of this is also coming from Cassandra.
> My current quandry is that I need to support the case where a user comes in
> and out of geofence and also track how many messages have already been sent
> and do some logic based on that.
> 
> My stream is basically a bunch  of jsons
> {
> member:""
> beacon
> state:"exit","enter"
> }
> 
> 
> This information is invalidated at certain timesteps say messages a day and
> geofence every few minutes. Frist I thought if broadcast vars are good for
> this, but this gets updated a bunch so i do not think I can peridically
> rebroadcast these from the driver.
> 
> So I was thinking this might be a perfect case for UpdateStateByKey as I can
> kinda track what is going
> and also track the time inside the values and return Nones to "pop" things.
> 
> Currently I cannot wrap my head around on how to use this stream in
> conjuction with some other info that is coming in "Dstreams" "Rdds". All the
> example for UpdateStatebyKey are basically doing something to a stream
> updateStateBykey and then foreaching over it and persisting in a store. I
> dont think writing and reading from cassandra on every batch to get this
> info is a good idea, because I might get stale info.
> 
> Is this a valid case or am I missing the point and usecase of this function?
> 
> Thanks,
> Artyom
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-StreamingStatefull-information-tp25160.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to