Hi all, As I understand from docs and talks, the streaming state is in memory as RDD (optionally checkpointable to disk). SPARK-2629 hints that this in memory structure is not indexed efficiently?
I am wondering how my performance would be if the streaming state does not fit in memory (say 100GB state over 10GB total RAM), and I did random updates to different keys via updateStateByKey? (Would throwing in SSDs help out). I am picturing some kind of performance degeneration would happen akin to Linux/innoDB Buffer cache thrashing. But if someone can demystify this, that would be awesome. Thanks Vinoth