Hi all,

As I understand from docs and talks, the streaming state is in memory as
RDD (optionally checkpointable to disk). SPARK-2629 hints that this in
memory structure is not indexed efficiently?

I am wondering how my performance would be if the streaming state does not
fit in memory (say 100GB state over 10GB total RAM), and I did random
updates to different keys via updateStateByKey? (Would throwing in SSDs
help out).

I am picturing some kind of performance degeneration would happen akin to
Linux/innoDB Buffer cache thrashing. But if someone can demystify this,
that would be awesome.

Thanks
Vinoth

Reply via email to