Re: How to handle spark state which is growing too big even with timeout set.

2021-02-14 Thread Jungtaek Lim
For now, you'd like to consider using a 3rd party implementation of RocksDB state store (either open source implementations, or commercial one if you use either Databricks or Qubole) if the state doesn't fit the executor memory. - https://github.com/chermenin/spark-states -

How to handle spark state which is growing too big even with timeout set.

2021-02-11 Thread Kuttaiah Robin
Hello, I have a use case where I need to read events(non correlated) from a kafka topic, then correlate and push forward to another topic. I use spark structured streaming with FlatMapGroupsWithStateFunction along with GroupStateTimeout.ProcessingTimeTimeout() . After each timeout, I do some