For now, you'd like to consider using a 3rd party implementation of RocksDB
state store (either open source implementations, or commercial one if you
use either Databricks or Qubole) if the state doesn't fit the executor
memory.
- https://github.com/chermenin/spark-states
-
Hello,
I have a use case where I need to read events(non correlated) from a kafka
topic, then correlate and push forward to another topic.
I use spark structured streaming with FlatMapGroupsWithStateFunction along
with GroupStateTimeout.ProcessingTimeTimeout() . After each timeout, I do
some