Jungtaek Lim created SPARK-37224:
------------------------------------

             Summary: Optimize write path on RocksDB state store provider
                 Key: SPARK-37224
                 URL: https://issues.apache.org/jira/browse/SPARK-37224
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 3.3.0
            Reporter: Jungtaek Lim


We figured out that RocksDB class does additional lookup on the key for write 
operations (put/delete) to track the number of rows. This is required to 
fulfill the metric of the state store, but after benchmarking it turns out 
performance hit is significant.

We can't find a good alternative to retain the number of rows without 
additional lookup, so we are proposing a new config to flag tracking the number 
of rows.
 * *config name: spark.sql.streaming.stateStore.rocksdb.trackTotalNumberOfRows*

 * *default value: true* (since we already serve the number and we want to 
avoid breaking change)

*We will give "0" for the number of keys in the state store metric when the 
config is turned off.* The ideal value seems to be a negative one, but 
currently SQL metric doesn't allow negative value and there seems to be some 
technical/historical issue not to.

*We will also handle the case the config is flipped during restart* - this will 
enable the way end users enjoy the benefit but also not lose the chance to know 
the number of state rows. End users can turn off the flag to maximize the 
performance, and turn on the flag (restart required) when they want to see the 
actual number of keys (for observability/debugging/etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to