Jungtaek Lim created SPARK-37224: ------------------------------------ Summary: Optimize write path on RocksDB state store provider Key: SPARK-37224 URL: https://issues.apache.org/jira/browse/SPARK-37224 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Jungtaek Lim
We figured out that RocksDB class does additional lookup on the key for write operations (put/delete) to track the number of rows. This is required to fulfill the metric of the state store, but after benchmarking it turns out performance hit is significant. We can't find a good alternative to retain the number of rows without additional lookup, so we are proposing a new config to flag tracking the number of rows. * *config name: spark.sql.streaming.stateStore.rocksdb.trackTotalNumberOfRows* * *default value: true* (since we already serve the number and we want to avoid breaking change) *We will give "0" for the number of keys in the state store metric when the config is turned off.* The ideal value seems to be a negative one, but currently SQL metric doesn't allow negative value and there seems to be some technical/historical issue not to. *We will also handle the case the config is flipped during restart* - this will enable the way end users enjoy the benefit but also not lose the chance to know the number of state rows. End users can turn off the flag to maximize the performance, and turn on the flag (restart required) when they want to see the actual number of keys (for observability/debugging/etc). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org