[ https://issues.apache.org/jira/browse/SPARK-37224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446142#comment-17446142 ]
Apache Spark commented on SPARK-37224: -------------------------------------- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/34652 > Optimize write path on RocksDB state store provider > --------------------------------------------------- > > Key: SPARK-37224 > URL: https://issues.apache.org/jira/browse/SPARK-37224 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.3.0 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Major > Fix For: 3.3.0 > > > We figured out that RocksDB class does additional lookup on the key for write > operations (put/delete) to track the number of rows. This is required to > fulfill the metric of the state store, but after benchmarking it turns out > performance hit is significant. > We can't find a good alternative to retain the number of rows without > additional lookup, so we are proposing a new config to flag tracking the > number of rows. > * *config name: > spark.sql.streaming.stateStore.rocksdb.trackTotalNumberOfRows* > * *default value: true* (since we already serve the number and we want to > avoid breaking change) > *We will give "0" for the number of keys in the state store metric when the > config is turned off.* The ideal value seems to be a negative one, but > currently SQL metric doesn't allow negative value and there seems to be some > technical/historical issue not to. > *We will also handle the case the config is flipped during restart* - this > will enable the way end users enjoy the benefit but also not lose the chance > to know the number of state rows. End users can turn off the flag to maximize > the performance, and turn on the flag (restart required) when they want to > see the actual number of keys (for observability/debugging/etc). -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org