[ 
https://issues.apache.org/jira/browse/SPARK-37224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37224:
------------------------------------

    Assignee:     (was: Apache Spark)

> Optimize write path on RocksDB state store provider
> ---------------------------------------------------
>
>                 Key: SPARK-37224
>                 URL: https://issues.apache.org/jira/browse/SPARK-37224
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.3.0
>            Reporter: Jungtaek Lim
>            Priority: Major
>
> We figured out that RocksDB class does additional lookup on the key for write 
> operations (put/delete) to track the number of rows. This is required to 
> fulfill the metric of the state store, but after benchmarking it turns out 
> performance hit is significant.
> We can't find a good alternative to retain the number of rows without 
> additional lookup, so we are proposing a new config to flag tracking the 
> number of rows.
>  * *config name: 
> spark.sql.streaming.stateStore.rocksdb.trackTotalNumberOfRows*
>  * *default value: true* (since we already serve the number and we want to 
> avoid breaking change)
> *We will give "0" for the number of keys in the state store metric when the 
> config is turned off.* The ideal value seems to be a negative one, but 
> currently SQL metric doesn't allow negative value and there seems to be some 
> technical/historical issue not to.
> *We will also handle the case the config is flipped during restart* - this 
> will enable the way end users enjoy the benefit but also not lose the chance 
> to know the number of state rows. End users can turn off the flag to maximize 
> the performance, and turn on the flag (restart required) when they want to 
> see the actual number of keys (for observability/debugging/etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to