This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new b14fc2d  [SPARK-37224][SS][FOLLOWUP] Clarify the guide doc and fix the 
method doc
b14fc2d is described below

commit b14fc2d018dc948dc48579c88748d8af34d549e2
Author: Jungtaek Lim <kabhwan.opensou...@gmail.com>
AuthorDate: Fri Nov 19 11:00:35 2021 +0900

    [SPARK-37224][SS][FOLLOWUP] Clarify the guide doc and fix the method doc
    
    ### What changes were proposed in this pull request?
    
    This PR is a follow-up of #34502 to address post-reviews.
    
    This PR rewords on the explanation on performance tune on RocksDB state 
store to make it less confused, and also fix the method docs to be in sync with 
the code changes.
    
    ### Why are the changes needed?
    
    1. The explanation on performance tune on RocksDB state store was unclear 
in a couple of spots.
    2. We changed the method signature, but the change was not reflected to the 
method doc.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, end users will get less confused from the explanation on performance 
tune on RocksDB state store.
    
    ### How was this patch tested?
    
    N/A
    
    Closes #34652 from HeartSaVioR/SPARK-37224-follow-up-postreview.
    
    Authored-by: Jungtaek Lim <kabhwan.opensou...@gmail.com>
    Signed-off-by: Jungtaek Lim <kabhwan.opensou...@gmail.com>
---
 docs/structured-streaming-programming-guide.md                       | 5 +++--
 .../org/apache/spark/sql/execution/streaming/state/RocksDB.scala     | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 9547d46..a53adde 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1965,9 +1965,10 @@ Here are the configs regarding to RocksDB instance of 
the state store provider:
 
 ##### Performance-aspect considerations
 
-1. For write-heavy workloads, you may want to disable the track of total 
number of rows.
+1. You may want to disable the track of total number of rows to aim the better 
performance on RocksDB state store.
+
+Tracking the number of rows brings additional lookup on write operations - 
you're encouraged to try turning off the config on tuning RocksDB state store, 
especially the values of metrics for state operator are big - `numRowsUpdated`, 
`numRowsRemoved`.
 
-Tracking the number of rows brings additional lookup on write operations - for 
heavy-write workloads you're encouraged to turn off the config.
 You can change the config during restarting the query, which enables you to 
change the trade-off decision on "observability vs performance".
 If the config is disabled, the number of rows in state (`numTotalStateRows`) 
will be reported as 0.
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
index cb31945..ea25342 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
@@ -153,7 +153,7 @@ class RocksDB(
   }
 
   /**
-   * Put the given value for the given key and return the last written value.
+   * Put the given value for the given key.
    * @note This update is not committed to disk until commit() is called.
    */
   def put(key: Array[Byte], value: Array[Byte]): Unit = {
@@ -167,7 +167,7 @@ class RocksDB(
   }
 
   /**
-   * Remove the key if present, and return the previous value if it was 
present (null otherwise).
+   * Remove the key if present.
    * @note This update is not committed to disk until commit() is called.
    */
   def remove(key: Array[Byte]): Unit = {

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to