[GitHub] [spark] siying commented on a diff in pull request #41042: [SPARK-43364][SS] Add docs for RocksDB state store memory management

via GitHub Thu, 04 May 2023 11:59:58 -0700


siying commented on code in PR #41042:
URL: https://github.com/apache/spark/pull/41042#discussion_r1185399331



##########
docs/structured-streaming-programming-guide.md:
##########
@@ -2360,8 +2360,35 @@ Here are the configs regarding to RocksDB instance of 
the state store provider:
     <td>The maximum number of MemTables in RocksDB, both active and immutable. 
Value of -1 means that RocksDB internal default values will be used</td>
     <td>-1</td>
   </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.boundedMemoryUsage</td>
+    <td>Whether total memory usage for RocksDB state store instances on a 
single node is bounded.</td>
+    <td>false</td>
+  </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.maxMemoryUsageMB</td>
+    <td>Total memory limit in MB for RocksDB state store instances on a single 
node.</td>
+    <td>500</td>
+  </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.writeBufferCacheRatio</td>
+    <td>Total memory to be occupied by write buffers as a fraction of memory 
allocated across all RocksDB instances on a single node.</td>

Review Comment:
   Perhaps be more specific that the base is `maxMemoryUsageMB`, although it 
can be obvious to some users.



##########
docs/structured-streaming-programming-guide.md:
##########
@@ -2360,8 +2360,35 @@ Here are the configs regarding to RocksDB instance of 
the state store provider:
     <td>The maximum number of MemTables in RocksDB, both active and immutable. 
Value of -1 means that RocksDB internal default values will be used</td>
     <td>-1</td>
   </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.boundedMemoryUsage</td>
+    <td>Whether total memory usage for RocksDB state store instances on a 
single node is bounded.</td>
+    <td>false</td>
+  </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.maxMemoryUsageMB</td>
+    <td>Total memory limit in MB for RocksDB state store instances on a single 
node.</td>
+    <td>500</td>
+  </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.writeBufferCacheRatio</td>
+    <td>Total memory to be occupied by write buffers as a fraction of memory 
allocated across all RocksDB instances on a single node.</td>
+    <td>0.5</td>
+  </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.highPriorityPoolRatio</td>
+    <td>Total memory to be occupied by filter and index blocks as a fraction 
of memory allocated across all RocksDB instances on a single node.</td>
+    <td>0.1</td>

Review Comment:
   I didn't realize the default is 0.1. If we don't have a special study, 0.5 
is a good value to start with.



##########
docs/structured-streaming-programming-guide.md:
##########
@@ -2360,8 +2360,35 @@ Here are the configs regarding to RocksDB instance of 
the state store provider:
     <td>The maximum number of MemTables in RocksDB, both active and immutable. 
Value of -1 means that RocksDB internal default values will be used</td>
     <td>-1</td>
   </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.boundedMemoryUsage</td>
+    <td>Whether total memory usage for RocksDB state store instances on a 
single node is bounded.</td>
+    <td>false</td>
+  </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.maxMemoryUsageMB</td>
+    <td>Total memory limit in MB for RocksDB state store instances on a single 
node.</td>
+    <td>500</td>
+  </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.writeBufferCacheRatio</td>
+    <td>Total memory to be occupied by write buffers as a fraction of memory 
allocated across all RocksDB instances on a single node.</td>
+    <td>0.5</td>
+  </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.highPriorityPoolRatio</td>
+    <td>Total memory to be occupied by filter and index blocks as a fraction 
of memory allocated across all RocksDB instances on a single node.</td>

Review Comment:
   It is more than filter and index blocks. It's the high pri pool size, used 
for mid-point insertion. Blocks are first inserted to low pri pool and promote 
to high pri pool for the second access. Filter and index blocks can directly go 
to high pri pool.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] siying commented on a diff in pull request #41042: [SPARK-43364][SS] Add docs for RocksDB state store memory management

Reply via email to