siying commented on code in PR #41042: URL: https://github.com/apache/spark/pull/41042#discussion_r1185399331
########## docs/structured-streaming-programming-guide.md: ########## @@ -2360,8 +2360,35 @@ Here are the configs regarding to RocksDB instance of the state store provider: <td>The maximum number of MemTables in RocksDB, both active and immutable. Value of -1 means that RocksDB internal default values will be used</td> <td>-1</td> </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.boundedMemoryUsage</td> + <td>Whether total memory usage for RocksDB state store instances on a single node is bounded.</td> + <td>false</td> + </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.maxMemoryUsageMB</td> + <td>Total memory limit in MB for RocksDB state store instances on a single node.</td> + <td>500</td> + </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.writeBufferCacheRatio</td> + <td>Total memory to be occupied by write buffers as a fraction of memory allocated across all RocksDB instances on a single node.</td> Review Comment: Perhaps be more specific that the base is `maxMemoryUsageMB`, although it can be obvious to some users. ########## docs/structured-streaming-programming-guide.md: ########## @@ -2360,8 +2360,35 @@ Here are the configs regarding to RocksDB instance of the state store provider: <td>The maximum number of MemTables in RocksDB, both active and immutable. Value of -1 means that RocksDB internal default values will be used</td> <td>-1</td> </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.boundedMemoryUsage</td> + <td>Whether total memory usage for RocksDB state store instances on a single node is bounded.</td> + <td>false</td> + </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.maxMemoryUsageMB</td> + <td>Total memory limit in MB for RocksDB state store instances on a single node.</td> + <td>500</td> + </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.writeBufferCacheRatio</td> + <td>Total memory to be occupied by write buffers as a fraction of memory allocated across all RocksDB instances on a single node.</td> + <td>0.5</td> + </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.highPriorityPoolRatio</td> + <td>Total memory to be occupied by filter and index blocks as a fraction of memory allocated across all RocksDB instances on a single node.</td> + <td>0.1</td> Review Comment: I didn't realize the default is 0.1. If we don't have a special study, 0.5 is a good value to start with. ########## docs/structured-streaming-programming-guide.md: ########## @@ -2360,8 +2360,35 @@ Here are the configs regarding to RocksDB instance of the state store provider: <td>The maximum number of MemTables in RocksDB, both active and immutable. Value of -1 means that RocksDB internal default values will be used</td> <td>-1</td> </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.boundedMemoryUsage</td> + <td>Whether total memory usage for RocksDB state store instances on a single node is bounded.</td> + <td>false</td> + </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.maxMemoryUsageMB</td> + <td>Total memory limit in MB for RocksDB state store instances on a single node.</td> + <td>500</td> + </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.writeBufferCacheRatio</td> + <td>Total memory to be occupied by write buffers as a fraction of memory allocated across all RocksDB instances on a single node.</td> + <td>0.5</td> + </tr> + <tr> + <td>spark.sql.streaming.stateStore.rocksdb.highPriorityPoolRatio</td> + <td>Total memory to be occupied by filter and index blocks as a fraction of memory allocated across all RocksDB instances on a single node.</td> Review Comment: It is more than filter and index blocks. It's the high pri pool size, used for mid-point insertion. Blocks are first inserted to low pri pool and promote to high pri pool for the second access. Filter and index blocks can directly go to high pri pool. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org