[
https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196076#comment-17196076
]
Stephen O'Donnell commented on HDDS-3630:
-----------------------------------------
Have a look at HDDS-4246 - it seems there is only one 8MB cache shared by all
RocksDBs related to datanode containers.
Looking at the rocksDB manual, one key memory user is the "write buffer size"
https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#write-buffer-size"
{quote}
It represents the amount of data to build up in memory (backed by an unsorted
log on disk) before converting to a sorted on-disk file. The default is 64 MB.
You need to budget for 2 x your worst case memory use. If you don't have enough
memory for this, you should reduce this value. Otherwise, it is not recommended
to change this option.
{quote}
It seems to be, this default of 64MB is setup for "high write throughput",
which is probably a usual use case for RocksDB. However for datanode
containers, I doubt rocksDB is really stressed, especially for closed
containers. What if we:
1. Reduced this value significantly - eg to 1MB?
2. Reduced it significantly for only closed containers?
There are also some other interesting Rocks DB options. You can configure a
"Write Buffer Manager" and give it a target size for all RocksDB instances /
column families related to write buffers, and then all open instances will
share this. You can also make it be part of the LRU cache:
https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager
And you can have the Index and Filter blocks cached in the LRU cache too via
the option - cache_index_and_filter_blocks.
Therefore, if we created a large shared LRU cache, use a shared Write Buffer
Manager which stored the memtables inside this LRU cache, and also cache the
Index and Filter block there - perhaps we could constrain the rocksDB memory
within reasonable bounds.
It would be good to experiment with some of these options before jumping into a
major refactor to use a single RocksDB per disk or other major changes.
> Merge rocksdb in datanode
> -------------------------
>
> Key: HDDS-3630
> URL: https://issues.apache.org/jira/browse/HDDS-3630
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Reporter: runzhiwang
> Assignee: runzhiwang
> Priority: Major
> Attachments: Merge RocksDB in Datanode-v1.pdf, Merge RocksDB in
> Datanode-v2.pdf
>
>
> Currently, one rocksdb for one container. one container has 5GB capacity.
> 10TB data need more than 2000 rocksdb in one datanode. It's difficult to
> limit the memory of 2000 rocksdb. So maybe we should limited instance of
> rocksdb for each disk.
> The design of improvement is in the follow link, but still is a draft.
> TODO:
> 1. compatibility with current logic i.e. one rocksdb for each container
> 2. measure the memory usage before and after improvement
> 3. effect on efficiency of read and write.
> https://docs.google.com/document/d/18Ybg-NjyU602c-MYXaJHP6yrg-dVMZKGyoK5C_pp1mM/edit#
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]