[
https://issues.apache.org/jira/browse/HDDS-12903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-12903:
-------------------------------
Description:
Currently when OM list keys, it needs to create a RocksDB iterator which will
load both the RocksDB key and value.
For a key with a lot of blocks (for example MPU key with few hundred parts),
this can take a considerable amount of OM heap memory. These blocks is even
loaded to OM memory even when we use listKeysLight which will remove the key
block info.
One possible way to handle this is to separate the keyspace and blockspace into
two separate column families. The keyspace CF will only store the basic key
info (similar to BasicOmKeyInfo), while the blockspace CF stores the blocks
associated with the key. Therefore, during list, we will only load the
BasicOmKeyInfo which will result in lower memory overhead.
The downside is that now we have two CF to update or query for key get,
creation, update, and deletion. For get, we can use RocksDB multiget to get
both the keyspace and blockspace CF.
This is inspired by Tectonic Filesystem Namespace and Block layer
(https://www.usenix.org/system/files/fast21-pan.pdf).
was:
Currently when OM list keys, it needs to create a RocksDB iterator which will
load both the RocksDB key and value.
For a key with a lot of blocks (for example MPU key with few hundred parts),
this can take a considerable amount of OM heap memory. These blocks is even
loaded to OM memory even when we use listKeysLight which will remove the key
block info.
One possible way to handle this is to separate the keyspace and blockspace into
two separate column families. The keyspace CF will only store the basic key
info (similar to BasicOmKeyInfo), while the blockspace CF stores the blocks
associated with the key. Therefore, during list, we will only load the
BasicOmKeyInfo which will result in lower memory overhead.
The downside is that now we have two CF to update or query for key get,
creation, update, and deletion. For get, we can use RocksDB multiget to get
both the keyspace and blockspace CF.
This is inspired by Tectonic Filesystem Namespace and Block layer.
> Separate OM namespace and blockspace into separate Column Families
> ------------------------------------------------------------------
>
> Key: HDDS-12903
> URL: https://issues.apache.org/jira/browse/HDDS-12903
> Project: Apache Ozone
> Issue Type: Wish
> Components: Ozone Manager
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> Currently when OM list keys, it needs to create a RocksDB iterator which will
> load both the RocksDB key and value.
> For a key with a lot of blocks (for example MPU key with few hundred parts),
> this can take a considerable amount of OM heap memory. These blocks is even
> loaded to OM memory even when we use listKeysLight which will remove the key
> block info.
> One possible way to handle this is to separate the keyspace and blockspace
> into two separate column families. The keyspace CF will only store the basic
> key info (similar to BasicOmKeyInfo), while the blockspace CF stores the
> blocks associated with the key. Therefore, during list, we will only load the
> BasicOmKeyInfo which will result in lower memory overhead.
> The downside is that now we have two CF to update or query for key get,
> creation, update, and deletion. For get, we can use RocksDB multiget to get
> both the keyspace and blockspace CF.
> This is inspired by Tectonic Filesystem Namespace and Block layer
> (https://www.usenix.org/system/files/fast21-pan.pdf).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]