[ https://issues.apache.org/jira/browse/HDDS-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz-wo Sze updated HDDS-8128: ----------------------------- Component/s: db (was: OM) Description: In a multipart upload test, the key "testKey" had 1000-parts with 8KB each. The same key was uploaded 10 times sequentially (i.e. it overwrote the previous upload) in a newly formatted cluster. The replication was 3, so the total raw size of the key is ~ 24 MB. After the test has completed, OM rocks db uses ~ 7.5 GB. In this JIRA, we add a cache to RDBBatchOperation for deduplication. Within a batch, the put-ops and delete-ops of the same key can be safely deduplicated. Only the last op has to be applied to the db. All the previous ops can be discarded. was:In a multipart upload test, the key "testKey" had 1000-parts with 8KB each. The same key was uploaded 10 times sequentially (i.e. it overwrote the previous upload) in a newly formatted cluster. The replication was 3, so the total raw size of the key is ~ 24 MB. After the test has completed, OM rocks db uses ~ 7.5 GB. Summary: Deduplicate the ops in RDBBatchOperation (was: OM rocksdb uses a lot of space) In this JIRA, we will focus on RDBBatchOperation deduplication, where RDBBatchOperation is a utility class used everywhere including OM, SCM, DN, etc. Filed HDDS-8238 for some further works specific OM. > Deduplicate the ops in RDBBatchOperation > ---------------------------------------- > > Key: HDDS-8128 > URL: https://issues.apache.org/jira/browse/HDDS-8128 > Project: Apache Ozone > Issue Type: Improvement > Components: db > Reporter: Tsz-wo Sze > Assignee: Tsz-wo Sze > Priority: Blocker > Labels: pull-request-available > > In a multipart upload test, the key "testKey" had 1000-parts with 8KB each. > The same key was uploaded 10 times sequentially (i.e. it overwrote the > previous upload) in a newly formatted cluster. The replication was 3, so the > total raw size of the key is ~ 24 MB. After the test has completed, OM rocks > db uses ~ 7.5 GB. > In this JIRA, we add a cache to RDBBatchOperation for deduplication. Within > a batch, the put-ops and delete-ops of the same key can be safely > deduplicated. Only the last op has to be applied to the db. All the > previous ops can be discarded. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org