[ 
https://issues.apache.org/jira/browse/HDDS-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDDS-8128:
-----------------------------
    Component/s: db
                     (was: OM)
    Description: 
In a multipart upload test, the key "testKey" had 1000-parts with 8KB each.  
The same key was uploaded 10 times sequentially (i.e. it overwrote the previous 
upload) in a newly formatted cluster.  The replication was 3, so the total raw 
size of the key is ~ 24 MB.  After the test has completed, OM rocks db uses ~ 
7.5 GB.

In this JIRA, we add a cache to RDBBatchOperation for deduplication.  Within a 
batch, the put-ops and delete-ops of the same key can be safely deduplicated.  
Only the last op has to be applied to the db.  All the previous ops can be 
discarded.

  was:In a multipart upload test, the key "testKey" had 1000-parts with 8KB 
each.  The same key was uploaded 10 times sequentially (i.e. it overwrote the 
previous upload) in a newly formatted cluster.  The replication was 3, so the 
total raw size of the key is ~ 24 MB.  After the test has completed, OM rocks 
db uses ~ 7.5 GB.

        Summary: Deduplicate the ops in RDBBatchOperation  (was: OM rocksdb 
uses a lot of space)

In this JIRA, we will focus on RDBBatchOperation deduplication, where 
RDBBatchOperation is a utility class used everywhere including OM, SCM, DN, etc.

Filed HDDS-8238 for some further works specific OM.

> Deduplicate the ops in RDBBatchOperation
> ----------------------------------------
>
>                 Key: HDDS-8128
>                 URL: https://issues.apache.org/jira/browse/HDDS-8128
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: db
>            Reporter: Tsz-wo Sze
>            Assignee: Tsz-wo Sze
>            Priority: Blocker
>              Labels: pull-request-available
>
> In a multipart upload test, the key "testKey" had 1000-parts with 8KB each.  
> The same key was uploaded 10 times sequentially (i.e. it overwrote the 
> previous upload) in a newly formatted cluster.  The replication was 3, so the 
> total raw size of the key is ~ 24 MB.  After the test has completed, OM rocks 
> db uses ~ 7.5 GB.
> In this JIRA, we add a cache to RDBBatchOperation for deduplication.  Within 
> a batch, the put-ops and delete-ops of the same key can be safely 
> deduplicated.  Only the last op has to be applied to the db.  All the 
> previous ops can be discarded.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to