[ 
https://issues.apache.org/jira/browse/HDDS-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-3658:
-----------------------------
    Summary: Stop to persist container related pipeline info of each key into 
OM DB to reduce DB size  (was: Stop persist container related pipeline info of 
each key into OM DB to reduce DB size)

> Stop to persist container related pipeline info of each key into OM DB to 
> reduce DB size
> ----------------------------------------------------------------------------------------
>
>                 Key: HDDS-3658
>                 URL: https://issues.apache.org/jira/browse/HDDS-3658
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>
> An investigation result of serilized key size, RATIS with three replica.  
> Following examples are quoted from the output of the "ozone sh key info" 
> command which doesn't show related pipeline information for each key location 
> element. 
> 1.  empty key,  serilized size 113 bytes
> hadoop/bucket/user/root/terasort/10G-input-7/_SUCCESS
> {
>   "volumeName" : "hadoop",
>   "bucketName" : "bucket",
>   "name" : "user/root/terasort/10G-input-7/_SUCCESS",
>   "dataSize" : 0,
>   "creationTime" : "2019-11-21T13:53:11.330Z",
>   "modificationTime" : "2019-11-21T13:53:11.361Z",
>   "replicationType" : "RATIS",
>   "replicationFactor" : 3,
>   "ozoneKeyLocations" : [ ],
>   "metadata" : { },
>   "fileEncryptionInfo" : null
> }
> 2.  key with one chunk data, serilized size 661 bytes
> hadoop/bucket/user/root/terasort/10G-input-6/part-m-00037
> {
>   "volumeName" : "hadoop",
>   "bucketName" : "bucket",
>   "name" : "user/root/terasort/10G-input-6/part-m-00037",
>   "dataSize" : 223696200,
>   "creationTime" : "2019-11-18T07:47:58.254Z",
>   "modificationTime" : "2019-11-18T07:53:52.066Z",
>   "replicationType" : "RATIS",
>   "replicationFactor" : 3,
>   "ozoneKeyLocations" : [ {
>     "containerID" : 7,
>     "localID" : 103157811003588713,
>     "length" : 223696200,
>     "offset" : 0
>   } ],
>   "metadata" : { },
>   "fileEncryptionInfo" : null
> }
> 3. key with two chunk data, serilized size 1205 bytes,
> ozone sh key info hadoop/bucket/user/root/terasort/10G-input-7/part-m-00027
> {
>   "volumeName" : "hadoop",
>   "bucketName" : "bucket",
>   "name" : "user/root/terasort/10G-input-7/part-m-00027",
>   "dataSize" : 223696200,
>   "creationTime" : "2019-11-21T13:47:07.653Z",
>   "modificationTime" : "2019-11-21T13:53:07.964Z",
>   "replicationType" : "RATIS",
>   "replicationFactor" : 3,
>   "ozoneKeyLocations" : [ {
>     "containerID" : 221,
>     "localID" : 103176210196201501,
>     "length" : 134217728,
>     "offset" : 0
>   }, {
>     "containerID" : 222,
>     "localID" : 103176231767375926,
>     "length" : 89478472,
>     "offset" : 0
>   } ],
>   "metadata" : { },
>   "fileEncryptionInfo" : null
> }
> When client reads a key, there is "refreshPipeline" option to control whether 
> to get the up-to-date container location infofrom SCM. 
> Currently, this option is always set to true, which makes  saved container 
> location info in OM DB useless. 
> Another motivation is when using Nanda's tool for the OM performance test,  
> with 1000 millions(1Billion) keys, each key with 1 replica, 2 chunk meta 
> data, the total rocks DB directory size is 65.5GB.  One of our customer 
> cluster has the requirement to save 10 Billion objects.  In this case ,the DB 
> size is approximately (65.5GB * 10 * /2 * 3 )~ 1TB. 
> The goal of this task is going to discard the container location info when 
> persist key to OM DB to save the DB space.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to