[jira] [Comment Edited] (HDDS-3354) OM HA replay optimization

Bharat Viswanadham (Jira) Thu, 21 May 2020 13:00:54 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113494#comment-17113494
 ]


Bharat Viswanadham edited comment on HDDS-3354 at 5/21/20, 7:59 PM:
--------------------------------------------------------------------

{quote}That's very interesting. Do you have more finding about the root cause? 
How many buckets did you have? It's very surprising to have GC pause for a few 
thousands of buckets especially as they are not frequently updated. Do we need 
to adjust something on the cache size?{quote}

Initially, the test is started with default heap size settings. Once OM is 
started with 8GB memory settings, full GC pauses are not seen. As said we 
identified few areas for improvements like HDDS-3615 and HDDS-3623

Bucket count is not in order of thousands, it is 14 million buckets.
Command used in the test is 
{{$ bin/ozone freon ombg -n=1000000}}

Even during cache design, we thought in a single OM cluster, we shall have 1000 
of volumes, and each volume will have 1000 of buckets and will have billions of 
key in a cluster. As volume/bucket exist checks are done for each request, we 
have decided to keep bucket/volume cache in memory whole time and as a full 
cache. If we want to revisit this decision. But this is not related to this 
Jira. (This came out as part of testing when creating a million buckets to test 
this Jira)
Snippet from Cache design doc attached to HDDS-505


{noformat}
Memory Usage:
As discussed above for Volume and Bucket Table we store full table information 
in memory. This will help in validation of the requests very quickly. As for 
every request Ozone Manager receives the mandatory check is volume/bucket 
exists or not. 

On a typical Ozone cluster Volumes can be in number of thousands. (Considering 
this as an admin level operation in a system where each team/organization gets 
a volume for their usage). And for each volume we can expect 1000 to 10000 
buckets. These are considered just for calculation purpose.

Let’s assume each VolumeInfo and BucketInfo structure consumes 1KB in memory. 
Then,

Volume cache memory usage can be 1000 * 1KB = 10 MB. 
Bucket cache memory usage can be 1000 * 1000 * 1KB  = 1GB.

We can make the Volume and BucketTable caches partial if the number of buckets 
and volumes are very high in the system. This can be given as an option to end 
user. For now we assume that the entire list of volumes and buckets can be 
safely cached in memory.
{noformat}






was (Author: bharatviswa):
{quote}That's very interesting. Do you have more finding about the root cause? 
How many buckets did you have? It's very surprising to have GC pause for a few 
thousands of buckets especially as they are not frequently updated. Do we need 
to adjust something on the cache size?{quote}

Initially, the test is started with default heap size settings. Once OM is 
started with 8GB memory settings, no issues are seen.

Bucket count is not in order of thousands, it is 14 million buckets.
Command used in the test is 
{{$ bin/ozone freon ombg -n=1000000}}

Even during cache design, we thought in a single OM cluster, we shall have 1000 
of volumes, and each volume will have 1000 of buckets and will have billions of 
key in a cluster. As volume/bucket exist checks are done for each request, we 
have decided to keep bucket/volume cache in memory whole time and as a full 
cache. If we want to revisit this decision. But this is not related to this 
Jira. (This came out as part of testing when creating a million buckets to test 
this Jira)
Snippet from Cache design doc attached to HDDS-505


{noformat}
Memory Usage:
As discussed above for Volume and Bucket Table we store full table information 
in memory. This will help in validation of the requests very quickly. As for 
every request Ozone Manager receives the mandatory check is volume/bucket 
exists or not. 

On a typical Ozone cluster Volumes can be in number of thousands. (Considering 
this as an admin level operation in a system where each team/organization gets 
a volume for their usage). And for each volume we can expect 1000 to 10000 
buckets. These are considered just for calculation purpose.

Let’s assume each VolumeInfo and BucketInfo structure consumes 1KB in memory. 
Then,

Volume cache memory usage can be 1000 * 1KB = 10 MB. 
Bucket cache memory usage can be 1000 * 1000 * 1KB  = 1GB.

We can make the Volume and BucketTable caches partial if the number of buckets 
and volumes are very high in the system. This can be given as an option to end 
user. For now we assume that the entire list of volumes and buckets can be 
safely cached in memory.
{noformat}





> OM HA replay optimization
> -------------------------
>
>                 Key: HDDS-3354
>                 URL: https://issues.apache.org/jira/browse/HDDS-3354
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: Bharat Viswanadham
>            Assignee: Bharat Viswanadham
>            Priority: Major
>         Attachments: OM HA Replay.pdf, Screen Shot 2020-05-20 at 1.28.48 
> PM.png
>
>
> This Jira is to improve the OM HA replay scenario.
> Attached the design document which discusses about the proposal and issue in 
> detail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-3354) OM HA replay optimization

Reply via email to