[ https://issues.apache.org/jira/browse/HDDS-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113494#comment-17113494 ]
Bharat Viswanadham edited comment on HDDS-3354 at 5/21/20, 7:59 PM: -------------------------------------------------------------------- {quote}That's very interesting. Do you have more finding about the root cause? How many buckets did you have? It's very surprising to have GC pause for a few thousands of buckets especially as they are not frequently updated. Do we need to adjust something on the cache size?{quote} Initially, the test is started with default heap size settings. Once OM is started with 8GB memory settings, full GC pauses are not seen. As said we identified few areas for improvements like HDDS-3615 and HDDS-3623 Bucket count is not in order of thousands, it is 14 million buckets. Command used in the test is {{$ bin/ozone freon ombg -n=1000000}} Even during cache design, we thought in a single OM cluster, we shall have 1000 of volumes, and each volume will have 1000 of buckets and will have billions of key in a cluster. As volume/bucket exist checks are done for each request, we have decided to keep bucket/volume cache in memory whole time and as a full cache. If we want to revisit this decision. But this is not related to this Jira. (This came out as part of testing when creating a million buckets to test this Jira) Snippet from Cache design doc attached to HDDS-505 {noformat} Memory Usage: As discussed above for Volume and Bucket Table we store full table information in memory. This will help in validation of the requests very quickly. As for every request Ozone Manager receives the mandatory check is volume/bucket exists or not. On a typical Ozone cluster Volumes can be in number of thousands. (Considering this as an admin level operation in a system where each team/organization gets a volume for their usage). And for each volume we can expect 1000 to 10000 buckets. These are considered just for calculation purpose. Let’s assume each VolumeInfo and BucketInfo structure consumes 1KB in memory. Then, Volume cache memory usage can be 1000 * 1KB = 10 MB. Bucket cache memory usage can be 1000 * 1000 * 1KB = 1GB. We can make the Volume and BucketTable caches partial if the number of buckets and volumes are very high in the system. This can be given as an option to end user. For now we assume that the entire list of volumes and buckets can be safely cached in memory. {noformat} was (Author: bharatviswa): {quote}That's very interesting. Do you have more finding about the root cause? How many buckets did you have? It's very surprising to have GC pause for a few thousands of buckets especially as they are not frequently updated. Do we need to adjust something on the cache size?{quote} Initially, the test is started with default heap size settings. Once OM is started with 8GB memory settings, no issues are seen. Bucket count is not in order of thousands, it is 14 million buckets. Command used in the test is {{$ bin/ozone freon ombg -n=1000000}} Even during cache design, we thought in a single OM cluster, we shall have 1000 of volumes, and each volume will have 1000 of buckets and will have billions of key in a cluster. As volume/bucket exist checks are done for each request, we have decided to keep bucket/volume cache in memory whole time and as a full cache. If we want to revisit this decision. But this is not related to this Jira. (This came out as part of testing when creating a million buckets to test this Jira) Snippet from Cache design doc attached to HDDS-505 {noformat} Memory Usage: As discussed above for Volume and Bucket Table we store full table information in memory. This will help in validation of the requests very quickly. As for every request Ozone Manager receives the mandatory check is volume/bucket exists or not. On a typical Ozone cluster Volumes can be in number of thousands. (Considering this as an admin level operation in a system where each team/organization gets a volume for their usage). And for each volume we can expect 1000 to 10000 buckets. These are considered just for calculation purpose. Let’s assume each VolumeInfo and BucketInfo structure consumes 1KB in memory. Then, Volume cache memory usage can be 1000 * 1KB = 10 MB. Bucket cache memory usage can be 1000 * 1000 * 1KB = 1GB. We can make the Volume and BucketTable caches partial if the number of buckets and volumes are very high in the system. This can be given as an option to end user. For now we assume that the entire list of volumes and buckets can be safely cached in memory. {noformat} > OM HA replay optimization > ------------------------- > > Key: HDDS-3354 > URL: https://issues.apache.org/jira/browse/HDDS-3354 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Reporter: Bharat Viswanadham > Assignee: Bharat Viswanadham > Priority: Major > Attachments: OM HA Replay.pdf, Screen Shot 2020-05-20 at 1.28.48 > PM.png > > > This Jira is to improve the OM HA replay scenario. > Attached the design document which discusses about the proposal and issue in > detail. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org