[ https://issues.apache.org/jira/browse/HDDS-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HDDS-5679: --------------------------------- Labels: pull-request-available (was: ) > Use more defensive sizeRequired for replication manager for container > replication. > ----------------------------------------------------------------------------------- > > Key: HDDS-5679 > URL: https://issues.apache.org/jira/browse/HDDS-5679 > Project: Apache Ozone > Issue Type: Bug > Reporter: Mark Gui > Assignee: Mark Gui > Priority: Major > Labels: pull-request-available > > We hit a bug when replicating a container of some size about 2GB < > 5GB(container size): > {code:java} > // code placeholder > 2021-08-25 19:12:31,945 [ContainerReplicationThread-4] ERROR > org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: > Container 73446 replication was unsuccessful. > org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The > volume with the most available space (=2580881408 B) is less than the > container size (=5368709120 B). > at > org.apache.hadoop.ozone.container.common.volume.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:77) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.populateContainerPathFields(KeyValueHandler.java:290) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.importContainer(KeyValueHandler.java:907) > at > org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.importContainer(ContainerController.java:139) > at > org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:90) > at > org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:135) > at > org.apache.hadoop.ozone.container.replication.MeasuredReplicator.replicate(MeasuredReplicator.java:69) > at > org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:139) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2021-08-25 19:12:31,946 [ContainerReplicationThread-4] ERROR > org.apache.hadoop.ozone.container.replication.ReplicationSupervisor: > Container 73446 can't be downloaded from any of the datanodes. > {code} > ReplicationManager will place the container replica to a datanode with enough > space, but when datanode wants to create a container replica, it will check > if whether there's at least 5GB(container size) left, so even that we have > enough space for a container of 2GB, we will hit an out of space exception. > In this case, RM should not schedule this replica to this datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org