[ 
https://issues.apache.org/jira/browse/HDDS-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4343:
----------------------------
    Description: 
{code:java}
      // If there are unhealthy replicas, then we should remove them even if it
      // makes the container violate the placement policy, as excess unhealthy
      // containers are not really useful. It will be corrected later as a
      // mis-replicated container will be seen as under-replicated.
      for (ContainerReplica r : unhealthyReplicas) {
        if (excess > 0) {
          sendDeleteCommand(container, r.getDatanodeDetails(), true);
          excess -= 1;
        }
        break;
      }
      // After removing all unhealthy replicas, if the container is still over
      // replicated then we need to check if it is already mis-replicated.
      // If it is, we do no harm by removing excess replicas. However, if it is
      // not mis-replicated, then we can only remove replicas if they don't
      // make the container become mis-replicated.it seems that the comments 
want to remove all unhealthy replicas until excess reach 0 ?I guess it should be
      for (ContainerReplica r : unhealthyReplicas) {
        if (excess > 0) {
          sendDeleteCommand(container, r.getDatanodeDetails(), true);
          excess -= 1;
        } else {
          break;
        }
      }
{code}

  was:
{code:java}

20/08/28 03:21:53 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of call #28868 $Proxy17.submitRequest over 
nodeId=om3,nodeAddress=vc1330.halxg.cloudera.com:9862
20/08/28 03:21:53 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of call #28870 $Proxy17.submitRequest over 
nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862
20/08/28 03:21:53 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of call #28869 $Proxy17.submitRequest over 
nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862
20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of call #28871 $Proxy17.submitRequest over 
nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862
20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of call #28872 $Proxy17.submitRequest over 
nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862
20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of call #28866 $Proxy17.submitRequest over 
nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862
20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of call #28867 $Proxy17.submitRequest over 
nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862
20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of call #28874 $Proxy17.submitRequest over 
nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862
20/08/28 03:21:54 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of call #28875 $Proxy17.submitRequest over 
nodeId=om1,nodeAddress=vc1325.halxg.cloudera.com:9862
20/08/28 03:21:54 ERROR freon.BaseFreonGenerator: Error on executing task 14424
KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to 
commit key, as /vol1/bucket1/akjkdz4hoj/14424/104766512182520809entry is not 
found in the OpenKey table
        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:593)
        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.commitKey(OzoneManagerProtocolClientSideTranslatorPB.java:650)
        at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.commitKey(BlockOutputStreamEntryPool.java:306)
        at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:514)
        at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60)
        at 
org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.lambda$createKey$0(OzoneClientKeyGenerator.java:118)
        at com.codahale.metrics.Timer.time(Timer.java:101)
        at 
org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.createKey(OzoneClientKeyGenerator.java:113)
        at 
org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:178)
        at 
org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:167)
        at 
org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$0(BaseFreonGenerator.java:150)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}


> CLONE - OM client request fails with "failed to commit as key is not found in 
> OpenKey table"
> --------------------------------------------------------------------------------------------
>
>                 Key: HDDS-4343
>                 URL: https://issues.apache.org/jira/browse/HDDS-4343
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Blocker
>
> {code:java}
>       // If there are unhealthy replicas, then we should remove them even if 
> it
>       // makes the container violate the placement policy, as excess unhealthy
>       // containers are not really useful. It will be corrected later as a
>       // mis-replicated container will be seen as under-replicated.
>       for (ContainerReplica r : unhealthyReplicas) {
>         if (excess > 0) {
>           sendDeleteCommand(container, r.getDatanodeDetails(), true);
>           excess -= 1;
>         }
>         break;
>       }
>       // After removing all unhealthy replicas, if the container is still over
>       // replicated then we need to check if it is already mis-replicated.
>       // If it is, we do no harm by removing excess replicas. However, if it 
> is
>       // not mis-replicated, then we can only remove replicas if they don't
>       // make the container become mis-replicated.it seems that the comments 
> want to remove all unhealthy replicas until excess reach 0 ?I guess it should 
> be
>       for (ContainerReplica r : unhealthyReplicas) {
>         if (excess > 0) {
>           sendDeleteCommand(container, r.getDatanodeDetails(), true);
>           excess -= 1;
>         } else {
>           break;
>         }
>       }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to