[jira] [Comment Edited] (HDDS-2459) Refactor ReplicationManager to consider maintenance states

Stephen O'Donnell (Jira) Thu, 14 Nov 2019 09:26:45 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974366#comment-16974366
 ]


Stephen O'Donnell edited comment on HDDS-2459 at 11/14/19 5:25 PM:
-------------------------------------------------------------------

In the decommission design doc, we had an algorithm to determine the number of 
replicas that need to be created or destroy so a container can be perfectly 
replicated. The algorithm was:

{code}
/**
 * Calculate the number of the missing replicas.
 * 
 * @return the number of the missing replicas. If it's less than zero, the 
container is over replicated.
 */
int getReplicationCount(int expectedCount, int healthy, 
   int maintenance, int inFlight) {

   //for over replication, count only with the healthy replicas
   if (expectedCount < healthy) {
      return expectedCount - healthy;
   }
   
   replicaCount = expectedCount - (healthy + maintenance + inFlight);

   if (replicaCount == 0 && healthy < 1) {
      replicaCount ++;
   }
   
   //over replication is already handled
   return Math.max(0, replicaCount);
}
{code}

The code from the design doc needs a minor correction to handle inflight 
deletes on over replication, so it would look like this:

{code}
  public int additionalReplicaNeeded2() {

    if (repFactor < healthyCount) {
      return repFactor - healthyCount + inFlightDel;
    }

    int delta = repFactor - (healthyCount + maintenanceCount + inFlightAdd - 
inFlightDel);

    if (delta == 0 && healthyCount < minHealthyForMaintenance) {
      delta += minHealthyForMaintenance - healthyCount;
    }
    return Math.max(0, delta);
  }
{code}

I also came up with the logic below, which is very similar although a little 
more verbose. The only different between the above and the below, is that in 
the case of 3 in_service replicas and one or more inflight deletes, the above 
will return 1 new replica needed, but the below will return zero. The reasoning 
is that we should let the delete complete or not, as it may fail, and then deal 
with the over or under replication when the inflight operations have cleared.

{code}
  /**
   * Calculates the the delta of replicas which need to be created or removed
   * to ensure the container is correctly replicated.
   *
   * Decisions around over-replication are made only on healthy replicas,
   * ignoring any in maintenance and also any inflight adds. InFlight adds are
   * ignored, as they may not complete, so if we have:
   *
   *     H, H, H, IN_FLIGHT_ADD
   *
   * And then schedule a delete, we could end up under-replicated (add fails,
   * delete completes). It is better to let the inflight operations complete
   * and then deal with any further over or under replication.
   *
   * For maintenance replicas, assuming replication factor 3, and minHealthy
   * 2, it is possible for all 3 hosts to be put into maintenance, leaving the
   * following (H = healthy, M = maintenance):
   *
   *     H, H, M, M, M
   *
   * Even though we are tracking 5 replicas, this is not over replicated as we
   * ignore the maintenance copies. Later, the replicas could look like:
   *
   *     H, H, H, H, M
   *
   * At this stage, the container is over replicated by 1, so one replica can be
   * removed.
   *
   * For containers which have replication factor healthy replica, we ignore any
   * inflight add or deletes, as they may fail. Instead, wait for them to
   * complete and then deal with any excess or deficit.
   *
   * For under replicated containers we do consider inflight add and delete to
   * avoid scheduling more adds than needed. There is additional logic around
   * containers with maintenance replica to ensure minHealthyForMaintenance
   * replia are maintained/
   *
   * @return Delta of replicas needed. Negative indicates over replication and
   *         containers should be removed. Positive indicates over replication
   *         and zero indicates the containers has replicationFactor healthy
   *         replica
   */
  public int additionalReplicaNeeded() {
    int delta = repFactor - healthyCount;

    if (delta < 0) {
      // Over replicated, so may need to remove a block. Do not consider
      // inFlightAdds, as they may fail, but do consider inFlightDel which
      // will reduce the over-replication if it completes.
      return delta + inFlightDel;
    } else if (delta > 0) {
      // May be under-replicated, depending on maintenance. When a container is
      // under-replicated, we must consider inflight add and delete when
      // calculating the new containers needed.
      delta = Math.max(0, delta - maintenanceCount);
      // Check we have enough healthy replicas
      int neededHealthy =
          Math.max(0, minHealthyForMaintenance - healthyCount);
      delta = Math.max(neededHealthy, delta);
      return delta - inFlightAdd + inFlightDel;
    } else { // delta == 0
      // We have exactly the number of healthy replicas needed, but there may
      // be inflight add or delete. Ignore them until they complete or fail
      // and then deal with the excess or deficit.
      return delta;
    }
  }
}
{code}

The following logic also describes the conditions the replica for a container 
must meet to be considered sufficiently replicated - note that inflight adds 
are ignored and inflight deletes are considered until they complete:

{code}
  /**
   * Return true if the container is sufficiently replicated. Decommissioning
   * and Decommissioned containers are ignored in this check, assuming they will
   * eventually be removed from the cluster.
   * This check ignores inflight additions, as those replicas have not yet been
   * created and the create could fail for some reason.
   * The check does consider inflight deletes as there may be 3 healthy replicas
   * now, but once the delete completes it will reduce to 2.
   * We also assume a replica in Maintenance state cannot be removed, so the
   * pending delete would affect only the healthy replica count.
   *
   * @return True if the container is sufficiently replicated and False
   *         otherwise.
   */
  public boolean isSufficientlyReplicated() {
    return (healthyCount + maintenanceCount - inFlightDel) >= repFactor
        && healthyCount - inFlightDel >= minHealthyForMaintenance;
  }
{code}


was (Author: sodonnell):
In the decommission design doc, we had an algorithm to determine the number of 
replicas that need to be created or destroy so a container can be perfectly 
replicated. The algorithm was:

{code}
/**
 * Calculate the number of the missing replicas.
 * 
 * @return the number of the missing replicas. If it's less than zero, the 
container is over replicated.
 */
int getReplicationCount(int expectedCount, int healthy, 
   int maintenance, int inFlight) {

   //for over replication, count only with the healthy replicas
   if (expectedCount < healthy) {
      return expectedCount - healthy;
   }
   
   replicaCount = expectedCount - (healthy + maintenance + inFlight);

   if (replicaCount == 0 && healthy < 1) {
      replicaCount ++;
   }
   
   //over replication is already handled
   return Math.max(0, replicaCount);
}
{code}

Reflecting on this for some time, I think it is a little too simplistic and 
would propose the following instead. One key difference in the logic below is 
that maintenance replicas are not considered when calculating over replicated. 
This is because a maintenance copy cannot be removed (the node is offline) and 
there is not insignificant change the node will fail to come back online, 
resulting in all its replicas getting lost.

{code}
  /**
   * Calculates the the delta of replicas which need to be created or removed
   * to ensure the container is correctly replicated.
   *
   * Decisions around over-replication are made only on healthy replicas,
   * ignoring any in maintenance and also any inflight adds. InFlight adds are
   * ignored, as they may not complete, so if we have:
   *
   *     H, H, H, IN_FLIGHT_ADD
   *
   * And then schedule a delete, we could end up under-replicated (add fails,
   * delete completes). It is better to let the inflight operations complete
   * and then deal with any further over or under replication.
   *
   * For maintenance replicas, assuming replication factor 3, and minHealthy
   * 2, it is possible for all 3 hosts to be put into maintenance, leaving the
   * following (H = healthy, M = maintenance):
   *
   *     H, H, M, M, M
   *
   * Even though we are tracking 5 replicas, this is not over replicated as we
   * ignore the maintenance copies. Later, the replicas could look like:
   *
   *     H, H, H, H, M
   *
   * At this stage, the container is over replicated by 1, so one replica can be
   * removed.
   *
   * For containers which have replication factor healthy replica, we ignore any
   * inflight add or deletes, as they may fail. Instead, wait for them to
   * complete and then deal with any excess or deficit.
   *
   * For under replicated containers we do consider inflight add and delete to
   * avoid scheduling more adds than needed. There is additional logic around
   * containers with maintenance replica to ensure minHealthyForMaintenance
   * replia are maintained/
   *
   * @return Delta of replicas needed. Negative indicates over replication and
   *         containers should be removed. Positive indicates over replication
   *         and zero indicates the containers has replicationFactor healthy
   *         replica
   */
  public int additionalReplicaNeeded() {
    int blockDelta = 0;
    int delta = repFactor - healthyCount;

    if (delta < 0) {
      // Over replicated, so may need to remove a block. Do not consider
      // inFlightAdds, as they may fail, but do consider inFlightDel which
      // will reduce the over-replication if it completes.
      blockDelta = delta  + inFlightDel;
    } else if (delta > 0) {
      // May be under-replicated, depending on maintenance. When a container is
      // under-replicated, we must consider inflight add and delete when
      // calculating the new containers needed.
      if (maintenanceCount != 0) {
        // Remove maintenance copies from delta to see if it is really
        // under-replicated.
        delta = Math.max(0, delta - maintenanceCount);
        // Check we have enough healthy replicas
        int neededHealthy =
            Math.max(0, minHealthyForMaintenance - healthyCount);
        delta = Math.max(neededHealthy, delta);
      }
      blockDelta = delta - inFlightAdd + inFlightDel;
    } else { // delta == 0
      // We have exactly the number of healthy replicas needed, but there may
      // be inflight add or delete. Ignore them until they complete or fail
      // and then deal with the excess or deficit.
      blockDelta = delta;
    }
    return blockDelta;
{code}

The following logic also describes the conditions the replica for a container 
must meet to be considered sufficiently replicated - note that inflight adds 
are ignored and inflight deletes are considered until they complete:

{code}
  /**
   * Return true if the container is sufficiently replicated. Decommissioning
   * and Decommissioned containers are ignored in this check, assuming they will
   * eventually be removed from the cluster.
   * This check ignores inflight additions, as those replicas have not yet been
   * created and the create could fail for some reason.
   * The check does consider inflight deletes as there may be 3 healthy replicas
   * now, but once the delete completes it will reduce to 2.
   * We also assume a replica in Maintenance state cannot be removed, so the
   * pending delete would affect only the healthy replica count.
   *
   * @return True if the container is sufficiently replicated and False
   *         otherwise.
   */
  public boolean isSufficientlyReplicated() {
    return (healthyCount + maintenanceCount - inFlightDel) >= repFactor
        && healthyCount - inFlightDel >= minHealthyForMaintenance;
  }
{code}

> Refactor ReplicationManager to consider maintenance states
> ----------------------------------------------------------
>
>                 Key: HDDS-2459
>                 URL: https://issues.apache.org/jira/browse/HDDS-2459
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> In its current form the replication manager does not consider decommission or 
> maintenance states when checking if replicas are sufficiently replicated. 
> With the introduction of maintenance states, it needs to consider 
> decommission and maintenance states when deciding if blocks are over or under 
> replicated.
> It also needs to provide an API to allow the decommission manager to check if 
> blocks are over or under replicated, so the decommission manager can decide 
> if a node has completed decommission and maintenance or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-2459) Refactor ReplicationManager to consider maintenance states

Reply via email to