siddhantsangwan commented on PR #6967:
URL: https://github.com/apache/ozone/pull/6967#issuecomment-2249695064

   My intention for including quasi-closed replicas to decide on the 
container's state is to basically try and prevent complete data loss, even if 
there has been some data loss already. This is the direction that we had 
decided on when refactoring the replication manager last year.
   
   > In order for a container to have moved to the DELETING state in the first 
place, it must have first been in the CLOSED state.
   
   Agreed.
   
   > If all CLOSED replicas were all lost, then that is a data loss scenario 
caused by too many faults in the system.
   
   There's data loss, right, but shouldn't we try and salvage any data that is 
remaining in the quasi-closed replicas?
   
   > and we should not try to save them from remaining partial/lagging replicas 
in states OPEN, CLOSING, QUASI-CLOSED.
   
   Not sure I understand why we shouldn't save lagging quasi-closed replicas 
here - are you expecting the data race in 
[HDDS-8129](https://issues.apache.org/jira/browse/HDDS-8129) to have completely 
corrupted these replicas?
   
   > Even though this patch is only trying to handle QUASI-CLOSED replicas 
right now, I'm grouping QUASI-CLOSED, OPEN, and CLOSING replicas together 
because:
   > 
   > If the SCM state is DELETING then we know these replicas states are 
lagging behind CLOSED replicas that may or may not still exist.
   > [They do not process block 
deletes](https://github.com/apache/ozone/blob/91c409d501c842d29d5355287af16d32db23d310/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/BlockDeletingService.java#L204).
 Therefore block deleting service will leave their isEmpty flag to false.
   > This flag may be set to true on 
[restart](https://github.com/apache/ozone/blob/930d3f0552b86d6605625cabe5e8961652d14fa2/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/KeyValueContainerUtil.java#L351)
 regardless of container state, but I think that is a separate issue we should 
fix, such that only CLOSED containers can have isEmtpy = true.
   > There is extra complexity in trying to save these stale replicas. For 
example currently the patch does not correctly handle the case were we see 
CLOSED replicas that are empty and non-CLOSED replicas which are not empty. The 
empty CLOSED replica confirms the non-empty non-closed one is stale and the 
container can be deleted. Whether or not the patch does this depends on the 
order of the container reports received and how those interleave with 
replication manager runs.
   
   I'm not very clear on whether block deletion can directly cause a container 
state change. As far as I know, a container should transition to DELETING 
(correctly) only when it has a positive number of replicas and _all_ its 
replicas known to SCM are closed and empty:
   
   ```
     private boolean isContainerEmptyAndClosed(final ContainerInfo container,
         final Set<ContainerReplica> replicas) {
       return container.getState() == HddsProtos.LifeCycleState.CLOSED &&
           !replicas.isEmpty() &&
           replicas.stream().allMatch(
               r -> r.getState() == ContainerReplicaProto.State.CLOSED &&
                   r.isEmpty());
     }
   ```
   
   That means if any non-closed replicas exist at all, the container is 
incorrectly in the deleting state, UNLESS all the closed replicas were empty.
   
   Now consider our situation where a container moved to the deleting state 
incorrectly in a previous version because of keyCount being 0 when the 
container wasn't actually empty. Suppose it has a quasi-closed replica that was 
offline, and all the closed replicas have keyCount = 0 (but not actually empty).
   
   On upgrading to the latest version with our fix, let's say the CLOSED 
replicas are somehow lost and the quasi-closed replica comes back. At this 
point, the container is incorrectly in the deleting state because its closed 
replicas, that are now lost, weren't actually empty. So we should transition it 
back to CLOSED when the quasi-closed replica is reported. If we don't, we risk 
losing that replica as well.
   
   > For example currently the patch does not correctly handle the case were we 
see CLOSED replicas that are empty and non-CLOSED replicas which are not empty. 
The empty CLOSED replica confirms the non-empty non-closed one is stale and the 
container can be deleted.
   
   If the container has a non-closed non-empty replica, it shouldn't have moved 
to the DELETING state in the first place according to the code I quoted above. 
If the non-closed replica was somehow offline and that's why the check in the 
code passed, and we now transition the container back to CLOSED from DELETING 
on the basis of that non-closed replica, then we have an orphan container. I 
think that's an acceptable tradeoff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to