[ 
https://issues.apache.org/jira/browse/HDDS-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764147#comment-17764147
 ] 

Stephen O'Donnell commented on HDDS-7728:
-----------------------------------------

OK - so that means if a container is somehow missing, then its related block 
delete transactions will be stuck forever in the SCM queue. If the container 
returned later, then it will have its blocks deleted. If we somehow end up with 
a lot of missing containers, which would suggest a very serious problem on the 
cluster and a lot of data loss, then these deletes are going to be stuck 
forever.

In some respects this is similar to the under-replicated case. If the container 
is under replicated and has only 2 replicas rather than 3, there is a chance 
the delete gets processes on one of the 2 remaining replicas, but before it 
gets processed on the other, it gets copied to a 3rd replica. So then we end up 
with:

R1 - blocks deleted.
R2 - blocks not deleted.
R3 - blocks not deleted.

Or you have under-replication due to a stopped node and you could end up with:

R1 - blocks deleted.
R2 - blocks deleted.
R3 - blocks deleted.
R4 - blocks not deleted as it was offline and now back.

Here RM could delete any replica, so we could end up with extra blocks in one 
replica.

I am not sure if this is the only scenario where orphan blocks can get left in 
containers. The question is whether this is worth fixing?

We could introduce something like a blockDeleteTransactionID into the 
containers, and increment it each time a block is removed. Then RM could check 
for replicas with lagging sequences, but this could end up complicated with 
many edge cases.

There has also been discussion about needing a process driven by Recon that can 
detect orphan blocks on the DNs and arrange for them to be deleted, so perhaps 
this all could be handled by that.

Back to the missing containers - the question is whether we want these 
transactions to be stuck in SCM forever. But there is also a missing piece in 
Ozone about what we do with missing containers? Ideally, if containers are 
missing we need a tool to find the affected keys, flag / remove them from OM, 
then remove the containers from SCM (otherwise we have missing alerts forever) 
and that flow could remove these delete block transactions from the SCM DB too.

The more I think about this, I feel that orphan block cleanup is something we 
need to let a new process deal with, however, if the block deleting service was 
able to check if a container is under-replicated when it sends the deletes, 
then perhaps it could delete sending the deletes or requeue and resend them 
later. That sounds fairly simple to implement. What happens on the DN if it 
gets a delete block for a block it has already deleted? Will it just ignore it?

For missing containers, leave things as they are, and if we come up with a 
missing container cleanup system - then we can purge any related transactions 
as part of that, or mark the containers in SCM in some way, so that 
transactions are dropped in a similar way to deleted containers?

> Block should be safely deleted from the containers if they are instructed 
> from OM and containers are in missing state.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-7728
>                 URL: https://issues.apache.org/jira/browse/HDDS-7728
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>    Affects Versions: 1.3.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: Ashish Kumar
>            Priority: Major
>
> Currently when OM instructs to delete the blocks and if containers are in 
> missing state, deletion may not be processed properly. This Jira to track 
> this requirement and implement to safe deletion os blocks what ever state 
> they are on. Otherwise containers would never get cleaned up even though all 
> blocks in that files deleted. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to