[
https://issues.apache.org/jira/browse/HDDS-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764147#comment-17764147
]
Stephen O'Donnell commented on HDDS-7728:
-----------------------------------------
OK - so that means if a container is somehow missing, then its related block
delete transactions will be stuck forever in the SCM queue. If the container
returned later, then it will have its blocks deleted. If we somehow end up with
a lot of missing containers, which would suggest a very serious problem on the
cluster and a lot of data loss, then these deletes are going to be stuck
forever.
In some respects this is similar to the under-replicated case. If the container
is under replicated and has only 2 replicas rather than 3, there is a chance
the delete gets processes on one of the 2 remaining replicas, but before it
gets processed on the other, it gets copied to a 3rd replica. So then we end up
with:
R1 - blocks deleted.
R2 - blocks not deleted.
R3 - blocks not deleted.
Or you have under-replication due to a stopped node and you could end up with:
R1 - blocks deleted.
R2 - blocks deleted.
R3 - blocks deleted.
R4 - blocks not deleted as it was offline and now back.
Here RM could delete any replica, so we could end up with extra blocks in one
replica.
I am not sure if this is the only scenario where orphan blocks can get left in
containers. The question is whether this is worth fixing?
We could introduce something like a blockDeleteTransactionID into the
containers, and increment it each time a block is removed. Then RM could check
for replicas with lagging sequences, but this could end up complicated with
many edge cases.
There has also been discussion about needing a process driven by Recon that can
detect orphan blocks on the DNs and arrange for them to be deleted, so perhaps
this all could be handled by that.
Back to the missing containers - the question is whether we want these
transactions to be stuck in SCM forever. But there is also a missing piece in
Ozone about what we do with missing containers? Ideally, if containers are
missing we need a tool to find the affected keys, flag / remove them from OM,
then remove the containers from SCM (otherwise we have missing alerts forever)
and that flow could remove these delete block transactions from the SCM DB too.
The more I think about this, I feel that orphan block cleanup is something we
need to let a new process deal with, however, if the block deleting service was
able to check if a container is under-replicated when it sends the deletes,
then perhaps it could delete sending the deletes or requeue and resend them
later. That sounds fairly simple to implement. What happens on the DN if it
gets a delete block for a block it has already deleted? Will it just ignore it?
For missing containers, leave things as they are, and if we come up with a
missing container cleanup system - then we can purge any related transactions
as part of that, or mark the containers in SCM in some way, so that
transactions are dropped in a similar way to deleted containers?
> Block should be safely deleted from the containers if they are instructed
> from OM and containers are in missing state.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-7728
> URL: https://issues.apache.org/jira/browse/HDDS-7728
> Project: Apache Ozone
> Issue Type: Improvement
> Components: SCM
> Affects Versions: 1.3.0
> Reporter: Uma Maheswara Rao G
> Assignee: Ashish Kumar
> Priority: Major
>
> Currently when OM instructs to delete the blocks and if containers are in
> missing state, deletion may not be processed properly. This Jira to track
> this requirement and implement to safe deletion os blocks what ever state
> they are on. Otherwise containers would never get cleaned up even though all
> blocks in that files deleted.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]