xichen01 commented on PR #7548:
URL: https://github.com/apache/ozone/pull/7548#issuecomment-2574556366

   > Recon already have job to check if any missing container. So IMO, checking 
container state to check for blocks may not be required. But if any container 
is not in healthy state (In Quasi-closed, deleted, ...) where chances of 
missing block is there, that can be reported as additional information by 
quering from SCM.
   
   ---
   
   > > The output of the command will be all the missing keys, if we skip the 
Container state check, we may need to get this information from Recon.
   > > And we have encountered some Missing Key Container seems to have never 
existed in the cluster, there is no any record in the SCM and Recon, this kind 
of scenario Recon can be found?
   > 
   > Yes, Recon already have capability to identify missing container and 
reports them. It monitors all key and verify if any container is missing for 
the keys.
   > 
   > But at block level, if some block is deleted at physical disk, there is no 
direct mechanism to identify this till data is not read via Recon. But,
   > 
   > There is a DN Container scan task is there which verify if container 
metadata and disk are in consistent state. Else mark the container to 
un-healthy so that replication can re-replicate. (I remember this is disabled 
by default, need recheck this).
   > 
   > cc: @errose28
   
   Thanks for your information.
   Recon can handle Container exception keys ,but for Container exception keys 
if we don't list them in the output, then our output result will only report a 
part of the "Block missing Key", which may cause ambiguity, so in order to 
report the "Block missing" Key completely, so I think the container state check 
is necessary.
   And if we want to check the Block on Datanode, container state check is hard 
to bypass.
   
   > There is a DN Container scan task is there which verify if container 
metadata and disk are in consistent state.
   
   This relies on the Block being correctly placed in the Container, and the 
Block not being incorrectly deleted by the DN (i.e., a key that should not have 
been deleted through the normal deletion process), which is not guaranteed for 
a cluster that has been upgraded many times and run for a long time.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to