xichen01 commented on PR #7548: URL: https://github.com/apache/ozone/pull/7548#issuecomment-2574556366
> Recon already have job to check if any missing container. So IMO, checking container state to check for blocks may not be required. But if any container is not in healthy state (In Quasi-closed, deleted, ...) where chances of missing block is there, that can be reported as additional information by quering from SCM. --- > > The output of the command will be all the missing keys, if we skip the Container state check, we may need to get this information from Recon. > > And we have encountered some Missing Key Container seems to have never existed in the cluster, there is no any record in the SCM and Recon, this kind of scenario Recon can be found? > > Yes, Recon already have capability to identify missing container and reports them. It monitors all key and verify if any container is missing for the keys. > > But at block level, if some block is deleted at physical disk, there is no direct mechanism to identify this till data is not read via Recon. But, > > There is a DN Container scan task is there which verify if container metadata and disk are in consistent state. Else mark the container to un-healthy so that replication can re-replicate. (I remember this is disabled by default, need recheck this). > > cc: @errose28 Thanks for your information. Recon can handle Container exception keys ,but for Container exception keys if we don't list them in the output, then our output result will only report a part of the "Block missing Key", which may cause ambiguity, so in order to report the "Block missing" Key completely, so I think the container state check is necessary. And if we want to check the Block on Datanode, container state check is hard to bypass. > There is a DN Container scan task is there which verify if container metadata and disk are in consistent state. This relies on the Block being correctly placed in the Container, and the Block not being incorrectly deleted by the DN (i.e., a key that should not have been deleted through the normal deletion process), which is not guaranteed for a cluster that has been upgraded many times and run for a long time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
