xichen01 commented on PR #7548:
URL: https://github.com/apache/ozone/pull/7548#issuecomment-2660839546

   > I don't follow this. GetBlock only needs a block token generated by the OM 
or datanode. Calling SCM's getContainer is not required, nor is it done on the 
read or reconstruct paths where this API is currently used. I am also not sure 
what is meant by a normal container here. Does this mean one where replicas are 
not missing or deleted? In that case reading the block metadata would fail as 
expected.
   
   - Generally, users use this command to check whether the key is readable and 
if not, what is the reason. Possible reasons include the Container does not 
exist, or the Container is DELETED, or the Block file does not exist. So 
`getContainer` is necessary as a step in checking the readability of a key.
   - A simple process can be summarized as `list -> getContainer -> 
getContainerReplicas -> getBlock/verifyBlocks`
   - For clusters with security turned on, the process may be different, and 
`listKey` will not return a Block Token, so we may need to add a `getKeyInfo` 
step to get the Token.
   - If we want to check the readability of a Key, we need to call 
`getContainerReplicas` so that we can check each Block replicas.
   - A "normal Container" is a Container that exists and is in DELETED or 
DELETING state, or a Container that can provide services.
   
   > This sounds like a good addition. I think the new API should be called 
something like verifyBlocks instead of headBlock though. Head requests usually 
retrieve metadata in the same way that getBlock already does, but the 
description here returns results about metadata verification, not the actual 
metadata.
   
   Yes, I think `verifyBlocks` should be a better name.
   
   > Got it. I don't think it's clear in the doc whether the steps are applied 
to a single key or a whole section of the namespace. For the proposed ozone 
debug replicas verify command we would like it to operate on a specified 
portion of the namespace. This allows the command to parallelize listing 
commands across buckets and run verifications for a list batch concurrently 
while the next listing batch is being fetched.
   
   Support for checking Volume / Bucket / Key I will improve the documentation.
   
   > It sounds like the block verification API is looking to batch requests for 
multiple blocks in one request to the DN. In this case the per-key mapping 
should probably be datanode to block, not container to block, since blocks for 
the key may be under the same datanode but in different containers.
   
   It's actually a two-level mapping ContainerID -> LocalID -> Key. This allows 
you to query from BlockID(ContainerID + LocalID) to Key.
   
   > I don't see how this would work because listKeysLight does not include 
block information. That is what makes it fast. However we need both the block 
locations in the cluster and the block tokens to read them to run the 
verification checks. ListKeysResponse returns KeyInfo objects which if you 
drill down have the block token present. The BasicKeyInfo in 
ListKeysLightResponse has no informtation that can be used to locate or read 
the blocks.
   
   We need to modify listKeysLight to add at least the BlockID information.
   
   > Since the list keys API is paginated, the bottleneck on list keys would 
likely be alleviated by starting the block checks for one listing batch while 
concurrently fetching the next listing batch as mentioned above.
   
   For clusters that don't have security turned on, perhaps getting the key 
list directly from the DB would be the best performance.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to