[ https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007247#comment-16007247 ]
Virajith Jalaparti commented on HDFS-11639: ------------------------------------------- Thanks for working on this [~ehiggs]. I think the general direction of the patch (using {{BlockProvider#resolve()}} in {{ProvidedBlocksBuilder}} to get the {{BlockAlias}}, passing it along as a set of bytes to the client, and resolving it only on the Datanode) looks good. A few comments: * In {{ProvidedBlocksBuilder#newLocatedBlock}}, the {{fileRegion}} should be resolved only if the block has {{PROVIDED}} locations (i.e., {{hasProvidedLocations}} is true). When {{dfs.namenode.provided.enabled}} is set to true, all {{LocatedBlock}} are created in this method, and for non-provided blocks, a resolution of {{BlockAlias}} is needed. * {{PBHelperClient#convertLocatedBlockProto()}} and {{PBHelperClient#convertLocatedBlock()}} should be modified to decode/encode the {{BlockAlias}} bytes. * How about decoding the {{blockAlias}} bytes in {{DataXceiver#readBlock}} using a new {{DataTransferProtoUtil#blockAliasFromProto(bytes[] blockAlias}} method instead of using the {{BlockAlias#builder()}}? The former will be in-line with the way the protobufs are decoded in {{DataXceiver}}. Further, if in the future a different {{BlockAlias}} is used, the current implementation of using the {{FileRegion#Builder}} in {{DataXceiver#readBlock}} will be hard to extend (will end up being try {{FileRegion#Builder}}, if null try {{BlockAliasXX#Builder}} and so on). * Similar to passing on BlockAlias from {{DataXceiver#readBlock}} to {{BlockSender}}, it should be passed along from {{DataXceiver#readBlock}} to {{BlockReceiver}}. However, we would not need it till we have writes implemented. * {{DataStreamer#blockAlias}} will never be non-null. I think it should be initialized in {{DFSOutputStream}}. * Current implementation of {{TextReader#resolve}} can be expensive as it has to scan through the whole list of blocks. I don't have a very good solution for this as one alternative would be to maintain an in-memory map to store this information. I think it would be good to have both implementations, although this is not a blocker. * In {{DataTransfer#run()}}, the {{blockAlias}} should be null unless it is for a provided block. I think this will entail adding BlockAlias to {{transferBlock()}} and also to {{BlockCommand}} for the {{DatanodeProtocol.DNA_TRANSFER}} command. However, this will only be relevant for writing provided blocks (and in particular recovery). * I don't think {{TestDataTransferProtocol#testDataTransferProtocolWithBlockAlias}} is currently correct/tests something useful. Shouldn't block alias be relevant for only provided blocks? Looking over this patch, one thing that occurred to me is if it makes sense to unify {{FileRegionProvider}} with {{BlockProvider}}? They both have very close functionality. I like the use of {{BlockProvider#resolve()}}. If we unify {{FileRegionProvider}} with {{BlockProvider}}, then {{resolve}} can return {{null}} if the block map is accessible from the Datanodes also. If it is accessible only from the Namenode, then a non-null value can be propagated to the Datanode. One of the motivations for adding the {{BlockAlias}} to the client protocol was to have the blocks map only on the Namenode. In this scenario, the {{ReplicaMap}} in {{FsDatasetImpl}} of will not have any replicas apriori. Thus, one way to ensure that the {{FsDatasetImpl}} interface continues to function as today is to create a {{FinalizedProvidedReplica}} in {{FsDatasetImpl#getBlockInputStream}} when {{BlockAlias}} is not null. > [READ] Encode the BlockAlias in the client protocol > --------------------------------------------------- > > Key: HDFS-11639 > URL: https://issues.apache.org/jira/browse/HDFS-11639 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs > Reporter: Ewan Higgs > Assignee: Virajith Jalaparti > Attachments: HDFS-11639-HDFS-9806.001.patch, > HDFS-11639-HDFS-9806.002.patch > > > As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which > encodes information about where the data comes from. i.e. URI, offset, > length, and nonce value. This data should be encoded in the protocol > ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is > available using the PROVIDED storage type. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org