[ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007247#comment-16007247
 ] 

Virajith Jalaparti commented on HDFS-11639:
-------------------------------------------

Thanks for working on this [~ehiggs]. I think the general direction of the 
patch (using {{BlockProvider#resolve()}} in {{ProvidedBlocksBuilder}} to get 
the {{BlockAlias}}, passing it along as a set of bytes to the client, and 
resolving it only on the Datanode) looks good. 

A few comments:

* In {{ProvidedBlocksBuilder#newLocatedBlock}}, the {{fileRegion}} should be 
resolved only if the block has {{PROVIDED}} locations (i.e., 
{{hasProvidedLocations}} is true). When {{dfs.namenode.provided.enabled}} is 
set to true, all {{LocatedBlock}} are created in this method, and for 
non-provided blocks, a resolution of {{BlockAlias}} is needed.

* {{PBHelperClient#convertLocatedBlockProto()}} and 
{{PBHelperClient#convertLocatedBlock()}} should be modified to decode/encode 
the {{BlockAlias}} bytes.

* How about decoding the {{blockAlias}} bytes in {{DataXceiver#readBlock}} 
using a new {{DataTransferProtoUtil#blockAliasFromProto(bytes[] blockAlias}} 
method instead of using the {{BlockAlias#builder()}}? The former will be 
in-line with the way the protobufs are decoded in {{DataXceiver}}. Further, if 
in the future a different {{BlockAlias}} is used, the current implementation of 
using the {{FileRegion#Builder}} in {{DataXceiver#readBlock}}  will be hard to 
extend (will end up being try {{FileRegion#Builder}}, if null try 
{{BlockAliasXX#Builder}} and so on). 

* Similar to passing on BlockAlias from {{DataXceiver#readBlock}} to 
{{BlockSender}}, it should be passed along from {{DataXceiver#readBlock}}  to 
{{BlockReceiver}}. However, we would not need it till we have writes 
implemented.

* {{DataStreamer#blockAlias}} will never be non-null. I think it should be 
initialized in {{DFSOutputStream}}.

* Current implementation of {{TextReader#resolve}} can be expensive as it has 
to scan through the whole list of blocks. I don't have a very good solution for 
this as one alternative would be to maintain an in-memory map to store this 
information. I think it would be good to have both implementations, although 
this is not a blocker.

* In {{DataTransfer#run()}}, the {{blockAlias}} should be null unless it is for 
a provided block. I think this will entail adding BlockAlias to 
{{transferBlock()}} and also to {{BlockCommand}} for the 
{{DatanodeProtocol.DNA_TRANSFER}} command. However, this will only be relevant 
for writing provided blocks (and in particular recovery).

* I don't think 
{{TestDataTransferProtocol#testDataTransferProtocolWithBlockAlias}} is 
currently correct/tests something useful. Shouldn't block alias be relevant for 
only provided blocks?

Looking over this patch, one thing that occurred to me is if it makes sense to 
unify {{FileRegionProvider}} with {{BlockProvider}}? They both have very close 
functionality.

I like the use of {{BlockProvider#resolve()}}. If we unify 
{{FileRegionProvider}} with {{BlockProvider}}, then {{resolve}} can return 
{{null}} if the block map is accessible from the Datanodes also. If it is 
accessible only from the Namenode, then a non-null value can be propagated to 
the Datanode.

One of the motivations for adding the {{BlockAlias}} to the client protocol was 
to have the blocks map only on the Namenode. In this scenario, the 
{{ReplicaMap}} in {{FsDatasetImpl}} of will not have any replicas apriori. 
Thus, one way to ensure that the {{FsDatasetImpl}} interface continues to 
function as today is to create a {{FinalizedProvidedReplica}} in 
{{FsDatasetImpl#getBlockInputStream}} when {{BlockAlias}} is not null.



> [READ] Encode the BlockAlias in the client protocol
> ---------------------------------------------------
>
>                 Key: HDFS-11639
>                 URL: https://issues.apache.org/jira/browse/HDFS-11639
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs
>            Reporter: Ewan Higgs
>            Assignee: Virajith Jalaparti
>         Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to