[ 
https://issues.apache.org/jira/browse/HDFS-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559872#comment-14559872
 ] 

Zhe Zhang commented on HDFS-8453:
---------------------------------

Thanks Walter for the review. Sorry about the confusion. 

This was my initial thought:
{quote}
LocatedBlock#offset should indicate the "offset of the first byte of the block 
in the file". In a striped block group, we should properly assign this offset 
for internal blocks, so each internal block can be identified from a given 
offset.
My current plan is to keep using bg.getStartOffset() + idxInBlockGroup * 
cellSize as the start offset for data blocks. For parity blocks, use -1 * 
(bg.getStartOffset() + idxInBlockGroup * cellSize).
{quote}

This was the approach in the patch:
{quote}
Actually it's not possible to assign meaningful start offset values for all 
internal blocks, especially parity ones. Consider a block group with 1 byte of 
data. No matter how to set the start offsets for parity blocks (negative 
values, etc.), they will overlap with the next block group in the file.
So this patch takes another approach: refactor DFSInputStream with a new 
refreshLocatedBlock method when the located block is to be refreshed instead of 
calling getBlockAt at first time. Then the refresh method can be extended in 
DFSStripedInputStream with index handling.
{quote}

If it's still confusing, please ignore all comments and review the patch itself 
:)

> Erasure coding: properly assign start offset for internal blocks in a block 
> group
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-8453
>                 URL: https://issues.apache.org/jira/browse/HDFS-8453
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-8453-HDFS-7285.00.patch
>
>
> {code}
>   void actualGetFromOneDataNode(final DNAddrPair datanode,
>     ...
>       LocatedBlock block = getBlockAt(blockStartOffset);
>     ...
>       fetchBlockAt(block.getStartOffset());
> {code}
> The {{blockStartOffset}} here is from inner block. For parity blocks, the 
> offset will overlap with the next block group, and we may end up with 
> fetching wrong block. So we have to assign a meaningful start offset for 
> internal blocks in a block group, especially for parity blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to