[ 
https://issues.apache.org/jira/browse/HDFS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737604#action_12737604
 ] 

Hairong Kuang commented on HDFS-512:
------------------------------------

> Block is an existing base class and it is there for a very long time. We 
> cannot simply view it in one way yesterday and view it in another way today.
I agree that Block is a long existing base class. But I disagree that we could 
not do any change to it. Adding generation stamp as a part of its key was 
introduced by the append project. As I think more on the new append design, I 
feel that this was a design flaw and has caused many problems that we could not 
handle multiple generation stamps. That's why I want to make the proposed 
change.

As I said in the description of this jira, this change is based on the 
following facts: (1) On each datanode only one replica of block exists. 
Therefore there is only one generation of a block. (2) NameNode has only one 
entry for a block in its blocks map. 

So in all the maps that you mentioned, there should be only one entry of a 
block per block id. In either NN or DN, there is only one entry of blockInfo or 
replicaInfo per block. I do not think changing the key of the Block should 
cause any problem to all these data structures in dfs. If there are two 
generations of a block in those data structures, this is an error. Whether the 
key contains generation stamp or not, dfs should handle this error.

I agree this may break some external applications. We could put this change in 
the release note. 

> Set block id as the key to Block
> --------------------------------
>
>                 Key: HDFS-512
>                 URL: https://issues.apache.org/jira/browse/HDFS-512
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: blockKey.patch
>
>
> Currently the key to Block is block id + generation stamp. I would propose to 
> change it to be only block id. This is based on the following properties of 
> the dfs cluster:
> 1. On each datanode only one replica of block exists. Therefore there is only 
> one generation of a block.
> 2. NameNode has only one entry for a block in its blocks map.
> With this change, search for a block/replica's meta information is easier 
> since most of the time we know a block's id but may not know its generation 
> stamp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to