[ 
https://issues.apache.org/jira/browse/HDFS-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370761#comment-14370761
 ] 

Andrew Wang commented on HDFS-7960:
-----------------------------------

Nice find and fix Colin. This change looks good to me overall, looking forward 
to the unit test. Just a few comments:

* EMPTY_STORAGE_INFO_LIST is a little dangerous since it can still be modified, 
maybe use ImmutableLists instead? Alternatively we can just allocate the empty 
LinkedList, it'll be young gen and cheap.
* Could use some comments about the lifecycle of updatedByCurrentFbr on the NN 
in a comment (on the zombie reaper methods?), also on the DN side too about 
sentAllStorages. The comment in the proto file is reasonable, but that is not 
the first place people will look.
* When we're doing a rolling upgrade, I think updatedByCurrentFbr will be off. 
The NN will never get a sentAllStorages signal from an old DN, so it'll keep 
marking all of its storages as updatedByCurrentFbr. The DN upgrades, and the 
first FBR with sentAllStorages set won't handle zombies correctly since 
everything is marked updatedByCurrentFbr. This probably deserves a comment at 
least. We could also provide a "first" flag to go along with the "last", so the 
NN resets updatedByCurrentFbr on seeing "first" and reaps upon seeing "last".

A few nits:

{code}
        List<DatanodeStorageInfo> zombies =  node.removeZombieStorages();
{code}

double space after equals here

{code}
      BlockReportRequestProto request)
                throws ServiceException {
{code}

unnecessary whitespace change?

> The full block report should prune zombie storages even if they're not empty
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-7960
>                 URL: https://issues.apache.org/jira/browse/HDFS-7960
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-7960.002.patch
>
>
> The full block report should prune zombie storages even if they're not empty. 
>  We have seen cases in production where zombie storages have not been pruned 
> subsequent to HDFS-7575.  This could arise any time the NameNode thinks there 
> is a block in some old storage which is actually not there.  In this case, 
> the block will not show up in the "new" storage (once old is renamed to new) 
> and the old storage will linger forever as a zombie, even with the HDFS-7596 
> fix applied.  This also happens with datanode hotplug, when a drive is 
> removed.  In this case, an entire storage (volume) goes away but the blocks 
> do not show up in another storage on the same datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to