[ 
https://issues.apache.org/jira/browse/HDFS-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372404#comment-14372404
 ] 

Colin Patrick McCabe commented on HDFS-7960:
--------------------------------------------

bq. there's a TODO: FIXME, we aren't passing in the BlockReportContext.

Yeah, mea culpa.

bq. processReport doesn't need that last parameter anymore either I think, 
since the information is in the BR context.

The last parameter is needed because we want to eliminate zombie storages only 
after all storages have been processed, and a single call to 
{{NameNodeRpcServer#blockReport}} can handle multiple storages

bq. Is there a need for BR ids to be monotonic increasing? Else using a random 
number seems better. I see you do a fixup by checking with the previous ID, but 
with random this shouldn't be necessary

I like the idea of monotonic increasing BR ids for two reasons: it makes it 
easier to see in the logs what block report came after what block report, and 
it effectively removes the (admittedly very, very small) chance of a collision 
between two subsequent BR IDs.  The monotonic timer in Linux (or other OS) only 
gets reset when a node reboots, so even restarting the DN process will not 
normally reset the ID.

bq. If you wanted to add comments about all this, BlockReportContext's class 
javadoc would be a good choice.

Good idea, I added some comments there.

bq. space after assert

fixed

> The full block report should prune zombie storages even if they're not empty
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-7960
>                 URL: https://issues.apache.org/jira/browse/HDFS-7960
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-7960.002.patch, HDFS-7960.003.patch, 
> HDFS-7960.004.patch
>
>
> The full block report should prune zombie storages even if they're not empty. 
>  We have seen cases in production where zombie storages have not been pruned 
> subsequent to HDFS-7575.  This could arise any time the NameNode thinks there 
> is a block in some old storage which is actually not there.  In this case, 
> the block will not show up in the "new" storage (once old is renamed to new) 
> and the old storage will linger forever as a zombie, even with the HDFS-7596 
> fix applied.  This also happens with datanode hotplug, when a drive is 
> removed.  In this case, an entire storage (volume) goes away but the blocks 
> do not show up in another storage on the same datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to