[ https://issues.apache.org/jira/browse/HDFS-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372404#comment-14372404 ]
Colin Patrick McCabe commented on HDFS-7960: -------------------------------------------- bq. there's a TODO: FIXME, we aren't passing in the BlockReportContext. Yeah, mea culpa. bq. processReport doesn't need that last parameter anymore either I think, since the information is in the BR context. The last parameter is needed because we want to eliminate zombie storages only after all storages have been processed, and a single call to {{NameNodeRpcServer#blockReport}} can handle multiple storages bq. Is there a need for BR ids to be monotonic increasing? Else using a random number seems better. I see you do a fixup by checking with the previous ID, but with random this shouldn't be necessary I like the idea of monotonic increasing BR ids for two reasons: it makes it easier to see in the logs what block report came after what block report, and it effectively removes the (admittedly very, very small) chance of a collision between two subsequent BR IDs. The monotonic timer in Linux (or other OS) only gets reset when a node reboots, so even restarting the DN process will not normally reset the ID. bq. If you wanted to add comments about all this, BlockReportContext's class javadoc would be a good choice. Good idea, I added some comments there. bq. space after assert fixed > The full block report should prune zombie storages even if they're not empty > ---------------------------------------------------------------------------- > > Key: HDFS-7960 > URL: https://issues.apache.org/jira/browse/HDFS-7960 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Lei (Eddy) Xu > Assignee: Colin Patrick McCabe > Priority: Critical > Attachments: HDFS-7960.002.patch, HDFS-7960.003.patch, > HDFS-7960.004.patch > > > The full block report should prune zombie storages even if they're not empty. > We have seen cases in production where zombie storages have not been pruned > subsequent to HDFS-7575. This could arise any time the NameNode thinks there > is a block in some old storage which is actually not there. In this case, > the block will not show up in the "new" storage (once old is renamed to new) > and the old storage will linger forever as a zombie, even with the HDFS-7596 > fix applied. This also happens with datanode hotplug, when a drive is > removed. In this case, an entire storage (volume) goes away but the blocks > do not show up in another storage on the same datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)