[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319015#comment-14319015
 ] 

Colin Patrick McCabe commented on HDFS-7648:
--------------------------------------------

Hi [~szetszwo], since you suggested splitting this JIRA into two, I had assumed 
that you wanted to have the discussion about "automatic fixing" on the second 
JIRA.  However if you want to have it now, I'll share my thoughts.

As I stated earlier, I don't think we should do automatic fixing.  We simply 
don't know *why* the DataNode got into a state where the directory layout is 
wrong.  This is similar to "what happens if there is no VERSION file?"  We 
don't try to automatically fix this.  If there is no VERSION file, then it's 
very likely that there is a serious misconfiguration and/or filesystem bug, and 
our attempts to fix it would only make things worse.

The same logic applies here.  If there are blocks in the wrong location, why is 
that happening?  It could be because there is a serious bug in the software.  
In that case, deleting the blocks, as you have suggested, would only lead to 
data loss.  It could be because the sysadmin manually edited a {{VERSION}} file 
for an old (pre HDFS-6482) datanode directory to look like it was 
post-HDFS-6482, bypassing the upgrade process.  In this case, deleting *all* 
the data is still the wrong thing to do... the sysadmin should instead see logs 
telling him that this configuration is wrong.  Finally, blocks could be in the 
wrong place because there is a serious disk drive or local FS error.  In this 
case, deletion will still do no good, because the device is in a seriously 
unusable state.

I'd also like to note that we've spent quite a lot of time discussing 
theoretical failures that may or may not ever happen.  Who knows whether we 
actually will ever find blocks in the wrong place?  You are asking for 
automatic handling of something that, to our knowledge, has never even happened 
once.  That seems like putting the cart before the horse.

> Verify the datanode directory layout
> ------------------------------------
>
>                 Key: HDFS-7648
>                 URL: https://issues.apache.org/jira/browse/HDFS-7648
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Rakesh R
>         Attachments: HDFS-7648.patch, HDFS-7648.patch
>
>
> HDFS-6482 changed datanode layout to use block ID to determine the directory 
> to store the block.  We should have some mechanism to verify it.  Either 
> DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to