[ https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang updated HDFS-10360: ----------------------------------- Status: Patch Available (was: Open) > DataNode may format directory and lose blocks if If current/VERSION is missing > ------------------------------------------------------------------------------ > > Key: HDFS-10360 > URL: https://issues.apache.org/jira/browse/HDFS-10360 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Attachments: HDFS-10360.001.patch > > > Under certain circumstances, if the current/VERSION of a storage directory is > missing, DataNode may format the storage directory even though _block files > are not missing_. > This is very easy to reproduce. Simply launch a HDFS cluster and create some > files. Delete current/VERSION, and restart the data node. > After the restart, the data node will format the directory and remove all > existing block files: > {noformat} > 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: > Lock on /data/dfs/dn/in_use.lock acquired by nodename > 5...@weichiu-dn-2.vpc.cloudera.com > 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: > Storage directory /data/dfs/dn is not formatted for > BP-787466439-172.26.24.43-1462305406642 > 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting ... > 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: > Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642 > 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: > Locking is disabled for > /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 > 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: > Block pool storage directory > /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted > for BP-787466439-172 > .26.24.43-1462305406642 > 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting ... > 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory > /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current > {noformat} > The bug is: DataNode assumes that if none of {{current/VERSION}}, > {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and > {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing > important to HDFS and decides to format it. However, block files may still > exist, and in my opinion, we should do everything possible to retain the > block files. > I have two suggestions: > # check if {{current/}} directory is empty. If not, throw an > InconsistentFSStateException in {{Storage#analyzeStorage}} instead of > asumming its not formatted. Or, > # In {{Storage#clearDirectory}}, before it formats the storage directory, > rename or move {{current/}} directory. Also, log whatever is being > renamed/moved. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org