[jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if current/VERSION is missing

2016-05-17 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-10360:
-
Labels: dataloss datanode  (was: )

> DataNode may format directory and lose blocks if current/VERSION is missing
> ---
>
> Key: HDFS-10360
> URL: https://issues.apache.org/jira/browse/HDFS-10360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: dataloss, datanode
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-10360.001.patch, HDFS-10360.002.patch, 
> HDFS-10360.003.patch, HDFS-10360.004.patch, HDFS-10360.004.patch, 
> HDFS-10360.005.patch, HDFS-10360.007.patch
>
>
> Under certain circumstances, if the current/VERSION of a storage directory is 
> missing, DataNode may format the storage directory even though _block files 
> are not missing_.
> This is very easy to reproduce. Simply launch a HDFS cluster and create some 
> files. Delete current/VERSION, and restart the data node.
> After the restart, the data node will format the directory and remove all 
> existing block files:
> {noformat}
> 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /data/dfs/dn/in_use.lock acquired by nodename 
> 5...@weichiu-dn-2.vpc.cloudera.com
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /data/dfs/dn is not formatted for 
> BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled for 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Block pool storage directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
> for BP-787466439-172
> .26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
> {noformat}
> The bug is: DataNode assumes that if none of {{current/VERSION}}, 
> {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
> {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
> important to HDFS and decides to format it. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
> However, block files may still exist, and in my opinion, we should do 
> everything possible to retain the block files.
> I have two suggestions:
> # check if {{current/}} directory is empty. If not, throw an 
> InconsistentFSStateException in {{Storage#analyzeStorage}} instead of 
> asumming its not formatted. Or,
> # In {{Storage#clearDirectory}}, before it formats the storage directory, 
> rename or move {{current/}} directory. Also, log whatever is being 
> renamed/moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if current/VERSION is missing

2016-05-17 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-10360:
-
  Resolution: Fixed
   Fix Version/s: 3.0.0-alpha1
  2.9.0
Target Version/s: 3.0.0-alpha1
  Status: Resolved  (was: Patch Available)

+1. Thanks [~jojochuang]. Good work.


> DataNode may format directory and lose blocks if current/VERSION is missing
> ---
>
> Key: HDFS-10360
> URL: https://issues.apache.org/jira/browse/HDFS-10360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-10360.001.patch, HDFS-10360.002.patch, 
> HDFS-10360.003.patch, HDFS-10360.004.patch, HDFS-10360.004.patch, 
> HDFS-10360.005.patch, HDFS-10360.007.patch
>
>
> Under certain circumstances, if the current/VERSION of a storage directory is 
> missing, DataNode may format the storage directory even though _block files 
> are not missing_.
> This is very easy to reproduce. Simply launch a HDFS cluster and create some 
> files. Delete current/VERSION, and restart the data node.
> After the restart, the data node will format the directory and remove all 
> existing block files:
> {noformat}
> 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /data/dfs/dn/in_use.lock acquired by nodename 
> 5...@weichiu-dn-2.vpc.cloudera.com
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /data/dfs/dn is not formatted for 
> BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled for 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Block pool storage directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
> for BP-787466439-172
> .26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
> {noformat}
> The bug is: DataNode assumes that if none of {{current/VERSION}}, 
> {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
> {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
> important to HDFS and decides to format it. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
> However, block files may still exist, and in my opinion, we should do 
> everything possible to retain the block files.
> I have two suggestions:
> # check if {{current/}} directory is empty. If not, throw an 
> InconsistentFSStateException in {{Storage#analyzeStorage}} instead of 
> asumming its not formatted. Or,
> # In {{Storage#clearDirectory}}, before it formats the storage directory, 
> rename or move {{current/}} directory. Also, log whatever is being 
> renamed/moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if current/VERSION is missing

2016-05-17 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10360:
---
Attachment: HDFS-10360.007.patch

Many thanks to [~eddyxu] for the comments. Here's the new patch that includes a 
test. This test validates that DN sees the failed volume in its JMX message, 
and NN also sees the same failure.

Please review again! Thanks again!

> DataNode may format directory and lose blocks if current/VERSION is missing
> ---
>
> Key: HDFS-10360
> URL: https://issues.apache.org/jira/browse/HDFS-10360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10360.001.patch, HDFS-10360.002.patch, 
> HDFS-10360.003.patch, HDFS-10360.004.patch, HDFS-10360.004.patch, 
> HDFS-10360.005.patch, HDFS-10360.007.patch
>
>
> Under certain circumstances, if the current/VERSION of a storage directory is 
> missing, DataNode may format the storage directory even though _block files 
> are not missing_.
> This is very easy to reproduce. Simply launch a HDFS cluster and create some 
> files. Delete current/VERSION, and restart the data node.
> After the restart, the data node will format the directory and remove all 
> existing block files:
> {noformat}
> 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /data/dfs/dn/in_use.lock acquired by nodename 
> 5...@weichiu-dn-2.vpc.cloudera.com
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /data/dfs/dn is not formatted for 
> BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled for 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Block pool storage directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
> for BP-787466439-172
> .26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
> {noformat}
> The bug is: DataNode assumes that if none of {{current/VERSION}}, 
> {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
> {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
> important to HDFS and decides to format it. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
> However, block files may still exist, and in my opinion, we should do 
> everything possible to retain the block files.
> I have two suggestions:
> # check if {{current/}} directory is empty. If not, throw an 
> InconsistentFSStateException in {{Storage#analyzeStorage}} instead of 
> asumming its not formatted. Or,
> # In {{Storage#clearDirectory}}, before it formats the storage directory, 
> rename or move {{current/}} directory. Also, log whatever is being 
> renamed/moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if current/VERSION is missing

2016-05-16 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10360:
---
Attachment: HDFS-10360.005.patch

Rev05: fixed checkstyle warning, findbug warning. Also, use Java 
{{DirectoryStream}} to identify an empty directory without listing the content 
of the directory.

> DataNode may format directory and lose blocks if current/VERSION is missing
> ---
>
> Key: HDFS-10360
> URL: https://issues.apache.org/jira/browse/HDFS-10360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10360.001.patch, HDFS-10360.002.patch, 
> HDFS-10360.003.patch, HDFS-10360.004.patch, HDFS-10360.004.patch, 
> HDFS-10360.005.patch
>
>
> Under certain circumstances, if the current/VERSION of a storage directory is 
> missing, DataNode may format the storage directory even though _block files 
> are not missing_.
> This is very easy to reproduce. Simply launch a HDFS cluster and create some 
> files. Delete current/VERSION, and restart the data node.
> After the restart, the data node will format the directory and remove all 
> existing block files:
> {noformat}
> 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /data/dfs/dn/in_use.lock acquired by nodename 
> 5...@weichiu-dn-2.vpc.cloudera.com
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /data/dfs/dn is not formatted for 
> BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled for 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Block pool storage directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
> for BP-787466439-172
> .26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
> {noformat}
> The bug is: DataNode assumes that if none of {{current/VERSION}}, 
> {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
> {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
> important to HDFS and decides to format it. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
> However, block files may still exist, and in my opinion, we should do 
> everything possible to retain the block files.
> I have two suggestions:
> # check if {{current/}} directory is empty. If not, throw an 
> InconsistentFSStateException in {{Storage#analyzeStorage}} instead of 
> asumming its not formatted. Or,
> # In {{Storage#clearDirectory}}, before it formats the storage directory, 
> rename or move {{current/}} directory. Also, log whatever is being 
> renamed/moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if current/VERSION is missing

2016-05-13 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10360:
---
Attachment: HDFS-10360.004.patch
HDFS-10360.004.patch

> DataNode may format directory and lose blocks if current/VERSION is missing
> ---
>
> Key: HDFS-10360
> URL: https://issues.apache.org/jira/browse/HDFS-10360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10360.001.patch, HDFS-10360.002.patch, 
> HDFS-10360.003.patch, HDFS-10360.004.patch, HDFS-10360.004.patch
>
>
> Under certain circumstances, if the current/VERSION of a storage directory is 
> missing, DataNode may format the storage directory even though _block files 
> are not missing_.
> This is very easy to reproduce. Simply launch a HDFS cluster and create some 
> files. Delete current/VERSION, and restart the data node.
> After the restart, the data node will format the directory and remove all 
> existing block files:
> {noformat}
> 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /data/dfs/dn/in_use.lock acquired by nodename 
> 5...@weichiu-dn-2.vpc.cloudera.com
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /data/dfs/dn is not formatted for 
> BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled for 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Block pool storage directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
> for BP-787466439-172
> .26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
> {noformat}
> The bug is: DataNode assumes that if none of {{current/VERSION}}, 
> {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
> {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
> important to HDFS and decides to format it. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
> However, block files may still exist, and in my opinion, we should do 
> everything possible to retain the block files.
> I have two suggestions:
> # check if {{current/}} directory is empty. If not, throw an 
> InconsistentFSStateException in {{Storage#analyzeStorage}} instead of 
> asumming its not formatted. Or,
> # In {{Storage#clearDirectory}}, before it formats the storage directory, 
> rename or move {{current/}} directory. Also, log whatever is being 
> renamed/moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if current/VERSION is missing

2016-05-09 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10360:
---
Attachment: HDFS-10360.003.patch

Forgot to rebase my patch... submitted a rebased patch.

> DataNode may format directory and lose blocks if current/VERSION is missing
> ---
>
> Key: HDFS-10360
> URL: https://issues.apache.org/jira/browse/HDFS-10360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10360.001.patch, HDFS-10360.002.patch, 
> HDFS-10360.003.patch
>
>
> Under certain circumstances, if the current/VERSION of a storage directory is 
> missing, DataNode may format the storage directory even though _block files 
> are not missing_.
> This is very easy to reproduce. Simply launch a HDFS cluster and create some 
> files. Delete current/VERSION, and restart the data node.
> After the restart, the data node will format the directory and remove all 
> existing block files:
> {noformat}
> 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /data/dfs/dn/in_use.lock acquired by nodename 
> 5...@weichiu-dn-2.vpc.cloudera.com
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /data/dfs/dn is not formatted for 
> BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled for 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Block pool storage directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
> for BP-787466439-172
> .26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
> {noformat}
> The bug is: DataNode assumes that if none of {{current/VERSION}}, 
> {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
> {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
> important to HDFS and decides to format it. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
> However, block files may still exist, and in my opinion, we should do 
> everything possible to retain the block files.
> I have two suggestions:
> # check if {{current/}} directory is empty. If not, throw an 
> InconsistentFSStateException in {{Storage#analyzeStorage}} instead of 
> asumming its not formatted. Or,
> # In {{Storage#clearDirectory}}, before it formats the storage directory, 
> rename or move {{current/}} directory. Also, log whatever is being 
> renamed/moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if current/VERSION is missing

2016-05-09 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10360:
---
Attachment: HDFS-10360.002.patch

Rev02: add log to record what files are being deleted during a DataNode 
reformat operation. There's no need to add extra JMX reporting code, as long as 
an IOException thrown adding the volume, it is added into JMX by the existing 
code.

The test failures seems unrelated. I don't have these test failures in my tree.

> DataNode may format directory and lose blocks if current/VERSION is missing
> ---
>
> Key: HDFS-10360
> URL: https://issues.apache.org/jira/browse/HDFS-10360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10360.001.patch, HDFS-10360.002.patch
>
>
> Under certain circumstances, if the current/VERSION of a storage directory is 
> missing, DataNode may format the storage directory even though _block files 
> are not missing_.
> This is very easy to reproduce. Simply launch a HDFS cluster and create some 
> files. Delete current/VERSION, and restart the data node.
> After the restart, the data node will format the directory and remove all 
> existing block files:
> {noformat}
> 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /data/dfs/dn/in_use.lock acquired by nodename 
> 5...@weichiu-dn-2.vpc.cloudera.com
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /data/dfs/dn is not formatted for 
> BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled for 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Block pool storage directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
> for BP-787466439-172
> .26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
> {noformat}
> The bug is: DataNode assumes that if none of {{current/VERSION}}, 
> {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
> {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
> important to HDFS and decides to format it. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
> However, block files may still exist, and in my opinion, we should do 
> everything possible to retain the block files.
> I have two suggestions:
> # check if {{current/}} directory is empty. If not, throw an 
> InconsistentFSStateException in {{Storage#analyzeStorage}} instead of 
> asumming its not formatted. Or,
> # In {{Storage#clearDirectory}}, before it formats the storage directory, 
> rename or move {{current/}} directory. Also, log whatever is being 
> renamed/moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if current/VERSION is missing

2016-05-03 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10360:
---
Description: 
Under certain circumstances, if the current/VERSION of a storage directory is 
missing, DataNode may format the storage directory even though _block files are 
not missing_.

This is very easy to reproduce. Simply launch a HDFS cluster and create some 
files. Delete current/VERSION, and restart the data node.

After the restart, the data node will format the directory and remove all 
existing block files:

{noformat}
2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock 
on /data/dfs/dn/in_use.lock acquired by nodename 
5...@weichiu-dn-2.vpc.cloudera.com
2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Storage directory /data/dfs/dn is not formatted for 
BP-787466439-172.26.24.43-1462305406642
2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Formatting ...
2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Locking is disabled for 
/data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Block pool storage directory 
/data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
for BP-787466439-172
.26.24.43-1462305406642
2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Formatting ...
2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
/data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
{noformat}

The bug is: DataNode assumes that if none of {{current/VERSION}}, 
{{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
{{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
important to HDFS and decides to format it. 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
However, block files may still exist, and in my opinion, we should do 
everything possible to retain the block files.

I have two suggestions:
# check if {{current/}} directory is empty. If not, throw an 
InconsistentFSStateException in {{Storage#analyzeStorage}} instead of asumming 
its not formatted. Or,
# In {{Storage#clearDirectory}}, before it formats the storage directory, 
rename or move {{current/}} directory. Also, log whatever is being 
renamed/moved.

  was:
Under certain circumstances, if the current/VERSION of a storage directory is 
missing, DataNode may format the storage directory even though _block files are 
not missing_.

This is very easy to reproduce. Simply launch a HDFS cluster and create some 
files. Delete current/VERSION, and restart the data node.

After the restart, the data node will format the directory and remove all 
existing block files:

{noformat}
2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock 
on /data/dfs/dn/in_use.lock acquired by nodename 
5...@weichiu-dn-2.vpc.cloudera.com
2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Storage directory /data/dfs/dn is not formatted for 
BP-787466439-172.26.24.43-1462305406642
2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Formatting ...
2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Locking is disabled for 
/data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Block pool storage directory 
/data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
for BP-787466439-172
.26.24.43-1462305406642
2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Formatting ...
2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
/data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
{noformat}

The bug is: DataNode assumes that if none of {{current/VERSION}}, 
{{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
{{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
important to HDFS and decides to format it. However, block files may still 
exist, and in my opinion, we should do everything possible to retain the block 
files.

I have two suggestions:
# check if {{current/}} directory is empty. If not, throw an 

[jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if current/VERSION is missing

2016-05-03 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10360:
---
Summary: DataNode may format directory and lose blocks if current/VERSION 
is missing  (was: DataNode may format directory and lose blocks if If 
current/VERSION is missing)

> DataNode may format directory and lose blocks if current/VERSION is missing
> ---
>
> Key: HDFS-10360
> URL: https://issues.apache.org/jira/browse/HDFS-10360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10360.001.patch
>
>
> Under certain circumstances, if the current/VERSION of a storage directory is 
> missing, DataNode may format the storage directory even though _block files 
> are not missing_.
> This is very easy to reproduce. Simply launch a HDFS cluster and create some 
> files. Delete current/VERSION, and restart the data node.
> After the restart, the data node will format the directory and remove all 
> existing block files:
> {noformat}
> 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /data/dfs/dn/in_use.lock acquired by nodename 
> 5...@weichiu-dn-2.vpc.cloudera.com
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /data/dfs/dn is not formatted for 
> BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled for 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Block pool storage directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted 
> for BP-787466439-172
> .26.24.43-1462305406642
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory 
> /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
> {noformat}
> The bug is: DataNode assumes that if none of {{current/VERSION}}, 
> {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and 
> {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing 
> important to HDFS and decides to format it. However, block files may still 
> exist, and in my opinion, we should do everything possible to retain the 
> block files.
> I have two suggestions:
> # check if {{current/}} directory is empty. If not, throw an 
> InconsistentFSStateException in {{Storage#analyzeStorage}} instead of 
> asumming its not formatted. Or,
> # In {{Storage#clearDirectory}}, before it formats the storage directory, 
> rename or move {{current/}} directory. Also, log whatever is being 
> renamed/moved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org