[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-30 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242912#comment-13242912
 ] 

Eli Collins commented on HDFS-3044:
---

I've committed it to branch-1

> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.0
>
> Attachments: HDFS-3044-b1.002.patch, HDFS-3044-b1.004.patch, 
> HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-30 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242909#comment-13242909
 ] 

Eli Collins commented on HDFS-3044:
---

+1 updated patch looks great

> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.0
>
> Attachments: HDFS-3044-b1.002.patch, HDFS-3044-b1.004.patch, 
> HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-29 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241854#comment-13241854
 ] 

Eli Collins commented on HDFS-3044:
---

Oops, I ran the test for sanity and looks like we need another rev.

> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.0
>
> Attachments: HDFS-3044-b1.002.patch, HDFS-3044.002.patch, 
> HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236605#comment-13236605
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Mapreduce-trunk #1028 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1028/])
HDFS-3044. fsck move should be non-destructive by default. Contributed by 
Colin Patrick McCabe (Revision 1304063)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304063
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236577#comment-13236577
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Mapreduce-0.23-Build #234 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/234/])
HDFS-3044. svn merge -c 1304063 from trunk (Revision 1304067)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304067
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-common-project
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job
* /hadoop/common/branches/branch-0.23/hadoop-project
* /hadoop/common/branches/branch-0.23/hadoop-project/src/site


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way 

[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236570#comment-13236570
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Hdfs-0.23-Build #206 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/206/])
HDFS-3044. svn merge -c 1304063 from trunk (Revision 1304067)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304067
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-common-project
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job
* /hadoop/common/branches/branch-0.23/hadoop-project
* /hadoop/common/branches/branch-0.23/hadoop-project/src/site


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" t

[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236563#comment-13236563
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Hdfs-trunk #993 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/993/])
HDFS-3044. fsck move should be non-destructive by default. Contributed by 
Colin Patrick McCabe (Revision 1304063)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304063
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236215#comment-13236215
 ] 

Todd Lipcon commented on HDFS-3044:
---

This probably needs a release note, and perhaps an incompatible change field. 
Seems like an acceptable incompatibility, but worth calling out in the changes.

> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236094#comment-13236094
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Mapreduce-0.23-Commit #723 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/723/])
HDFS-3044. svn merge -c 1304063 from trunk (Revision 1304067)

 Result = ABORTED
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304067
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-common-project
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job
* /hadoop/common/branches/branch-0.23/hadoop-project
* /hadoop/common/branches/branch-0.23/hadoop-project/src/site


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only wa

[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236082#comment-13236082
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1925 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1925/])
HDFS-3044. fsck move should be non-destructive by default. Contributed by 
Colin Patrick McCabe (Revision 1304063)

 Result = ABORTED
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304063
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236052#comment-13236052
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Common-0.23-Commit #715 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/715/])
HDFS-3044. svn merge -c 1304063 from trunk (Revision 1304067)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304067
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-common-project
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job
* /hadoop/common/branches/branch-0.23/hadoop-project
* /hadoop/common/branches/branch-0.23/hadoop-project/src/site


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "

[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236051#comment-13236051
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Hdfs-0.23-Commit #706 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/706/])
HDFS-3044. svn merge -c 1304063 from trunk (Revision 1304067)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304067
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-common-project
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job
* /hadoop/common/branches/branch-0.23/hadoop-project
* /hadoop/common/branches/branch-0.23/hadoop-project/src/site


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix"

[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236046#comment-13236046
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Common-trunk-Commit #1916 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1916/])
HDFS-3044. fsck move should be non-destructive by default. Contributed by 
Colin Patrick McCabe (Revision 1304063)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304063
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236044#comment-13236044
 ] 

Hudson commented on HDFS-3044:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1990 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1990/])
HDFS-3044. fsck move should be non-destructive by default. Contributed by 
Colin Patrick McCabe (Revision 1304063)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304063
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Fix For: 0.23.3
>
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235950#comment-13235950
 ] 

Hadoop QA commented on HDFS-3044:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519115/HDFS-3044.003.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2070//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2070//console

This message is automatically generated.

> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235938#comment-13235938
 ] 

Hadoop QA commented on HDFS-3044:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519115/HDFS-3044.003.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2069//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2069//console

This message is automatically generated.

> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-22 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235841#comment-13235841
 ] 

Eli Collins commented on HDFS-3044:
---

+1 pending jenkins (I'll kick it)

> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-20 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233591#comment-13233591
 ] 

Eli Collins commented on HDFS-3044:
---

- Still needs a test the new behavior of fsck move, that it's not destructive 
(ie a test that covers move w/o delete, and asserts the source files are still 
there)
- Nit, if we're going to name the flag "doMove" (vs eg salvageCorruptFiles) 
please add a comment by the declaration that "doMove" doesn't actually do a 
move anymore (since it no longer deletes its a copy now)

Otherwise looks great!

> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3044.002.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default

2012-03-09 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226646#comment-13226646
 ] 

Eli Collins commented on HDFS-3044:
---

- The new boolean destructive is unused
- FsckOperation is kind of overkill, probably simpler to have two bools since 
these are independent operations:
-- salvageCorruptFiles, whehter to copy whatever blocks are left to lost+found
-- deleteCorruptFiles, whether to delete corrupt files
- Let's rename lostFoundMove to something like copyBlocksToLostFound to reflect 
what this method actually does, ditto update the warning since we didn't really 
copy the file (perhaps "coppied accessible blocks for file X")
- Let's rename testFsckMove to testFsckMoveAndDelete and add a testFsckMove 
that tests that fsck move is not destructive
- Per the last bullet in the description would be good to at least add a log at 
INFO level indicating the # of datanodes that have checked in so an admin can 
see if the number looks off (and doesn't do a destructive operation before 
waiting for DNs to check in)


> fsck move should be non-destructive by default
> --
>
> Key: HDFS-3044
> URL: https://issues.apache.org/jira/browse/HDFS-3044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-3044.001.patch
>
>
> The fsck move behavior in the code and originally articulated in HADOOP-101 
> is:
> {quote}Current failure modes for DFS involve blocks that are completely 
> missing. The only way to "fix" them would be to recover chains of blocks and 
> put them into lost+found{quote}
> A directory is created with the file name, the blocks that are accessible are 
> created as individual files in this directory, then the original file is 
> removed. 
> I suspect the rationale for this behavior was that you can't use files that 
> are missing locations, and copying the block as files at least makes part of 
> the files accessible. However this behavior can also result in permanent 
> dataloss. Eg:
> - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
> startup, files with blocks where all replicas are on these set of datanodes 
> are marked corrupt
> - Admin does fsck move, which deletes the "corrupt" files, saves whatever 
> blocks were available
> - The HW issues with datanodes are resolved, they are started and join the 
> cluster. The NN tells them to delete their blocks for the corrupt files since 
> the file was deleted. 
> I think we should:
> - Make fsck move non-destructive by default (eg just does a move into 
> lost+found)
> - Make the destructive behavior optional (eg "--destructive" so admins think 
> about what they're doing)
> - Provide better sanity checks and warnings, eg if you're running fsck and 
> not all the slaves have checked in (if using dfs.hosts) then fsck should 
> print a warning indicating this that an admin should have to override if they 
> want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira