[jira] [Updated] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingjie Lai updated HDFS-2303: -- Attachment: HDFS-2303-5-trunk.patch Eli. Thanks for your review. I updated the patch and attached it here. jsvc needs to be recompilable - Key: HDFS-2303 URL: https://issues.apache.org/jira/browse/HDFS-2303 Project: Hadoop HDFS Issue Type: Bug Components: build, scripts Affects Versions: 0.23.0, 0.24.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Fix For: 0.24.0, 0.23.2 Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, HDFS-2303-4-trunk.patch, HDFS-2303-5-trunk.patch, HDFS-2303.patch.txt It would be nice to recompile jsvc as part of the native profile. This has a number of benefits including an ability to re-generate all binary artifacts, etc. Most of all, however, it will provide a way to generate jsvc on Linux distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3071) haadmin failover command does not provide enough detail for when target NN is not ready to be active
[ https://issues.apache.org/jira/browse/HDFS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HDFS-3071: - Assignee: Todd Lipcon haadmin failover command does not provide enough detail for when target NN is not ready to be active Key: HDFS-3071 URL: https://issues.apache.org/jira/browse/HDFS-3071 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger Assignee: Todd Lipcon When running the failover command, you can get an error message like the following: {quote} $ hdfs --config $(pwd) haadmin -failover namenode2 namenode1 Failover failed: xxx.yyy/1.2.3.4:8020 is not ready to become active {quote} Unfortunately, the error message doesn't describe why that node isn't ready to be active. In my case, the target namenode's logs don't indicate anything either. It turned out that the issue was Safe mode is ON.Resources are low on NN. Safe mode must be turned off manually., but ideally the user would be told that at the time of the failover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226052#comment-13226052 ] Hudson commented on HDFS-2976: -- Integrated in Hadoop-Hdfs-trunk #979 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/979/]) HDFS-2976 removed the unused imports that were missed in previous commit. (Revision 1298508) HDFS-2976 corrected the previous wrong commit for this issue. (Revision 1298507) HDFS-2976. Remove unnecessary method (tokenRefetchNeeded) in DFSClient. (Contributed by Uma Maheswara Rao G) (Revision 1298495) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1298508 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1298507 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1298495 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java Remove unnecessary method (tokenRefetchNeeded) in DFSClient --- Key: HDFS-2976 URL: https://issues.apache.org/jira/browse/HDFS-2976 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Trivial Fix For: 0.24.0 Attachments: HDFS-2976.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2966) TestNameNodeMetrics tests can fail under load
[ https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-2966: - Resolution: Fixed Fix Version/s: (was: 0.23.2) Status: Resolved (was: Patch Available) fixed in trunk. Not patched 0.23.x as the test is out of sync with other changes, and it's not that important. TestNameNodeMetrics tests can fail under load - Key: HDFS-2966 URL: https://issues.apache.org/jira/browse/HDFS-2966 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Environment: OS/X running intellij IDEA, firefox, winxp in a virtualbox. Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of running the HDFS tests on a desktop with out enough memory for all the programs trying to run. Things got swapped out and the tests failed as the DN heartbeats didn't come in on time. the tests both rely on {{waitForDeletion()}} to block the tests until the delete operation has completed, but all it does is sleep for the same number of seconds as there are datanodes. This is too brittle -it may work on a lightly-loaded system, but not on a system under heavy load where it is taking longer to replicate than expect. Immediate fix: double, triple, the sleep time? Better fix: have the thread block until all the DN heartbeats have finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load
[ https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226061#comment-13226061 ] Hudson commented on HDFS-2966: -- Integrated in Hadoop-Hdfs-trunk-Commit #1931 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1931/]) HDFS-2966 (Revision 1298820) Result = SUCCESS stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1298820 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java TestNameNodeMetrics tests can fail under load - Key: HDFS-2966 URL: https://issues.apache.org/jira/browse/HDFS-2966 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Environment: OS/X running intellij IDEA, firefox, winxp in a virtualbox. Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of running the HDFS tests on a desktop with out enough memory for all the programs trying to run. Things got swapped out and the tests failed as the DN heartbeats didn't come in on time. the tests both rely on {{waitForDeletion()}} to block the tests until the delete operation has completed, but all it does is sleep for the same number of seconds as there are datanodes. This is too brittle -it may work on a lightly-loaded system, but not on a system under heavy load where it is taking longer to replicate than expect. Immediate fix: double, triple, the sleep time? Better fix: have the thread block until all the DN heartbeats have finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load
[ https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226062#comment-13226062 ] Hudson commented on HDFS-2966: -- Integrated in Hadoop-Common-trunk-Commit #1856 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1856/]) HDFS-2966 (Revision 1298820) Result = SUCCESS stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1298820 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java TestNameNodeMetrics tests can fail under load - Key: HDFS-2966 URL: https://issues.apache.org/jira/browse/HDFS-2966 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Environment: OS/X running intellij IDEA, firefox, winxp in a virtualbox. Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of running the HDFS tests on a desktop with out enough memory for all the programs trying to run. Things got swapped out and the tests failed as the DN heartbeats didn't come in on time. the tests both rely on {{waitForDeletion()}} to block the tests until the delete operation has completed, but all it does is sleep for the same number of seconds as there are datanodes. This is too brittle -it may work on a lightly-loaded system, but not on a system under heavy load where it is taking longer to replicate than expect. Immediate fix: double, triple, the sleep time? Better fix: have the thread block until all the DN heartbeats have finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2492) BlockManager cross-rack replication checks only work for ScriptBasedMapping
[ https://issues.apache.org/jira/browse/HDFS-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226063#comment-13226063 ] Steve Loughran commented on HDFS-2492: -- +1 for this; it's needed to complete the roll out of the (still optional) topology base class; all tests are working. BlockManager cross-rack replication checks only work for ScriptBasedMapping --- Key: HDFS-2492 URL: https://issues.apache.org/jira/browse/HDFS-2492 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0, 0.24.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 0.24.0, 0.23.3 Attachments: HDFS-2492-blockmanager.patch, HDFS-2492-blockmanager.patch, HDFS-2492-blockmanager.patch, HDFS-2492-blockmanager.patch, HDFS-2492-blockmanager.patch, HDFS-2492-blockmanager.patch The BlockManager cross-rack replication checks only works if script files are used for replication, not if alternate plugins provide the topology information. This is because the BlockManager sets its rack checking flag if there is a filename key {code} shouldCheckForEnoughRacks = conf.get(DFSConfigKeys.NET_TOPOLOGY_SCRIPT_FILE_NAME_KEY) != null; {code} yet this filename key is only used if the topology mapper defined by {code} DFSConfigKeys.NET_TOPOLOGY_NODE_SWITCH_MAPPING_IMPL_KEY {code} is an instance of {{ScriptBasedMapping}} If any other mapper is used, the system may be multi rack, but the Block Manager will not be aware of this fact unless the filename key is set to something non-null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226078#comment-13226078 ] Hudson commented on HDFS-2976: -- Integrated in Hadoop-Mapreduce-trunk #1014 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1014/]) HDFS-2976 removed the unused imports that were missed in previous commit. (Revision 1298508) HDFS-2976 corrected the previous wrong commit for this issue. (Revision 1298507) HDFS-2976. Remove unnecessary method (tokenRefetchNeeded) in DFSClient. (Contributed by Uma Maheswara Rao G) (Revision 1298495) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1298508 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1298507 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1298495 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java Remove unnecessary method (tokenRefetchNeeded) in DFSClient --- Key: HDFS-2976 URL: https://issues.apache.org/jira/browse/HDFS-2976 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Trivial Fix For: 0.24.0 Attachments: HDFS-2976.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load
[ https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226082#comment-13226082 ] Hudson commented on HDFS-2966: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1865 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1865/]) HDFS-2966 (Revision 1298820) Result = ABORTED stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1298820 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java TestNameNodeMetrics tests can fail under load - Key: HDFS-2966 URL: https://issues.apache.org/jira/browse/HDFS-2966 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Environment: OS/X running intellij IDEA, firefox, winxp in a virtualbox. Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of running the HDFS tests on a desktop with out enough memory for all the programs trying to run. Things got swapped out and the tests failed as the DN heartbeats didn't come in on time. the tests both rely on {{waitForDeletion()}} to block the tests until the delete operation has completed, but all it does is sleep for the same number of seconds as there are datanodes. This is too brittle -it may work on a lightly-loaded system, but not on a system under heavy load where it is taking longer to replicate than expect. Immediate fix: double, triple, the sleep time? Better fix: have the thread block until all the DN heartbeats have finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3063) NameNode should validate all coming file path
[ https://issues.apache.org/jira/browse/HDFS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226121#comment-13226121 ] Daryn Sharp commented on HDFS-3063: --- I was posing it as a question. I'm not a rpc expert, but quickly running through the call code does indeed look impossible to hook in w/o artificially coupling the rpc layer to the namenode protocol. If so, definitely scratch that idea. A rpc domain expert might provide guidance. My only other suggestions would be to consider calling a method that encapsulates the if throw. If we want to change the exception type or message, it's 1 instead of N-many locations to change. Since we use generic IOExceptions everywhere, the client often has to mince the error string which makes enforced consistency especially important. It may make sense to always perform the check as the first statement of the methods. The validity of the path is unrelated to safemode, so should I really have to wait for the NN to be operational before knowing my paths are invalid? NameNode should validate all coming file path - Key: HDFS-3063 URL: https://issues.apache.org/jira/browse/HDFS-3063 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.20.205.0 Reporter: Denny Ye Priority: Minor Labels: namenode Attachments: HDFS-3063.patch NameNode provides RPC service for not only DFS client but also user defined program. A common case we always met is that user transfers file path prefixed with HDFS protocol(hdfs://{namenode:{port}}/{folder}/{file}). NameNode cannot map node meta-data with this path and always throw NPE. In user client, we only see the NullPointerException, no other tips for which step it occurs. Also, NameNode should validate all coming file path with regular format. One exception I met: Exception in thread main org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:334) at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:329) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1623) High Availability Framework for HDFS NN
[ https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226273#comment-13226273 ] Eli Collins commented on HDFS-1623: --- +1 branch 23 patch looks good to me High Availability Framework for HDFS NN --- Key: HDFS-1623 URL: https://issues.apache.org/jira/browse/HDFS-1623 Project: Hadoop HDFS Issue Type: New Feature Reporter: Sanjay Radia Fix For: 0.24.0 Attachments: HA-tests.pdf, HDFS-1623.rel23.patch, HDFS-1623.trunk.patch, HDFS-High-Availability.pdf, NameNode HA_v2.pdf, NameNode HA_v2_1.pdf, Namenode HA Framework.pdf, dfsio-results.tsv, ha-testplan.pdf, ha-testplan.tex -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2303: -- Fix Version/s: (was: 0.23.2) (was: 0.24.0) Target Version/s: 0.23.3 Status: Patch Available (was: Open) jsvc needs to be recompilable - Key: HDFS-2303 URL: https://issues.apache.org/jira/browse/HDFS-2303 Project: Hadoop HDFS Issue Type: Bug Components: build, scripts Affects Versions: 0.23.0, 0.24.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, HDFS-2303-4-trunk.patch, HDFS-2303-5-trunk.patch, HDFS-2303.patch.txt It would be nice to recompile jsvc as part of the native profile. This has a number of benefits including an ability to re-generate all binary artifacts, etc. Most of all, however, it will provide a way to generate jsvc on Linux distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3066) cap space usage of default log4j rolling policy (hdfs specific changes)
[ https://issues.apache.org/jira/browse/HDFS-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3066: -- Target Version/s: 0.23.3 cap space usage of default log4j rolling policy (hdfs specific changes) --- Key: HDFS-3066 URL: https://issues.apache.org/jira/browse/HDFS-3066 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: HDFS-3066.patch see HADOOP-8149 for background on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: (was: HDFS-3004.007.patch) Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.008.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: HDFS-3004.008.patch regenerate patch with diff --no-prefix (d'oh!) Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.008.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3073) NetworkTopology::getLeaf should check for invalid topologies
NetworkTopology::getLeaf should check for invalid topologies Key: HDFS-3073 URL: https://issues.apache.org/jira/browse/HDFS-3073 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Currently, NetworkTopology::getLeaf doesn't do too much validation on the NetworkTopology object itself. This results in us getting ClassCastException sometimes when the topology is invalid. We should have a less confusing exception message for this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226314#comment-13226314 ] Hadoop QA commented on HDFS-2303: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517689/HDFS-2303-5-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1977//console This message is automatically generated. jsvc needs to be recompilable - Key: HDFS-2303 URL: https://issues.apache.org/jira/browse/HDFS-2303 Project: Hadoop HDFS Issue Type: Bug Components: build, scripts Affects Versions: 0.23.0, 0.24.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, HDFS-2303-4-trunk.patch, HDFS-2303-5-trunk.patch, HDFS-2303.patch.txt It would be nice to recompile jsvc as part of the native profile. This has a number of benefits including an ability to re-generate all binary artifacts, etc. Most of all, however, it will provide a way to generate jsvc on Linux distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226378#comment-13226378 ] Eli Collins commented on HDFS-3004: --- Your comments above make sense, thanks for the explanation. Comments on latest patch: - HDFS-2709 (hash 110b6d0) introduced EditLogInputException and used to have places where it was caught explicitly, that they just catch IOE, so given that you we no longer throw this either you can remove the class entirely - In logTruncateMessage we should log something like stopping edit log load at position X instead of saying we're truncating it because we're not actually truncating the log (from the user's perspective) - Isn't always select the first choice effectively always skip? Better to call it that as users might think it means use the previously selected option for all future choices (eg if I chose skip then chose try to fix then always choose 1st I might not have meant to always skip). - The conditional on answer is probably more readable as a switch, wasn't clear that the else clause was always a and therefore that's why we call recovery.setAlwaysChooseFirst() - What is the TODO: attempt to resynchronize stream here for? - Should use s.equals(answer) instead of answer == s etc since if for some reason RecoveryContext doesn't return the exact object it was passed in the future this would break - Should RC#ask should log as info instead of error for prompt and automatically choosing log - RC#ask javadoc needs to be updated to match the method. Also, his choice - their choice =P - RecoveryContext could use a high-level javadoc with a sentence or two since the name is pretty generic and the use is very specific - Can s/LOG.error/LOG.fatal/ in NN.java for recovery failed case - NN#printUsage has two IMPORT lines - ++i still used in a couple files - brackets on their own line still need fixing eg } else if { - Why does TestRecoverTruncatedEditLog make the same dir 21 times? Maybe you mean to append i to the path? The test should corrupt an operation that mutates the namespace (vs the last op which I believe is an op to finalize the log segment) so you can test that that edit is not present when you reload (eg corrupt the edit to mkdir /foo then assert /foo does not exist in the namespace) Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.008.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2303: -- Attachment: HDFS-2303-5-modcommon-trunk.patch Same patch modulo the commented line in hadoop-env.sh in hadoop-common so jenkins will run. jsvc needs to be recompilable - Key: HDFS-2303 URL: https://issues.apache.org/jira/browse/HDFS-2303 Project: Hadoop HDFS Issue Type: Bug Components: build, scripts Affects Versions: 0.23.0, 0.24.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, HDFS-2303-4-trunk.patch, HDFS-2303-5-modcommon-trunk.patch, HDFS-2303-5-trunk.patch, HDFS-2303.patch.txt It would be nice to recompile jsvc as part of the native profile. This has a number of benefits including an ability to re-generate all binary artifacts, etc. Most of all, however, it will provide a way to generate jsvc on Linux distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226395#comment-13226395 ] Eli Collins commented on HDFS-2303: --- +1 Latest patch (HDFS-2303-5-trunk.patch) looks great. Thanks Mingjie! jsvc needs to be recompilable - Key: HDFS-2303 URL: https://issues.apache.org/jira/browse/HDFS-2303 Project: Hadoop HDFS Issue Type: Bug Components: build, scripts Affects Versions: 0.23.0, 0.24.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, HDFS-2303-4-trunk.patch, HDFS-2303-5-modcommon-trunk.patch, HDFS-2303-5-trunk.patch, HDFS-2303.patch.txt It would be nice to recompile jsvc as part of the native profile. This has a number of benefits including an ability to re-generate all binary artifacts, etc. Most of all, however, it will provide a way to generate jsvc on Linux distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3066) cap space usage of default log4j rolling policy (hdfs specific changes)
[ https://issues.apache.org/jira/browse/HDFS-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226399#comment-13226399 ] Eli Collins commented on HDFS-3066: --- +1 pending jenkins cap space usage of default log4j rolling policy (hdfs specific changes) --- Key: HDFS-3066 URL: https://issues.apache.org/jira/browse/HDFS-3066 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: HDFS-3066.patch see HADOOP-8149 for background on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226402#comment-13226402 ] Hadoop QA commented on HDFS-2303: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517770/HDFS-2303-5-modcommon-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1978//console This message is automatically generated. jsvc needs to be recompilable - Key: HDFS-2303 URL: https://issues.apache.org/jira/browse/HDFS-2303 Project: Hadoop HDFS Issue Type: Bug Components: build, scripts Affects Versions: 0.23.0, 0.24.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, HDFS-2303-4-trunk.patch, HDFS-2303-5-modcommon-trunk.patch, HDFS-2303-5-trunk.patch, HDFS-2303.patch.txt It would be nice to recompile jsvc as part of the native profile. This has a number of benefits including an ability to re-generate all binary artifacts, etc. Most of all, however, it will provide a way to generate jsvc on Linux distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2303: -- Attachment: HDFS-2303-5-modcommon-trunk.patch Right patch this time. jsvc needs to be recompilable - Key: HDFS-2303 URL: https://issues.apache.org/jira/browse/HDFS-2303 Project: Hadoop HDFS Issue Type: Bug Components: build, scripts Affects Versions: 0.23.0, 0.24.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, HDFS-2303-4-trunk.patch, HDFS-2303-5-modcommon-trunk.patch, HDFS-2303-5-modcommon-trunk.patch, HDFS-2303-5-trunk.patch, HDFS-2303.patch.txt It would be nice to recompile jsvc as part of the native profile. This has a number of benefits including an ability to re-generate all binary artifacts, etc. Most of all, however, it will provide a way to generate jsvc on Linux distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3073) NetworkTopology::getLeaf should check for invalid topologies
[ https://issues.apache.org/jira/browse/HDFS-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226445#comment-13226445 ] Aaron T. Myers commented on HDFS-3073: -- Seems like this JIRA should perhaps be moved to Common? NetworkTopology::getLeaf should check for invalid topologies Key: HDFS-3073 URL: https://issues.apache.org/jira/browse/HDFS-3073 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Currently, NetworkTopology::getLeaf doesn't do too much validation on the NetworkTopology object itself. This results in us getting ClassCastException sometimes when the topology is invalid. We should have a less confusing exception message for this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3066) cap space usage of default log4j rolling policy (hdfs specific changes)
[ https://issues.apache.org/jira/browse/HDFS-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226479#comment-13226479 ] Hadoop QA commented on HDFS-3066: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517634/HDFS-3066.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1979//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1979//console This message is automatically generated. cap space usage of default log4j rolling policy (hdfs specific changes) --- Key: HDFS-3066 URL: https://issues.apache.org/jira/browse/HDFS-3066 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: HDFS-3066.patch see HADOOP-8149 for background on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226490#comment-13226490 ] Colin Patrick McCabe commented on HDFS-3004: bq. Isn't always select the first choice effectively always skip? Better to call it that as users might think it means use the previously selected option for all future choices (eg if I chose skip then chose try to fix then always choose 1st I might not have meant to always skip). The first choice isn't always skip-- sometimes it's truncate. Agree with the rest of the points Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.008.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3073) NetworkTopology::getLeaf should check for invalid topologies
[ https://issues.apache.org/jira/browse/HDFS-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3073: --- Attachment: HDFS-3073.002.patch * refactor getLeaf a bit * the exception getLeaf() throws for an invalid Node now includes the offending Node as a string, and a helpful error message. NetworkTopology::getLeaf should check for invalid topologies Key: HDFS-3073 URL: https://issues.apache.org/jira/browse/HDFS-3073 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3073.002.patch Currently, NetworkTopology::getLeaf doesn't do too much validation on the NetworkTopology object itself. This results in us getting ClassCastException sometimes when the topology is invalid. We should have a less confusing exception message for this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226530#comment-13226530 ] Colin Patrick McCabe commented on HDFS-3004: cannot switch on a value of type String for source level below 1.7 Nice idea, but it looks like it's going to have to be if statements. Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.008.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes
[ https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3070: -- Target Version/s: 0.23.3 hdfs balancer doesn't balance blocks between datanodes -- Key: HDFS-3070 URL: https://issues.apache.org/jira/browse/HDFS-3070 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 0.24.0 Reporter: Stephen Chu Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, both have over 3% disk usage. Attached is a screenshot of the Live Nodes web UI. On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see the blocks being balanced across all 4 datanodes (all blocks on styx01 and styx02 stay put). HA is currently enabled. [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1 active [schu@styx01 ~]$ hdfs balancer -threshold 1 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = [] 12/03/08 10:10:32 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=1.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Balancing took 95.0 milliseconds [schu@styx01 ~]$ I believe with a threshold of 1% the balancer should trigger blocks being moved across DataNodes, right? I am curious about the namenode = [] from the above output. [schu@styx01 ~]$ hadoop version Hadoop 0.24.0-SNAPSHOT Subversion git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common -r f6a577d697bbcd04ffbc568167c97b79479ff319 Compiled by schu on Thu Mar 8 15:32:50 PST 2012 From source with checksum ec971a6e7316f7fbf471b617905856b8 From http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html: The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2303: -- Status: Open (was: Patch Available) jsvc needs to be recompilable - Key: HDFS-2303 URL: https://issues.apache.org/jira/browse/HDFS-2303 Project: Hadoop HDFS Issue Type: Bug Components: build, scripts Affects Versions: 0.23.0, 0.24.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, HDFS-2303-4-trunk.patch, HDFS-2303-5-modcommon-trunk.patch, HDFS-2303-5-modcommon-trunk.patch, HDFS-2303-5-trunk.patch, HDFS-2303.patch.txt It would be nice to recompile jsvc as part of the native profile. This has a number of benefits including an ability to re-generate all binary artifacts, etc. Most of all, however, it will provide a way to generate jsvc on Linux distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2303: -- Status: Patch Available (was: Open) jsvc needs to be recompilable - Key: HDFS-2303 URL: https://issues.apache.org/jira/browse/HDFS-2303 Project: Hadoop HDFS Issue Type: Bug Components: build, scripts Affects Versions: 0.23.0, 0.24.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, HDFS-2303-4-trunk.patch, HDFS-2303-5-modcommon-trunk.patch, HDFS-2303-5-modcommon-trunk.patch, HDFS-2303-5-trunk.patch, HDFS-2303.patch.txt It would be nice to recompile jsvc as part of the native profile. This has a number of benefits including an ability to re-generate all binary artifacts, etc. Most of all, however, it will provide a way to generate jsvc on Linux distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226583#comment-13226583 ] Eli Collins commented on HDFS-3004: --- HDFS-3004.008.patch has a bunch of other stuff in it, probably not the patch you intended. Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.008.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes
[ https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226590#comment-13226590 ] Tsz Wo (Nicholas), SZE commented on HDFS-3070: -- 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = [] The namenode lists is empty. You have to set dfs.namenode.servicerpc-address. hdfs balancer doesn't balance blocks between datanodes -- Key: HDFS-3070 URL: https://issues.apache.org/jira/browse/HDFS-3070 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 0.24.0 Reporter: Stephen Chu Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, both have over 3% disk usage. Attached is a screenshot of the Live Nodes web UI. On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see the blocks being balanced across all 4 datanodes (all blocks on styx01 and styx02 stay put). HA is currently enabled. [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1 active [schu@styx01 ~]$ hdfs balancer -threshold 1 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = [] 12/03/08 10:10:32 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=1.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Balancing took 95.0 milliseconds [schu@styx01 ~]$ I believe with a threshold of 1% the balancer should trigger blocks being moved across DataNodes, right? I am curious about the namenode = [] from the above output. [schu@styx01 ~]$ hadoop version Hadoop 0.24.0-SNAPSHOT Subversion git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common -r f6a577d697bbcd04ffbc568167c97b79479ff319 Compiled by schu on Thu Mar 8 15:32:50 PST 2012 From source with checksum ec971a6e7316f7fbf471b617905856b8 From http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html: The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes
[ https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226599#comment-13226599 ] Eli Collins commented on HDFS-3070: --- Stephen, What are dfs.namenode.rpc-address and servicerpc-address set to in the configs? I suspect at least the 1st is set so it might be a bug in the method the balancer uses to determine the namenodes (eg doesn't work for a federated or HA conf). hdfs balancer doesn't balance blocks between datanodes -- Key: HDFS-3070 URL: https://issues.apache.org/jira/browse/HDFS-3070 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 0.24.0 Reporter: Stephen Chu Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, both have over 3% disk usage. Attached is a screenshot of the Live Nodes web UI. On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see the blocks being balanced across all 4 datanodes (all blocks on styx01 and styx02 stay put). HA is currently enabled. [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1 active [schu@styx01 ~]$ hdfs balancer -threshold 1 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = [] 12/03/08 10:10:32 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=1.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Balancing took 95.0 milliseconds [schu@styx01 ~]$ I believe with a threshold of 1% the balancer should trigger blocks being moved across DataNodes, right? I am curious about the namenode = [] from the above output. [schu@styx01 ~]$ hadoop version Hadoop 0.24.0-SNAPSHOT Subversion git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common -r f6a577d697bbcd04ffbc568167c97b79479ff319 Compiled by schu on Thu Mar 8 15:32:50 PST 2012 From source with checksum ec971a6e7316f7fbf471b617905856b8 From http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html: The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226624#comment-13226624 ] Eli Collins commented on HDFS-3050: --- Hey Colin, Agree that #4 is good approach, can punt on #2 for now. Overall your patch looks great, minor stuff: - PermissionStatus, DelegationKey, Block, DelegationTokenIdentifier diffs are just unused imports - addSaxString and OfflineEditsViewer#go could use a small javadoc each - Brackets go on the same line as clauses (can update your editor to do this, start with the Java conventions and update to no tabs and two space indent) Thanks, Eli refactor OEV to share more code with the NameNode - Key: HDFS-3050 URL: https://issues.apache.org/jira/browse/HDFS-3050 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3050.004.patch Current, OEV (the offline edits viewer) re-implements all of the opcode parsing logic found in the NameNode. This duplicated code creates a maintenance burden for us. OEV should be refactored to simply use the normal EditLog parsing code, rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default
[ https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226646#comment-13226646 ] Eli Collins commented on HDFS-3044: --- - The new boolean destructive is unused - FsckOperation is kind of overkill, probably simpler to have two bools since these are independent operations: -- salvageCorruptFiles, whehter to copy whatever blocks are left to lost+found -- deleteCorruptFiles, whether to delete corrupt files - Let's rename lostFoundMove to something like copyBlocksToLostFound to reflect what this method actually does, ditto update the warning since we didn't really copy the file (perhaps coppied accessible blocks for file X) - Let's rename testFsckMove to testFsckMoveAndDelete and add a testFsckMove that tests that fsck move is not destructive - Per the last bullet in the description would be good to at least add a log at INFO level indicating the # of datanodes that have checked in so an admin can see if the number looks off (and doesn't do a destructive operation before waiting for DNs to check in) fsck move should be non-destructive by default -- Key: HDFS-3044 URL: https://issues.apache.org/jira/browse/HDFS-3044 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Eli Collins Assignee: Colin Patrick McCabe Attachments: HDFS-3044.001.patch The fsck move behavior in the code and originally articulated in HADOOP-101 is: {quote}Current failure modes for DFS involve blocks that are completely missing. The only way to fix them would be to recover chains of blocks and put them into lost+found{quote} A directory is created with the file name, the blocks that are accessible are created as individual files in this directory, then the original file is removed. I suspect the rationale for this behavior was that you can't use files that are missing locations, and copying the block as files at least makes part of the files accessible. However this behavior can also result in permanent dataloss. Eg: - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster startup, files with blocks where all replicas are on these set of datanodes are marked corrupt - Admin does fsck move, which deletes the corrupt files, saves whatever blocks were available - The HW issues with datanodes are resolved, they are started and join the cluster. The NN tells them to delete their blocks for the corrupt files since the file was deleted. I think we should: - Make fsck move non-destructive by default (eg just does a move into lost+found) - Make the destructive behavior optional (eg --destructive so admins think about what they're doing) - Provide better sanity checks and warnings, eg if you're running fsck and not all the slaves have checked in (if using dfs.hosts) then fsck should print a warning indicating this that an admin should have to override if they want to do something destructive -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3045) fsck move should bail on a file if it can't create a block file
[ https://issues.apache.org/jira/browse/HDFS-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226655#comment-13226655 ] Eli Collins commented on HDFS-3045: --- Looks good. This doesn't unwind the directory creation in lost+found but on 2nd thought I think that's better (might as well salvage what blocks we can). Nits: - Better if the IO references the file it failed to create, eg {code} throw new IOException(errmsg + : could not create + target + / + chain); {code} - Not your change but lets add brackets to the if where the new IOE is thrown fsck move should bail on a file if it can't create a block file --- Key: HDFS-3045 URL: https://issues.apache.org/jira/browse/HDFS-3045 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Eli Collins Assignee: Colin Patrick McCabe Attachments: HDFS-3045.001.patch NamenodeFsck#lostFoundMove, when it fails to create a file for a block continues on to the next block (There's a comment perhaps we should bail out here... but it doesn't). It should instead fail the move for that particular file (unwind the directory creation and not delete the original file). Otherwise a transient failure speaking to the NN means this block is lost forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: HDFS-3004.009.patch diff against correct change Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.009.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: (was: HDFS-3004.008.patch) Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.009.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3056) Add an interface for DataBlockScanner logging
[ https://issues.apache.org/jira/browse/HDFS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226701#comment-13226701 ] Suresh Srinivas commented on HDFS-3056: --- +1 for the patch. Add an interface for DataBlockScanner logging - Key: HDFS-3056 URL: https://issues.apache.org/jira/browse/HDFS-3056 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3056_20120306.patch, h3056_20120307.patch, h3056_20120307b.patch Some methods in the FSDatasetInterface are used only for logging in DataBlockScanner. These methods should be separated out to an new interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: HDFS-3004.010.patch fix diff again, sigh Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: (was: HDFS-3004.009.patch) Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3056) Add an interface for DataBlockScanner logging
[ https://issues.apache.org/jira/browse/HDFS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226713#comment-13226713 ] Hudson commented on HDFS-3056: -- Integrated in Hadoop-Hdfs-trunk-Commit #1938 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1938/]) HDFS-3056: add the new file for the previous commit. (Revision 1299144) HDFS-3056. Add a new interface RollingLogs for DataBlockScanner logging. (Revision 1299139) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1299144 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/RollingLogs.java szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1299139 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java Add an interface for DataBlockScanner logging - Key: HDFS-3056 URL: https://issues.apache.org/jira/browse/HDFS-3056 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3056_20120306.patch, h3056_20120307.patch, h3056_20120307b.patch Some methods in the FSDatasetInterface are used only for logging in DataBlockScanner. These methods should be separated out to an new interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3056) Add an interface for DataBlockScanner logging
[ https://issues.apache.org/jira/browse/HDFS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3056: - Resolution: Fixed Fix Version/s: 0.23.3 0.24.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review, Suresh. I have committed this to trunk and 0.23. Add an interface for DataBlockScanner logging - Key: HDFS-3056 URL: https://issues.apache.org/jira/browse/HDFS-3056 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3056_20120306.patch, h3056_20120307.patch, h3056_20120307b.patch Some methods in the FSDatasetInterface are used only for logging in DataBlockScanner. These methods should be separated out to an new interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3056) Add an interface for DataBlockScanner logging
[ https://issues.apache.org/jira/browse/HDFS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226719#comment-13226719 ] Hudson commented on HDFS-3056: -- Integrated in Hadoop-Common-trunk-Commit #1863 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1863/]) HDFS-3056: add the new file for the previous commit. (Revision 1299144) HDFS-3056. Add a new interface RollingLogs for DataBlockScanner logging. (Revision 1299139) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1299144 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/RollingLogs.java szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1299139 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java Add an interface for DataBlockScanner logging - Key: HDFS-3056 URL: https://issues.apache.org/jira/browse/HDFS-3056 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3056_20120306.patch, h3056_20120307.patch, h3056_20120307b.patch Some methods in the FSDatasetInterface are used only for logging in DataBlockScanner. These methods should be separated out to an new interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3050: --- Attachment: HDFS-3050.006.patch address Eli's suggestions refactor OEV to share more code with the NameNode - Key: HDFS-3050 URL: https://issues.apache.org/jira/browse/HDFS-3050 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3050.006.patch Current, OEV (the offline edits viewer) re-implements all of the opcode parsing logic found in the NameNode. This duplicated code creates a maintenance burden for us. OEV should be refactored to simply use the normal EditLog parsing code, rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3050: --- Attachment: (was: HDFS-3050.004.patch) refactor OEV to share more code with the NameNode - Key: HDFS-3050 URL: https://issues.apache.org/jira/browse/HDFS-3050 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3050.006.patch Current, OEV (the offline edits viewer) re-implements all of the opcode parsing logic found in the NameNode. This duplicated code creates a maintenance burden for us. OEV should be refactored to simply use the normal EditLog parsing code, rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3056) Add an interface for DataBlockScanner logging
[ https://issues.apache.org/jira/browse/HDFS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226721#comment-13226721 ] Hudson commented on HDFS-3056: -- Integrated in Hadoop-Hdfs-0.23-Commit #660 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/660/]) Merge r1299139 and r1299144 from trunk for HDFS-3056. (Revision 1299146) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1299146 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataBlockScanner.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/RollingLogs.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java Add an interface for DataBlockScanner logging - Key: HDFS-3056 URL: https://issues.apache.org/jira/browse/HDFS-3056 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3056_20120306.patch, h3056_20120307.patch, h3056_20120307b.patch Some methods in the FSDatasetInterface are used only for logging in DataBlockScanner. These methods should be separated out to an new interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3056) Add an interface for DataBlockScanner logging
[ https://issues.apache.org/jira/browse/HDFS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226722#comment-13226722 ] Hudson commented on HDFS-3056: -- Integrated in Hadoop-Common-0.23-Commit #669 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/669/]) Merge r1299139 and r1299144 from trunk for HDFS-3056. (Revision 1299146) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1299146 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataBlockScanner.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/RollingLogs.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java Add an interface for DataBlockScanner logging - Key: HDFS-3056 URL: https://issues.apache.org/jira/browse/HDFS-3056 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3056_20120306.patch, h3056_20120307.patch, h3056_20120307b.patch Some methods in the FSDatasetInterface are used only for logging in DataBlockScanner. These methods should be separated out to an new interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3067) NPE in DFSInputStream.readBuffer if read is repeated on corrupted block
[ https://issues.apache.org/jira/browse/HDFS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226724#comment-13226724 ] Aaron T. Myers commented on HDFS-3067: -- Looks pretty good to me, Hank. Just a few small nits. +1 once these are addressed. # A few lines are over 80 chars. # Indent 4 lines on lines that go over 80 chars, instead of 2. # Rather than use the sawException boolean, add an explicit call to fail() after the dis.read(), and call GenericTestUtils.assertExceptionContains(...) in the catch clause. # Put some white space around = and in the for loop. NPE in DFSInputStream.readBuffer if read is repeated on corrupted block --- Key: HDFS-3067 URL: https://issues.apache.org/jira/browse/HDFS-3067 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.24.0 Reporter: Henry Robinson Assignee: Henry Robinson Attachments: HDFS-3607.patch With a singly-replicated block that's corrupted, issuing a read against it twice in succession (e.g. if ChecksumException is caught by the client) gives a NullPointerException. Here's the body of a test that reproduces the problem: {code} final short REPL_FACTOR = 1; final long FILE_LENGTH = 512L; cluster.waitActive(); FileSystem fs = cluster.getFileSystem(); Path path = new Path(/corrupted); DFSTestUtil.createFile(fs, path, FILE_LENGTH, REPL_FACTOR, 12345L); DFSTestUtil.waitReplication(fs, path, REPL_FACTOR); ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, path); int blockFilesCorrupted = cluster.corruptBlockOnDataNodes(block); assertEquals(All replicas not corrupted, REPL_FACTOR, blockFilesCorrupted); InetSocketAddress nnAddr = new InetSocketAddress(localhost, cluster.getNameNodePort()); DFSClient client = new DFSClient(nnAddr, conf); DFSInputStream dis = client.open(path.toString()); byte[] arr = new byte[(int)FILE_LENGTH]; boolean sawException = false; try { dis.read(arr, 0, (int)FILE_LENGTH); } catch (ChecksumException ex) { sawException = true; } assertTrue(sawException); sawException = false; try { dis.read(arr, 0, (int)FILE_LENGTH); // -- NPE thrown here } catch (ChecksumException ex) { sawException = true; } {code} The stack: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:492) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:545) [snip test stack] {code} and the problem is that currentNode is null. It's left at null after the first read, which fails, and then is never refreshed because the condition in read that protects blockSeekTo is only triggered if the current position is outside the block's range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3056) Add an interface for DataBlockScanner logging
[ https://issues.apache.org/jira/browse/HDFS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226730#comment-13226730 ] Hudson commented on HDFS-3056: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1872 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1872/]) HDFS-3056: add the new file for the previous commit. (Revision 1299144) HDFS-3056. Add a new interface RollingLogs for DataBlockScanner logging. (Revision 1299139) Result = ABORTED szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1299144 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/RollingLogs.java szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1299139 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java Add an interface for DataBlockScanner logging - Key: HDFS-3056 URL: https://issues.apache.org/jira/browse/HDFS-3056 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3056_20120306.patch, h3056_20120307.patch, h3056_20120307b.patch Some methods in the FSDatasetInterface are used only for logging in DataBlockScanner. These methods should be separated out to an new interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3056) Add an interface for DataBlockScanner logging
[ https://issues.apache.org/jira/browse/HDFS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226731#comment-13226731 ] Hudson commented on HDFS-3056: -- Integrated in Hadoop-Mapreduce-0.23-Commit #677 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/677/]) Merge r1299139 and r1299144 from trunk for HDFS-3056. (Revision 1299146) Result = ABORTED szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1299146 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataBlockScanner.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/RollingLogs.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeBlockScanner.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java Add an interface for DataBlockScanner logging - Key: HDFS-3056 URL: https://issues.apache.org/jira/browse/HDFS-3056 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3056_20120306.patch, h3056_20120307.patch, h3056_20120307b.patch Some methods in the FSDatasetInterface are used only for logging in DataBlockScanner. These methods should be separated out to an new interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1512) BlockSender calls deprecated method getReplica
[ https://issues.apache.org/jira/browse/HDFS-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-1512: -- Attachment: HDFS-1512.patch Amin, I just re-based your patch based on trunk. Lets trigger Jenkins. BlockSender calls deprecated method getReplica -- Key: HDFS-1512 URL: https://issues.apache.org/jira/browse/HDFS-1512 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Amin Bandeali Labels: newbie Attachments: HDFS-1512.patch, HDFS-1512.patch HDFS-680 deprecated FSDatasetInterface#getReplica, however it is still used by BlockSender which still maintains a Replica member. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1512) BlockSender calls deprecated method getReplica
[ https://issues.apache.org/jira/browse/HDFS-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-1512: -- Status: Open (was: Patch Available) BlockSender calls deprecated method getReplica -- Key: HDFS-1512 URL: https://issues.apache.org/jira/browse/HDFS-1512 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Amin Bandeali Labels: newbie Attachments: HDFS-1512.patch, HDFS-1512.patch HDFS-680 deprecated FSDatasetInterface#getReplica, however it is still used by BlockSender which still maintains a Replica member. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1512) BlockSender calls deprecated method getReplica
[ https://issues.apache.org/jira/browse/HDFS-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-1512: -- Status: Patch Available (was: Open) BlockSender calls deprecated method getReplica -- Key: HDFS-1512 URL: https://issues.apache.org/jira/browse/HDFS-1512 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Amin Bandeali Labels: newbie Attachments: HDFS-1512.patch, HDFS-1512.patch HDFS-680 deprecated FSDatasetInterface#getReplica, however it is still used by BlockSender which still maintains a Replica member. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1512) BlockSender calls deprecated method getReplica
[ https://issues.apache.org/jira/browse/HDFS-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226745#comment-13226745 ] Amin Bandeali commented on HDFS-1512: - How do I trigger? On Fri, Mar 9, 2012 at 7:58 PM, Uma Maheswara Rao G (Updated) (JIRA) -- Amin Bandeali Cell: 714.757.9544 Follow me on twitter http://twitter.com/aminbandeali DISCLAIMER This e-mail is confidential and intended solely for the use of the individual to whom it is addressed. If you have received this e-mail in error please notify me. Although this message and any attachments are believed to be free of any virus or other defect, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by me for any loss or damage in any way arising from its use. BlockSender calls deprecated method getReplica -- Key: HDFS-1512 URL: https://issues.apache.org/jira/browse/HDFS-1512 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Amin Bandeali Labels: newbie Attachments: HDFS-1512.patch, HDFS-1512.patch HDFS-680 deprecated FSDatasetInterface#getReplica, however it is still used by BlockSender which still maintains a Replica member. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1512) BlockSender calls deprecated method getReplica
[ https://issues.apache.org/jira/browse/HDFS-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226744#comment-13226744 ] Uma Maheswara Rao G commented on HDFS-1512: --- comments for the patch: HDFS-2862 idea is, DN classes should not invoke the APIs directly from FSDataSet. Now, in your Patch directly casting it to FSDataSet and calling the apis.This will break the 2862 contract. So, we may need to add the interface method in FSDatasetInterface.? @Nicholas, since you are the Author for HDFS-2862, could you please comment on this point? BlockSender calls deprecated method getReplica -- Key: HDFS-1512 URL: https://issues.apache.org/jira/browse/HDFS-1512 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Amin Bandeali Labels: newbie Attachments: HDFS-1512.patch, HDFS-1512.patch HDFS-680 deprecated FSDatasetInterface#getReplica, however it is still used by BlockSender which still maintains a Replica member. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1512) BlockSender calls deprecated method getReplica
[ https://issues.apache.org/jira/browse/HDFS-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226746#comment-13226746 ] Uma Maheswara Rao G commented on HDFS-1512: --- @Amin, I already re submitted your patch, Automatically it will pick the patch. Also could you please avoid pasting unnecessary content in comments? ex: email DISCLAIMER. :-) Thanks Uma BlockSender calls deprecated method getReplica -- Key: HDFS-1512 URL: https://issues.apache.org/jira/browse/HDFS-1512 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Amin Bandeali Labels: newbie Attachments: HDFS-1512.patch, HDFS-1512.patch HDFS-680 deprecated FSDatasetInterface#getReplica, however it is still used by BlockSender which still maintains a Replica member. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3063) NameNode should validate all coming file path
[ https://issues.apache.org/jira/browse/HDFS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226748#comment-13226748 ] Denny Ye commented on HDFS-3063: Thank you, Daryn. We have same concerns about the maintainability of NameNode. It's better to encapsulate all the validation for each interface method at NameNode using common method. Another problem of this case is there should be similar validation for all coming methods. NameNode should validate all coming file path - Key: HDFS-3063 URL: https://issues.apache.org/jira/browse/HDFS-3063 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.20.205.0 Reporter: Denny Ye Priority: Minor Labels: namenode Attachments: HDFS-3063.patch NameNode provides RPC service for not only DFS client but also user defined program. A common case we always met is that user transfers file path prefixed with HDFS protocol(hdfs://{namenode:{port}}/{folder}/{file}). NameNode cannot map node meta-data with this path and always throw NPE. In user client, we only see the NullPointerException, no other tips for which step it occurs. Also, NameNode should validate all coming file path with regular format. One exception I met: Exception in thread main org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:334) at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:329) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1512) BlockSender calls deprecated method getReplica
[ https://issues.apache.org/jira/browse/HDFS-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226778#comment-13226778 ] Hadoop QA commented on HDFS-1512: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517835/HDFS-1512.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes org.apache.hadoop.hdfs.TestSmallBlock org.apache.hadoop.hdfs.TestDFSStartupVersions org.apache.hadoop.hdfs.TestDFSShellGenericOptions org.apache.hadoop.hdfs.TestModTime org.apache.hadoop.hdfs.TestPread +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1983//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1983//console This message is automatically generated. BlockSender calls deprecated method getReplica -- Key: HDFS-1512 URL: https://issues.apache.org/jira/browse/HDFS-1512 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Amin Bandeali Labels: newbie Attachments: HDFS-1512.patch, HDFS-1512.patch HDFS-680 deprecated FSDatasetInterface#getReplica, however it is still used by BlockSender which still maintains a Replica member. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1512) BlockSender calls deprecated method getReplica
[ https://issues.apache.org/jira/browse/HDFS-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226787#comment-13226787 ] Tsz Wo (Nicholas), SZE commented on HDFS-1512: -- @Nicholas, since you are the Author for HDFS-2862, could you please comment on this point? - Please don't cast it to FSDataset and change the interface if necessary. - The use of getReplica(..) in BlockSender cannot easily be removed. It is not a public API. We could have removed it directly. This simple patch won't work as indicated by the unit test results. I think it needs a bigger change of the code. BlockSender calls deprecated method getReplica -- Key: HDFS-1512 URL: https://issues.apache.org/jira/browse/HDFS-1512 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Amin Bandeali Labels: newbie Attachments: HDFS-1512.patch, HDFS-1512.patch HDFS-680 deprecated FSDatasetInterface#getReplica, however it is still used by BlockSender which still maintains a Replica member. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira