[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-3597: -- Component/s: name-node SNN can fail to start on upgrade Key: HDFS-3597 URL: https://issues.apache.org/jira/browse/HDFS-3597 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Labels: upgrade Fix For: 0.23.3, 2.0.2-alpha Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597-4.txt, hdfs-3597.txt When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: {code} 2012-06-16 09:52:33,812 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = BP-1792677198-172.29.121.67-1339813967723. Expecting respectively: -19; 64415959; 0; ; . at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) at java.lang.Thread.run(Thread.java:662) {code} The error check we're hitting came from HDFS-1073, and it's intended to verify that we're connecting to the correct NN. But the check is too strict and considers different metadata version to be the same as different clusterID. I believe the check in {{doCheckpoint}} simply needs to explicitly check for and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3597: -- Fix Version/s: 0.23.3 I've committed to 23. SNN can fail to start on upgrade Key: HDFS-3597 URL: https://issues.apache.org/jira/browse/HDFS-3597 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597-4.txt, hdfs-3597.txt When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: {code} 2012-06-16 09:52:33,812 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = BP-1792677198-172.29.121.67-1339813967723. Expecting respectively: -19; 64415959; 0; ; . at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) at java.lang.Thread.run(Thread.java:662) {code} The error check we're hitting came from HDFS-1073, and it's intended to verify that we're connecting to the correct NN. But the check is too strict and considers different metadata version to be the same as different clusterID. I believe the check in {{doCheckpoint}} simply needs to explicitly check for and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3597: -- Resolution: Fixed Fix Version/s: 2.2.0-alpha 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) SNN can fail to start on upgrade Key: HDFS-3597 URL: https://issues.apache.org/jira/browse/HDFS-3597 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Fix For: 3.0.0, 2.2.0-alpha Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597-4.txt, hdfs-3597.txt When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: {code} 2012-06-16 09:52:33,812 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = BP-1792677198-172.29.121.67-1339813967723. Expecting respectively: -19; 64415959; 0; ; . at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) at java.lang.Thread.run(Thread.java:662) {code} The error check we're hitting came from HDFS-1073, and it's intended to verify that we're connecting to the correct NN. But the check is too strict and considers different metadata version to be the same as different clusterID. I believe the check in {{doCheckpoint}} simply needs to explicitly check for and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-3597: Attachment: hdfs-3597-4.txt Attaching hdfs-3597-4.txt, addressing review feedback. SNN can fail to start on upgrade Key: HDFS-3597 URL: https://issues.apache.org/jira/browse/HDFS-3597 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597-4.txt, hdfs-3597.txt When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: {code} 2012-06-16 09:52:33,812 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = BP-1792677198-172.29.121.67-1339813967723. Expecting respectively: -19; 64415959; 0; ; . at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) at java.lang.Thread.run(Thread.java:662) {code} The error check we're hitting came from HDFS-1073, and it's intended to verify that we're connecting to the correct NN. But the check is too strict and considers different metadata version to be the same as different clusterID. I believe the check in {{doCheckpoint}} simply needs to explicitly check for and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3597: - Target Version/s: 2.0.1-alpha Status: Patch Available (was: Open) SNN can fail to start on upgrade Key: HDFS-3597 URL: https://issues.apache.org/jira/browse/HDFS-3597 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597.txt When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: {code} 2012-06-16 09:52:33,812 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = BP-1792677198-172.29.121.67-1339813967723. Expecting respectively: -19; 64415959; 0; ; . at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) at java.lang.Thread.run(Thread.java:662) {code} The error check we're hitting came from HDFS-1073, and it's intended to verify that we're connecting to the correct NN. But the check is too strict and considers different metadata version to be the same as different clusterID. I believe the check in {{doCheckpoint}} simply needs to explicitly check for and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-3597: Attachment: hdfs-3597-3.txt Address review feedback and adjust test to more accurately test the upgrade scenario. # we now corrupt all 2NN directories # we now test upgrade from -39 which fixes some unexplained test failures # clean up the test # drop the datanodes and use mkdir instead of writefile for quicker test startup. SNN can fail to start on upgrade Key: HDFS-3597 URL: https://issues.apache.org/jira/browse/HDFS-3597 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597.txt When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: {code} 2012-06-16 09:52:33,812 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = BP-1792677198-172.29.121.67-1339813967723. Expecting respectively: -19; 64415959; 0; ; . at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) at java.lang.Thread.run(Thread.java:662) {code} The error check we're hitting came from HDFS-1073, and it's intended to verify that we're connecting to the correct NN. But the check is too strict and considers different metadata version to be the same as different clusterID. I believe the check in {{doCheckpoint}} simply needs to explicitly check for and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-3597: Attachment: hdfs-3597-2.txt Attaching new version of patch that addresses review comments. Please check the {{doCheckpoint}} logic specifically, I'm happy with this refactoring but am open to better suggestions. Running a full set of tests locally to verify no breakage. SNN can fail to start on upgrade Key: HDFS-3597 URL: https://issues.apache.org/jira/browse/HDFS-3597 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Attachments: hdfs-3597-2.txt, hdfs-3597.txt When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: {code} 2012-06-16 09:52:33,812 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = BP-1792677198-172.29.121.67-1339813967723. Expecting respectively: -19; 64415959; 0; ; . at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) at java.lang.Thread.run(Thread.java:662) {code} The error check we're hitting came from HDFS-1073, and it's intended to verify that we're connecting to the correct NN. But the check is too strict and considers different metadata version to be the same as different clusterID. I believe the check in {{doCheckpoint}} simply needs to explicitly check for and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-3597: Attachment: hdfs-3597.txt Attaching proposed fix, including positive and negative test cases showing that the check functions as expected. SNN can fail to start on upgrade Key: HDFS-3597 URL: https://issues.apache.org/jira/browse/HDFS-3597 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Attachments: hdfs-3597.txt When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: {code} 2012-06-16 09:52:33,812 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = BP-1792677198-172.29.121.67-1339813967723. Expecting respectively: -19; 64415959; 0; ; . at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) at java.lang.Thread.run(Thread.java:662) {code} The error check we're hitting came from HDFS-1073, and it's intended to verify that we're connecting to the correct NN. But the check is too strict and considers different metadata version to be the same as different clusterID. I believe the check in {{doCheckpoint}} simply needs to explicitly check for and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira