[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-10-01 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-3597:
--

Component/s: name-node

 SNN can fail to start on upgrade
 

 Key: HDFS-3597
 URL: https://issues.apache.org/jira/browse/HDFS-3597
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor
  Labels: upgrade
 Fix For: 0.23.3, 2.0.2-alpha

 Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597-4.txt, 
 hdfs-3597.txt


 When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
 {code}
 2012-06-16 09:52:33,812 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
 doCheckpoint
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
 CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
 BP-1792677198-172.29.121.67-1339813967723.
 Expecting respectively: -19; 64415959; 0; ; .
 at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
 at 
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The error check we're hitting came from HDFS-1073, and it's intended to 
 verify that we're connecting to the correct NN.  But the check is too strict 
 and considers different metadata version to be the same as different 
 clusterID.
 I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
 and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-08-14 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3597:
--

Fix Version/s: 0.23.3

I've committed to 23.

 SNN can fail to start on upgrade
 

 Key: HDFS-3597
 URL: https://issues.apache.org/jira/browse/HDFS-3597
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor
 Fix For: 0.23.3, 3.0.0, 2.2.0-alpha

 Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597-4.txt, 
 hdfs-3597.txt


 When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
 {code}
 2012-06-16 09:52:33,812 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
 doCheckpoint
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
 CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
 BP-1792677198-172.29.121.67-1339813967723.
 Expecting respectively: -19; 64415959; 0; ; .
 at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
 at 
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The error check we're hitting came from HDFS-1073, and it's intended to 
 verify that we're connecting to the correct NN.  But the check is too strict 
 and considers different metadata version to be the same as different 
 clusterID.
 I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
 and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-07-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3597:
--

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 SNN can fail to start on upgrade
 

 Key: HDFS-3597
 URL: https://issues.apache.org/jira/browse/HDFS-3597
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor
 Fix For: 3.0.0, 2.2.0-alpha

 Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597-4.txt, 
 hdfs-3597.txt


 When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
 {code}
 2012-06-16 09:52:33,812 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
 doCheckpoint
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
 CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
 BP-1792677198-172.29.121.67-1339813967723.
 Expecting respectively: -19; 64415959; 0; ; .
 at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
 at 
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The error check we're hitting came from HDFS-1073, and it's intended to 
 verify that we're connecting to the correct NN.  But the check is too strict 
 and considers different metadata version to be the same as different 
 clusterID.
 I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
 and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-07-18 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-3597:


Attachment: hdfs-3597-4.txt

Attaching hdfs-3597-4.txt, addressing review feedback.

 SNN can fail to start on upgrade
 

 Key: HDFS-3597
 URL: https://issues.apache.org/jira/browse/HDFS-3597
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597-4.txt, 
 hdfs-3597.txt


 When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
 {code}
 2012-06-16 09:52:33,812 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
 doCheckpoint
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
 CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
 BP-1792677198-172.29.121.67-1339813967723.
 Expecting respectively: -19; 64415959; 0; ; .
 at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
 at 
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The error check we're hitting came from HDFS-1073, and it's intended to 
 verify that we're connecting to the correct NN.  But the check is too strict 
 and considers different metadata version to be the same as different 
 clusterID.
 I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
 and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-07-09 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3597:
-

Target Version/s: 2.0.1-alpha
  Status: Patch Available  (was: Open)

 SNN can fail to start on upgrade
 

 Key: HDFS-3597
 URL: https://issues.apache.org/jira/browse/HDFS-3597
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597.txt


 When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
 {code}
 2012-06-16 09:52:33,812 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
 doCheckpoint
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
 CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
 BP-1792677198-172.29.121.67-1339813967723.
 Expecting respectively: -19; 64415959; 0; ; .
 at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
 at 
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The error check we're hitting came from HDFS-1073, and it's intended to 
 verify that we're connecting to the correct NN.  But the check is too strict 
 and considers different metadata version to be the same as different 
 clusterID.
 I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
 and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-07-06 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-3597:


Attachment: hdfs-3597-3.txt

Address review feedback and adjust test to more accurately test the upgrade 
scenario.
# we now corrupt all 2NN directories
# we now test upgrade from -39 which fixes some unexplained test failures
# clean up the test
# drop the datanodes and use mkdir instead of writefile for quicker test 
startup.

 SNN can fail to start on upgrade
 

 Key: HDFS-3597
 URL: https://issues.apache.org/jira/browse/HDFS-3597
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597.txt


 When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
 {code}
 2012-06-16 09:52:33,812 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
 doCheckpoint
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
 CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
 BP-1792677198-172.29.121.67-1339813967723.
 Expecting respectively: -19; 64415959; 0; ; .
 at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
 at 
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The error check we're hitting came from HDFS-1073, and it's intended to 
 verify that we're connecting to the correct NN.  But the check is too strict 
 and considers different metadata version to be the same as different 
 clusterID.
 I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
 and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-07-05 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-3597:


Attachment: hdfs-3597-2.txt

Attaching new version of patch that addresses review comments.  Please check 
the {{doCheckpoint}} logic specifically, I'm happy with this refactoring but am 
open to better suggestions.

Running a full set of tests locally to verify no breakage.

 SNN can fail to start on upgrade
 

 Key: HDFS-3597
 URL: https://issues.apache.org/jira/browse/HDFS-3597
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs-3597-2.txt, hdfs-3597.txt


 When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
 {code}
 2012-06-16 09:52:33,812 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
 doCheckpoint
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
 CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
 BP-1792677198-172.29.121.67-1339813967723.
 Expecting respectively: -19; 64415959; 0; ; .
 at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
 at 
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The error check we're hitting came from HDFS-1073, and it's intended to 
 verify that we're connecting to the correct NN.  But the check is too strict 
 and considers different metadata version to be the same as different 
 clusterID.
 I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
 and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-07-03 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-3597:


Attachment: hdfs-3597.txt

Attaching proposed fix, including positive and negative test cases showing that 
the check functions as expected.

 SNN can fail to start on upgrade
 

 Key: HDFS-3597
 URL: https://issues.apache.org/jira/browse/HDFS-3597
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs-3597.txt


 When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
 {code}
 2012-06-16 09:52:33,812 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
 doCheckpoint
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
 CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
 BP-1792677198-172.29.121.67-1339813967723.
 Expecting respectively: -19; 64415959; 0; ; .
 at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
 at 
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The error check we're hitting came from HDFS-1073, and it's intended to 
 verify that we're connecting to the correct NN.  But the check is too strict 
 and considers different metadata version to be the same as different 
 clusterID.
 I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
 and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira