[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845289#comment-13845289
 ] 

Hudson commented on HDFS-5283:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move HDFS-5283 to section branch-2.3.0 (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Critical
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845385#comment-13845385
 ] 

Hudson commented on HDFS-5283:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move HDFS-5283 to section branch-2.3.0 (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Critical
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845440#comment-13845440
 ] 

Hudson commented on HDFS-5283:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move HDFS-5283 to section branch-2.3.0 (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Critical
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-12-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845024#comment-13845024
 ] 

Hudson commented on HDFS-5283:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4862 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4862/])
Move HDFS-5283 to section branch-2.3.0 (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Critical
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797778#comment-13797778
 ] 

Hudson commented on HDFS-5283:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #365 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/365/])
Add TestOpenFilesWithSnapshot.java for HDFS-5283. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532860)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java
HDFS-5283. Under construction blocks only inside snapshots should not be 
counted in safemode threshhold.  Contributed by Vinay (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532857)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797878#comment-13797878
 ] 

Hudson commented on HDFS-5283:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1555 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1555/])
Add TestOpenFilesWithSnapshot.java for HDFS-5283. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532860)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java
HDFS-5283. Under construction blocks only inside snapshots should not be 
counted in safemode threshhold.  Contributed by Vinay (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532857)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797939#comment-13797939
 ] 

Hudson commented on HDFS-5283:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1581 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1581/])
Add TestOpenFilesWithSnapshot.java for HDFS-5283. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532860)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java
HDFS-5283. Under construction blocks only inside snapshots should not be 
counted in safemode threshhold.  Contributed by Vinay (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532857)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796661#comment-13796661
 ] 

Hadoop QA commented on HDFS-5283:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608680/HDFS-5283.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
  org.apache.hadoop.hdfs.TestDecommission

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5208//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5208//console

This message is automatically generated.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796863#comment-13796863
 ] 

Hadoop QA commented on HDFS-5283:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608707/HDFS-5283.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5210//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5210//console

This message is automatically generated.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797078#comment-13797078
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5283:
--

+1 patch looks good.

 Since isInSnapshot() is being called holding the writeLock, hasReadlock() 
 returning false ...

It is a bug.  Let's fix it separately.  I will file a JIRA.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797164#comment-13797164
 ] 

Hudson commented on HDFS-5283:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4612 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4612/])
Add TestOpenFilesWithSnapshot.java for HDFS-5283. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532860)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java
HDFS-5283. Under construction blocks only inside snapshots should not be 
counted in safemode threshhold.  Contributed by Vinay (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532857)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-16 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797590#comment-13797590
 ] 

Vinay commented on HDFS-5283:
-

Thanks Nicholas for the reviews and commit.
Thanks Jing for reviews.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794971#comment-13794971
 ] 

Hadoop QA commented on HDFS-5283:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608435/HDFS-5283.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5191//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5191//console

This message is automatically generated.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795500#comment-13795500
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5283:
--

- In FSNamesystem.isInSnapshot(..), we can safely assume the blockUC is 
non-null and put blockUC.getBlockCollection() in a local variable in the very 
beginning.  Also the assert should be hasReadLock() instead of hasWriteLock() 
since the method does not write anything.
{code}
//FSNamesystem
  public boolean isInSnapshot(BlockInfoUnderConstruction blockUC) {
assert hasReadLock();
final BlockCollection bc = blockUC.getBlockCollection();
if (bc == null || !(bc instanceof INodeFileUnderConstruction)) {
  return false;
}

final INodeFileUnderConstruction inodeUC = (INodeFileUnderConstruction) bc;
...
  }
{code}

- For DFSOutputStream.abort(), it is better to add DFSTestUtil.abortStream(..) 
than change it to public.  Sorry that I did not see this previously.


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-11 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792996#comment-13792996
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5283:
--

Vinay, thanks for working on this.  Some comments:

The new method added to Namesystem is better to
# pass BlockInfoUnderConstruction,
# call it as isInSnapshot, and
# do not throw IOException.

i.e.
{code}
//Namesystem.java
public boolean isInSnapshot(BlockInfoUnderConstruction block);
{code}
In the implementation in FSNamesystem, it should try-catch the 
UnresolvedLinkException and log it as an error since the full path obtained 
from a file should not have unresolved link.

Second question: Why adding DFSTestUtil.abortStream(..)?  It does not look very 
useful.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-08 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789688#comment-13789688
 ] 

Jing Zhao commented on HDFS-5283:
-

The patch looks good to me. The only concern is that the extra check in our 
current solution may affect the performance of NN starting up. [~szetszwo], 
could you also take a look at the patch?

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787940#comment-13787940
 ] 

Vinay commented on HDFS-5283:
-

One more doubt here, do we need to update the safemode blocksafe count on all 
block reports of a DN? current patch does only one first report.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788023#comment-13788023
 ] 

Hadoop QA commented on HDFS-5283:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607132/HDFS-5283.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5121//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5121//console

This message is automatically generated.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-04 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786021#comment-13786021
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5283:
--

 ... My only doubt is, why there are different behaviors with file delete and 
 directory delete. Not changing Inodes recursively was intentional or its an 
 issue? 

It is intentional.  Otherwise, the running time of recordModification(..) 
becomes O(subtree size).  For a non-WithSnapshot INode, the state (in current 
state or in some snapshot state) is determined by its parent.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786051#comment-13786051
 ] 

Vinay commented on HDFS-5283:
-

bq. It is intentional. Otherwise, the running time of recordModification(..) 
becomes O(subtree size). For a non-WithSnapshot INode, the state (in current 
state or in some snapshot state) is determined by its parent.
Thanks for the explanation Nicholas. 

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786427#comment-13786427
 ] 

Jing Zhao commented on HDFS-5283:
-

bq. instead used ((BlockInfoUnderConstruction) 
storedBlock).getNumExpectedLocations(), so test was passing. 

This should also work actually. I failed to get the actual behavior when 
generating the new patch. I think your fix there should be better. 

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786429#comment-13786429
 ] 

Jing Zhao commented on HDFS-5283:
-

bq.  get the actual behavior 

I mean,  get the actual result of the getNumExpectedLocations call.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-03 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785809#comment-13785809
 ] 

Jing Zhao commented on HDFS-5283:
-

bq. We need to have some reference which tells that the BlockCollection resides 
inside the snapshot if we are not able to find outside. I think In directory 
delete case with snapshot, changing the Inode types recursively is necessary. 
This keeps the behaviour of both cases ( file deletion and directory deletion ) 
in consistent.

So in our current solution, for each BlockCollection (which is an INodeUC) in 
the blocksMap, we first check if it's in the current fsdir tree. Here our claim 
is, if the inode is not in the current tree (i.e., we cannot identify the 
node's absolute full path or the node with the absolute full path in the 
current fsdir tree is actually not the node stored in the blocksMap), this 
inode should be a file only existing in snapshot, no matter this node is 
instance of INodeUCWithSnapshot or not. If this claim stands, to convert an 
INodeUC to an INodeUCWithSnapshot during deletion will be unnecessary.

bq. storedBlock.addNode(node);

Without this the new test will fail when setting DN number to a 1 value. 
Currently when NN receives the first block report from a DN, for each block in 
the report, it will check its total number of available replica, and if the 
number is EQUAL to the minimum required replica number, it increases the 
blockSafe value by 1 in the safemodeInfo. Thus here when we call 
namesystem.incrementSafeBlockCount(numOfReplicas), the numOfReplicas must be 
the current available replica's number. Otherwise we will miss the EQUAL case 
and fail to increase the blockSafe number. 

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-03 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785892#comment-13785892
 ] 

Vinay commented on HDFS-5283:
-

bq. So in our current solution, for each BlockCollection (which is an INodeUC) 
in the blocksMap, we first check if it's in the current fsdir tree. Here our 
claim is, if the inode is not in the current tree (i.e., we cannot identify the 
node's absolute full path or the node with the absolute full path in the 
current fsdir tree is actually not the node stored in the blocksMap), this 
inode should be a file only existing in snapshot, no matter this node is 
instance of INodeUCWithSnapshot or not. If this claim stands, to convert an 
INodeUC to an INodeUCWithSnapshot during deletion will be unnecessary.
I agree that the current solution mentioned here will work for this issue. My 
only doubt is, why there are different behaviors with file delete and directory 
delete. Not changing Inodes recursively was intentional or its an issue? 

bq. Without this the new test will fail when setting DN number to a 1 value.
I got it. Got confused because in my earlier patch I haven't used 
{{countLiveNodes(storedBlock);}} instead used {{((BlockInfoUnderConstruction) 
storedBlock).getNumExpectedLocations()}}, so test was passing. 
But will it not be better if we use {{((BlockInfoUnderConstruction) 
storedBlock).getNumExpectedLocations()}} instead of 
{{storedBlock.addNode(node)}} and {{countLiveNodes(storedBlock);}}.. as in my 
patch ? .. Any problems you seeing in that ?

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-02 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784826#comment-13784826
 ] 

Vinay commented on HDFS-5283:
-

Thanks zing for the update on patch.

bq. For dir-deletion scenario, instead of changing the current snapshot code 
(i.e., to convert all the INodeFileUC under the deleted dir to 
INodeFIleUCWithSnapshot),
We need to have some reference which tells that the BlockCollection resides 
inside the snapshot if we are not able to find outside. I think In directory 
delete case with snapshot, changing the Inode types recursively is necessary. 
This keeps the behaviour of both cases ( file deletion and directory deletion ) 
in consistent.
Any thoughts..?


I have small queries on the patch:
1. {{storedBlock.addNode(node);}} is it required.? This is done only for the 
finalized blocks.
2. {noformat}   /*
 * 1. if bc is an instance of INodeFileUnderConstructionWithSnapshot, and
 * bc is not in the current fsdirectory tree, bc must represent a snapshot
 * file. 
 * 2. if fullName is not an absolute path, bc cannot be existent in the 
 * current fsdirectory tree. 
 * 3. if bc is not the current node associated with fullName, bc must be a
 * snapshot inode.
 */
return true;{noformat}
As of current code for all three cases returning true without explicitly 
checking holds good. But in any of the future changes in the code related in 
this area we will still get true as return value. Do you think we need to check 
something here.?

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782995#comment-13782995
 ] 

Hadoop QA commented on HDFS-5283:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606091/HDFS-5283.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5073//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5073//console

This message is automatically generated.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783061#comment-13783061
 ] 

Hadoop QA commented on HDFS-5283:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606098/HDFS-5283.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5074//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5074//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5074//console

This message is automatically generated.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-01 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783465#comment-13783465
 ] 

Jing Zhao commented on HDFS-5283:
-

Looks like my patch will fail some tests. Will update the patch later.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783605#comment-13783605
 ] 

Hadoop QA commented on HDFS-5283:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606236/HDFS-5283.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5082//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5082//console

This message is automatically generated.

 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)