[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845193#comment-13845193
 ] 

Hudson commented on HDFS-5580:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4863 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4863/])
HDFS-5580. Fix infinite loop in Balancer.waitForMoveCompletion. (Binglin Chang 
via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550074)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java


 Infinite loop in Balancer.waitForMoveCompletion
 ---

 Key: HDFS-5580
 URL: https://issues.apache.org/jira/browse/HDFS-5580
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, 
 HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log


 In recent 
 [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/]
  in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in 
 HDFS-4376 
 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402].
  
 Looks like the bug is introduced by HDFS-3495.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion

2013-12-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845204#comment-13845204
 ] 

Junping Du commented on HDFS-5580:
--

+1. I have commit this to trunk and branch-2. Thanks Binglin!

 Infinite loop in Balancer.waitForMoveCompletion
 ---

 Key: HDFS-5580
 URL: https://issues.apache.org/jira/browse/HDFS-5580
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.4.0

 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, 
 HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log


 In recent 
 [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/]
  in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in 
 HDFS-4376 
 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402].
  
 Looks like the bug is introduced by HDFS-3495.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion

2013-12-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-5580:
-

  Resolution: Fixed
   Fix Version/s: 2.4.0
Target Version/s: 2.4.0
  Status: Resolved  (was: Patch Available)

 Infinite loop in Balancer.waitForMoveCompletion
 ---

 Key: HDFS-5580
 URL: https://issues.apache.org/jira/browse/HDFS-5580
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.4.0

 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, 
 HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log


 In recent 
 [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/]
  in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in 
 HDFS-4376 
 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402].
  
 Looks like the bug is introduced by HDFS-3495.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure

2013-12-11 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845218#comment-13845218
 ] 

Liang Xie commented on HDFS-4273:
-

{code}
 - seekToNewSource() add currentNode to deadnode, wish to get a different 
datanode
- blockSeekTo()
   - chooseDataNode()
  - block missing, clear deadNodes and pick the currentNode again
seekToNewSource() return false
{code}

i checked codebase, it shows :
{code}
  private synchronized boolean seekToBlockSource(long targetPos)
 throws IOException {
currentNode = blockSeekTo(targetPos);
return true;
  }
{code}
It could not return false, seems the original description is stale ? 

 Problem in DFSInputStream read retry logic may cause early failure
 --

 Key: HDFS-4273
 URL: https://issues.apache.org/jira/browse/HDFS-4273
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, 
 HDFS-4273.v4.patch, HDFS-4273.v5.patch, TestDFSInputStream.java


 Assume the following call logic
 {noformat} 
 readWithStrategy()
   - blockSeekTo()
   - readBuffer()
  - reader.doRead()
  - seekToNewSource() add currentNode to deadnode, wish to get a 
 different datanode
 - blockSeekTo()
- chooseDataNode()
   - block missing, clear deadNodes and pick the currentNode again
 seekToNewSource() return false
  readBuffer() re-throw the exception quit loop
 readWithStrategy() got the exception,  and may fail the read call before 
 tried MaxBlockAcquireFailures.
 {noformat} 
 some issues of the logic:
 1. seekToNewSource() logic is broken because it may clear deadNodes in the 
 middle.
 2. the variable int retries=2 in readWithStrategy seems have conflict with 
 MaxBlockAcquireFailures, should it be removed?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-4874) create with OVERWRITE deletes existing file without checking the lease: feature or a bug.

2013-12-11 Thread amol khatri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845244#comment-13845244
 ] 

amol khatri commented on HDFS-4874:
---

What will happen in case, 2 clients trying to create file on the same path?

 create with OVERWRITE deletes existing file without checking the lease: 
 feature or a bug.
 -

 Key: HDFS-4874
 URL: https://issues.apache.org/jira/browse/HDFS-4874
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Konstantin Shvachko

 create with OVERWRITE flag will remove a file under construction even if the 
 issuing client does not hold a lease on the file.
 It could be a bug or the feature that applications rely upon.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5646) Exceptions during HDFS failover

2013-12-11 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HDFS-5646.
--

Resolution: Fixed

HI, I'm afraid you are going to have to take this up with cloudera -if there is 
a problem in the hadoop codebase then they can escalate it over here

Closing as invalid per policy
[http://wiki.apache.org/hadoop/InvalidJiraIssues]

 Exceptions during HDFS failover
 ---

 Key: HDFS-5646
 URL: https://issues.apache.org/jira/browse/HDFS-5646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Reporter: Nikhil Mulley

 Hi, In our HDFS HA, I see the following excpetions when I try to failback. I 
 have an auto failover mechanism enabled. Although the failback operation 
 succeeds, the exceptions and the return status of 255 tend to worry me 
 (because I cannot script this if I needed to) Please let me know if this is 
 anything that is known and easily resolvable. 
 I am using Cloudera Hadoop 4.4.0, if that helps.Please let me know if I need 
 to open this ticket with CDH Jira, instead. 
 Thanks. 
 sudo -u hdfs hdfs haadmin -failover nn2 nn1 
 Operation failed: Unable to become active. Service became unhealthy while 
 trying to failover. at 
 org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:652)
  at 
 org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:58)
  at 
 org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:591)
  at 
 org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:588)
  at java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:396) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
  at 
 org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:588)
  at 
 org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94) at 
 org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61)
  at 
 org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1351)
  at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751) at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:396) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5074) Allow starting up from an fsimage checkpoint in the middle of a segment

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845300#comment-13845300
 ] 

Hudson commented on HDFS-5074:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move entry for HDFS-5074 to correct section. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550027)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
HDFS-5074. Allow starting up from an fsimage checkpoint in the middle of a 
segment. Contributed by Todd Lipcon. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550016)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLogger.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLoggerSet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocol/QJournalProtocol.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolServerSideTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeRpcServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LogsPurgeable.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorageRetentionManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/QJournalProtocol.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/MiniQJMHACluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestNNWithQJM.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestBootstrapStandbyWithQJM.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureToReadEdits.java


 Allow starting up from an fsimage checkpoint in the middle of a segment
 ---

 Key: HDFS-5074
 URL: 

[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845289#comment-13845289
 ] 

Hudson commented on HDFS-5283:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move HDFS-5283 to section branch-2.3.0 (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Critical
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845293#comment-13845293
 ] 

Hudson commented on HDFS-5504:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, 
 leads to NN safemode.
 

 Key: HDFS-5504
 URL: https://issues.apache.org/jira/browse/HDFS-5504
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5504.patch, HDFS-5504.patch


 1. HA installation, standby NN is down.
 2. delete snapshot is called and it has deleted the blocks from blocksmap and 
 all datanodes. log sync also happened.
 3. before next log roll NN crashed
 4. When the namenode restartes then it will fsimage and finalized edits from 
 shared storage and set the safemode threshold. which includes blocks from 
 deleted snapshot also. (because this edits is not yet read as namenode is 
 restarted before the last edits segment is not finalized)
 5. When it becomes active, it finalizes the edits and read the delete 
 snapshot edits_op. but at this time, it was not reducing the safemode count. 
 and it will continuing in safemode.
 6. On next restart, as the edits is already finalized, on startup only it 
 will read and set the safemode threshold correctly.
 But one more restart will bring NN out of safemode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5425) Renaming underconstruction file with snapshots can make NN failure on restart

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845296#comment-13845296
 ] 

Hudson commented on HDFS-5425:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Renaming underconstruction file with snapshots can make NN failure on restart
 -

 Key: HDFS-5425
 URL: https://issues.apache.org/jira/browse/HDFS-5425
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Affects Versions: 2.2.0
Reporter: sathish
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HDFS-5425.001.patch, HDFS-5425.patch, HDFS-5425.patch, 
 HDFS-5425.patch


 I faced this When i am doing some snapshot operations like 
 createSnapshot,renameSnapshot,i restarted my NN,it is shutting down with 
 exception,
 2013-10-24 21:07:03,040 FATAL 
 org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
 java.lang.IllegalStateException
   at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:133)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.replace(INodeDirectoryWithSnapshot.java:82)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.access$700(INodeDirectoryWithSnapshot.java:62)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.replaceChild(INodeDirectoryWithSnapshot.java:397)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.access$900(INodeDirectoryWithSnapshot.java:376)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot.replaceChild(INodeDirectoryWithSnapshot.java:598)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedReplaceINodeFile(FSDirectory.java:1548)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.replaceINodeFile(FSDirectory.java:1537)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:855)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:910)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:751)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:720)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:266)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:784)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:563)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:422)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:472)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:670)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:655)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1245)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1311)
 2013-10-24 21:07:03,050 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 2013-10-24 21:07:03,052 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
 SHUTDOWN_MSG: 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845295#comment-13845295
 ] 

Hudson commented on HDFS-5476:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Snapshot: clean the blocks/files/directories under a renamed file/directory 
 while deletion
 --

 Key: HDFS-5476
 URL: https://issues.apache.org/jira/browse/HDFS-5476
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HDFS-5476.001.patch


 Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree 
 under the DstReference node for file/directory/snapshot deletion.
 Use case 1:
 # rename under-construction file with 0-sized blocks after snapshot.
 # delete the renamed directory.
 We need to make sure we delete the 0-sized block.
 Use case 2:
 # create snapshot s0 for /
 # create a new file under /foo/bar/
 # rename foo -- foo2
 # create snapshot s1
 # delete bar and foo2
 # delete snapshot s1
 We need to make sure we delete the file under /foo/bar since it is not 
 included in snapshot s0.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845291#comment-13845291
 ] 

Hudson commented on HDFS-5580:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
HDFS-5580. Fix infinite loop in Balancer.waitForMoveCompletion. (Binglin Chang 
via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550074)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java


 Infinite loop in Balancer.waitForMoveCompletion
 ---

 Key: HDFS-5580
 URL: https://issues.apache.org/jira/browse/HDFS-5580
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.4.0

 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, 
 HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log


 In recent 
 [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/]
  in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in 
 HDFS-4376 
 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402].
  
 Looks like the bug is introduced by HDFS-3495.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845292#comment-13845292
 ] 

Hudson commented on HDFS-5428:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, 
 HDFS-5428.004.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845298#comment-13845298
 ] 

Hudson commented on HDFS-5257:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 addBlock() retry should return LocatedBlock with locations else client will 
 get AIOBE
 -

 Key: HDFS-5257
 URL: https://issues.apache.org/jira/browse/HDFS-5257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, 
 HDFS-5257.patch


 {{addBlock()}} call retry should return the LocatedBlock with locations if 
 the block was created in previous call and failover/restart of namenode 
 happened.
 otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating 
 the block and write will fail.
 {noformat}java.lang.ArrayIndexOutOfBoundsException: 0
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845297#comment-13845297
 ] 

Hudson commented on HDFS-5443:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Delete 0-sized block when deleting an under-construction file that is 
 included in snapshot
 --

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: 5443-test.patch, HDFS-5443.000.patch


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file. This issue is reported by 
 Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845290#comment-13845290
 ] 

Hudson commented on HDFS-5427:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 not able to read deleted files from snapshot directly under snapshottable dir 
 after checkpoint and NN restart
 -

 Key: HDFS-5427
 URL: https://issues.apache.org/jira/browse/HDFS-5427
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/bar
 3. create a snapshot s1 under /foo
 4. delete the file /foo/bar
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar
 client will get BlockMissingException
 Reason is 
 While loading the deleted file list for a snashottable dir from fsimage, 
 blocks were not updated in blocksmap



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845294#comment-13845294
 ] 

Hudson commented on HDFS-5474:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/418/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Deletesnapshot can make Namenode in safemode on NN restarts.
 

 Key: HDFS-5474
 URL: https://issues.apache.org/jira/browse/HDFS-5474
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Fix For: 2.3.0

 Attachments: HDFS-5474-001.patch, HDFS-5474-002.patch


 When we deletesnapshot, we are deleting the blocks associated to that 
 snapshot and after that we do logsync to editlog about deleteSnapshot.
 There can be a chance that blocks removed from blocks map but before log sync 
 if there is BR ,  NN may finds that block does not exist in blocks map and 
 may invalidate that block. As part HB, invalidation info also can go. After 
 this steps if Namenode shutdown before actually do logsync,  On restart it 
 will still consider that snapshot Inodes and expect blocks to report from DN.
 Simple solution is, we should simply move down that blocks removal after 
 logsync only. Similar to delete op.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure

2013-12-11 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845308#comment-13845308
 ] 

Binglin Chang commented on HDFS-4273:
-

seekToNewSource, not seekToBlockSource

 Problem in DFSInputStream read retry logic may cause early failure
 --

 Key: HDFS-4273
 URL: https://issues.apache.org/jira/browse/HDFS-4273
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, 
 HDFS-4273.v4.patch, HDFS-4273.v5.patch, TestDFSInputStream.java


 Assume the following call logic
 {noformat} 
 readWithStrategy()
   - blockSeekTo()
   - readBuffer()
  - reader.doRead()
  - seekToNewSource() add currentNode to deadnode, wish to get a 
 different datanode
 - blockSeekTo()
- chooseDataNode()
   - block missing, clear deadNodes and pick the currentNode again
 seekToNewSource() return false
  readBuffer() re-throw the exception quit loop
 readWithStrategy() got the exception,  and may fail the read call before 
 tried MaxBlockAcquireFailures.
 {noformat} 
 some issues of the logic:
 1. seekToNewSource() logic is broken because it may clear deadNodes in the 
 middle.
 2. the variable int retries=2 in readWithStrategy seems have conflict with 
 MaxBlockAcquireFailures, should it be removed?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845394#comment-13845394
 ] 

Hudson commented on HDFS-5257:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 addBlock() retry should return LocatedBlock with locations else client will 
 get AIOBE
 -

 Key: HDFS-5257
 URL: https://issues.apache.org/jira/browse/HDFS-5257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, 
 HDFS-5257.patch


 {{addBlock()}} call retry should return the LocatedBlock with locations if 
 the block was created in previous call and failover/restart of namenode 
 happened.
 otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating 
 the block and write will fail.
 {noformat}java.lang.ArrayIndexOutOfBoundsException: 0
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5425) Renaming underconstruction file with snapshots can make NN failure on restart

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845392#comment-13845392
 ] 

Hudson commented on HDFS-5425:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Renaming underconstruction file with snapshots can make NN failure on restart
 -

 Key: HDFS-5425
 URL: https://issues.apache.org/jira/browse/HDFS-5425
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Affects Versions: 2.2.0
Reporter: sathish
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HDFS-5425.001.patch, HDFS-5425.patch, HDFS-5425.patch, 
 HDFS-5425.patch


 I faced this When i am doing some snapshot operations like 
 createSnapshot,renameSnapshot,i restarted my NN,it is shutting down with 
 exception,
 2013-10-24 21:07:03,040 FATAL 
 org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
 java.lang.IllegalStateException
   at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:133)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.replace(INodeDirectoryWithSnapshot.java:82)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.access$700(INodeDirectoryWithSnapshot.java:62)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.replaceChild(INodeDirectoryWithSnapshot.java:397)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.access$900(INodeDirectoryWithSnapshot.java:376)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot.replaceChild(INodeDirectoryWithSnapshot.java:598)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedReplaceINodeFile(FSDirectory.java:1548)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.replaceINodeFile(FSDirectory.java:1537)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:855)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:910)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:751)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:720)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:266)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:784)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:563)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:422)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:472)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:670)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:655)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1245)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1311)
 2013-10-24 21:07:03,050 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 2013-10-24 21:07:03,052 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
 SHUTDOWN_MSG: 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845386#comment-13845386
 ] 

Hudson commented on HDFS-5427:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 not able to read deleted files from snapshot directly under snapshottable dir 
 after checkpoint and NN restart
 -

 Key: HDFS-5427
 URL: https://issues.apache.org/jira/browse/HDFS-5427
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/bar
 3. create a snapshot s1 under /foo
 4. delete the file /foo/bar
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar
 client will get BlockMissingException
 Reason is 
 While loading the deleted file list for a snashottable dir from fsimage, 
 blocks were not updated in blocksmap



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845387#comment-13845387
 ] 

Hudson commented on HDFS-5580:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
HDFS-5580. Fix infinite loop in Balancer.waitForMoveCompletion. (Binglin Chang 
via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550074)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java


 Infinite loop in Balancer.waitForMoveCompletion
 ---

 Key: HDFS-5580
 URL: https://issues.apache.org/jira/browse/HDFS-5580
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.4.0

 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, 
 HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log


 In recent 
 [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/]
  in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in 
 HDFS-4376 
 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402].
  
 Looks like the bug is introduced by HDFS-3495.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845391#comment-13845391
 ] 

Hudson commented on HDFS-5476:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Snapshot: clean the blocks/files/directories under a renamed file/directory 
 while deletion
 --

 Key: HDFS-5476
 URL: https://issues.apache.org/jira/browse/HDFS-5476
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HDFS-5476.001.patch


 Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree 
 under the DstReference node for file/directory/snapshot deletion.
 Use case 1:
 # rename under-construction file with 0-sized blocks after snapshot.
 # delete the renamed directory.
 We need to make sure we delete the 0-sized block.
 Use case 2:
 # create snapshot s0 for /
 # create a new file under /foo/bar/
 # rename foo -- foo2
 # create snapshot s1
 # delete bar and foo2
 # delete snapshot s1
 We need to make sure we delete the file under /foo/bar since it is not 
 included in snapshot s0.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845393#comment-13845393
 ] 

Hudson commented on HDFS-5443:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Delete 0-sized block when deleting an under-construction file that is 
 included in snapshot
 --

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: 5443-test.patch, HDFS-5443.000.patch


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file. This issue is reported by 
 Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5074) Allow starting up from an fsimage checkpoint in the middle of a segment

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845396#comment-13845396
 ] 

Hudson commented on HDFS-5074:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move entry for HDFS-5074 to correct section. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550027)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
HDFS-5074. Allow starting up from an fsimage checkpoint in the middle of a 
segment. Contributed by Todd Lipcon. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550016)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLogger.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLoggerSet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocol/QJournalProtocol.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolServerSideTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeRpcServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LogsPurgeable.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorageRetentionManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/QJournalProtocol.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/MiniQJMHACluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestNNWithQJM.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestBootstrapStandbyWithQJM.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureToReadEdits.java


 Allow starting up from an fsimage checkpoint in the middle of a segment
 ---

 Key: HDFS-5074
 URL: 

[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845389#comment-13845389
 ] 

Hudson commented on HDFS-5504:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, 
 leads to NN safemode.
 

 Key: HDFS-5504
 URL: https://issues.apache.org/jira/browse/HDFS-5504
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5504.patch, HDFS-5504.patch


 1. HA installation, standby NN is down.
 2. delete snapshot is called and it has deleted the blocks from blocksmap and 
 all datanodes. log sync also happened.
 3. before next log roll NN crashed
 4. When the namenode restartes then it will fsimage and finalized edits from 
 shared storage and set the safemode threshold. which includes blocks from 
 deleted snapshot also. (because this edits is not yet read as namenode is 
 restarted before the last edits segment is not finalized)
 5. When it becomes active, it finalizes the edits and read the delete 
 snapshot edits_op. but at this time, it was not reducing the safemode count. 
 and it will continuing in safemode.
 6. On next restart, as the edits is already finalized, on startup only it 
 will read and set the safemode threshold correctly.
 But one more restart will bring NN out of safemode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845385#comment-13845385
 ] 

Hudson commented on HDFS-5283:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move HDFS-5283 to section branch-2.3.0 (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Critical
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845388#comment-13845388
 ] 

Hudson commented on HDFS-5428:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, 
 HDFS-5428.004.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845390#comment-13845390
 ] 

Hudson commented on HDFS-5474:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Deletesnapshot can make Namenode in safemode on NN restarts.
 

 Key: HDFS-5474
 URL: https://issues.apache.org/jira/browse/HDFS-5474
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Fix For: 2.3.0

 Attachments: HDFS-5474-001.patch, HDFS-5474-002.patch


 When we deletesnapshot, we are deleting the blocks associated to that 
 snapshot and after that we do logsync to editlog about deleteSnapshot.
 There can be a chance that blocks removed from blocks map but before log sync 
 if there is BR ,  NN may finds that block does not exist in blocks map and 
 may invalidate that block. As part HB, invalidation info also can go. After 
 this steps if Namenode shutdown before actually do logsync,  On restart it 
 will still consider that snapshot Inodes and expect blocks to report from DN.
 Simple solution is, we should simply move down that blocks removal after 
 logsync only. Similar to delete op.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845444#comment-13845444
 ] 

Hudson commented on HDFS-5504:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, 
 leads to NN safemode.
 

 Key: HDFS-5504
 URL: https://issues.apache.org/jira/browse/HDFS-5504
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5504.patch, HDFS-5504.patch


 1. HA installation, standby NN is down.
 2. delete snapshot is called and it has deleted the blocks from blocksmap and 
 all datanodes. log sync also happened.
 3. before next log roll NN crashed
 4. When the namenode restartes then it will fsimage and finalized edits from 
 shared storage and set the safemode threshold. which includes blocks from 
 deleted snapshot also. (because this edits is not yet read as namenode is 
 restarted before the last edits segment is not finalized)
 5. When it becomes active, it finalizes the edits and read the delete 
 snapshot edits_op. but at this time, it was not reducing the safemode count. 
 and it will continuing in safemode.
 6. On next restart, as the edits is already finalized, on startup only it 
 will read and set the safemode threshold correctly.
 But one more restart will bring NN out of safemode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845441#comment-13845441
 ] 

Hudson commented on HDFS-5427:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 not able to read deleted files from snapshot directly under snapshottable dir 
 after checkpoint and NN restart
 -

 Key: HDFS-5427
 URL: https://issues.apache.org/jira/browse/HDFS-5427
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/bar
 3. create a snapshot s1 under /foo
 4. delete the file /foo/bar
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar
 client will get BlockMissingException
 Reason is 
 While loading the deleted file list for a snashottable dir from fsimage, 
 blocks were not updated in blocksmap



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845440#comment-13845440
 ] 

Hudson commented on HDFS-5283:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move HDFS-5283 to section branch-2.3.0 (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NN not coming out of startup safemode due to under construction blocks only 
 inside snapshots also counted in safemode threshhold
 

 Key: HDFS-5283
 URL: https://issues.apache.org/jira/browse/HDFS-5283
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
Priority: Critical
 Fix For: 2.3.0

 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, 
 HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch


 This is observed in one of our env:
 1. A MR Job was running which has created some temporary files and was 
 writing to them.
 2. Snapshot was taken
 3. And Job was killed and temporary files were deleted.
 4. Namenode restarted.
 5. After restart Namenode was in safemode waiting for blocks
 Analysis
 -
 1. Since the snapshot taken also includes the temporary files which were 
 open, and later original files are deleted.
 2. UnderConstruction blocks count was taken from leases. not considered the 
 UC blocks only inside snapshots
 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845449#comment-13845449
 ] 

Hudson commented on HDFS-5257:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 addBlock() retry should return LocatedBlock with locations else client will 
 get AIOBE
 -

 Key: HDFS-5257
 URL: https://issues.apache.org/jira/browse/HDFS-5257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.1.1-beta
Reporter: Vinay
Assignee: Vinay
 Fix For: 2.3.0

 Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, 
 HDFS-5257.patch


 {{addBlock()}} call retry should return the LocatedBlock with locations if 
 the block was created in previous call and failover/restart of namenode 
 happened.
 otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating 
 the block and write will fail.
 {noformat}java.lang.ArrayIndexOutOfBoundsException: 0
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845446#comment-13845446
 ] 

Hudson commented on HDFS-5476:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Snapshot: clean the blocks/files/directories under a renamed file/directory 
 while deletion
 --

 Key: HDFS-5476
 URL: https://issues.apache.org/jira/browse/HDFS-5476
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HDFS-5476.001.patch


 Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree 
 under the DstReference node for file/directory/snapshot deletion.
 Use case 1:
 # rename under-construction file with 0-sized blocks after snapshot.
 # delete the renamed directory.
 We need to make sure we delete the 0-sized block.
 Use case 2:
 # create snapshot s0 for /
 # create a new file under /foo/bar/
 # rename foo -- foo2
 # create snapshot s1
 # delete bar and foo2
 # delete snapshot s1
 We need to make sure we delete the file under /foo/bar since it is not 
 included in snapshot s0.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845445#comment-13845445
 ] 

Hudson commented on HDFS-5474:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Deletesnapshot can make Namenode in safemode on NN restarts.
 

 Key: HDFS-5474
 URL: https://issues.apache.org/jira/browse/HDFS-5474
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Fix For: 2.3.0

 Attachments: HDFS-5474-001.patch, HDFS-5474-002.patch


 When we deletesnapshot, we are deleting the blocks associated to that 
 snapshot and after that we do logsync to editlog about deleteSnapshot.
 There can be a chance that blocks removed from blocks map but before log sync 
 if there is BR ,  NN may finds that block does not exist in blocks map and 
 may invalidate that block. As part HB, invalidation info also can go. After 
 this steps if Namenode shutdown before actually do logsync,  On restart it 
 will still consider that snapshot Inodes and expect blocks to report from DN.
 Simple solution is, we should simply move down that blocks removal after 
 logsync only. Similar to delete op.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5074) Allow starting up from an fsimage checkpoint in the middle of a segment

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845451#comment-13845451
 ] 

Hudson commented on HDFS-5074:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move entry for HDFS-5074 to correct section. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550027)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
HDFS-5074. Allow starting up from an fsimage checkpoint in the middle of a 
segment. Contributed by Todd Lipcon. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550016)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLogger.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLoggerSet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocol/QJournalProtocol.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolServerSideTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeRpcServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LogsPurgeable.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorageRetentionManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/QJournalProtocol.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/MiniQJMHACluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestNNWithQJM.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestBootstrapStandbyWithQJM.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureToReadEdits.java


 Allow starting up from an fsimage checkpoint in the middle of a segment
 ---

 Key: HDFS-5074
 URL: 

[jira] [Commented] (HDFS-5425) Renaming underconstruction file with snapshots can make NN failure on restart

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845447#comment-13845447
 ] 

Hudson commented on HDFS-5425:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Renaming underconstruction file with snapshots can make NN failure on restart
 -

 Key: HDFS-5425
 URL: https://issues.apache.org/jira/browse/HDFS-5425
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Affects Versions: 2.2.0
Reporter: sathish
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HDFS-5425.001.patch, HDFS-5425.patch, HDFS-5425.patch, 
 HDFS-5425.patch


 I faced this When i am doing some snapshot operations like 
 createSnapshot,renameSnapshot,i restarted my NN,it is shutting down with 
 exception,
 2013-10-24 21:07:03,040 FATAL 
 org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
 java.lang.IllegalStateException
   at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:133)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.replace(INodeDirectoryWithSnapshot.java:82)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.access$700(INodeDirectoryWithSnapshot.java:62)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.replaceChild(INodeDirectoryWithSnapshot.java:397)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.access$900(INodeDirectoryWithSnapshot.java:376)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot.replaceChild(INodeDirectoryWithSnapshot.java:598)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedReplaceINodeFile(FSDirectory.java:1548)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.replaceINodeFile(FSDirectory.java:1537)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:855)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:910)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:751)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:720)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:266)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:784)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:563)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:422)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:472)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:670)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:655)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1245)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1311)
 2013-10-24 21:07:03,050 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 2013-10-24 21:07:03,052 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
 SHUTDOWN_MSG: 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845443#comment-13845443
 ] 

Hudson commented on HDFS-5428:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, 
 HDFS-5428.004.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845448#comment-13845448
 ] 

Hudson commented on HDFS-5443:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
Move 
HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 
into branch-2.3 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Delete 0-sized block when deleting an under-construction file that is 
 included in snapshot
 --

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: 5443-test.patch, HDFS-5443.000.patch


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file. This issue is reported by 
 Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845442#comment-13845442
 ] 

Hudson commented on HDFS-5580:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/])
HDFS-5580. Fix infinite loop in Balancer.waitForMoveCompletion. (Binglin Chang 
via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550074)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java


 Infinite loop in Balancer.waitForMoveCompletion
 ---

 Key: HDFS-5580
 URL: https://issues.apache.org/jira/browse/HDFS-5580
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.4.0

 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, 
 HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log


 In recent 
 [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/]
  in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in 
 HDFS-4376 
 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402].
  
 Looks like the bug is introduced by HDFS-3495.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-4331) checkpoint between NN and SNN (secure cluster) does not happen once NN TGT expires

2013-12-11 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony resolved HDFS-4331.


  Resolution: Duplicate
Release Note: Code already in place

 checkpoint between NN and SNN (secure cluster) does not happen once NN TGT 
 expires 
 ---

 Key: HDFS-4331
 URL: https://issues.apache.org/jira/browse/HDFS-4331
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 1.1.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-4331.patch, nn-checkpoint-failed.log


 NameNode fails to download the new FSIMage from SNN.
 The error indicates that the NameNode TGT has expired.
 It seems that NN doesn't renew the ticket after the ticket expires (10 hours 
 validity). 
 NN - Checkpoint error is attached



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845529#comment-13845529
 ] 

Kihwal Lee commented on HDFS-5496:
--

It will be nice if the web UI says something if the replication queues are 
being initialized. Showing its progress will be a plus.

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
 Attachments: HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-11 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5496:
-

Assignee: Vinay

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5654) Add lock context support to FSNamesystemLock

2013-12-11 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-5654:
-

 Summary: Add lock context support to FSNamesystemLock
 Key: HDFS-5654
 URL: https://issues.apache.org/jira/browse/HDFS-5654
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Supporting new methods of locking the namesystem, ie. coarse or fine-grain, 
needs an api to manage the locks (or any object conforming to Lock interface) 
held during access to the namespace.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5655) Update FSNamesystem path operations to use a lock context

2013-12-11 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-5655:
-

 Summary: Update FSNamesystem path operations to use a lock context
 Key: HDFS-5655
 URL: https://issues.apache.org/jira/browse/HDFS-5655
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Most path based methods should use the {{FSNamesystem.LockContext}} introduced 
by HDFS-5654.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5654) Add lock context support to FSNamesystemLock

2013-12-11 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-5654:
--

Attachment: HDFS-5654.patch

Adds an interface for a lock context to {{FSNameSystemLock}}, and provides a 
trivial implementation of a coarse locking context which just uses the 
{{FSNamesystemLock}} itself.  I'll update some path-based {{FSNamesystem}} 
methods on a followup jira.

 Add lock context support to FSNamesystemLock
 

 Key: HDFS-5654
 URL: https://issues.apache.org/jira/browse/HDFS-5654
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-5654.patch


 Supporting new methods of locking the namesystem, ie. coarse or fine-grain, 
 needs an api to manage the locks (or any object conforming to Lock interface) 
 held during access to the namespace.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5654) Add lock context support to FSNamesystemLock

2013-12-11 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-5654:
--

Status: Patch Available  (was: Open)

 Add lock context support to FSNamesystemLock
 

 Key: HDFS-5654
 URL: https://issues.apache.org/jira/browse/HDFS-5654
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-5654.patch


 Supporting new methods of locking the namesystem, ie. coarse or fine-grain, 
 needs an api to manage the locks (or any object conforming to Lock interface) 
 held during access to the namespace.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2

2013-12-11 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845567#comment-13845567
 ] 

Jonathan Eagles commented on HDFS-5023:
---

+1. Jing. If I don't hear anything on this issue today, I'll check this in 
tomorrow.

 TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2
 ---

 Key: HDFS-5023
 URL: https://issues.apache.org/jira/browse/HDFS-5023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots, test
Affects Versions: 2.4.0
Reporter: Ravi Prakash
Assignee: Mit Desai
  Labels: test
 Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, 
 TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, 
 org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt


 The assertion on line 91 is failing. I am using Fedora 19 + JDK7. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845580#comment-13845580
 ] 

Kihwal Lee commented on HDFS-5496:
--

The following change would have been fine if leaving safe mode and initializing 
replication queues were synchronized.  It appears {{checkMode()}} can start a 
background initialization before leaving the safe mode. Since the queues are 
unconditionally cleared right before the following, an on-going initialization 
should be stopped and redone.

{code}
-if (!isInSafeMode() ||
-(isInSafeMode()  safeMode.isPopulatingReplQueues())) {
+// We only need to reprocess the queue in HA mode and not in safemode
+if (!isInSafeMode()  haEnabled) {
{code}

There have been discussions regarding removing safe mode extension and perhaps 
safe mode monitor. That will make the check/logic simpler.

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2

2013-12-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845591#comment-13845591
 ] 

Jing Zhao commented on HDFS-5023:
-

+1. Thanks Mit!

 TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2
 ---

 Key: HDFS-5023
 URL: https://issues.apache.org/jira/browse/HDFS-5023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots, test
Affects Versions: 2.4.0
Reporter: Ravi Prakash
Assignee: Mit Desai
  Labels: test
 Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, 
 TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, 
 org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt


 The assertion on line 91 is failing. I am using Fedora 19 + JDK7. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5350) Name Node should report fsimage transfer time as a metric

2013-12-11 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-5350:
--

Priority: Minor  (was: Major)

 Name Node should report fsimage transfer time as a metric
 -

 Key: HDFS-5350
 URL: https://issues.apache.org/jira/browse/HDFS-5350
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Rob Weltman
Assignee: Jimmy Xiang
Priority: Minor

 If the (Secondary) Name Node reported fsimage transfer times (perhaps the 
 last ten of them), monitoring tools could detect slowdowns that might 
 jeopardize cluster stability.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5647) Merge INodeDirectory.Feature and INodeFile.Feature

2013-12-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5647:
-

Attachment: HDFS-5647.003.patch

 Merge INodeDirectory.Feature and INodeFile.Feature
 --

 Key: HDFS-5647
 URL: https://issues.apache.org/jira/browse/HDFS-5647
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5647.000.patch, HDFS-5647.001.patch, 
 HDFS-5647.002.patch, HDFS-5647.003.patch


 HDFS-4685 implements ACLs for HDFS, which can benefit from the INode features 
 introduced in HDFS-5284. The current code separates the INode feature of 
 INodeFile and INodeDirectory into two different class hierarchies. This 
 hinders the implementation of ACL since ACL is a concept that applies to both 
 INodeFile and INodeDirectory.
 This jira proposes to merge the two class hierarchies (i.e., 
 INodeDirectory.Feature and INodeFile.Feature) to simplify the implementation 
 of ACLs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5650) Remove AclReadFlag and AclWriteFlag in FileSystem API

2013-12-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5650:
-

Attachment: HDFS-5650.005.patch

Thanks for the comments! Uploading the v5 patch to address the comments from 
Vinay and Chris.

 Remove AclReadFlag and AclWriteFlag in FileSystem API
 -

 Key: HDFS-5650
 URL: https://issues.apache.org/jira/browse/HDFS-5650
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5650.000.patch, HDFS-5650.001.patch, 
 HDFS-5650.002.patch, HDFS-5650.003.patch, HDFS-5650.004.patch, 
 HDFS-5650.005.patch


 AclReadFlag and AclWriteFlag intended to capture various options used in 
 getfacl and setfacl. These options determine whether the tool should traverse 
 the filesystem recursively, follow the symlink, etc., but they are not part 
 of the core ACLs abstractions.
 The client program has more information and more flexibility to implement 
 these options. This jira proposes to remove these flags to simplify the APIs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5654) Add lock context support to FSNamesystemLock

2013-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845671#comment-13845671
 ] 

Hadoop QA commented on HDFS-5654:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618264/HDFS-5654.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5694//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5694//console

This message is automatically generated.

 Add lock context support to FSNamesystemLock
 

 Key: HDFS-5654
 URL: https://issues.apache.org/jira/browse/HDFS-5654
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-5654.patch


 Supporting new methods of locking the namesystem, ie. coarse or fine-grain, 
 needs an api to manage the locks (or any object conforming to Lock interface) 
 held during access to the namespace.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5242) Reduce contention on DatanodeInfo instances

2013-12-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845675#comment-13845675
 ] 

Kihwal Lee commented on HDFS-5242:
--

+1 for the patch.  However, the contention might have seen unusually high if 
only a small number of data nodes were involved.

 Reduce contention on DatanodeInfo instances
 ---

 Key: HDFS-5242
 URL: https://issues.apache.org/jira/browse/HDFS-5242
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-5242.patch


 Synchronization in {{DatanodeInfo}} instances causes unnecessary contention 
 between call handlers.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845687#comment-13845687
 ] 

Colin Patrick McCabe commented on HDFS-5634:


bq. Do we mean to setCachingStrategy in DFSInputStream#getBlockReader? Also, I 
get that there are a zillion parameters for the BRL constructor, but builders 
are for when there are optional arguments. Here, it looks like we want to set 
all of them.

Actually, in the tests, we often don't set a lot of the arguments.  For 
example, the unit tests don't use the FISCache, may not set readahead, etc.  
Also, I think there's value in naming the arguments, since otherwise updating 
the callsites gets very, very difficult.

bq. We have both verifyChecksum and skipChecksum right now. Let's get rid of 
one, seems error-prone to be flipping booleans.

OK.  I updated to {{BlockReaderFactory#newShortCircuitBlockReader}} to use 
{{skipChecksums}} as well.

A little note on the history here: prior to the introduction of mlock, it was 
more straightforward to have a simple positive boolean verifyChecksum than to 
have the skip boolean.  But now that we have mlock, verifyChecksum = true might 
be a lie, since mlock might mean we don't verify.

bq. The skipChecksum || mlocked.get() idiom is used in a few places, maybe 
should be a shouldSkipChecksum() method?

OK.

bq. IIUC, fillDataBuf fills the bounce buffer, and drainBounceBuffer empties 
it. Rename fillDataBuf to fillBounceBuffer for parity?

I renamed {{drainBounceBuffer}} to {{drainDataBuf}} for symmetry.

bq. I'm wondering what happens in the bounce buffer read paths when readahead 
is turned off. It looks like they use readahead to determine how much to read, 
regardless of the bytes needed, so what if it's zero?

We always buffer at least a single chunk, even if readahead is turned off.  The 
mechanics of checksumming require this.

bq. For the slow lane, fillDataBuf doesn't actually fill the returned buf, so 
when we hit the EOF and break, it looks like we make the user read again to 
flush out the bounce buffer. Can we save this?

Yeah, the current code could result in us doing an extra {{pread}} even after 
we know we're at EOF.  Let me see if I can avoid that.

bq. fillDataBuf doesn't fill just the data buf, it also fills the checksum buf 
and verifies checksums via fillBuffer. Would be nice to javadoc this.

OK

bq. I noticed there are two readahead config options too, 
dfs.client.cache.readahead and dfs.datanode.readahead.bytes. Seems like we 
should try to emulate the same behavior as remote reads which (according to 
hdfs-default.xml) use the DN setting, and override with the client setting. 
Right now it's just using the DN readahead in BRL, so the test that sets client 
readahead to 0 isn't doing much.

Right now, the readahead is coming out of {{DFSClient#cachingStrategy}}, so it 
will be coming from {{dfs.client.cache.readahead}}, unless someone has 
overridden it for that {{DFSInputStream}} object.  The problem with defaulting 
to the DN setting, is that we don't know what that is (we're on the client, not 
the DN).

bq. I don't quite understand why we check needed  maxReadahead... for the fast 
lane. Once we're checksum aligned via draining the bounce buffer, can't we just 
stay in the fast lane? Seems like the slow vs. fast lane determination should 
be based on read alignment, not bytes left.

The issue is that we want to honor the readahead setting.  We would not be 
doing this if we did a shorter read directly into the provided buffer.

bq. It's a little weird to me that the readahead chunks is min'd with the 
buffer size (default 1MB). I get why (memory consumption), but this linkage 
should be documented somewhere.

I added a comment.

bq. DirectBufferPool, would it be better to use a Deque's stack operations 
rather than a Queue? Might give better cache locality to do LIFO rather than 
FIFO.

Interesting point.  I will try that and see what numbers I get.

bq.TestEnhancedByteBufferAccess has an import only change

OK.  I will avoid doing that to make merging easier.

bq. Kinda unrelated, but should the dfs.client.read.shortcircuit.* keys be in 
hdfs-default.xml? I also noticed that dfs.client.cache.readahead says this 
setting causes the datanode to... so the readahead documentation might need to 
be updated too.

I'll update it with the information about short-circuit

 allow BlockReaderLocal to switch between checksumming and not
 -

 Key: HDFS-5634
 URL: https://issues.apache.org/jira/browse/HDFS-5634
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch


 

[jira] [Created] (HDFS-5656) add some configuration keys to hdfs-default.xml

2013-12-11 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-5656:
--

 Summary: add some configuration keys to hdfs-default.xml
 Key: HDFS-5656
 URL: https://issues.apache.org/jira/browse/HDFS-5656
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Colin Patrick McCabe
Priority: Minor


Some configuration keys like {{dfs.client.read.shortcircuit}} are not present 
in {{hdfs-default.xml}} as they should be.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845689#comment-13845689
 ] 

Colin Patrick McCabe commented on HDFS-5634:


update: there are a bunch of things in DFSConfigKeys not in hdfs-default.xml.  
I created HDFS-5656 for this, since it's a change we'd want to do and quickly 
merge to branch-2.3, etc, and also because it can be decoupled from this JIRA.

 allow BlockReaderLocal to switch between checksumming and not
 -

 Key: HDFS-5634
 URL: https://issues.apache.org/jira/browse/HDFS-5634
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch


 BlockReaderLocal should be able to switch between checksumming and 
 non-checksumming, so that when we get notifications that something is mlocked 
 (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5650) Remove AclReadFlag and AclWriteFlag in FileSystem API

2013-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-5650.
-

   Resolution: Fixed
Fix Version/s: HDFS ACLs (HDFS-4685)
 Hadoop Flags: Reviewed

+1 for the patch.  I committed this to the HDFS-4685 branch.  Thank you to 
Haohui for incorporating this valuable feedback on the API.  Thank you to Vinay 
for code reviews.

 Remove AclReadFlag and AclWriteFlag in FileSystem API
 -

 Key: HDFS-5650
 URL: https://issues.apache.org/jira/browse/HDFS-5650
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS ACLs (HDFS-4685)

 Attachments: HDFS-5650.000.patch, HDFS-5650.001.patch, 
 HDFS-5650.002.patch, HDFS-5650.003.patch, HDFS-5650.004.patch, 
 HDFS-5650.005.patch


 AclReadFlag and AclWriteFlag intended to capture various options used in 
 getfacl and setfacl. These options determine whether the tool should traverse 
 the filesystem recursively, follow the symlink, etc., but they are not part 
 of the core ACLs abstractions.
 The client program has more information and more flexibility to implement 
 these options. This jira proposes to remove these flags to simplify the APIs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5650) Remove AclReadFlag and AclWriteFlag in FileSystem API

2013-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5650:


 Target Version/s: HDFS ACLs (HDFS-4685)
Affects Version/s: HDFS ACLs (HDFS-4685)

 Remove AclReadFlag and AclWriteFlag in FileSystem API
 -

 Key: HDFS-5650
 URL: https://issues.apache.org/jira/browse/HDFS-5650
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS ACLs (HDFS-4685)

 Attachments: HDFS-5650.000.patch, HDFS-5650.001.patch, 
 HDFS-5650.002.patch, HDFS-5650.003.patch, HDFS-5650.004.patch, 
 HDFS-5650.005.patch


 AclReadFlag and AclWriteFlag intended to capture various options used in 
 getfacl and setfacl. These options determine whether the tool should traverse 
 the filesystem recursively, follow the symlink, etc., but they are not part 
 of the core ACLs abstractions.
 The client program has more information and more flexibility to implement 
 these options. This jira proposes to remove these flags to simplify the APIs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5650) Remove AclReadFlag and AclWriteFlag in FileSystem API

2013-12-11 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845718#comment-13845718
 ] 

Chris Nauroth commented on HDFS-5650:
-

I have one more note on part of the change here.  We remove the path member 
from {{AclStatus}}.  The only reason for the path member was to support 
recursive getfacl.  If the recursion had been done server-side, then the result 
set would have needed to specify the file for each returned ACL.  Now that 
recursion will be driven from the client side, we don't need this member 
anymore.  FsShell will always know which path it was working on, so it can 
still print each file during a recursive getfacl.

 Remove AclReadFlag and AclWriteFlag in FileSystem API
 -

 Key: HDFS-5650
 URL: https://issues.apache.org/jira/browse/HDFS-5650
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS ACLs (HDFS-4685)

 Attachments: HDFS-5650.000.patch, HDFS-5650.001.patch, 
 HDFS-5650.002.patch, HDFS-5650.003.patch, HDFS-5650.004.patch, 
 HDFS-5650.005.patch


 AclReadFlag and AclWriteFlag intended to capture various options used in 
 getfacl and setfacl. These options determine whether the tool should traverse 
 the filesystem recursively, follow the symlink, etc., but they are not part 
 of the core ACLs abstractions.
 The client program has more information and more flexibility to implement 
 these options. This jira proposes to remove these flags to simplify the APIs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5477) Block manager as a service

2013-12-11 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated HDFS-5477:
-

Attachment: Proposal.pdf

Fix formatting problems in PDF.

 Block manager as a service
 --

 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf


 The block manager needs to evolve towards having the ability to run as a 
 standalone service to improve NN vertical and horizontal scalability.  The 
 goal is reducing the memory footprint of the NN proper to support larger 
 namespaces, and improve overall performance by decoupling the block manager 
 from the namespace and its lock.  Ideally, a distinct BM will be transparent 
 to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2

2013-12-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-5023:
--

Affects Version/s: 3.0.0

 TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2
 ---

 Key: HDFS-5023
 URL: https://issues.apache.org/jira/browse/HDFS-5023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Ravi Prakash
Assignee: Mit Desai
  Labels: java7, test
 Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, 
 TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, 
 org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt


 The assertion on line 91 is failing. I am using Fedora 19 + JDK7. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7

2013-12-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-5023:
--

Summary: TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7  
(was: TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2)

 TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7
 -

 Key: HDFS-5023
 URL: https://issues.apache.org/jira/browse/HDFS-5023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Ravi Prakash
Assignee: Mit Desai
  Labels: java7, test
 Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, 
 TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, 
 org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt


 The assertion on line 91 is failing. I am using Fedora 19 + JDK7. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7

2013-12-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-5023:
--

Labels: java7 test  (was: test)

 TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7
 -

 Key: HDFS-5023
 URL: https://issues.apache.org/jira/browse/HDFS-5023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Ravi Prakash
Assignee: Mit Desai
  Labels: java7, test
 Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, 
 TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, 
 org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt


 The assertion on line 91 is failing. I am using Fedora 19 + JDK7. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7

2013-12-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated HDFS-5023:
--

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

 TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7
 -

 Key: HDFS-5023
 URL: https://issues.apache.org/jira/browse/HDFS-5023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Ravi Prakash
Assignee: Mit Desai
  Labels: java7, test
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, 
 TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, 
 org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt


 The assertion on line 91 is failing. I am using Fedora 19 + JDK7. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845738#comment-13845738
 ] 

Hudson commented on HDFS-5023:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4868 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4868/])
HDFS-5023. TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7 (Mit 
Desai via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550261)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSnapshotPathINodes.java


 TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7
 -

 Key: HDFS-5023
 URL: https://issues.apache.org/jira/browse/HDFS-5023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Ravi Prakash
Assignee: Mit Desai
  Labels: java7, test
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, 
 TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, 
 org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt


 The assertion on line 91 is failing. I am using Fedora 19 + JDK7. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5607) libHDFS: add support for recursive flag in ACL functions.

2013-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-5607.
-

Resolution: Fixed

I'm resolving this as won't fix.  This is no longer relevant after the API 
design changes in HDFS-5650.

 libHDFS: add support for recursive flag in ACL functions.
 -

 Key: HDFS-5607
 URL: https://issues.apache.org/jira/browse/HDFS-5607
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: libhdfs
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth

 Implement and test handling of recursive flag for all ACL functions in 
 libHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5599) DistributedFileSystem: add support for recursive flag in ACL methods.

2013-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-5599.
-

Resolution: Won't Fix

I'm resolving this as won't fix.  This is no longer relevant after the API 
design changes in HDFS-5650.

 DistributedFileSystem: add support for recursive flag in ACL methods.
 -

 Key: HDFS-5599
 URL: https://issues.apache.org/jira/browse/HDFS-5599
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth

 Implement and test handling of recursive flag for all ACL methods in 
 {{DistributedFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Reopened] (HDFS-5607) libHDFS: add support for recursive flag in ACL functions.

2013-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-5607:
-


 libHDFS: add support for recursive flag in ACL functions.
 -

 Key: HDFS-5607
 URL: https://issues.apache.org/jira/browse/HDFS-5607
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: libhdfs
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth

 Implement and test handling of recursive flag for all ACL functions in 
 libHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5611) WebHDFS: add support for recursive flag in ACL operations.

2013-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-5611.
-

Resolution: Won't Fix

I'm resolving this as won't fix.  This is no longer relevant after the API 
design changes in HDFS-5650.

 WebHDFS: add support for recursive flag in ACL operations.
 --

 Key: HDFS-5611
 URL: https://issues.apache.org/jira/browse/HDFS-5611
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: webhdfs
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Renil J

 Implement and test handling of recursive flag for all ACL operations in 
 WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5607) libHDFS: add support for recursive flag in ACL functions.

2013-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-5607.
-

Resolution: Won't Fix

 libHDFS: add support for recursive flag in ACL functions.
 -

 Key: HDFS-5607
 URL: https://issues.apache.org/jira/browse/HDFS-5607
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: libhdfs
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth

 Implement and test handling of recursive flag for all ACL functions in 
 libHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-4201) NPE in BPServiceActor#sendHeartBeat

2013-12-11 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4201:
---

  Resolution: Fixed
   Fix Version/s: (was: 3.0.0)
  2.3.0
Target Version/s: 2.3.0
  Status: Resolved  (was: Patch Available)

 NPE in BPServiceActor#sendHeartBeat
 ---

 Key: HDFS-4201
 URL: https://issues.apache.org/jira/browse/HDFS-4201
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Eli Collins
Assignee: Jimmy Xiang
Priority: Critical
 Fix For: 2.3.0

 Attachments: trunk-4201.patch, trunk-4201_v2.patch, 
 trunk-4201_v3.patch


 Saw the following NPE in a log.
 Think this is likely due to {{dn}} or {{dn.getFSDataset()}} being null, (not 
 {{bpRegistration}}) due to a configuration or local directory failure.
 {code}
 2012-09-25 04:33:20,782 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 For namenode svsrs00127/11.164.162.226:8020 using DELETEREPORT_INTERVAL of 
 30 msec  BLOCKREPORT_INTERVAL of 2160msec Initial delay: 0msec; 
 heartBeatInterval=3000
 2012-09-25 04:33:20,782 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService 
 for Block pool BP-1678908700-11.164.162.226-1342785481826 (storage id 
 DS-1031100678-11.164.162.251-5010-1341933415989) service to 
 svsrs00127/11.164.162.226:8020
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:434)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:520)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
 at java.lang.Thread.run(Thread.java:722)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5647) Merge INodeDirectory.Feature and INodeFile.Feature

2013-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845746#comment-13845746
 ] 

Hadoop QA commented on HDFS-5647:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618277/HDFS-5647.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5695//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5695//console

This message is automatically generated.

 Merge INodeDirectory.Feature and INodeFile.Feature
 --

 Key: HDFS-5647
 URL: https://issues.apache.org/jira/browse/HDFS-5647
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5647.000.patch, HDFS-5647.001.patch, 
 HDFS-5647.002.patch, HDFS-5647.003.patch


 HDFS-4685 implements ACLs for HDFS, which can benefit from the INode features 
 introduced in HDFS-5284. The current code separates the INode feature of 
 INodeFile and INodeDirectory into two different class hierarchies. This 
 hinders the implementation of ACL since ACL is a concept that applies to both 
 INodeFile and INodeDirectory.
 This jira proposes to merge the two class hierarchies (i.e., 
 INodeDirectory.Feature and INodeFile.Feature) to simplify the implementation 
 of ACLs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5596) Implement RPC stubs

2013-12-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5596:
-

Summary: Implement RPC stubs  (was: DistributedFileSystem: implement 
getAcls and setAcl.)

 Implement RPC stubs
 ---

 Key: HDFS-5596
 URL: https://issues.apache.org/jira/browse/HDFS-5596
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Haohui Mai

 Implement and test {{getAcls}} and {{setAcl}} in {{DistributedFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-4201) NPE in BPServiceActor#sendHeartBeat

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845747#comment-13845747
 ] 

Hudson commented on HDFS-4201:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4869 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4869/])
HDFS-4201. NPE in BPServiceActor#sendHeartBeat (jxiang via cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550269)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java


 NPE in BPServiceActor#sendHeartBeat
 ---

 Key: HDFS-4201
 URL: https://issues.apache.org/jira/browse/HDFS-4201
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Eli Collins
Assignee: Jimmy Xiang
Priority: Critical
 Fix For: 2.3.0

 Attachments: trunk-4201.patch, trunk-4201_v2.patch, 
 trunk-4201_v3.patch


 Saw the following NPE in a log.
 Think this is likely due to {{dn}} or {{dn.getFSDataset()}} being null, (not 
 {{bpRegistration}}) due to a configuration or local directory failure.
 {code}
 2012-09-25 04:33:20,782 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 For namenode svsrs00127/11.164.162.226:8020 using DELETEREPORT_INTERVAL of 
 30 msec  BLOCKREPORT_INTERVAL of 2160msec Initial delay: 0msec; 
 heartBeatInterval=3000
 2012-09-25 04:33:20,782 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService 
 for Block pool BP-1678908700-11.164.162.226-1342785481826 (storage id 
 DS-1031100678-11.164.162.251-5010-1341933415989) service to 
 svsrs00127/11.164.162.226:8020
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:434)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:520)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
 at java.lang.Thread.run(Thread.java:722)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5596) Implement RPC stubs

2013-12-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5596:
-

Description: Implement RPC stubs for both {{DistributedFileSystem}} and 
{{NameNodeRpcServer}}.  (was: Implement and test {{getAcls}} and {{setAcl}} in 
{{DistributedFileSystem}}.)

 Implement RPC stubs
 ---

 Key: HDFS-5596
 URL: https://issues.apache.org/jira/browse/HDFS-5596
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Haohui Mai

 Implement RPC stubs for both {{DistributedFileSystem}} and 
 {{NameNodeRpcServer}}.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845751#comment-13845751
 ] 

Colin Patrick McCabe commented on HDFS-5634:


bq. DirectBufferPool, would it be better to use a Deque's stack operations 
rather than a Queue?  Might give better cache locality to do LIFO rather than 
FIFO.

I examined this code more carefully, and I found that it was actually using 
LIFO at the moment.  The reason is because it uses 
{{ConcurrentLinkedQueue#add}} to add the elements, which add them to the end.  
It then uses {{ConcurrentLinkedQueue#poll}} to get the elements, which gets 
them from the beginning.

 allow BlockReaderLocal to switch between checksumming and not
 -

 Key: HDFS-5634
 URL: https://issues.apache.org/jira/browse/HDFS-5634
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch


 BlockReaderLocal should be able to switch between checksumming and 
 non-checksumming, so that when we get notifications that something is mlocked 
 (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-11 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5634:
---

Attachment: HDFS-5634.003.patch

 allow BlockReaderLocal to switch between checksumming and not
 -

 Key: HDFS-5634
 URL: https://issues.apache.org/jira/browse/HDFS-5634
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, 
 HDFS-5634.003.patch


 BlockReaderLocal should be able to switch between checksumming and 
 non-checksumming, so that when we get notifications that something is mlocked 
 (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5596) Implement RPC stubs

2013-12-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5596:
-

Attachment: HDFS-5596.000.patch

 Implement RPC stubs
 ---

 Key: HDFS-5596
 URL: https://issues.apache.org/jira/browse/HDFS-5596
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Haohui Mai
 Attachments: HDFS-5596.000.patch


 Implement RPC stubs for both {{DistributedFileSystem}} and 
 {{NameNodeRpcServer}}.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5597) DistributedFileSystem: implement modifyAclEntries, removeAclEntries and removeAcl.

2013-12-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-5597.
--

Resolution: Duplicate
  Assignee: Haohui Mai

This jira is implemented within the scope of HDFS-5596. Marking it as a 
duplicate.

 DistributedFileSystem: implement modifyAclEntries, removeAclEntries and 
 removeAcl.
 --

 Key: HDFS-5597
 URL: https://issues.apache.org/jira/browse/HDFS-5597
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Haohui Mai

 Implement and test {{modifyAclEntries}}, {{removeAclEntries}} and 
 {{removeAcl}} in {{DistributedFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5431) support cachepool-based limit management in path-based caching

2013-12-11 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845756#comment-13845756
 ] 

Andrew Wang commented on HDFS-5431:
---

The checkLimit flag makes sense to me, except I'd prefer force, or if you'd 
like it flipped, enforce or strict. This is pretty easy.

I agree on synchronously waiting on the CRM in the listed scenarios, and a CV 
would be a good way of doing this. It's a bit complicated though, since I don't 
think we can get a FSN CV, especially with the new lock context in HDFS-5453 
coming down the pipe I think kicking the CRM, releasing the FSN lock, waiting 
on the CRM CV, then regetting the FSN lock should be okay, but it might be 
simpler to just call into CRM directly to do the rescan.

I'll try the CV version, but if it looks too messy, we can go with a direct 
call.

 support cachepool-based limit management in path-based caching
 --

 Key: HDFS-5431
 URL: https://issues.apache.org/jira/browse/HDFS-5431
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
 Attachments: hdfs-5431-1.patch, hdfs-5431-2.patch


 We should support cachepool-based quota management in path-based caching.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5598) DistributedFileSystem: implement removeDefaultAcl.

2013-12-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-5598.
--

Resolution: Duplicate
  Assignee: Haohui Mai

This jira is implemented within the scope of HDFS-5596. Marking it as a 
duplicate.

 DistributedFileSystem: implement removeDefaultAcl.
 --

 Key: HDFS-5598
 URL: https://issues.apache.org/jira/browse/HDFS-5598
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Haohui Mai

 Implement and test {{removeDefaultAcl}}.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-11 Thread Brandon Li (JIRA)
Brandon Li created HDFS-5657:


 Summary: race condition causes writeback state error in NFS gateway
 Key: HDFS-5657
 URL: https://issues.apache.org/jira/browse/HDFS-5657
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li


A race condition between NFS gateway writeback executor thread and new write 
handler thread can cause writeback state check failure, e.g.,
{noformat}
2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
(Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
(OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
writes, fileId: 30938
2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
(AsyncDataService.java:run(136)) - Asyn data service got 
error:java.lang.IllegalStateException: The openFileCtx has false async status
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:145)
at 
org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
at 
org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
(RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
filesize=917504
2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
(WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 917504 
length:65536 stableHow:0
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5658) Implement ACL as a INode feature

2013-12-11 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-5658:


 Summary: Implement ACL as a INode feature
 Key: HDFS-5658
 URL: https://issues.apache.org/jira/browse/HDFS-5658
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


HDFS-5284 introduces features as generic abstractions to extend the 
functionality of the inodes. The implementation of ACL should leverage the new 
abstractions.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-11 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845769#comment-13845769
 ] 

Brandon Li commented on HDFS-5657:
--

Here is how the race happens:

{noformat}
  /** Invoked by AsynDataService to write back to HDFS */
  void executeWriteBack() {
Preconditions.checkState(asyncStatus,
The openFileCtx has false async status);  == check failed here
try {
  while (activeState) {
WriteCtx toWrite = offerNextToWrite();
if (toWrite != null) {
  // Do the write
  doSingleWrite(toWrite);  === a synchronized method, which 
sets asyncStatus to false
  updateLastAccessTime();
} else {
  break;
}
  }
  
  if (!activeState  LOG.isDebugEnabled()) {
LOG.debug(The openFileCtx is not active anymore, fileId: 
+ latestAttr.getFileId());
  }
} finally {
  // make sure we reset asyncStatus to false
  asyncStatus = false; == before this line is executed, 
OpenFileCtx.checkAndStartWrite sets 
asyncStatus to true and invokes a task. 
When that task calls 
executeWriteBack() again the condition 
check failed.
}
  }
{noformat}

 race condition causes writeback state error in NFS gateway
 --

 Key: HDFS-5657
 URL: https://issues.apache.org/jira/browse/HDFS-5657
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li

 A race condition between NFS gateway writeback executor thread and new write 
 handler thread can cause writeback state check failure, e.g.,
 {noformat}
 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
 (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
 (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
 writes, fileId: 30938
 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
 (AsyncDataService.java:run(136)) - Asyn data service got 
 error:java.lang.IllegalStateException: The openFileCtx has false async status
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
 at 
 org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
 (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
 filesize=917504
 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
 (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
 917504 length:65536 stableHow:0
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5350) Name Node should report fsimage transfer time as a metric

2013-12-11 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-5350:
--

Attachment: trunk-5350.patch

Attached a patch that added metrics for fsimage downloaded/uploaded.

 Name Node should report fsimage transfer time as a metric
 -

 Key: HDFS-5350
 URL: https://issues.apache.org/jira/browse/HDFS-5350
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Rob Weltman
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 3.0.0

 Attachments: trunk-5350.patch


 If the (Secondary) Name Node reported fsimage transfer times (perhaps the 
 last ten of them), monitoring tools could detect slowdowns that might 
 jeopardize cluster stability.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5350) Name Node should report fsimage transfer time as a metric

2013-12-11 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-5350:
--

Fix Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 Name Node should report fsimage transfer time as a metric
 -

 Key: HDFS-5350
 URL: https://issues.apache.org/jira/browse/HDFS-5350
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Rob Weltman
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 3.0.0

 Attachments: trunk-5350.patch


 If the (Secondary) Name Node reported fsimage transfer times (perhaps the 
 last ten of them), monitoring tools could detect slowdowns that might 
 jeopardize cluster stability.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5350) Name Node should report fsimage transfer time as a metric

2013-12-11 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845805#comment-13845805
 ] 

Jimmy Xiang commented on HDFS-5350:
---

I tested the patch on my cluster. Here is the new metrics from the jmx page:
{noformat}
GetImageNumOps : 56,
GetImageAvgTime : 3.75,
PutImageNumOps : 51,
PutImageAvgTime : 80.0
{noformat}

 Name Node should report fsimage transfer time as a metric
 -

 Key: HDFS-5350
 URL: https://issues.apache.org/jira/browse/HDFS-5350
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Rob Weltman
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 3.0.0

 Attachments: trunk-5350.patch


 If the (Secondary) Name Node reported fsimage transfer times (perhaps the 
 last ten of them), monitoring tools could detect slowdowns that might 
 jeopardize cluster stability.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-11 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5634:
---

Attachment: HDFS-5634.004.patch

I optimized the CPU consumption a bit by caching the checksum size and 
bytes-per in final ints, and avoiding the need to re-do some multiplications a 
few times on every read.  perf stat now gives me 305,384,306,460 cycles for 
TestParallelShortCircuitRead, as opposed to 321,040,227,686 cycles before.

 allow BlockReaderLocal to switch between checksumming and not
 -

 Key: HDFS-5634
 URL: https://issues.apache.org/jira/browse/HDFS-5634
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, 
 HDFS-5634.003.patch, HDFS-5634.004.patch


 BlockReaderLocal should be able to switch between checksumming and 
 non-checksumming, so that when we get notifications that something is mlocked 
 (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5648) Get rid of perVolumeReplicaMap

2013-12-11 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5648:


Attachment: h5648.08.patch

Updated patch fixes an unrelated bug exposed by the earlier patch. 
DatanodeStorage was not overriding {{Object.equals()}} and 
{{Object.hashCode()}}.

 Get rid of perVolumeReplicaMap
 --

 Key: HDFS-5648
 URL: https://issues.apache.org/jira/browse/HDFS-5648
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: h5648.02.patch, h5648.08.patch


 The perVolumeReplicaMap in FsDatasetImpl.java is not necessary and can be 
 removed. We continue to use the existing volumeMap.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-12-11 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-2832:


Attachment: h2832_20131211.patch

 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: 20130813-HeterogeneousStorage.pdf, 
 20131125-HeterogeneousStorage-TestPlan.pdf, 
 20131125-HeterogeneousStorage.pdf, 
 20131202-HeterogeneousStorage-TestPlan.pdf, 
 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, 
 editsStored, h2832_20131023.patch, h2832_20131023b.patch, 
 h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, 
 h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, 
 h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, 
 h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, 
 h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, 
 h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, 
 h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, 
 h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, 
 h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845879#comment-13845879
 ] 

Hadoop QA commented on HDFS-5634:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618297/HDFS-5634.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5696//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5696//console

This message is automatically generated.

 allow BlockReaderLocal to switch between checksumming and not
 -

 Key: HDFS-5634
 URL: https://issues.apache.org/jira/browse/HDFS-5634
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, 
 HDFS-5634.003.patch, HDFS-5634.004.patch


 BlockReaderLocal should be able to switch between checksumming and 
 non-checksumming, so that when we get notifications that something is mlocked 
 (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5596) Implement RPC stubs

2013-12-11 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845915#comment-13845915
 ] 

Chris Nauroth commented on HDFS-5596:
-

Nice work, Haohui.  A few comments:
# {{ClientProtocol}}: All RPCs can be annotated idempotent.  The implementation 
will be such that repeated application of the same request will yield the same 
result.  For example, a retried {{removeDefaultAcl}} call yields the same 
result whether the first call reaches the server, the second call reaches the 
server, or both.  The end result is always the prior ACL entries with all 
default entries removed.
# {{ReadonlyIterableAdaptor}}: (Optional) Do you think this is worth promoting 
to a top-level class in {{org.apache.hadoop.hdfs.util}}?  It's not directly 
coupled to the rest of the serialization code, and perhaps it will be useful 
elsewhere.
# {{DFSClient}}: There are a few more exception types that would be helpful to 
unwrap on the modification operations.  I think the full list of interesting 
exceptions for all modification operations would be: 
{{AccessControlException}}, {{FileNotFoundException}}, {{SafeModeException}}, 
{{UnresolvedPathException}}, {{SnapshotAccessControlException}}, and 
{{NSQuotaExceededException}}.  However, I'm also wondering if we ought to 
simplify the whole thing and call {{unwrapRemoteException}} with no args for 
all of these new methods.  What do you think?
# {{TestPBHelper}}: Let's add a test for conversion of {{AclStatus}} too.



 Implement RPC stubs
 ---

 Key: HDFS-5596
 URL: https://issues.apache.org/jira/browse/HDFS-5596
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Haohui Mai
 Attachments: HDFS-5596.000.patch


 Implement RPC stubs for both {{DistributedFileSystem}} and 
 {{NameNodeRpcServer}}.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5350) Name Node should report fsimage transfer time as a metric

2013-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845961#comment-13845961
 ] 

Hadoop QA commented on HDFS-5350:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618308/trunk-5350.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5697//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5697//console

This message is automatically generated.

 Name Node should report fsimage transfer time as a metric
 -

 Key: HDFS-5350
 URL: https://issues.apache.org/jira/browse/HDFS-5350
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Rob Weltman
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 3.0.0

 Attachments: trunk-5350.patch


 If the (Secondary) Name Node reported fsimage transfer times (perhaps the 
 last ten of them), monitoring tools could detect slowdowns that might 
 jeopardize cluster stability.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845977#comment-13845977
 ] 

Hadoop QA commented on HDFS-5634:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618311/HDFS-5634.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5698//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5698//console

This message is automatically generated.

 allow BlockReaderLocal to switch between checksumming and not
 -

 Key: HDFS-5634
 URL: https://issues.apache.org/jira/browse/HDFS-5634
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, 
 HDFS-5634.003.patch, HDFS-5634.004.patch


 BlockReaderLocal should be able to switch between checksumming and 
 non-checksumming, so that when we get notifications that something is mlocked 
 (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845992#comment-13845992
 ] 

Hadoop QA commented on HDFS-2832:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618314/h2832_20131211.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 48 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
-12 warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5699//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5699//console

This message is automatically generated.

 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: 20130813-HeterogeneousStorage.pdf, 
 20131125-HeterogeneousStorage-TestPlan.pdf, 
 20131125-HeterogeneousStorage.pdf, 
 20131202-HeterogeneousStorage-TestPlan.pdf, 
 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, 
 editsStored, h2832_20131023.patch, h2832_20131023b.patch, 
 h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, 
 h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, 
 h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, 
 h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, 
 h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, 
 h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, 
 h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, 
 h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, 
 h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-11 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5657:
-

Attachment: HDFS-5657.001.patch

 race condition causes writeback state error in NFS gateway
 --

 Key: HDFS-5657
 URL: https://issues.apache.org/jira/browse/HDFS-5657
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5657.001.patch


 A race condition between NFS gateway writeback executor thread and new write 
 handler thread can cause writeback state check failure, e.g.,
 {noformat}
 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
 (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
 (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
 writes, fileId: 30938
 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
 (AsyncDataService.java:run(136)) - Asyn data service got 
 error:java.lang.IllegalStateException: The openFileCtx has false async status
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
 at 
 org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
 (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
 filesize=917504
 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
 (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
 917504 length:65536 stableHow:0
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-11 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5657:
-

Status: Patch Available  (was: Open)

 race condition causes writeback state error in NFS gateway
 --

 Key: HDFS-5657
 URL: https://issues.apache.org/jira/browse/HDFS-5657
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5657.001.patch


 A race condition between NFS gateway writeback executor thread and new write 
 handler thread can cause writeback state check failure, e.g.,
 {noformat}
 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
 (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
 (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
 writes, fileId: 30938
 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
 (AsyncDataService.java:run(136)) - Asyn data service got 
 error:java.lang.IllegalStateException: The openFileCtx has false async status
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
 at 
 org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
 (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
 filesize=917504
 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
 (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
 917504 length:65536 stableHow:0
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure

2013-12-11 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846002#comment-13846002
 ] 

Liang Xie commented on HDFS-4273:
-

Oh, my stupid:)

 Problem in DFSInputStream read retry logic may cause early failure
 --

 Key: HDFS-4273
 URL: https://issues.apache.org/jira/browse/HDFS-4273
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, 
 HDFS-4273.v4.patch, HDFS-4273.v5.patch, TestDFSInputStream.java


 Assume the following call logic
 {noformat} 
 readWithStrategy()
   - blockSeekTo()
   - readBuffer()
  - reader.doRead()
  - seekToNewSource() add currentNode to deadnode, wish to get a 
 different datanode
 - blockSeekTo()
- chooseDataNode()
   - block missing, clear deadNodes and pick the currentNode again
 seekToNewSource() return false
  readBuffer() re-throw the exception quit loop
 readWithStrategy() got the exception,  and may fail the read call before 
 tried MaxBlockAcquireFailures.
 {noformat} 
 some issues of the logic:
 1. seekToNewSource() logic is broken because it may clear deadNodes in the 
 middle.
 2. the variable int retries=2 in readWithStrategy seems have conflict with 
 MaxBlockAcquireFailures, should it be removed?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


  1   2   >