date:20140113

Kousuke Saruta created HDFS-5761:


 Summary: DataNode fail to validate integrity for checksum type 
NULL when DataNode recovers 
 Key: HDFS-5761
 URL: https://issues.apache.org/jira/browse/HDFS-5761
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


When DataNode is down during writing blocks, the blocks are not filinalized and 
the next time DataNode recovers, integrity validation will run.
But if we use NULL for checksum algorithm (we can set NULL to 
dfs.checksum.type), DataNode will fail to validate integrity and cannot be up. 

The cause is in BlockPoolSlice#validateIntegrity.
In the method, there is following code.

{code}
long numChunks = Math.min(
  (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, 
  (metaFileLen - crcHeaderLen)/checksumSize);
{code}

When we choose NULL checksum, checksumSize is 0 so ArithmeticException will be 
thrown and DataNode cannot be up.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers


 [ 
https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HDFS-5761:
-

Summary: DataNode fails to validate integrity for checksum type NULL when 
DataNode recovers   (was: DataNode fail to validate integrity for checksum type 
NULL when DataNode recovers )

 DataNode fails to validate integrity for checksum type NULL when DataNode 
 recovers 
 ---

 Key: HDFS-5761
 URL: https://issues.apache.org/jira/browse/HDFS-5761
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta

 When DataNode is down during writing blocks, the blocks are not filinalized 
 and the next time DataNode recovers, integrity validation will run.
 But if we use NULL for checksum algorithm (we can set NULL to 
 dfs.checksum.type), DataNode will fail to validate integrity and cannot be 
 up. 
 The cause is in BlockPoolSlice#validateIntegrity.
 In the method, there is following code.
 {code}
 long numChunks = Math.min(
   (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, 
   (metaFileLen - crcHeaderLen)/checksumSize);
 {code}
 When we choose NULL checksum, checksumSize is 0 so ArithmeticException will 
 be thrown and DataNode cannot be up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers


 [ 
https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HDFS-5761:
-

Attachment: HDFS-5761.patch

I've attached a patch for this issue.

 DataNode fails to validate integrity for checksum type NULL when DataNode 
 recovers 
 ---

 Key: HDFS-5761
 URL: https://issues.apache.org/jira/browse/HDFS-5761
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta
 Attachments: HDFS-5761.patch


 When DataNode is down during writing blocks, the blocks are not filinalized 
 and the next time DataNode recovers, integrity validation will run.
 But if we use NULL for checksum algorithm (we can set NULL to 
 dfs.checksum.type), DataNode will fail to validate integrity and cannot be 
 up. 
 The cause is in BlockPoolSlice#validateIntegrity.
 In the method, there is following code.
 {code}
 long numChunks = Math.min(
   (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, 
   (metaFileLen - crcHeaderLen)/checksumSize);
 {code}
 When we choose NULL checksum, checksumSize is 0 so ArithmeticException will 
 be thrown and DataNode cannot be up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers


 [ 
https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HDFS-5761:
-

Status: Patch Available  (was: Open)

 DataNode fails to validate integrity for checksum type NULL when DataNode 
 recovers 
 ---

 Key: HDFS-5761
 URL: https://issues.apache.org/jira/browse/HDFS-5761
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta
 Attachments: HDFS-5761.patch


 When DataNode is down during writing blocks, the blocks are not filinalized 
 and the next time DataNode recovers, integrity validation will run.
 But if we use NULL for checksum algorithm (we can set NULL to 
 dfs.checksum.type), DataNode will fail to validate integrity and cannot be 
 up. 
 The cause is in BlockPoolSlice#validateIntegrity.
 In the method, there is following code.
 {code}
 long numChunks = Math.min(
   (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, 
   (metaFileLen - crcHeaderLen)/checksumSize);
 {code}
 When we choose NULL checksum, checksumSize is 0 so ArithmeticException will 
 be thrown and DataNode cannot be up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers

2014-01-13 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869719#comment-13869719
 ] 

Uma Maheswara Rao G commented on HDFS-5761:
---

Thanks for filing a JIRA. I noticed this when I was looking the JIRA HDFS-5728.
Actually validate integrity check not necessary when it is set to NULL. It 
should consider full file length as is.
I think  the below array becomes 0 length array when checksumSize 0? 
{code}
byte[] buf = new byte[lastChunkSize+checksumSize];
{code}

So, how about just considering blockFileLength when crc type is NULL? Because 
crc is null now, so we need not care about integrity check with CRC file at all 
right.

 DataNode fails to validate integrity for checksum type NULL when DataNode 
 recovers 
 ---

 Key: HDFS-5761
 URL: https://issues.apache.org/jira/browse/HDFS-5761
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta
 Attachments: HDFS-5761.patch


 When DataNode is down during writing blocks, the blocks are not filinalized 
 and the next time DataNode recovers, integrity validation will run.
 But if we use NULL for checksum algorithm (we can set NULL to 
 dfs.checksum.type), DataNode will fail to validate integrity and cannot be 
 up. 
 The cause is in BlockPoolSlice#validateIntegrity.
 In the method, there is following code.
 {code}
 long numChunks = Math.min(
   (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, 
   (metaFileLen - crcHeaderLen)/checksumSize);
 {code}
 When we choose NULL checksum, checksumSize is 0 so ArithmeticException will 
 be thrown and DataNode cannot be up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null


[ 
https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869791#comment-13869791
 ] 

Jing Zhao commented on HDFS-5710:
-

+1 patch looks good to me. I will commit it shortly.

 FSDirectory#getFullPathName should check inodes against null
 

 Key: HDFS-5710
 URL: https://issues.apache.org/jira/browse/HDFS-5710
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Ted Yu
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-5710.patch, hdfs-5710-output.html


 From 
 https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/
  :
 {code}
 2014-01-01 00:10:15,571 INFO  [IPC Server handler 2 on 50198] 
 blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: 
 blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 
 2014-01-01 00:10:16,559 WARN  
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  namenode.FSDirectory(1854): Could not get full path. Corresponding file 
 might have deleted already.
 2014-01-01 00:10:16,560 FATAL 
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor 
 thread received Runtime exception. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871)
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482)
   at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112)
   at java.lang.Thread.run(Thread.java:724)
 {code}
 Looks like getRelativePathINodes() returned null but getFullPathName() didn't 
 check inodes against null, leading to NPE.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5477) Block manager as a service


[ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869798#comment-13869798
 ] 

Colin Patrick McCabe commented on HDFS-5477:


Hi Daryn,

This seems like a great direction for HDFS to go in.  Just a few comments.

You list scalability as a primary concern.  However, even if we separate the BM 
from the namespace management, a cluster with a large number of blocks will 
still have a giant BM heap (if I understand correctly).  So perhaps what we 
need is the ability to have multiple block manager daemons?

It seems like there will be a lot of messages that will necessarily flow 
between the namespace daemon and the block management daemon(s).  What IPC 
mechanism are you considering?  TCP socket?  UNIX domain socket?  Shared 
memory?  Shared memory would clearly be the highest performance, and perhaps we 
should consider that.

Is there an upstream svn branch for this yet?

 Block manager as a service
 --

 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf, 
 Standalone BM.pdf


 The block manager needs to evolve towards having the ability to run as a 
 standalone service to improve NN vertical and horizontal scalability.  The 
 goal is reducing the memory footprint of the NN proper to support larger 
 namespaces, and improve overall performance by decoupling the block manager 
 from the namespace and its lock.  Ideally, a distinct BM will be transparent 
 to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf


[ 
https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869803#comment-13869803
 ] 

Jing Zhao commented on HDFS-5738:
-

# The current 002 patch still cannot be compiled. Looks like you are missing 
the following changes in the patch:
{code}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
 b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/n
index 344a6a0..18dd768 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
@@ -195,6 +195,7 @@ private void loadINode(InputStream in, FileHeader.Section 
header)
   static final class Saver {
 final SaveNamespaceContext context;
 private MD5Hash savedDigest;
+private long currentOffset = PRE_ALLOCATED_HEADER_SIZE;
 
 Saver(SaveNamespaceContext context) {
   this.context = context;
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto
index 5df8fd1..0855102 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto
@@ -55,6 +55,7 @@ message FileHeader {
   message Section {
 optional string name = 1;
 optional uint64 length = 2;
+optional uint64 offset = 3;
   }
   repeated Section sections = 5;
 }
{code}
# In the meanwhile, for non-snapshot information, we also need to handle 
FileUnderConstruction information. This can be handled in either this jira or a 
separate jira (such as HDFS-5743).
# The section index information may be moved to the end of the fsimge as a 
footer? This can simplify the current code and avoid the 1KB allocation. This 
is optional and we can continue improving the protobuf definition in new jiras.


 Serialize INode information in protobuf
 ---

 Key: HDFS-5738
 URL: https://issues.apache.org/jira/browse/HDFS-5738
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, 
 HDFS-5738.002.patch


 This jira proposes to serialize inode information with protobuf. 
 Snapshot-related information are out of the scope of this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers


[ 
https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869813#comment-13869813
 ] 

Hadoop QA commented on HDFS-5761:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622648/HDFS-5761.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5865//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5865//console

This message is automatically generated.

 DataNode fails to validate integrity for checksum type NULL when DataNode 
 recovers 
 ---

 Key: HDFS-5761
 URL: https://issues.apache.org/jira/browse/HDFS-5761
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta
 Attachments: HDFS-5761.patch


 When DataNode is down during writing blocks, the blocks are not filinalized 
 and the next time DataNode recovers, integrity validation will run.
 But if we use NULL for checksum algorithm (we can set NULL to 
 dfs.checksum.type), DataNode will fail to validate integrity and cannot be 
 up. 
 The cause is in BlockPoolSlice#validateIntegrity.
 In the method, there is following code.
 {code}
 long numChunks = Math.min(
   (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, 
   (metaFileLen - crcHeaderLen)/checksumSize);
 {code}
 When we choose NULL checksum, checksumSize is 0 so ArithmeticException will 
 be thrown and DataNode cannot be up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null


 [ 
https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5710:


   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks to Ted for the report and Thank you Uma for the fix! I've committed this 
to trunk and branch-2.

 FSDirectory#getFullPathName should check inodes against null
 

 Key: HDFS-5710
 URL: https://issues.apache.org/jira/browse/HDFS-5710
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Ted Yu
Assignee: Uma Maheswara Rao G
 Fix For: 2.4.0

 Attachments: HDFS-5710.patch, hdfs-5710-output.html


 From 
 https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/
  :
 {code}
 2014-01-01 00:10:15,571 INFO  [IPC Server handler 2 on 50198] 
 blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: 
 blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 
 2014-01-01 00:10:16,559 WARN  
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  namenode.FSDirectory(1854): Could not get full path. Corresponding file 
 might have deleted already.
 2014-01-01 00:10:16,560 FATAL 
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor 
 thread received Runtime exception. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871)
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482)
   at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112)
   at java.lang.Thread.run(Thread.java:724)
 {code}
 Looks like getRelativePathINodes() returned null but getFullPathName() didn't 
 check inodes against null, leading to NPE.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null

2014-01-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869822#comment-13869822
 ] 

Hudson commented on HDFS-5710:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4992 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4992/])
HDFS-5710. FSDirectory#getFullPathName should check inodes against null. 
Contributed by Uma Maheswara Rao G. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1557803)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java


 FSDirectory#getFullPathName should check inodes against null
 

 Key: HDFS-5710
 URL: https://issues.apache.org/jira/browse/HDFS-5710
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Ted Yu
Assignee: Uma Maheswara Rao G
 Fix For: 2.4.0

 Attachments: HDFS-5710.patch, hdfs-5710-output.html


 From 
 https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/
  :
 {code}
 2014-01-01 00:10:15,571 INFO  [IPC Server handler 2 on 50198] 
 blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: 
 blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 
 2014-01-01 00:10:16,559 WARN  
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  namenode.FSDirectory(1854): Could not get full path. Corresponding file 
 might have deleted already.
 2014-01-01 00:10:16,560 FATAL 
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor 
 thread received Runtime exception. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871)
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482)
   at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112)
   at java.lang.Thread.run(Thread.java:724)
 {code}
 Looks like getRelativePathINodes() returned null but getFullPathName() didn't 
 check inodes against null, leading to NPE.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-01-13 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jimmy Xiang reassigned HDFS-4239:
-

Assignee: Jimmy Xiang

Means of telling the datanode to stop using a sick disk
---

Key: HDFS-4239
URL: https://issues.apache.org/jira/browse/HDFS-4239
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: stack
Assignee: Jimmy Xiang

If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing
occasionally, or just exhibiting high latency -- your choices are:
1. Decommission the total datanode. If the datanode is carrying 6 or 12
disks of data, especially on a cluster that is smallish -- 5 to 20 nodes --
the rereplication of the downed datanode's data can be pretty disruptive,
especially if the cluster is doing low latency serving: e.g. hosting an hbase
cluster.
2. Stop the datanode, unmount the bad disk, and restart the datanode (You
can't unmount the disk while it is in use). This latter is better in that
only the bad disk's data is rereplicated, not all datanode data.
Is it possible to do better, say, send the datanode a signal to tell it stop
using a disk an operator has designated 'bad'. This would be like option #2
above minus the need to stop and restart the datanode. Ideally the disk
would become unmountable after a while.
Nice to have would be being able to tell the datanode to restart using a disk
after its been replaced.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5760) Fix HttpServer construct


[ 
https://issues.apache.org/jira/browse/HDFS-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869877#comment-13869877
 ] 

Haohui Mai commented on HDFS-5760:
--

[~echarles], the first approach has been deprecated for a while. These methods 
will be removed from trunk shortly. Can you please fix HBase instead?

 Fix HttpServer construct
 

 Key: HDFS-5760
 URL: https://issues.apache.org/jira/browse/HDFS-5760
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Eric Charles
 Attachments: HDFS-5760-1.patch


 o.a.h.h.HttpServer can can be instanciated and configured:
 1. Via classical constructor
 2. Via static build method
 Those 2 methods don't populate the same way the (deprecated) hostname and 
 port, nor the jetty Connector.
 This gives issue when using hbase on hadoop3 (HBASE-6581)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5612) NameNode: change all permission checks to enforce ACLs in addition to permissions.

2014-01-13 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5612:


Attachment: HDFS-5612.3.patch

I'm attaching version 3 of this patch.  This version has been updated in 
reaction to the recent changes on HDFS-5758.  {{FSPermissionChecker}} has been 
updated to pull the relevant pieces of the whole logical ACL from either 
{{FsPermission}} or the list of {{AclEntry}}.  I've also added comments to 
document the invariants described earlier.

Overall, the HDFS-5758 changes didn't add that much complexity to 
{{FSPermissionChecker}}, so I'm satisfied with the end result.

 NameNode: change all permission checks to enforce ACLs in addition to 
 permissions.
 --

 Key: HDFS-5612
 URL: https://issues.apache.org/jira/browse/HDFS-5612
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-5612.1.patch, HDFS-5612.2.patch, HDFS-5612.3.patch


 All {{NameNode}} code paths that enforce permissions must be updated so that 
 they also enforce ACLs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Pluggable interface for replica counting

2014-01-13 Thread Eric Sirianni (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1387#comment-1387
 ] 

Eric Sirianni commented on HDFS-5318:
-

bq. 1. Block is finalized, r/w replica is lost, r/o replica is available. In 
this case the existing NN replication mechanisms will cause an extra replica to 
be created
Isn't this case equivalent to the case where the R/W replica is offline in 
general (i.e. not just for pipeline recovery)?  

bq. q. what happens if a client attempts to append before the replication 
happens?
Independent of how replicas are counted, whenever a R/W replica is offline, 
appends will not be possible (in the current implementation) until a new R/W 
replica is created (via inter-datanode replication from a R/O replica).  Are 
you proposing a solution to this (ability to create an append pipeline from 
only R/O replicas)?

bq. 4. Client should be able to bootstrap a write pipeline with read-only 
replicas.
Not sure I fully understand here.  Is this how you envision solving the append 
problem when R/W replica is offline?

 Pluggable interface for replica counting
 

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeSpi interfaces

2014-01-13 Thread David Powell (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870043#comment-13870043
]

David Powell commented on HDFS-5751:

Alas, I am intimately familiar with the reimplementation necessary, and wish
there was less of it to do and to maintain. That said, precluding alternate
implementations because creating one would require more than the ideal amount
of work feels like throwing the baby out with the bathwater.

Moving the abstraction lower is along the lines of what I had in mind when I
suggested the middle ground of changes that reduce mainline maintenance burden
while preserving a usable interface for others. I think the lower surface of
the official FsDatasetImpl is far too low, however, and that comparing HDFS
with ext3fs is both underestimating the complexity and modularity of HDFS and
overestimating the versatility of the simple interface a traditional filesystem
consumes. Which is to say, I think there is a class of problems which would
lead one to replace a traditional filesystem entirely, but could be solved much
more elegantly in HDFS given its components' architectural separation.

Remove the FsDatasetSpi and FsVolumeSpi interfaces
--

Key: HDFS-5751
URL: https://issues.apache.org/jira/browse/HDFS-5751
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode, test
Affects Versions: 3.0.0
Reporter: Arpit Agarwal

The in-memory block map and disk interface portions of the DataNode have been
abstracted out into an {{FsDatasetpSpi}} interface, which further uses
{{FsVolumeSpi}} to represent individual volumes.
The abstraction is useful as it allows DataNode tests to use a
{{SimulatedFSDataset}} which does not write any data to disk. Instead it just
stores block metadata in memory and returns zeroes for all reads. This is
useful for both unit testing and for simulating arbitrarily large datanodes
without having to provision real disk capacity.
A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and
{{SimulatedFSDataset}} implement {{FsDatasetSpi}}.
However there are a few problems with this approach:
# Using the factory class significantly complicates the code flow for the
common case. This makes the code harder to understand and debug.
# There is additional burden of maintaining two different dataset
implementations.
# Fidelity between the two implementations is poor.
Instead we can eliminate the SPIs and just hide the disk read/write routines
with a dependency injection framework like Google Guice.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf


[ 
https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870054#comment-13870054
 ] 

Haohui Mai commented on HDFS-5738:
--

Thanks Jing for the review. The v3 patch changes the following:

# It moves the fileheader to the end of the fsimage.
# It adds an entry in the INodeDirectorySection only if the corresponding 
directory has children.
# Minor refactor and clean ups.

I plan to handle FileUnderConstruction in a separate jira.

 Serialize INode information in protobuf
 ---

 Key: HDFS-5738
 URL: https://issues.apache.org/jira/browse/HDFS-5738
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, 
 HDFS-5738.002.patch, HDFS-5738.003.patch


 This jira proposes to serialize inode information with protobuf. 
 Snapshot-related information are out of the scope of this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5738) Serialize INode information in protobuf


 [ 
https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5738:
-

Attachment: HDFS-5738.003.patch

 Serialize INode information in protobuf
 ---

 Key: HDFS-5738
 URL: https://issues.apache.org/jira/browse/HDFS-5738
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, 
 HDFS-5738.002.patch, HDFS-5738.003.patch


 This jira proposes to serialize inode information with protobuf. 
 Snapshot-related information are out of the scope of this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5741) BlockInfo#findDataNode can be removed


 [ 
https://issues.apache.org/jira/browse/HDFS-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5741:


Summary: BlockInfo#findDataNode can be removed  (was: 
BlockInfo#findDataNode can be deprecated)

 BlockInfo#findDataNode can be removed
 -

 Key: HDFS-5741
 URL: https://issues.apache.org/jira/browse/HDFS-5741
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
Priority: Minor

 NN now tracks replicas by storage, so {{BlockInfo#findDataNode}} can be 
 replaced with {{BlockInfo#findStorageInfo}}.
 {{BlockManager#reportDiff}} is being fixed as part of HDFS-5483, this Jira is 
 to fix the rest of the callers.
 [suggested by [~sirianni] on HDFS-5483]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5741) BlockInfo#findDataNode can be removed


[ 
https://issues.apache.org/jira/browse/HDFS-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870085#comment-13870085
 ] 

Arpit Agarwal commented on HDFS-5741:
-

Thanks, edited description.

 BlockInfo#findDataNode can be removed
 -

 Key: HDFS-5741
 URL: https://issues.apache.org/jira/browse/HDFS-5741
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
Priority: Minor

 NN now tracks replicas by storage, so {{BlockInfo#findDataNode}} can be 
 replaced with {{BlockInfo#findStorageInfo}}.
 {{BlockManager#reportDiff}} is being fixed as part of HDFS-5483, this Jira is 
 to fix the rest of the callers.
 [suggested by [~sirianni] on HDFS-5483]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours


[ 
https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870079#comment-13870079
 ] 

Jing Zhao commented on HDFS-5579:
-

The javadoc warning and TestSafeMode failure should be unrelated. I will commit 
the patch shortly.

 Under construction files make DataNode decommission take very long hours
 

 Key: HDFS-5579
 URL: https://issues.apache.org/jira/browse/HDFS-5579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.2.0
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch


 We noticed that some times decommission DataNodes takes very long time, even 
 exceeds 100 hours.
 After check the code, I found that in 
 BlockManager:computeReplicationWorkForBlocks(ListListBlock 
 blocksToReplicate) it won't replicate blocks which belongs to under 
 construction files, however in 
 BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there  
 is block need replicate no matter whether it belongs to under construction or 
 not, the decommission progress will continue running.
 That's the reason some time the decommission takes very long time.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA


 [ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5138:
-

Attachment: HDFS-5138.patch

{quote}
+ // This is expected to happen for a stanby NN.
Typo (standby)
{quote}

Thanks, fixed.

{quote}
+ // Either they all return the same thing or this call fails, so we can
+ // just return the first result.
Would be good to assert that - eg in case one of the JNs crashed in the middle 
of a previously attempted upgrade sequence.
{quote}

Sure, done.

{quote}
* @param useLock true - enables locking on the storage directory and false
* disables locking
+ * @param isShared whether or not this dir is shared between two NNs. true
+ * enables locking on the storage directory, false disables locking
I think this doc is now wrong because you inverted the sense of these booleans 
- we don't lock the shared dir.
{quote}

Good catch. Fixed.

{quote}
+ public synchronized void doFinalizeOfSharedLog() throws IOException {
+ public synchronized boolean canRollBackSharedLog(Storage prevStorage,
Style nit: extra space in the above two methods
{quote}

Fixed.

{quote}
+ if (!sd.isShared()) {
+ // This will be done on transition to active.
Worth a LOG.info or even warn here
{quote}

Added the following:

{code}
LOG.info(Not doing recovery on  + sd +  now. Will be done on 
+ transition to active.);
{code}

bq. Currently it seems like whichever SBN starts up first has to be the one who 
does the transition to active. Maybe a follow-up JIRA could be to relax that 
constraint? Seems like it should be fine for either one of the NNs to actually 
do the upgrade - the lock file is just to make sure they agree on the target 
ctime.

Agree this seems like a good idea, but agree it can reasonably be done in a 
follow-up JIRA. If you agree, I'll file it when we commit this one.

{quote}
+ dfsadmin -finalizeUpgrade' command while the NNs are running and one of 
them
+ is active. The active NN at the time this happens will perform the upgrade of
+ the shared log, and both of the NNs will finalize the upgrade in their local
I think here you mean the finalization of the shared log
{quote}

Sure did. Fixed.

 Support HDFS upgrade in HA
 --

 Key: HDFS-5138
 URL: https://issues.apache.org/jira/browse/HDFS-5138
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Kihwal Lee
Assignee: Aaron T. Myers
Priority: Blocker
 Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
 HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch


 With HA enabled, NN wo't start with -upgrade. Since there has been a layout 
 version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
 necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
 to get around this was to disable HA and upgrade. 
 The NN and the cluster cannot be flipped back to HA until the upgrade is 
 finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
 back on without involving DNs, things will work, but finaliizeUpgrade won't 
 work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
 snapshots won't get removed.
 We will need a different ways of doing layout upgrade and upgrade snapshot.  
 I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
 there is a reasonable workaround that does not increase maintenance window 
 greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5752) Add a new DFSAdminCommand for rolling upgrade

2014-01-13 Thread Tsz Wo (Nicholas), SZE (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5752:
-

Attachment: h5752_20140114.patch

h5752_20140114.patch: adds cli usage.

 Add a new DFSAdminCommand for rolling upgrade
 -

 Key: HDFS-5752
 URL: https://issues.apache.org/jira/browse/HDFS-5752
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5752_20140112.patch, h5752_20140114.patch


 We need to add a new DFSAdmin to start, finalize and query rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.

2014-01-13 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870172#comment-13870172
 ] 

Chris Nauroth commented on HDFS-5608:
-

[~sachinjose2...@gmail.com], thank you for addressing the feedback.  Here are 
some additional comments based on the new patch:
# {{DFSConfigKeys}}: The regex does not allow for default ACL entries.  
Basically, these look identical to the access entries but have default: 
prepended.  Also, the regex only allows entries of type user or group, so 
it would reject the mask and other entries.  Also, the regex does not allow 
for an ACL entry that does not have a permission.  For {{removeAclEntries}}, 
the user supplies an ACL spec with no permissions in the entries.  You might 
want to take a look at the CLI implementation and HADOOP-10213 for more 
examples of this.
# {{JsonUtil#toJsonString}}: I'm curious if we can skip the manual conversion 
to {{entriesStringlist}} and just pass the {{ListAclEntry}} directly.  Will 
the JSON conversion automatically use the {{toString}} representation?  If this 
conversion really is necessary, then a small improvement would be to use 
Guava's {{Lists.newArrayListWithCapacity}} and pass {{entries.size()}} for the 
capacity to prevent allocating too large of an array or causing an immediate 
reallocation if the default initial array size turns out to be too small.
# {{JsonUtil#toAclStatus}}: Some of my earlier comments about the regex are 
applicable here too.  This method needs to be able to handle default ACL 
entries and call {{setScope}} on the builder.  It needs to be able to handle 
the mask entry.  Use other instead of others (singular, not plural).
# {{AclPermissionParam#DEFAULT}}: Was the default value meant to be empty 
string?  I don't think there is any specific ACL value that we could choose as 
the default, because it could risk accidentally expanding access if the caller 
forgets to provide the query parameter.
# {{AclPermissionParam#parseAclSpec}}: This is another spot where the earlier 
feedback on the regex has an impact.  This logic is very similar to 
{{AclCommands#SetfaclCommand#parseAclSpec}}.  Can we work out a way for both 
the CLI and WebHDFS to use a common method?  The parsing logic would be the 
same for both.
# {{PutOpParam}}: I think {{GETACLS}} needs to be removed.
# Nitpick: the Hadoop project code standard wraps lines at 80 characters, 
indents code blocks by 2 spaces, and uses spaces (not tabs) for indentation.  
There are a few places in the patch that need to be converted to this standard.


 WebHDFS: implement GETACLSTATUS and SETACL.
 ---

 Key: HDFS-5608
 URL: https://issues.apache.org/jira/browse/HDFS-5608
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: webhdfs
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Sachin Jose
 Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch


 Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Pluggable interface for replica counting


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870192#comment-13870192
 ] 

Arpit Agarwal commented on HDFS-5318:
-

Yes, we must allow support starting pipeline when read-only replicas in case 
the r/w replica is offline else append will be broken. One way to do it is to 
generate a copy of the replica on a read-write storage and then kick off the 
pipeline.

There is some precedence for doing so 
({{DFSOutputStream#addDatanode2ExistingPipeline}}).

 Pluggable interface for replica counting
 

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads

Colin Patrick McCabe created HDFS-5762:
--

 Summary: BlockReaderLocal doesn't return -1 on EOF when doing 
zero-length reads
 Key: HDFS-5762
 URL: https://issues.apache.org/jira/browse/HDFS-5762
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


Unlike the other block readers, BlockReaderLocal currently doesn't return -1 on 
EOF when doing zero-length reads.  This behavior, in turn, propagates to the 
DFSInputStream.  BlockReaderLocal should do this, so that client can determine 
whether the file is at an end by doing a zero-length read and checking for -1.

One place this shows up is in libhdfs, which does such a 0-length read to 
determine if direct (i.e., ByteBuffer) reads are supported.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Work started] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads


 [ 
https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5762 started by Colin Patrick McCabe.

 BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
 --

 Key: HDFS-5762
 URL: https://issues.apache.org/jira/browse/HDFS-5762
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5762.001.patch


 Unlike the other block readers, BlockReaderLocal currently doesn't return -1 
 on EOF when doing zero-length reads.  This behavior, in turn, propagates to 
 the DFSInputStream.  BlockReaderLocal should do this, so that client can 
 determine whether the file is at an end by doing a zero-length read and 
 checking for -1.
 One place this shows up is in libhdfs, which does such a 0-length read to 
 determine if direct (i.e., ByteBuffer) reads are supported.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads


 [ 
https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5762:
---

Attachment: HDFS-5762.001.patch

 BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
 --

 Key: HDFS-5762
 URL: https://issues.apache.org/jira/browse/HDFS-5762
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5762.001.patch


 Unlike the other block readers, BlockReaderLocal currently doesn't return -1 
 on EOF when doing zero-length reads.  This behavior, in turn, propagates to 
 the DFSInputStream.  BlockReaderLocal should do this, so that client can 
 determine whether the file is at an end by doing a zero-length read and 
 checking for -1.
 One place this shows up is in libhdfs, which does such a 0-length read to 
 determine if direct (i.e., ByteBuffer) reads are supported.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads


 [ 
https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5762:
---

Target Version/s: 2.4.0
  Status: Patch Available  (was: In Progress)

 BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
 --

 Key: HDFS-5762
 URL: https://issues.apache.org/jira/browse/HDFS-5762
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5762.001.patch


 Unlike the other block readers, BlockReaderLocal currently doesn't return -1 
 on EOF when doing zero-length reads.  This behavior, in turn, propagates to 
 the DFSInputStream.  BlockReaderLocal should do this, so that client can 
 determine whether the file is at an end by doing a zero-length read and 
 checking for -1.
 One place this shows up is in libhdfs, which does such a 0-length read to 
 determine if direct (i.e., ByteBuffer) reads are supported.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-13 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870242#comment-13870242
]

Konstantin Shvachko commented on HDFS-5138:
---

Aaron, I understand you made -rollback an offline operation for NN, which works
as -format. That is, NN makes changes in the directory structure and shuts
down. How will that work with DataNodes? They also need to be started with
-rollback in order to roll back to the old state. In current world you just
call {{start-hdfs -rollback}} and the cluster is up and running with the
previous software version and the previous data. What is the procedure in you
edition?

Support HDFS upgrade in HA
--

Key: HDFS-5138
URL: https://issues.apache.org/jira/browse/HDFS-5138
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Kihwal Lee
Assignee: Aaron T. Myers
Priority: Blocker
Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch,
HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch

With HA enabled, NN wo't start with -upgrade. Since there has been a layout
version change between 2.0.x and 2.1.x, starting NN in upgrade mode was
necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way
to get around this was to disable HA and upgrade.
The NN and the cluster cannot be flipped back to HA until the upgrade is
finalized. If HA is disabled only on NN for layout upgrade and HA is turned
back on without involving DNs, things will work, but finaliizeUpgrade won't
work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade
snapshots won't get removed.
We will need a different ways of doing layout upgrade and upgrade snapshot.
I am marking this as a 2.1.1-beta blocker based on feedback from others. If
there is a reasonable workaround that does not increase maintenance window
greatly, we can lower its priority from blocker to critical.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870258#comment-13870258
]

Aaron T. Myers commented on HDFS-5138:
--

Hi Konst, thanks for bringing this up - I should've mentioned it. The DN
rollback procedure is left unchanged by this patch, so you just start up the
DNs with the '-rollback' option as before. When the DN registers with an NN
which has already been rolled back, the DN will perform rollback of its data
dirs just like normal, i.e. all that matters is that the NN has already rolled
back, not whether or not the running NN was started with the '-rollback' option.

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5194) Robust support for alternate FsDatasetSpi implementations


[ 
https://issues.apache.org/jira/browse/HDFS-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870291#comment-13870291
 ] 

Arpit Agarwal commented on HDFS-5194:
-

David, comprehensive doc - you've done a lot of work on this. I skimmed through 
it (won't have time to read it in detail until next week). It would be useful 
to have a high level requirements or use cases section. The one requirement 
that jumped out from a quick read was the need to support non file-addressable 
stores.

 Robust support for alternate FsDatasetSpi implementations
 -

 Key: HDFS-5194
 URL: https://issues.apache.org/jira/browse/HDFS-5194
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs-client
Reporter: David Powell
Priority: Minor
 Attachments: HDFS-5194.design.09112013.pdf, HDFS-5194.patch.09112013


 The existing FsDatasetSpi interface is well-positioned to permit extending 
 Hadoop to run natively on non-traditional storage architectures.  Before this 
 can be done, however, a number of gaps need to be addressed.  This JIRA 
 documents those gaps, suggests some solutions, and puts forth a sample 
 implementation of some of the key changes needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA


[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870294#comment-13870294
 ] 

Hadoop QA commented on HDFS-5138:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622729/HDFS-5138.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5866//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5866//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5866//console

This message is automatically generated.

 Support HDFS upgrade in HA
 --

 Key: HDFS-5138
 URL: https://issues.apache.org/jira/browse/HDFS-5138
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Kihwal Lee
Assignee: Aaron T. Myers
Priority: Blocker
 Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
 HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch


 With HA enabled, NN wo't start with -upgrade. Since there has been a layout 
 version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
 necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
 to get around this was to disable HA and upgrade. 
 The NN and the cluster cannot be flipped back to HA until the upgrade is 
 finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
 back on without involving DNs, things will work, but finaliizeUpgrade won't 
 work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
 snapshots won't get removed.
 We will need a different ways of doing layout upgrade and upgrade snapshot.  
 I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
 there is a reasonable workaround that does not increase maintenance window 
 greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours


 [ 
https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5579:


   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks for the contribution 
[~zhaoyunjiong]!

 Under construction files make DataNode decommission take very long hours
 

 Key: HDFS-5579
 URL: https://issues.apache.org/jira/browse/HDFS-5579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.2.0
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.4.0

 Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch


 We noticed that some times decommission DataNodes takes very long time, even 
 exceeds 100 hours.
 After check the code, I found that in 
 BlockManager:computeReplicationWorkForBlocks(ListListBlock 
 blocksToReplicate) it won't replicate blocks which belongs to under 
 construction files, however in 
 BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there  
 is block need replicate no matter whether it belongs to under construction or 
 not, the decommission progress will continue running.
 That's the reason some time the decommission takes very long time.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours

2014-01-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870311#comment-13870311
 ] 

Hudson commented on HDFS-5579:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4993 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4993/])
HDFS-5579. Under construction files make DataNode decommission take very long 
hours. Contributed by zhaoyunjiong. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1557904)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockCollection.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java


 Under construction files make DataNode decommission take very long hours
 

 Key: HDFS-5579
 URL: https://issues.apache.org/jira/browse/HDFS-5579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.2.0
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.4.0

 Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch


 We noticed that some times decommission DataNodes takes very long time, even 
 exceeds 100 hours.
 After check the code, I found that in 
 BlockManager:computeReplicationWorkForBlocks(ListListBlock 
 blocksToReplicate) it won't replicate blocks which belongs to under 
 construction files, however in 
 BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there  
 is block need replicate no matter whether it belongs to under construction or 
 not, the decommission progress will continue running.
 That's the reason some time the decommission takes very long time.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

2014-01-13 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870341#comment-13870341
]

Konstantin Shvachko commented on HDFS-5535:
---

Thanks for the design doc, guys. My few questions.
(Quotations from the document are in italic)
# ??The total time required to upgrade a cluster MUST not exceed
#Nodes_in_cluster * 10 seconds.??
Not sure I understood how you measure the time to upgrade. Administrators
should be able to spend as much time as they need. On the other hand I can
write a script that calls upgrade commands in sequence, then push a button and
the upgrade is done for me.
Just trying to understand the meaning of the requirement.
# ??During upgrade or downgrade, no data loss MUST occur.??
Not clear what this means in case a bug in new software led to a loss of data.
Probably meant to say that old software should be able to support whatever
state of the file system left after the upgrade experiment was terminated?
# Does finalize require a checkpoint in the design?
# ??For rollback, NN read editlog in startup as usual. It stops at the marker
position, writes the fsimage back to disk and then discards the editlog.??
What happens if the edits is corrupted by the new software and the marker is
not recognizable?
May be it needs to roll edits in some special way to indicate the start of the
rolling upgrade?
# ??Software version is the version of the running software. In the current
rolling upgrade mechanism??
What is the current rolling upgrade mechanism? It would make more sense to me
if word current is removed from the above phrase.
# What is MTTR?
# Looks like Lite-Decom and “Optimizing DN Restart time” are competing
proposals.
Which one do you actually propose? Sounds like both are still being designed?

The last question is because this seems to be the most intricate part of the
issue. Conceptually rolling upgrades are possible with a simple patch, which
eliminates the Software Version verification, plus very careful cluster
administration, of course.
And the trick indeed is to avoid client failures so that HBase and other apps
could run during the upgrade.

Umbrella jira for improved HDFS rolling upgrades

Key: HDFS-5535
URL: https://issues.apache.org/jira/browse/HDFS-5535
Project: Hadoop HDFS
Issue Type: New Feature
Components: datanode, ha, hdfs-client, namenode
Affects Versions: 3.0.0, 2.2.0
Reporter: Nathan Roberts
Attachments: HDFSRollingUpgradesHighLevelDesign.pdf

In order to roll a new HDFS release through a large cluster quickly and
safely, a few enhancements are needed in HDFS. An initial High level design
document will be attached to this jira, and sub-jiras will itemize the
individual tasks.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-13 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870352#comment-13870352
]

Konstantin Shvachko commented on HDFS-5138:
---

This is less intuitive than the current state of the art. Because after NN
rollback you need to start NameNode as -regular, while DataNodes with -rollback
startup option.
Also just mentioning there could be some collisions with the rolling upgrade
design, which I just finished reading.
I think HDFS-5535 assumes current (pre-your-patch) behaviours of -rollback and
-finalize. For -finalize the problem could be that you remove it as a start up
option. May be Suresh can elaborate better on this.

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads


[ 
https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870353#comment-13870353
 ] 

Hadoop QA commented on HDFS-5762:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622753/HDFS-5762.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5867//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5867//console

This message is automatically generated.

 BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
 --

 Key: HDFS-5762
 URL: https://issues.apache.org/jira/browse/HDFS-5762
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5762.001.patch


 Unlike the other block readers, BlockReaderLocal currently doesn't return -1 
 on EOF when doing zero-length reads.  This behavior, in turn, propagates to 
 the DFSInputStream.  BlockReaderLocal should do this, so that client can 
 determine whether the file is at an end by doing a zero-length read and 
 checking for -1.
 One place this shows up is in libhdfs, which does such a 0-length read to 
 determine if direct (i.e., ByteBuffer) reads are supported.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron T. Myers updated HDFS-5138:
-

Attachment: HDFS-5138.patch

Attached patch adds an exclude directive for the findbugs warning. It was about
later loading a variable which we had previously confirmed was null, but all
we're doing is checking it for equality against another value which may also
reasonably be null.

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5763) Service ACL not refresh on both ANN and SNN

2014-01-13 Thread Fengdong Yu (JIRA)

Fengdong Yu created HDFS-5763:
-

 Summary: Service ACL not refresh on both ANN and SNN
 Key: HDFS-5763
 URL: https://issues.apache.org/jira/browse/HDFS-5763
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 3.0.0
Reporter: Fengdong Yu


Configured hadoop-policy.xml on the active NN, then:
hdfs dfsadmin -refreshServiceAcl

but service ACL refreshed only on the standby NN or active NN, not both.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5764) GSetByHashMap breaks contract of GSet

2014-01-13 Thread Hiroshi Ikeda (JIRA)

Hiroshi Ikeda created HDFS-5764:
---

 Summary: GSetByHashMap breaks contract of GSet
 Key: HDFS-5764
 URL: https://issues.apache.org/jira/browse/HDFS-5764
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Hiroshi Ikeda
Priority: Trivial


The contract of GSet says it is ensured to throw NullPointerException if a 
given argument is null for many methods, but GSetByHashMap doesn't. I think 
just writing non-null preconditions for GSet are required.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours

2014-01-13 Thread zhaoyunjiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870385#comment-13870385
 ] 

zhaoyunjiong commented on HDFS-5579:


Thanks for your time to review the patch, Jing.

 Under construction files make DataNode decommission take very long hours
 

 Key: HDFS-5579
 URL: https://issues.apache.org/jira/browse/HDFS-5579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.2.0
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.4.0

 Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch


 We noticed that some times decommission DataNodes takes very long time, even 
 exceeds 100 hours.
 After check the code, I found that in 
 BlockManager:computeReplicationWorkForBlocks(ListListBlock 
 blocksToReplicate) it won't replicate blocks which belongs to under 
 construction files, however in 
 BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there  
 is block need replicate no matter whether it belongs to under construction or 
 not, the decommission progress will continue running.
 That's the reason some time the decommission takes very long time.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5764) GSetByHashMap breaks contract of GSet

2014-01-13 Thread Hiroshi Ikeda (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870397#comment-13870397
 ] 

Hiroshi Ikeda commented on HDFS-5764:
-

Sorry, just after search I created the issue in wrong place.

 GSetByHashMap breaks contract of GSet
 -

 Key: HDFS-5764
 URL: https://issues.apache.org/jira/browse/HDFS-5764
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Hiroshi Ikeda
Priority: Trivial

 The contract of GSet says it is ensured to throw NullPointerException if a 
 given argument is null for many methods, but GSetByHashMap doesn't. I think 
 just writing non-null preconditions for GSet are required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-5764) GSetByHashMap breaks contract of GSet

2014-01-13 Thread Hiroshi Ikeda (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroshi Ikeda resolved HDFS-5764.
-

Resolution: Invalid

 GSetByHashMap breaks contract of GSet
 -

 Key: HDFS-5764
 URL: https://issues.apache.org/jira/browse/HDFS-5764
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Hiroshi Ikeda
Priority: Trivial

 The contract of GSet says it is ensured to throw NullPointerException if a 
 given argument is null for many methods, but GSetByHashMap doesn't. I think 
 just writing non-null preconditions for GSet are required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4922) Improve the short-circuit document

2014-01-13 Thread Akira AJISAKA (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870400#comment-13870400
]

Akira AJISAKA commented on HDFS-4922:
-

Thanks for updating! Minor comment:
{code}
+ Local block reader maintains a chunk buffer, This controls the maximum chunks
+ can be filled in the chunk buffer for each read.
+ The buffer size was specified in bytes, but
+ It would be better to be integral multiple of dfs.bytes-per-checksum
{code}
Some capital letters can be converted to small. Below fix is good to me.
{code}
+ Local block reader maintains a chunk buffer, this controls the maximum chunks
+ can be filled in the chunk buffer for each read.
+ The buffer size was specified in bytes. It would be better to be integral
+ multiple of dfs.bytes-per-checksum for better performance.
{code}
Also, now these parameters are not described in hdfs-default.xml. Would you add
them?

Improve the short-circuit document
--

Key: HDFS-4922
URL: https://issues.apache.org/jira/browse/HDFS-4922
Project: Hadoop HDFS
Issue Type: Improvement
Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch,
HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922.patch

explain the default value and add one configure key, which doesn't show in
the document, but exists in the code.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-13 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870408#comment-13870408
]

Todd Lipcon commented on HDFS-5138:
---

+1 pending Jenkins results. Please don't forget to file the follow-up JIRA we
discussed above. Thanks!

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870417#comment-13870417
]

Aaron T. Myers commented on HDFS-5138:
--

Thanks for the comments, Konst.

bq. This is less intuitive than the current state of the art. Because after NN
rollback you need to start NameNode as -regular, while DataNodes with -rollback
startup option.

It's different, but not obvious to me that it's necessary less intuitive. I've
personally always found it a bit strange that to roll back you need to start
the NN _once_ with the '-rollback' option, which will result in it doing some
things at startup, and then starting up as normal. This might seem to imply
that the NN is running in some sort of rollback mode, when in fact the act of
rolling back has already completed, and thereafter you should always start the
NN without the '-rollback' option.

bq. Also just mentioning there could be some collisions with the rolling
upgrade design, which I just finished reading. I think HDFS-5535 assumes
current (pre-your-patch) behaviours of -rollback and -finalize. For -finalize
the problem could be that you remove it as a start up option. May be Suresh can
elaborate better on this.

Needing to roll back should (hopefully!) be such a rare occurrence that it
doesn't seem unreasonable to me to not do that in a rolling way. Removal of the
'-finalize' startup option, I would think, should make the whole thing easier,
and doesn't seem to me to have any benefits vs. just using the finalizeUpgrade
RPC.

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5738) Serialize INode information in protobuf


 [ 
https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5738:
-

Attachment: HDFS-5738.004.patch

 Serialize INode information in protobuf
 ---

 Key: HDFS-5738
 URL: https://issues.apache.org/jira/browse/HDFS-5738
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, 
 HDFS-5738.002.patch, HDFS-5738.003.patch, HDFS-5738.004.patch


 This jira proposes to serialize inode information with protobuf. 
 Snapshot-related information are out of the scope of this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf


[ 
https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870454#comment-13870454
 ] 

Haohui Mai commented on HDFS-5738:
--

The v4 patch changes FileHeader into FileSummary.


 Serialize INode information in protobuf
 ---

 Key: HDFS-5738
 URL: https://issues.apache.org/jira/browse/HDFS-5738
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, 
 HDFS-5738.002.patch, HDFS-5738.003.patch, HDFS-5738.004.patch


 This jira proposes to serialize inode information with protobuf. 
 Snapshot-related information are out of the scope of this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5752) Add a new DFSAdminCommand for rolling upgrade


[ 
https://issues.apache.org/jira/browse/HDFS-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870460#comment-13870460
 ] 

Jing Zhao commented on HDFS-5752:
-

The patch looks good to me. Some minors:
# We may want to add more javadoc for ClientProtocol#rollingUpgrade
# Since ClientProtocol#rollingUpgrade returns long, shall we also let 
DistributedFileSystem#rollingUpgrade and DFSClient#rollingUpgrade return long?

+1 after addressing the comments.

 Add a new DFSAdminCommand for rolling upgrade
 -

 Key: HDFS-5752
 URL: https://issues.apache.org/jira/browse/HDFS-5752
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5752_20140112.patch, h5752_20140114.patch


 We need to add a new DFSAdmin to start, finalize and query rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA


[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870462#comment-13870462
 ] 

Hadoop QA commented on HDFS-5138:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622787/HDFS-5138.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

  org.apache.hadoop.hdfs.TestClientReportBadBlock

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5868//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5868//console

This message is automatically generated.

 Support HDFS upgrade in HA
 --

 Key: HDFS-5138
 URL: https://issues.apache.org/jira/browse/HDFS-5138
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Kihwal Lee
Assignee: Aaron T. Myers
Priority: Blocker
 Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
 HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch


 With HA enabled, NN wo't start with -upgrade. Since there has been a layout 
 version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
 necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
 to get around this was to disable HA and upgrade. 
 The NN and the cluster cannot be flipped back to HA until the upgrade is 
 finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
 back on without involving DNs, things will work, but finaliizeUpgrade won't 
 work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
 snapshots won't get removed.
 We will need a different ways of doing layout upgrade and upgrade snapshot.  
 I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
 there is a reasonable workaround that does not increase maintenance window 
 greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-4922) Improve the short-circuit document

2014-01-13 Thread Fengdong Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4922:
--

Attachment: HDFS-4922-006.patch

 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922-006.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4922) Improve the short-circuit document

2014-01-13 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870474#comment-13870474
 ] 

Fengdong Yu commented on HDFS-4922:
---

Hi [~ajisakaa],

I refreshed the patch, and I'll file a separate jira to describe these 
parameters in the hdfs-default.xml. Thanks.


 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922-006.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf