[jira] [Commented] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null
[ https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869379#comment-13869379 ] Uma Maheswara Rao G commented on HDFS-5710: --- @Jing Zhao, would you like to comment on this patch? FSDirectory#getFullPathName should check inodes against null Key: HDFS-5710 URL: https://issues.apache.org/jira/browse/HDFS-5710 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: Uma Maheswara Rao G Attachments: HDFS-5710.patch, hdfs-5710-output.html From https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/ : {code} 2014-01-01 00:10:15,571 INFO [IPC Server handler 2 on 50198] blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 2014-01-01 00:10:16,559 WARN [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] namenode.FSDirectory(1854): Could not get full path. Corresponding file might have deleted already. 2014-01-01 00:10:16,560 FATAL [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871) at org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482) at org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112) at java.lang.Thread.run(Thread.java:724) {code} Looks like getRelativePathINodes() returned null but getFullPathName() didn't check inodes against null, leading to NPE. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5760) Fix HttpServer construct
[ https://issues.apache.org/jira/browse/HDFS-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Charles updated HDFS-5760: --- Attachment: HDFS-5760-1.patch The attached patch fixes the issue for HBASE-6581 and should not have side-effect on any other HttpServer consumers. Fix HttpServer construct Key: HDFS-5760 URL: https://issues.apache.org/jira/browse/HDFS-5760 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Eric Charles Attachments: HDFS-5760-1.patch o.a.h.h.HttpServer can can be instanciated and configured: 1. Via classical constructor 2. Via static build method Those 2 methods don't populate the same way the (deprecated) hostname and port, nor the jetty Connector. This gives issue when using hbase on hadoop3 (HBASE-6581) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5760) Fix HttpServer construct
Eric Charles created HDFS-5760: -- Summary: Fix HttpServer construct Key: HDFS-5760 URL: https://issues.apache.org/jira/browse/HDFS-5760 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Eric Charles o.a.h.h.HttpServer can can be instanciated and configured: 1. Via classical constructor 2. Via static build method Those 2 methods don't populate the same way the (deprecated) hostname and port, nor the jetty Connector. This gives issue when using hbase on hadoop3 (HBASE-6581) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5761) DataNode fail to validate integrity for checksum type NULL when DataNode recovers
Kousuke Saruta created HDFS-5761: Summary: DataNode fail to validate integrity for checksum type NULL when DataNode recovers Key: HDFS-5761 URL: https://issues.apache.org/jira/browse/HDFS-5761 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta When DataNode is down during writing blocks, the blocks are not filinalized and the next time DataNode recovers, integrity validation will run. But if we use NULL for checksum algorithm (we can set NULL to dfs.checksum.type), DataNode will fail to validate integrity and cannot be up. The cause is in BlockPoolSlice#validateIntegrity. In the method, there is following code. {code} long numChunks = Math.min( (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, (metaFileLen - crcHeaderLen)/checksumSize); {code} When we choose NULL checksum, checksumSize is 0 so ArithmeticException will be thrown and DataNode cannot be up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers
[ https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated HDFS-5761: - Summary: DataNode fails to validate integrity for checksum type NULL when DataNode recovers (was: DataNode fail to validate integrity for checksum type NULL when DataNode recovers ) DataNode fails to validate integrity for checksum type NULL when DataNode recovers --- Key: HDFS-5761 URL: https://issues.apache.org/jira/browse/HDFS-5761 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta When DataNode is down during writing blocks, the blocks are not filinalized and the next time DataNode recovers, integrity validation will run. But if we use NULL for checksum algorithm (we can set NULL to dfs.checksum.type), DataNode will fail to validate integrity and cannot be up. The cause is in BlockPoolSlice#validateIntegrity. In the method, there is following code. {code} long numChunks = Math.min( (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, (metaFileLen - crcHeaderLen)/checksumSize); {code} When we choose NULL checksum, checksumSize is 0 so ArithmeticException will be thrown and DataNode cannot be up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers
[ https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated HDFS-5761: - Attachment: HDFS-5761.patch I've attached a patch for this issue. DataNode fails to validate integrity for checksum type NULL when DataNode recovers --- Key: HDFS-5761 URL: https://issues.apache.org/jira/browse/HDFS-5761 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Attachments: HDFS-5761.patch When DataNode is down during writing blocks, the blocks are not filinalized and the next time DataNode recovers, integrity validation will run. But if we use NULL for checksum algorithm (we can set NULL to dfs.checksum.type), DataNode will fail to validate integrity and cannot be up. The cause is in BlockPoolSlice#validateIntegrity. In the method, there is following code. {code} long numChunks = Math.min( (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, (metaFileLen - crcHeaderLen)/checksumSize); {code} When we choose NULL checksum, checksumSize is 0 so ArithmeticException will be thrown and DataNode cannot be up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers
[ https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated HDFS-5761: - Status: Patch Available (was: Open) DataNode fails to validate integrity for checksum type NULL when DataNode recovers --- Key: HDFS-5761 URL: https://issues.apache.org/jira/browse/HDFS-5761 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Attachments: HDFS-5761.patch When DataNode is down during writing blocks, the blocks are not filinalized and the next time DataNode recovers, integrity validation will run. But if we use NULL for checksum algorithm (we can set NULL to dfs.checksum.type), DataNode will fail to validate integrity and cannot be up. The cause is in BlockPoolSlice#validateIntegrity. In the method, there is following code. {code} long numChunks = Math.min( (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, (metaFileLen - crcHeaderLen)/checksumSize); {code} When we choose NULL checksum, checksumSize is 0 so ArithmeticException will be thrown and DataNode cannot be up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers
[ https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869719#comment-13869719 ] Uma Maheswara Rao G commented on HDFS-5761: --- Thanks for filing a JIRA. I noticed this when I was looking the JIRA HDFS-5728. Actually validate integrity check not necessary when it is set to NULL. It should consider full file length as is. I think the below array becomes 0 length array when checksumSize 0? {code} byte[] buf = new byte[lastChunkSize+checksumSize]; {code} So, how about just considering blockFileLength when crc type is NULL? Because crc is null now, so we need not care about integrity check with CRC file at all right. DataNode fails to validate integrity for checksum type NULL when DataNode recovers --- Key: HDFS-5761 URL: https://issues.apache.org/jira/browse/HDFS-5761 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Attachments: HDFS-5761.patch When DataNode is down during writing blocks, the blocks are not filinalized and the next time DataNode recovers, integrity validation will run. But if we use NULL for checksum algorithm (we can set NULL to dfs.checksum.type), DataNode will fail to validate integrity and cannot be up. The cause is in BlockPoolSlice#validateIntegrity. In the method, there is following code. {code} long numChunks = Math.min( (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, (metaFileLen - crcHeaderLen)/checksumSize); {code} When we choose NULL checksum, checksumSize is 0 so ArithmeticException will be thrown and DataNode cannot be up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null
[ https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869791#comment-13869791 ] Jing Zhao commented on HDFS-5710: - +1 patch looks good to me. I will commit it shortly. FSDirectory#getFullPathName should check inodes against null Key: HDFS-5710 URL: https://issues.apache.org/jira/browse/HDFS-5710 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: Uma Maheswara Rao G Attachments: HDFS-5710.patch, hdfs-5710-output.html From https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/ : {code} 2014-01-01 00:10:15,571 INFO [IPC Server handler 2 on 50198] blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 2014-01-01 00:10:16,559 WARN [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] namenode.FSDirectory(1854): Could not get full path. Corresponding file might have deleted already. 2014-01-01 00:10:16,560 FATAL [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871) at org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482) at org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112) at java.lang.Thread.run(Thread.java:724) {code} Looks like getRelativePathINodes() returned null but getFullPathName() didn't check inodes against null, leading to NPE. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5477) Block manager as a service
[ https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869798#comment-13869798 ] Colin Patrick McCabe commented on HDFS-5477: Hi Daryn, This seems like a great direction for HDFS to go in. Just a few comments. You list scalability as a primary concern. However, even if we separate the BM from the namespace management, a cluster with a large number of blocks will still have a giant BM heap (if I understand correctly). So perhaps what we need is the ability to have multiple block manager daemons? It seems like there will be a lot of messages that will necessarily flow between the namespace daemon and the block management daemon(s). What IPC mechanism are you considering? TCP socket? UNIX domain socket? Shared memory? Shared memory would clearly be the highest performance, and perhaps we should consider that. Is there an upstream svn branch for this yet? Block manager as a service -- Key: HDFS-5477 URL: https://issues.apache.org/jira/browse/HDFS-5477 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf, Standalone BM.pdf The block manager needs to evolve towards having the ability to run as a standalone service to improve NN vertical and horizontal scalability. The goal is reducing the memory footprint of the NN proper to support larger namespaces, and improve overall performance by decoupling the block manager from the namespace and its lock. Ideally, a distinct BM will be transparent to clients and DNs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869803#comment-13869803 ] Jing Zhao commented on HDFS-5738: - # The current 002 patch still cannot be compiled. Looks like you are missing the following changes in the patch: {code} diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/n index 344a6a0..18dd768 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java @@ -195,6 +195,7 @@ private void loadINode(InputStream in, FileHeader.Section header) static final class Saver { final SaveNamespaceContext context; private MD5Hash savedDigest; +private long currentOffset = PRE_ALLOCATED_HEADER_SIZE; Saver(SaveNamespaceContext context) { this.context = context; diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto b/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto index 5df8fd1..0855102 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto @@ -55,6 +55,7 @@ message FileHeader { message Section { optional string name = 1; optional uint64 length = 2; +optional uint64 offset = 3; } repeated Section sections = 5; } {code} # In the meanwhile, for non-snapshot information, we also need to handle FileUnderConstruction information. This can be handled in either this jira or a separate jira (such as HDFS-5743). # The section index information may be moved to the end of the fsimge as a footer? This can simplify the current code and avoid the 1KB allocation. This is optional and we can continue improving the protobuf definition in new jiras. Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, HDFS-5738.002.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5761) DataNode fails to validate integrity for checksum type NULL when DataNode recovers
[ https://issues.apache.org/jira/browse/HDFS-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869813#comment-13869813 ] Hadoop QA commented on HDFS-5761: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622648/HDFS-5761.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5865//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5865//console This message is automatically generated. DataNode fails to validate integrity for checksum type NULL when DataNode recovers --- Key: HDFS-5761 URL: https://issues.apache.org/jira/browse/HDFS-5761 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Attachments: HDFS-5761.patch When DataNode is down during writing blocks, the blocks are not filinalized and the next time DataNode recovers, integrity validation will run. But if we use NULL for checksum algorithm (we can set NULL to dfs.checksum.type), DataNode will fail to validate integrity and cannot be up. The cause is in BlockPoolSlice#validateIntegrity. In the method, there is following code. {code} long numChunks = Math.min( (blockFileLen + bytesPerChecksum - 1)/bytesPerChecksum, (metaFileLen - crcHeaderLen)/checksumSize); {code} When we choose NULL checksum, checksumSize is 0 so ArithmeticException will be thrown and DataNode cannot be up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null
[ https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5710: Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks to Ted for the report and Thank you Uma for the fix! I've committed this to trunk and branch-2. FSDirectory#getFullPathName should check inodes against null Key: HDFS-5710 URL: https://issues.apache.org/jira/browse/HDFS-5710 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: Uma Maheswara Rao G Fix For: 2.4.0 Attachments: HDFS-5710.patch, hdfs-5710-output.html From https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/ : {code} 2014-01-01 00:10:15,571 INFO [IPC Server handler 2 on 50198] blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 2014-01-01 00:10:16,559 WARN [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] namenode.FSDirectory(1854): Could not get full path. Corresponding file might have deleted already. 2014-01-01 00:10:16,560 FATAL [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871) at org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482) at org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112) at java.lang.Thread.run(Thread.java:724) {code} Looks like getRelativePathINodes() returned null but getFullPathName() didn't check inodes against null, leading to NPE. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null
[ https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869822#comment-13869822 ] Hudson commented on HDFS-5710: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4992 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4992/]) HDFS-5710. FSDirectory#getFullPathName should check inodes against null. Contributed by Uma Maheswara Rao G. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1557803) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java FSDirectory#getFullPathName should check inodes against null Key: HDFS-5710 URL: https://issues.apache.org/jira/browse/HDFS-5710 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: Uma Maheswara Rao G Fix For: 2.4.0 Attachments: HDFS-5710.patch, hdfs-5710-output.html From https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/ : {code} 2014-01-01 00:10:15,571 INFO [IPC Server handler 2 on 50198] blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 2014-01-01 00:10:16,559 WARN [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] namenode.FSDirectory(1854): Could not get full path. Corresponding file might have deleted already. 2014-01-01 00:10:16,560 FATAL [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871) at org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482) at org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112) at java.lang.Thread.run(Thread.java:724) {code} Looks like getRelativePathINodes() returned null but getFullPathName() didn't check inodes against null, leading to NPE. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HDFS-4239: - Assignee: Jimmy Xiang Means of telling the datanode to stop using a sick disk --- Key: HDFS-4239 URL: https://issues.apache.org/jira/browse/HDFS-4239 Project: Hadoop HDFS Issue Type: Improvement Reporter: stack Assignee: Jimmy Xiang If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing occasionally, or just exhibiting high latency -- your choices are: 1. Decommission the total datanode. If the datanode is carrying 6 or 12 disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- the rereplication of the downed datanode's data can be pretty disruptive, especially if the cluster is doing low latency serving: e.g. hosting an hbase cluster. 2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't unmount the disk while it is in use). This latter is better in that only the bad disk's data is rereplicated, not all datanode data. Is it possible to do better, say, send the datanode a signal to tell it stop using a disk an operator has designated 'bad'. This would be like option #2 above minus the need to stop and restart the datanode. Ideally the disk would become unmountable after a while. Nice to have would be being able to tell the datanode to restart using a disk after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5760) Fix HttpServer construct
[ https://issues.apache.org/jira/browse/HDFS-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869877#comment-13869877 ] Haohui Mai commented on HDFS-5760: -- [~echarles], the first approach has been deprecated for a while. These methods will be removed from trunk shortly. Can you please fix HBase instead? Fix HttpServer construct Key: HDFS-5760 URL: https://issues.apache.org/jira/browse/HDFS-5760 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Eric Charles Attachments: HDFS-5760-1.patch o.a.h.h.HttpServer can can be instanciated and configured: 1. Via classical constructor 2. Via static build method Those 2 methods don't populate the same way the (deprecated) hostname and port, nor the jetty Connector. This gives issue when using hbase on hadoop3 (HBASE-6581) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5612) NameNode: change all permission checks to enforce ACLs in addition to permissions.
[ https://issues.apache.org/jira/browse/HDFS-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5612: Attachment: HDFS-5612.3.patch I'm attaching version 3 of this patch. This version has been updated in reaction to the recent changes on HDFS-5758. {{FSPermissionChecker}} has been updated to pull the relevant pieces of the whole logical ACL from either {{FsPermission}} or the list of {{AclEntry}}. I've also added comments to document the invariants described earlier. Overall, the HDFS-5758 changes didn't add that much complexity to {{FSPermissionChecker}}, so I'm satisfied with the end result. NameNode: change all permission checks to enforce ACLs in addition to permissions. -- Key: HDFS-5612 URL: https://issues.apache.org/jira/browse/HDFS-5612 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5612.1.patch, HDFS-5612.2.patch, HDFS-5612.3.patch All {{NameNode}} code paths that enforce permissions must be updated so that they also enforce ACLs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Pluggable interface for replica counting
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1387#comment-1387 ] Eric Sirianni commented on HDFS-5318: - bq. 1. Block is finalized, r/w replica is lost, r/o replica is available. In this case the existing NN replication mechanisms will cause an extra replica to be created Isn't this case equivalent to the case where the R/W replica is offline in general (i.e. not just for pipeline recovery)? bq. q. what happens if a client attempts to append before the replication happens? Independent of how replicas are counted, whenever a R/W replica is offline, appends will not be possible (in the current implementation) until a new R/W replica is created (via inter-datanode replication from a R/O replica). Are you proposing a solution to this (ability to create an append pipeline from only R/O replicas)? bq. 4. Client should be able to bootstrap a write pipeline with read-only replicas. Not sure I fully understand here. Is this how you envision solving the append problem when R/W replica is offline? Pluggable interface for replica counting Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Eric Sirianni Attachments: HDFS-5318.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeSpi interfaces
[ https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870043#comment-13870043 ] David Powell commented on HDFS-5751: Alas, I am intimately familiar with the reimplementation necessary, and wish there was less of it to do and to maintain. That said, precluding alternate implementations because creating one would require more than the ideal amount of work feels like throwing the baby out with the bathwater. Moving the abstraction lower is along the lines of what I had in mind when I suggested the middle ground of changes that reduce mainline maintenance burden while preserving a usable interface for others. I think the lower surface of the official FsDatasetImpl is far too low, however, and that comparing HDFS with ext3fs is both underestimating the complexity and modularity of HDFS and overestimating the versatility of the simple interface a traditional filesystem consumes. Which is to say, I think there is a class of problems which would lead one to replace a traditional filesystem entirely, but could be solved much more elegantly in HDFS given its components' architectural separation. Remove the FsDatasetSpi and FsVolumeSpi interfaces -- Key: HDFS-5751 URL: https://issues.apache.org/jira/browse/HDFS-5751 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, test Affects Versions: 3.0.0 Reporter: Arpit Agarwal The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870054#comment-13870054 ] Haohui Mai commented on HDFS-5738: -- Thanks Jing for the review. The v3 patch changes the following: # It moves the fileheader to the end of the fsimage. # It adds an entry in the INodeDirectorySection only if the corresponding directory has children. # Minor refactor and clean ups. I plan to handle FileUnderConstruction in a separate jira. Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, HDFS-5738.002.patch, HDFS-5738.003.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5738: - Attachment: HDFS-5738.003.patch Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, HDFS-5738.002.patch, HDFS-5738.003.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5741) BlockInfo#findDataNode can be removed
[ https://issues.apache.org/jira/browse/HDFS-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5741: Summary: BlockInfo#findDataNode can be removed (was: BlockInfo#findDataNode can be deprecated) BlockInfo#findDataNode can be removed - Key: HDFS-5741 URL: https://issues.apache.org/jira/browse/HDFS-5741 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor NN now tracks replicas by storage, so {{BlockInfo#findDataNode}} can be replaced with {{BlockInfo#findStorageInfo}}. {{BlockManager#reportDiff}} is being fixed as part of HDFS-5483, this Jira is to fix the rest of the callers. [suggested by [~sirianni] on HDFS-5483] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5741) BlockInfo#findDataNode can be removed
[ https://issues.apache.org/jira/browse/HDFS-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870085#comment-13870085 ] Arpit Agarwal commented on HDFS-5741: - Thanks, edited description. BlockInfo#findDataNode can be removed - Key: HDFS-5741 URL: https://issues.apache.org/jira/browse/HDFS-5741 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor NN now tracks replicas by storage, so {{BlockInfo#findDataNode}} can be replaced with {{BlockInfo#findStorageInfo}}. {{BlockManager#reportDiff}} is being fixed as part of HDFS-5483, this Jira is to fix the rest of the callers. [suggested by [~sirianni] on HDFS-5483] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870079#comment-13870079 ] Jing Zhao commented on HDFS-5579: - The javadoc warning and TestSafeMode failure should be unrelated. I will commit the patch shortly. Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5138: - Attachment: HDFS-5138.patch {quote} + // This is expected to happen for a stanby NN. Typo (standby) {quote} Thanks, fixed. {quote} + // Either they all return the same thing or this call fails, so we can + // just return the first result. Would be good to assert that - eg in case one of the JNs crashed in the middle of a previously attempted upgrade sequence. {quote} Sure, done. {quote} * @param useLock true - enables locking on the storage directory and false * disables locking + * @param isShared whether or not this dir is shared between two NNs. true + * enables locking on the storage directory, false disables locking I think this doc is now wrong because you inverted the sense of these booleans - we don't lock the shared dir. {quote} Good catch. Fixed. {quote} + public synchronized void doFinalizeOfSharedLog() throws IOException { + public synchronized boolean canRollBackSharedLog(Storage prevStorage, Style nit: extra space in the above two methods {quote} Fixed. {quote} + if (!sd.isShared()) { + // This will be done on transition to active. Worth a LOG.info or even warn here {quote} Added the following: {code} LOG.info(Not doing recovery on + sd + now. Will be done on + transition to active.); {code} bq. Currently it seems like whichever SBN starts up first has to be the one who does the transition to active. Maybe a follow-up JIRA could be to relax that constraint? Seems like it should be fine for either one of the NNs to actually do the upgrade - the lock file is just to make sure they agree on the target ctime. Agree this seems like a good idea, but agree it can reasonably be done in a follow-up JIRA. If you agree, I'll file it when we commit this one. {quote} + dfsadmin -finalizeUpgrade' command while the NNs are running and one of them + is active. The active NN at the time this happens will perform the upgrade of + the shared log, and both of the NNs will finalize the upgrade in their local I think here you mean the finalization of the shared log {quote} Sure did. Fixed. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5752) Add a new DFSAdminCommand for rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5752: - Attachment: h5752_20140114.patch h5752_20140114.patch: adds cli usage. Add a new DFSAdminCommand for rolling upgrade - Key: HDFS-5752 URL: https://issues.apache.org/jira/browse/HDFS-5752 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5752_20140112.patch, h5752_20140114.patch We need to add a new DFSAdmin to start, finalize and query rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.
[ https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870172#comment-13870172 ] Chris Nauroth commented on HDFS-5608: - [~sachinjose2...@gmail.com], thank you for addressing the feedback. Here are some additional comments based on the new patch: # {{DFSConfigKeys}}: The regex does not allow for default ACL entries. Basically, these look identical to the access entries but have default: prepended. Also, the regex only allows entries of type user or group, so it would reject the mask and other entries. Also, the regex does not allow for an ACL entry that does not have a permission. For {{removeAclEntries}}, the user supplies an ACL spec with no permissions in the entries. You might want to take a look at the CLI implementation and HADOOP-10213 for more examples of this. # {{JsonUtil#toJsonString}}: I'm curious if we can skip the manual conversion to {{entriesStringlist}} and just pass the {{ListAclEntry}} directly. Will the JSON conversion automatically use the {{toString}} representation? If this conversion really is necessary, then a small improvement would be to use Guava's {{Lists.newArrayListWithCapacity}} and pass {{entries.size()}} for the capacity to prevent allocating too large of an array or causing an immediate reallocation if the default initial array size turns out to be too small. # {{JsonUtil#toAclStatus}}: Some of my earlier comments about the regex are applicable here too. This method needs to be able to handle default ACL entries and call {{setScope}} on the builder. It needs to be able to handle the mask entry. Use other instead of others (singular, not plural). # {{AclPermissionParam#DEFAULT}}: Was the default value meant to be empty string? I don't think there is any specific ACL value that we could choose as the default, because it could risk accidentally expanding access if the caller forgets to provide the query parameter. # {{AclPermissionParam#parseAclSpec}}: This is another spot where the earlier feedback on the regex has an impact. This logic is very similar to {{AclCommands#SetfaclCommand#parseAclSpec}}. Can we work out a way for both the CLI and WebHDFS to use a common method? The parsing logic would be the same for both. # {{PutOpParam}}: I think {{GETACLS}} needs to be removed. # Nitpick: the Hadoop project code standard wraps lines at 80 characters, indents code blocks by 2 spaces, and uses spaces (not tabs) for indentation. There are a few places in the patch that need to be converted to this standard. WebHDFS: implement GETACLSTATUS and SETACL. --- Key: HDFS-5608 URL: https://issues.apache.org/jira/browse/HDFS-5608 Project: Hadoop HDFS Issue Type: Sub-task Components: webhdfs Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Sachin Jose Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Pluggable interface for replica counting
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870192#comment-13870192 ] Arpit Agarwal commented on HDFS-5318: - Yes, we must allow support starting pipeline when read-only replicas in case the r/w replica is offline else append will be broken. One way to do it is to generate a copy of the replica on a read-write storage and then kick off the pipeline. There is some precedence for doing so ({{DFSOutputStream#addDatanode2ExistingPipeline}}). Pluggable interface for replica counting Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Eric Sirianni Attachments: HDFS-5318.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
Colin Patrick McCabe created HDFS-5762: -- Summary: BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads Key: HDFS-5762 URL: https://issues.apache.org/jira/browse/HDFS-5762 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Unlike the other block readers, BlockReaderLocal currently doesn't return -1 on EOF when doing zero-length reads. This behavior, in turn, propagates to the DFSInputStream. BlockReaderLocal should do this, so that client can determine whether the file is at an end by doing a zero-length read and checking for -1. One place this shows up is in libhdfs, which does such a 0-length read to determine if direct (i.e., ByteBuffer) reads are supported. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
[ https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5762 started by Colin Patrick McCabe. BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads -- Key: HDFS-5762 URL: https://issues.apache.org/jira/browse/HDFS-5762 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5762.001.patch Unlike the other block readers, BlockReaderLocal currently doesn't return -1 on EOF when doing zero-length reads. This behavior, in turn, propagates to the DFSInputStream. BlockReaderLocal should do this, so that client can determine whether the file is at an end by doing a zero-length read and checking for -1. One place this shows up is in libhdfs, which does such a 0-length read to determine if direct (i.e., ByteBuffer) reads are supported. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
[ https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5762: --- Attachment: HDFS-5762.001.patch BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads -- Key: HDFS-5762 URL: https://issues.apache.org/jira/browse/HDFS-5762 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5762.001.patch Unlike the other block readers, BlockReaderLocal currently doesn't return -1 on EOF when doing zero-length reads. This behavior, in turn, propagates to the DFSInputStream. BlockReaderLocal should do this, so that client can determine whether the file is at an end by doing a zero-length read and checking for -1. One place this shows up is in libhdfs, which does such a 0-length read to determine if direct (i.e., ByteBuffer) reads are supported. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
[ https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5762: --- Target Version/s: 2.4.0 Status: Patch Available (was: In Progress) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads -- Key: HDFS-5762 URL: https://issues.apache.org/jira/browse/HDFS-5762 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5762.001.patch Unlike the other block readers, BlockReaderLocal currently doesn't return -1 on EOF when doing zero-length reads. This behavior, in turn, propagates to the DFSInputStream. BlockReaderLocal should do this, so that client can determine whether the file is at an end by doing a zero-length read and checking for -1. One place this shows up is in libhdfs, which does such a 0-length read to determine if direct (i.e., ByteBuffer) reads are supported. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870242#comment-13870242 ] Konstantin Shvachko commented on HDFS-5138: --- Aaron, I understand you made -rollback an offline operation for NN, which works as -format. That is, NN makes changes in the directory structure and shuts down. How will that work with DataNodes? They also need to be started with -rollback in order to roll back to the old state. In current world you just call {{start-hdfs -rollback}} and the cluster is up and running with the previous software version and the previous data. What is the procedure in you edition? Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870258#comment-13870258 ] Aaron T. Myers commented on HDFS-5138: -- Hi Konst, thanks for bringing this up - I should've mentioned it. The DN rollback procedure is left unchanged by this patch, so you just start up the DNs with the '-rollback' option as before. When the DN registers with an NN which has already been rolled back, the DN will perform rollback of its data dirs just like normal, i.e. all that matters is that the NN has already rolled back, not whether or not the running NN was started with the '-rollback' option. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5194) Robust support for alternate FsDatasetSpi implementations
[ https://issues.apache.org/jira/browse/HDFS-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870291#comment-13870291 ] Arpit Agarwal commented on HDFS-5194: - David, comprehensive doc - you've done a lot of work on this. I skimmed through it (won't have time to read it in detail until next week). It would be useful to have a high level requirements or use cases section. The one requirement that jumped out from a quick read was the need to support non file-addressable stores. Robust support for alternate FsDatasetSpi implementations - Key: HDFS-5194 URL: https://issues.apache.org/jira/browse/HDFS-5194 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client Reporter: David Powell Priority: Minor Attachments: HDFS-5194.design.09112013.pdf, HDFS-5194.patch.09112013 The existing FsDatasetSpi interface is well-positioned to permit extending Hadoop to run natively on non-traditional storage architectures. Before this can be done, however, a number of gaps need to be addressed. This JIRA documents those gaps, suggests some solutions, and puts forth a sample implementation of some of the key changes needed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870294#comment-13870294 ] Hadoop QA commented on HDFS-5138: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622729/HDFS-5138.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5866//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5866//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5866//console This message is automatically generated. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5579: Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Thanks for the contribution [~zhaoyunjiong]! Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 2.4.0 Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870311#comment-13870311 ] Hudson commented on HDFS-5579: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4993 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4993/]) HDFS-5579. Under construction files make DataNode decommission take very long hours. Contributed by zhaoyunjiong. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1557904) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockCollection.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 2.4.0 Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870341#comment-13870341 ] Konstantin Shvachko commented on HDFS-5535: --- Thanks for the design doc, guys. My few questions. (Quotations from the document are in italic) # ??The total time required to upgrade a cluster MUST not exceed #Nodes_in_cluster * 10 seconds.?? Not sure I understood how you measure the time to upgrade. Administrators should be able to spend as much time as they need. On the other hand I can write a script that calls upgrade commands in sequence, then push a button and the upgrade is done for me. Just trying to understand the meaning of the requirement. # ??During upgrade or downgrade, no data loss MUST occur.?? Not clear what this means in case a bug in new software led to a loss of data. Probably meant to say that old software should be able to support whatever state of the file system left after the upgrade experiment was terminated? # Does finalize require a checkpoint in the design? # ??For rollback, NN read editlog in startup as usual. It stops at the marker position, writes the fsimage back to disk and then discards the editlog.?? What happens if the edits is corrupted by the new software and the marker is not recognizable? May be it needs to roll edits in some special way to indicate the start of the rolling upgrade? # ??Software version is the version of the running software. In the current rolling upgrade mechanism?? What is the current rolling upgrade mechanism? It would make more sense to me if word current is removed from the above phrase. # What is MTTR? # Looks like Lite-Decom and “Optimizing DN Restart time” are competing proposals. Which one do you actually propose? Sounds like both are still being designed? The last question is because this seems to be the most intricate part of the issue. Conceptually rolling upgrades are possible with a simple patch, which eliminates the Software Version verification, plus very careful cluster administration, of course. And the trick indeed is to avoid client failures so that HBase and other apps could run during the upgrade. Umbrella jira for improved HDFS rolling upgrades Key: HDFS-5535 URL: https://issues.apache.org/jira/browse/HDFS-5535 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, ha, hdfs-client, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Nathan Roberts Attachments: HDFSRollingUpgradesHighLevelDesign.pdf In order to roll a new HDFS release through a large cluster quickly and safely, a few enhancements are needed in HDFS. An initial High level design document will be attached to this jira, and sub-jiras will itemize the individual tasks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870352#comment-13870352 ] Konstantin Shvachko commented on HDFS-5138: --- This is less intuitive than the current state of the art. Because after NN rollback you need to start NameNode as -regular, while DataNodes with -rollback startup option. Also just mentioning there could be some collisions with the rolling upgrade design, which I just finished reading. I think HDFS-5535 assumes current (pre-your-patch) behaviours of -rollback and -finalize. For -finalize the problem could be that you remove it as a start up option. May be Suresh can elaborate better on this. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5762) BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
[ https://issues.apache.org/jira/browse/HDFS-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870353#comment-13870353 ] Hadoop QA commented on HDFS-5762: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622753/HDFS-5762.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5867//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5867//console This message is automatically generated. BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads -- Key: HDFS-5762 URL: https://issues.apache.org/jira/browse/HDFS-5762 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5762.001.patch Unlike the other block readers, BlockReaderLocal currently doesn't return -1 on EOF when doing zero-length reads. This behavior, in turn, propagates to the DFSInputStream. BlockReaderLocal should do this, so that client can determine whether the file is at an end by doing a zero-length read and checking for -1. One place this shows up is in libhdfs, which does such a 0-length read to determine if direct (i.e., ByteBuffer) reads are supported. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5138: - Attachment: HDFS-5138.patch Attached patch adds an exclude directive for the findbugs warning. It was about later loading a variable which we had previously confirmed was null, but all we're doing is checking it for equality against another value which may also reasonably be null. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5763) Service ACL not refresh on both ANN and SNN
Fengdong Yu created HDFS-5763: - Summary: Service ACL not refresh on both ANN and SNN Key: HDFS-5763 URL: https://issues.apache.org/jira/browse/HDFS-5763 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Fengdong Yu Configured hadoop-policy.xml on the active NN, then: hdfs dfsadmin -refreshServiceAcl but service ACL refreshed only on the standby NN or active NN, not both. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5764) GSetByHashMap breaks contract of GSet
Hiroshi Ikeda created HDFS-5764: --- Summary: GSetByHashMap breaks contract of GSet Key: HDFS-5764 URL: https://issues.apache.org/jira/browse/HDFS-5764 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Hiroshi Ikeda Priority: Trivial The contract of GSet says it is ensured to throw NullPointerException if a given argument is null for many methods, but GSetByHashMap doesn't. I think just writing non-null preconditions for GSet are required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870385#comment-13870385 ] zhaoyunjiong commented on HDFS-5579: Thanks for your time to review the patch, Jing. Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 2.4.0 Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5764) GSetByHashMap breaks contract of GSet
[ https://issues.apache.org/jira/browse/HDFS-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870397#comment-13870397 ] Hiroshi Ikeda commented on HDFS-5764: - Sorry, just after search I created the issue in wrong place. GSetByHashMap breaks contract of GSet - Key: HDFS-5764 URL: https://issues.apache.org/jira/browse/HDFS-5764 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Hiroshi Ikeda Priority: Trivial The contract of GSet says it is ensured to throw NullPointerException if a given argument is null for many methods, but GSetByHashMap doesn't. I think just writing non-null preconditions for GSet are required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5764) GSetByHashMap breaks contract of GSet
[ https://issues.apache.org/jira/browse/HDFS-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda resolved HDFS-5764. - Resolution: Invalid GSetByHashMap breaks contract of GSet - Key: HDFS-5764 URL: https://issues.apache.org/jira/browse/HDFS-5764 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Hiroshi Ikeda Priority: Trivial The contract of GSet says it is ensured to throw NullPointerException if a given argument is null for many methods, but GSetByHashMap doesn't. I think just writing non-null preconditions for GSet are required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870400#comment-13870400 ] Akira AJISAKA commented on HDFS-4922: - Thanks for updating! Minor comment: {code} + Local block reader maintains a chunk buffer, This controls the maximum chunks + can be filled in the chunk buffer for each read. + The buffer size was specified in bytes, but + It would be better to be integral multiple of dfs.bytes-per-checksum {code} Some capital letters can be converted to small. Below fix is good to me. {code} + Local block reader maintains a chunk buffer, this controls the maximum chunks + can be filled in the chunk buffer for each read. + The buffer size was specified in bytes. It would be better to be integral + multiple of dfs.bytes-per-checksum for better performance. {code} Also, now these parameters are not described in hdfs-default.xml. Would you add them? Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870408#comment-13870408 ] Todd Lipcon commented on HDFS-5138: --- +1 pending Jenkins results. Please don't forget to file the follow-up JIRA we discussed above. Thanks! Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870417#comment-13870417 ] Aaron T. Myers commented on HDFS-5138: -- Thanks for the comments, Konst. bq. This is less intuitive than the current state of the art. Because after NN rollback you need to start NameNode as -regular, while DataNodes with -rollback startup option. It's different, but not obvious to me that it's necessary less intuitive. I've personally always found it a bit strange that to roll back you need to start the NN _once_ with the '-rollback' option, which will result in it doing some things at startup, and then starting up as normal. This might seem to imply that the NN is running in some sort of rollback mode, when in fact the act of rolling back has already completed, and thereafter you should always start the NN without the '-rollback' option. bq. Also just mentioning there could be some collisions with the rolling upgrade design, which I just finished reading. I think HDFS-5535 assumes current (pre-your-patch) behaviours of -rollback and -finalize. For -finalize the problem could be that you remove it as a start up option. May be Suresh can elaborate better on this. Needing to roll back should (hopefully!) be such a rare occurrence that it doesn't seem unreasonable to me to not do that in a rolling way. Removal of the '-finalize' startup option, I would think, should make the whole thing easier, and doesn't seem to me to have any benefits vs. just using the finalizeUpgrade RPC. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5738: - Attachment: HDFS-5738.004.patch Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, HDFS-5738.002.patch, HDFS-5738.003.patch, HDFS-5738.004.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870454#comment-13870454 ] Haohui Mai commented on HDFS-5738: -- The v4 patch changes FileHeader into FileSummary. Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, HDFS-5738.002.patch, HDFS-5738.003.patch, HDFS-5738.004.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5752) Add a new DFSAdminCommand for rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870460#comment-13870460 ] Jing Zhao commented on HDFS-5752: - The patch looks good to me. Some minors: # We may want to add more javadoc for ClientProtocol#rollingUpgrade # Since ClientProtocol#rollingUpgrade returns long, shall we also let DistributedFileSystem#rollingUpgrade and DFSClient#rollingUpgrade return long? +1 after addressing the comments. Add a new DFSAdminCommand for rolling upgrade - Key: HDFS-5752 URL: https://issues.apache.org/jira/browse/HDFS-5752 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5752_20140112.patch, h5752_20140114.patch We need to add a new DFSAdmin to start, finalize and query rolling upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870462#comment-13870462 ] Hadoop QA commented on HDFS-5138: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622787/HDFS-5138.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.TestClientReportBadBlock {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5868//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5868//console This message is automatically generated. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated HDFS-4922: -- Attachment: HDFS-4922-006.patch Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922-006.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870474#comment-13870474 ] Fengdong Yu commented on HDFS-4922: --- Hi [~ajisakaa], I refreshed the patch, and I'll file a separate jira to describe these parameters in the hdfs-default.xml. Thanks. Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922-006.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870484#comment-13870484 ] Jing Zhao commented on HDFS-5738: - The v4 patch looks great to me. One nit: we should use a different class for the following code {code} private static final Log LOG = LogFactory .getLog(DelegationTokenSecretManager.class); {code} Besides, the current digest computation can only be understood by the saver and the loader. Maybe we should consider pulling the whole digest computation process out of the FSImage saving/loading procedure, and do this computation based on the complete FSImage file. But this can be discussed in a separate jira. Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch, HDFS-5738.002.patch, HDFS-5738.003.patch, HDFS-5738.004.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)