[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865202#comment-13865202 ] zhaoyunjiong commented on HDFS-5579: It's already in the patch. +if (bc.isUnderConstruction()) { + if (block.equals(bc.getLastBlock()) curReplicas minReplication) { +continue; + } + underReplicatedInOpenFiles++; +} Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5729) Lower chance to hit NPE in allocateNodeLocal
wenwupeng created HDFS-5729: --- Summary: Lower chance to hit NPE in allocateNodeLocal Key: HDFS-5729 URL: https://issues.apache.org/jira/browse/HDFS-5729 Project: Hadoop HDFS Issue Type: Bug Reporter: wenwupeng we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). Steps: 1. setup hadoop 2.2.0 environment 2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar /hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep 10;done 2014-01-08 03:56:14,082 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:662) will attach log and configure files later Note: My topology file: 10.111.89.230 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.231 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.232 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.239 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.233 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.234 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.240 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.236 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.241 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.238 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com 10.111.89.242 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5729) Lower chance to hit NPE in allocateNodeLocal
[ https://issues.apache.org/jira/browse/HDFS-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenwupeng updated HDFS-5729: Attachment: log.tar.gz conf.tar.gz Attach the logs and configure file Lower chance to hit NPE in allocateNodeLocal - Key: HDFS-5729 URL: https://issues.apache.org/jira/browse/HDFS-5729 Project: Hadoop HDFS Issue Type: Bug Reporter: wenwupeng Attachments: conf.tar.gz, log.tar.gz we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). Steps: 1. setup hadoop 2.2.0 environment 2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar /hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep 10;done 2014-01-08 03:56:14,082 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:662) will attach log and configure files later Note: My topology file: 10.111.89.230 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.231 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.232 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.239 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.233 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.234 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.240 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.236 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.241 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.238 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com 10.111.89.242 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4273) Fix some issue in DFSInputstream
[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865220#comment-13865220 ] Hadoop QA commented on HDFS-4273: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621932/HDFS-4273.v8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5844//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5844//console This message is automatically generated. Fix some issue in DFSInputstream Key: HDFS-4273 URL: https://issues.apache.org/jira/browse/HDFS-4273 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, HDFS-4273.v7.patch, HDFS-4273.v8.patch, TestDFSInputStream.java Following issues in DFSInputStream are addressed in this jira: 1. read may not retry enough in some cases cause early failure Assume the following call logic {noformat} readWithStrategy() - blockSeekTo() - readBuffer() - reader.doRead() - seekToNewSource() add currentNode to deadnode, wish to get a different datanode - blockSeekTo() - chooseDataNode() - block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false readBuffer() re-throw the exception quit loop readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. {noformat} 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it is cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit. Change failures to local variable solve this issue. 3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5730) Inconsistent Audit logging for HDFS APIs
Uma Maheswara Rao G created HDFS-5730: - Summary: Inconsistent Audit logging for HDFS APIs Key: HDFS-5730 URL: https://issues.apache.org/jira/browse/HDFS-5730 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G When looking at the audit loggs in HDFS, I am seeing some inconsistencies what was logged with audit and what is added recently. For more details please check the comments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5730) Inconsistent Audit logging for HDFS APIs
[ https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865273#comment-13865273 ] Uma Maheswara Rao G commented on HDFS-5730: --- HDFS audit logging interface: {code} /** * Same as * {@link #logAuditEvent(boolean, String, InetAddress, String, String, String, FileStatus)} * with additional parameters related to logging delegation token tracking * IDs. * * @param succeeded Whether authorization succeeded. * @param userName Name of the user executing the request. * @param addr Remote address of the request. * @param cmd The requested command. * @param src Path of affected source file. * @param dst Path of affected destination file (if any). * @param stat File information for operations that change the file's metadata * (permissions, owner, times, etc). * @param ugi UserGroupInformation of the current user, or null if not logging * token tracking information * @param dtSecretManager The token secret manager, or null if not logging * token tracking information */ public abstract void logAuditEvent(boolean succeeded, String userName, InetAddress addr, String cmd, String src, String dst, FileStatus stat, UserGroupInformation ugi, DelegationTokenSecretManager dtSecretManager); {code} Here succeeded parameter indicates whether Authorization check succeeded. Recent APIs like addCacheDirective, modifyCacheDirective, removeCacheDirective..etc are used that parameter to indicate whether op succeeded or not. {code} boolean success = false; ... writeLock(); try { checkOperation(OperationCategory.WRITE); if (isInSafeMode()) { throw new SafeModeException( Cannot add cache directive, safeMode); } cacheManager.modifyDirective(directive, pc, flags); getEditLog().logModifyCacheDirectiveInfo(directive, cacheEntry != null); success = true; } finally { writeUnlock(); if (success) { getEditLog().logSync(); } if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, modifyCacheDirective, null, null, null); } RetryCache.setState(cacheEntry, success); } {code} But all the older APIs like startFile..etc handled the AccessControlException explicitly and passed the first parameter value as false if failure. No log for other IOE. Also snapShot related APIs followed other pattern. Here we just logged only on success. {code} String createSnapshot(String snapshotRoot, String snapshotName) throws SafeModeException, IOException { .. . getEditLog().logSync(); if (auditLog.isInfoEnabled() isExternalInvocation()) { logAuditEvent(true, createSnapshot, snapshotRoot, snapshotPath, null); } return snapshotPath; } {code} So, we have to unify the audit logging here in all APIs. Inconsistent Audit logging for HDFS APIs Key: HDFS-5730 URL: https://issues.apache.org/jira/browse/HDFS-5730 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G When looking at the audit loggs in HDFS, I am seeing some inconsistencies what was logged with audit and what is added recently. For more details please check the comments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865281#comment-13865281 ] Uma Maheswara Rao G commented on HDFS-5721: --- {quote} There are also other places with the similar issues that not get close in finally block. i.e. Namenode#Format(), FSNamesystem# loadFromDisk(), etc. I think we should fix all these similar issues in one JIRA {quote} I agree to close the streams. Actually in most of this cases, JVM will terminate immediately after the command execution (ex: format ..etc). It will not run system for log with the leaked streams. But if we face any issue due to streams because of not closing them, closing would be fine now. Am i missed something here? sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5729) Lower chance to hit NPE in allocateNodeLocal
[ https://issues.apache.org/jira/browse/HDFS-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865288#comment-13865288 ] Uma Maheswara Rao G commented on HDFS-5729: --- Should we move this to YARN? Lower chance to hit NPE in allocateNodeLocal - Key: HDFS-5729 URL: https://issues.apache.org/jira/browse/HDFS-5729 Project: Hadoop HDFS Issue Type: Bug Reporter: wenwupeng Attachments: conf.tar.gz, log.tar.gz we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). Steps: 1. setup hadoop 2.2.0 environment 2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar /hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep 10;done 2014-01-08 03:56:14,082 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:662) will attach log and configure files later Note: My topology file: 10.111.89.230 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.231 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.232 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.239 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.233 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.234 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.240 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.236 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.241 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.238 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com 10.111.89.242 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
[ https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865317#comment-13865317 ] Hudson commented on HDFS-5726: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #446 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/446/]) HDFS-5726. Fix compilation error in AbstractINodeDiff for JDK7. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556433) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java Fix compilation error in AbstractINodeDiff for JDK7 --- Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5726.000.patch HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865313#comment-13865313 ] Hudson commented on HDFS-5715: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #446 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/446/]) HDFS-5715. Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556353) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeSymlink.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeWithAdditionalFields.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodesInPath.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiffList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiff.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiffList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/INodeDirectorySnapshottable.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/Snapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSnapshotPathINodes.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotTestHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestINodeFileUnderConstructionWithSnapshot.java *
[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865316#comment-13865316 ] Hudson commented on HDFS-5649: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #446 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/446/]) HDFS-5649. Unregister NFS and Mount service when NFS gateway is shutting down. Contributed by Brandon Li (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556405) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/portmap/PortmapRequest.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/DFSClientCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.3.0 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865314#comment-13865314 ] Hudson commented on HDFS-5724: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #446 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/446/]) HDFS-5724. modifyCacheDirective logging audit log command wrongly as addCacheDirective (Uma Maheswara Rao G via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556386) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Labels: caching Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5723: Assignee: Vinay Status: Patch Available (was: Open) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction Key: HDFS-5723 URL: https://issues.apache.org/jira/browse/HDFS-5723 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5723.patch Scenario: 1. 3 node cluster with dfs.client.block.write.replace-datanode-on-failure.enable set to false. 2. One file is written with 3 replicas, blk_id_gs1 3. One of the datanode DN1 is down. 4. File was opened with append and some more data is added to the file and synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 5. Now DN1 restarted 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should be marked corrupted. but since NN having appended block state as UnderConstruction, at this time its not detecting this block as corrupt and adding to valid block locations. As long as the namenode is alive, this datanode also will be considered as valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM
[ https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-3752: --- Attachment: HDFS-3752-testcase.patch BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM --- Key: HDFS-3752 URL: https://issues.apache.org/jira/browse/HDFS-3752 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0-alpha Reporter: Vinay Attachments: HDFS-3752-testcase.patch 1. do {{saveNameSpace}} in ANN node by entering into safemode 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY 3. Now StandBy NN will not able to copy the fsimage_txid from ANN This is because, SNN not able to find the next txid (txid+1) in shared storage. Just after {{saveNameSpace}} shared storage will have the new logsegment with only START_LOG_SEGEMENT edits op. and BookKeeper will not be able to read last entry from inprogress ledger. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5723: Attachment: HDFS-5723.patch Attached the patch, Please review Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction Key: HDFS-5723 URL: https://issues.apache.org/jira/browse/HDFS-5723 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Vinay Attachments: HDFS-5723.patch Scenario: 1. 3 node cluster with dfs.client.block.write.replace-datanode-on-failure.enable set to false. 2. One file is written with 3 replicas, blk_id_gs1 3. One of the datanode DN1 is down. 4. File was opened with append and some more data is added to the file and synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 5. Now DN1 restarted 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should be marked corrupted. but since NN having appended block state as UnderConstruction, at this time its not detecting this block as corrupt and adding to valid block locations. As long as the namenode is alive, this datanode also will be considered as valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM
[ https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865367#comment-13865367 ] Rakesh R commented on HDFS-3752: Hi, As I understood from the discussion, when bootstrapping the standby it is not very much required to see the transactions present in the 'in_progress' node and skipping of 'in_progress' will not cause any inconsistencies. Anyway StandbyToActive transition will always ensure, the edit delta transactions will be read from the shared edit dirs and able to reliably start as Active. bq.we could add an easy workaround flag, like bootstrapStandby -skipSharedEditsCheck, since the check here is just to help out the user and not actually necessary for correct operation. I also agree in skipping the shared edits check during bootstrapstandby, in that case there is no special fix required for this JIRA. Presently, there is no test cases for bootstrap with BKJM shared edits and I've tried few. Could you please review the attached test case patch. If everyone agrees, push this in and and can close this JIRA once HDFS-4120 is in. Any thoughts? BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM --- Key: HDFS-3752 URL: https://issues.apache.org/jira/browse/HDFS-3752 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0-alpha Reporter: Vinay Attachments: HDFS-3752-testcase.patch 1. do {{saveNameSpace}} in ANN node by entering into safemode 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY 3. Now StandBy NN will not able to copy the fsimage_txid from ANN This is because, SNN not able to find the next txid (txid+1) in shared storage. Just after {{saveNameSpace}} shared storage will have the new logsegment with only START_LOG_SEGEMENT edits op. and BookKeeper will not be able to read last entry from inprogress ledger. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865432#comment-13865432 ] Hudson commented on HDFS-5715: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1638 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1638/]) HDFS-5715. Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556353) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeSymlink.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeWithAdditionalFields.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodesInPath.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiffList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiff.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiffList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/INodeDirectorySnapshottable.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/Snapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSnapshotPathINodes.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotTestHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestINodeFileUnderConstructionWithSnapshot.java *
[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865435#comment-13865435 ] Hudson commented on HDFS-5649: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1638 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1638/]) HDFS-5649. Unregister NFS and Mount service when NFS gateway is shutting down. Contributed by Brandon Li (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556405) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/portmap/PortmapRequest.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/DFSClientCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.3.0 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
[ https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865436#comment-13865436 ] Hudson commented on HDFS-5726: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1638 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1638/]) HDFS-5726. Fix compilation error in AbstractINodeDiff for JDK7. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556433) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java Fix compilation error in AbstractINodeDiff for JDK7 --- Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5726.000.patch HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5727) introduce a self-maintaining io queue handling mechanism
[ https://issues.apache.org/jira/browse/HDFS-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Chen updated HDFS-5727: --- Summary: introduce a self-maintaining io queue handling mechanism (was: introduce a self-maintain io queue handling mechanism) introduce a self-maintaining io queue handling mechanism Key: HDFS-5727 URL: https://issues.apache.org/jira/browse/HDFS-5727 Project: Hadoop HDFS Issue Type: New Feature Components: datanode Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Currently the datanode read/write SLA is difficult to be guaranteed for HBase online requirement. One of major reasons is we don't support io priority or io request reorder inside datanode. I propose introducing a self-maintain io queue mechanism to handle io request priority. Imaging there're lots of concurrent read/write requests from HBase side, and a background datanode block scanner is running(default is every 21 days, IIRC) just in time, then the HBase read/write 99% or 99.9% percentile latency would be vulnerable despite we have a bg thread throttling... the reorder stuff i have not thought clearly enough, but definitely the reorder in the queue in the app side would beat the currently relying OS's io queue merge. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5727) introduce a self-maintain io queue handling mechanism
[ https://issues.apache.org/jira/browse/HDFS-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Chen updated HDFS-5727: --- Description: Currently the datanode read/write SLA is difficult to be guaranteed for HBase online requirement. One of major reasons is we don't support io priority or io request reorder inside datanode. I propose introducing a self-maintain io queue mechanism to handle io request priority. Imaging there're lots of concurrent read/write requests from HBase side, and a background datanode block scanner is running(default is every 21 days, IIRC) just in time, then the HBase read/write 99% or 99.9% percentile latency would be vulnerable despite we have a bg thread throttling... the reorder stuff i have not thought clearly enough, but definitely the reorder in the queue in the app side would beat the currently relying OS's io queue merge. was: Currently the datanode read/write SLA is dfficult to be ganranteed for HBase online requirement. One of major reasons is we don't support io priority or io reqeust reorder inside datanode. I proposal introducing a self-maintain io queue mechanism to handle io request priority. Image there're lots of concurrent read/write reqeust from HBase side, and a background datanode block scanner is running(default is every 21 days, IIRC) just in time, then the HBase read/write 99% or 99.9% percentile latency would be vulnerable despite we have a bg thread throttling... the reorder stuf i have not thought clearly enough, but definitely the reorder in the queue in the app side would beat the currently relying OS's io queue merge. introduce a self-maintain io queue handling mechanism - Key: HDFS-5727 URL: https://issues.apache.org/jira/browse/HDFS-5727 Project: Hadoop HDFS Issue Type: New Feature Components: datanode Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Currently the datanode read/write SLA is difficult to be guaranteed for HBase online requirement. One of major reasons is we don't support io priority or io request reorder inside datanode. I propose introducing a self-maintain io queue mechanism to handle io request priority. Imaging there're lots of concurrent read/write requests from HBase side, and a background datanode block scanner is running(default is every 21 days, IIRC) just in time, then the HBase read/write 99% or 99.9% percentile latency would be vulnerable despite we have a bg thread throttling... the reorder stuff i have not thought clearly enough, but definitely the reorder in the queue in the app side would beat the currently relying OS's io queue merge. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5731) Refactoring to define interfaces between BM and NN and simplify the flow between them
Amir Langer created HDFS-5731: - Summary: Refactoring to define interfaces between BM and NN and simplify the flow between them Key: HDFS-5731 URL: https://issues.apache.org/jira/browse/HDFS-5731 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Start the separation of BlockManager (BM) from NameNode (NN) by simplifying the flow between the two components and defining API interfaces between them. The two components still exist in the same VM and use the same memory space (using the same instances). Logic to calls from Datanodes should be in the BM. NN should interact with BM using few calls and BM should use the return types as much as possible to pass information to the NN. APIs between them should be defined as interfaces so later it can be improved to not use the same object instances and turned into a real protocol. This still assumes a one to one relation between NN and BM, same VM and does not handle lifecycle of the service. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5731) Refactoring to define interfaces between BM and NN and simplify the flow between them
[ https://issues.apache.org/jira/browse/HDFS-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Langer updated HDFS-5731: -- Description: Start the separation of BlockManager (BM) from NameNode (NN) by simplifying the flow between the two components and defining API interfaces between them. The two components still exist in the same VM and use the same memory space (using the same instances). Logic to calls from Datanodes should be in the BM. NN should interact with BM using few calls and BM should use the return types as much as possible to pass information to the NN. APIs between them should be defined as interfaces so later it can be improved to not use the same object instances and turned into a real protocol. This still assumes a one to one relation between NN and BM, same VM and does not handle lifecycle of the service. This task should maintain backward compatibility was: Start the separation of BlockManager (BM) from NameNode (NN) by simplifying the flow between the two components and defining API interfaces between them. The two components still exist in the same VM and use the same memory space (using the same instances). Logic to calls from Datanodes should be in the BM. NN should interact with BM using few calls and BM should use the return types as much as possible to pass information to the NN. APIs between them should be defined as interfaces so later it can be improved to not use the same object instances and turned into a real protocol. This still assumes a one to one relation between NN and BM, same VM and does not handle lifecycle of the service. Refactoring to define interfaces between BM and NN and simplify the flow between them - Key: HDFS-5731 URL: https://issues.apache.org/jira/browse/HDFS-5731 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Start the separation of BlockManager (BM) from NameNode (NN) by simplifying the flow between the two components and defining API interfaces between them. The two components still exist in the same VM and use the same memory space (using the same instances). Logic to calls from Datanodes should be in the BM. NN should interact with BM using few calls and BM should use the return types as much as possible to pass information to the NN. APIs between them should be defined as interfaces so later it can be improved to not use the same object instances and turned into a real protocol. This still assumes a one to one relation between NN and BM, same VM and does not handle lifecycle of the service. This task should maintain backward compatibility -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5732) Separate memory space between BM and NN
Amir Langer created HDFS-5732: - Summary: Separate memory space between BM and NN Key: HDFS-5732 URL: https://issues.apache.org/jira/browse/HDFS-5732 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Change created APIs to not rely on the same instance being shared in both BM and NN. Use immutable objects / keep state in sync. BM and NN will still exist in the same VM work on a new BM service as an independent process is deferred to later tasks. Also, a one to one relation between BM and NN is assumed. This task should maintain backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5733) Separate concurrency control between BM and NN
Amir Langer created HDFS-5733: - Summary: Separate concurrency control between BM and NN Key: HDFS-5733 URL: https://issues.apache.org/jira/browse/HDFS-5733 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Amir Langer Replace usage of the namesystem locking mechanism by the BM with its own concurrency control to control its own internal state. Both NN and BM will still run from the same VM. This task should maintain backward compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5727) introduce a self-maintaining io queue handling mechanism
[ https://issues.apache.org/jira/browse/HDFS-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865467#comment-13865467 ] Richard Chen commented on HDFS-5727: interesting but if you can improve your language further, you will help the audience to better understand what you intend to do. My team is working on something similar to that. I am thinking of adding your problem into our design scope. We can certainly collaborate on this. Let me know your thoughts. introduce a self-maintaining io queue handling mechanism Key: HDFS-5727 URL: https://issues.apache.org/jira/browse/HDFS-5727 Project: Hadoop HDFS Issue Type: New Feature Components: datanode Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Currently the datanode read/write SLA is difficult to be guaranteed for HBase online requirement. One of major reasons is we don't support io priority or io request reorder inside datanode. I propose introducing a self-maintain io queue mechanism to handle io request priority. Imaging there're lots of concurrent read/write requests from HBase side, and a background datanode block scanner is running(default is every 21 days, IIRC) just in time, then the HBase read/write 99% or 99.9% percentile latency would be vulnerable despite we have a bg thread throttling... the reorder stuff i have not thought clearly enough, but definitely the reorder in the queue in the app side would beat the currently relying OS's io queue merge. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5734) A NN-internal RPC BM service
Amir Langer created HDFS-5734: - Summary: A NN-internal RPC BM service Key: HDFS-5734 URL: https://issues.apache.org/jira/browse/HDFS-5734 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Separate the BM from NN by running it with with its own thread-pool and RPC protocol but still in the same process as NN. NN and BM will in interact through some loopback call that will simulate a separate service. This sprint still assumes a one to one relation between NN and BM and does not split the BM to a separate process, only simulates such a split inside the same VM. This allows us to defer any configuration issue / Testing support / scripts changes to later tasks. This task will therefore also not handle any HA issue to the BM itself. It will, however, deal with having BM code actually running in a different thread to the NN code and will handle building the initialisation / lifecycle code to an independent BM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5735) Testing support for BM as a service
Amir Langer created HDFS-5735: - Summary: Testing support for BM as a service Key: HDFS-5735 URL: https://issues.apache.org/jira/browse/HDFS-5735 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Testing support for an independent BM service. Modify tests to start it / use MiniDFSCluster if they require a BM. Verify that all tests still pass with an independent BM (running off MiniDFSCluster). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5736) BM service as a separate process
Amir Langer created HDFS-5736: - Summary: BM service as a separate process Key: HDFS-5736 URL: https://issues.apache.org/jira/browse/HDFS-5736 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Add scripts / config. to allow running BM as a separate service. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5731) Refactoring to define interfaces between BM and NN and simplify the flow between them
[ https://issues.apache.org/jira/browse/HDFS-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Langer updated HDFS-5731: -- Attachment: 0001-Separation-of-BM-from-NN-Step1-introduce-APIs-as-int.patch patch are changes done on top of trunk and were last rebased to start from commit: HADOOP-10175. Har files system authority should preserve userinfo. Contributed by Chuan Liu. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1553169 13f79535-47bb-0310-9956-ffa450edef68 Refactoring to define interfaces between BM and NN and simplify the flow between them - Key: HDFS-5731 URL: https://issues.apache.org/jira/browse/HDFS-5731 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Attachments: 0001-Separation-of-BM-from-NN-Step1-introduce-APIs-as-int.patch Start the separation of BlockManager (BM) from NameNode (NN) by simplifying the flow between the two components and defining API interfaces between them. The two components still exist in the same VM and use the same memory space (using the same instances). Logic to calls from Datanodes should be in the BM. NN should interact with BM using few calls and BM should use the return types as much as possible to pass information to the NN. APIs between them should be defined as interfaces so later it can be improved to not use the same object instances and turned into a real protocol. This still assumes a one to one relation between NN and BM, same VM and does not handle lifecycle of the service. This task should maintain backward compatibility -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction
[ https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865487#comment-13865487 ] Hadoop QA commented on HDFS-5723: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621965/HDFS-5723.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5845//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5845//console This message is automatically generated. Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction Key: HDFS-5723 URL: https://issues.apache.org/jira/browse/HDFS-5723 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5723.patch Scenario: 1. 3 node cluster with dfs.client.block.write.replace-datanode-on-failure.enable set to false. 2. One file is written with 3 replicas, blk_id_gs1 3. One of the datanode DN1 is down. 4. File was opened with append and some more data is added to the file and synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2 5. Now DN1 restarted 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should be marked corrupted. but since NN having appended block state as UnderConstruction, at this time its not detecting this block as corrupt and adding to valid block locations. As long as the namenode is alive, this datanode also will be considered as valid replica and read/append will fail in that datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865503#comment-13865503 ] Hudson commented on HDFS-5724: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1663 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1663/]) HDFS-5724. modifyCacheDirective logging audit log command wrongly as addCacheDirective (Uma Maheswara Rao G via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556386) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Labels: caching Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865502#comment-13865502 ] Hudson commented on HDFS-5715: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1663 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1663/]) HDFS-5715. Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556353) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeSymlink.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeWithAdditionalFields.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodesInPath.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiffList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiff.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiffList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/INodeDirectorySnapshottable.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/Snapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSnapshotPathINodes.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotTestHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestINodeFileUnderConstructionWithSnapshot.java *
[jira] [Commented] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
[ https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865506#comment-13865506 ] Hudson commented on HDFS-5726: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1663 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1663/]) HDFS-5726. Fix compilation error in AbstractINodeDiff for JDK7. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556433) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java Fix compilation error in AbstractINodeDiff for JDK7 --- Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5726.000.patch HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865505#comment-13865505 ] Hudson commented on HDFS-5649: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1663 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1663/]) HDFS-5649. Unregister NFS and Mount service when NFS gateway is shutting down. Contributed by Brandon Li (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556405) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/portmap/PortmapRequest.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/DFSClientCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.3.0 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-5721: - Attachment: hdfs-5721-v3.txt Patch v3 addresses Junping's comment. sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt, hdfs-5721-v3.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5734) A NN-internal RPC BM service
[ https://issues.apache.org/jira/browse/HDFS-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865628#comment-13865628 ] jay vyas commented on HDFS-5734: Sorry to ask, but... whats BM ? Is that the BackupNameNode? A NN-internal RPC BM service Key: HDFS-5734 URL: https://issues.apache.org/jira/browse/HDFS-5734 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Separate the BM from NN by running it with with its own thread-pool and RPC protocol but still in the same process as NN. NN and BM will in interact through some loopback call that will simulate a separate service. This sprint still assumes a one to one relation between NN and BM and does not split the BM to a separate process, only simulates such a split inside the same VM. This allows us to defer any configuration issue / Testing support / scripts changes to later tasks. This task will therefore also not handle any HA issue to the BM itself. It will, however, deal with having BM code actually running in a different thread to the NN code and will handle building the initialisation / lifecycle code to an independent BM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865669#comment-13865669 ] Hadoop QA commented on HDFS-5721: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621987/hdfs-5721-v3.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5846//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5846//console This message is automatically generated. sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt, hdfs-5721-v3.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-2261) AOP unit tests are not getting compiled or run
[ https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned HDFS-2261: -- Assignee: (was: Karthik Kambatla) AOP unit tests are not getting compiled or run --- Key: HDFS-2261 URL: https://issues.apache.org/jira/browse/HDFS-2261 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha, 2.0.4-alpha Environment: https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console -compile-fault-inject ant target Reporter: Giridharan Kesavan Priority: Minor Attachments: hdfs-2261.patch The tests in src/test/aop are not getting compiled or run. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865705#comment-13865705 ] Jing Zhao commented on HDFS-5579: - {code} +if (bc.isUnderConstruction()) { + if (block.equals(bc.getLastBlock()) curReplicas minReplication) { +continue; + } + underReplicatedInOpenFiles++; +} {code} Here if {{block}} is not the last block, and {{block}} is not under replicated, underReplicatedInOpenFiles will still increase? Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5737) Replacing only the default ACL can fail to copy unspecified base entries from the access ACL.
Chris Nauroth created HDFS-5737: --- Summary: Replacing only the default ACL can fail to copy unspecified base entries from the access ACL. Key: HDFS-5737 URL: https://issues.apache.org/jira/browse/HDFS-5737 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth The final round of changes in HDFS-5673 switched to a search approach instead of a scan approach for finding base access entries that need to be copied to the default ACL. However, in the case of doing full replacement on the default ACL, the list may not be sorted properly at this point in the code, causing the searches to miss the access entries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5737) Replacing only the default ACL can fail to copy unspecified base entries from the access ACL.
[ https://issues.apache.org/jira/browse/HDFS-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5737 started by Chris Nauroth. Replacing only the default ACL can fail to copy unspecified base entries from the access ACL. - Key: HDFS-5737 URL: https://issues.apache.org/jira/browse/HDFS-5737 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth The final round of changes in HDFS-5673 switched to a search approach instead of a scan approach for finding base access entries that need to be copied to the default ACL. However, in the case of doing full replacement on the default ACL, the list may not be sorted properly at this point in the code, causing the searches to miss the access entries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5737) Replacing only the default ACL can fail to copy unspecified base entries from the access ACL.
[ https://issues.apache.org/jira/browse/HDFS-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5737: Attachment: (was: HDFS-5673.1.patch) Replacing only the default ACL can fail to copy unspecified base entries from the access ACL. - Key: HDFS-5737 URL: https://issues.apache.org/jira/browse/HDFS-5737 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5737.1.patch The final round of changes in HDFS-5673 switched to a search approach instead of a scan approach for finding base access entries that need to be copied to the default ACL. However, in the case of doing full replacement on the default ACL, the list may not be sorted properly at this point in the code, causing the searches to miss the access entries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5737) Replacing only the default ACL can fail to copy unspecified base entries from the access ACL.
[ https://issues.apache.org/jira/browse/HDFS-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5737: Attachment: HDFS-5737.1.patch Replacing only the default ACL can fail to copy unspecified base entries from the access ACL. - Key: HDFS-5737 URL: https://issues.apache.org/jira/browse/HDFS-5737 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5737.1.patch The final round of changes in HDFS-5673 switched to a search approach instead of a scan approach for finding base access entries that need to be copied to the default ACL. However, in the case of doing full replacement on the default ACL, the list may not be sorted properly at this point in the code, causing the searches to miss the access entries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5737) Replacing only the default ACL can fail to copy unspecified base entries from the access ACL.
[ https://issues.apache.org/jira/browse/HDFS-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5737: Attachment: HDFS-5673.1.patch Here is a patch to fix the bug. # The easiest way to fix this is to do another sort at the start of {{AclTransformation#copyDefaultsIfNeeded}}. # This bug had been causing us to produce invalid default ACLs that are missing the base entries (owner, group, other). As an extra defense, I changed the validation logic so that it requires the base entries for both access and default. Previously, this was just enforced for access. To do this, I rewrote this portion of the logic to use the search approach, similar to what people found more readable for {{AclTransformation#copyDefaultsIfNeeded}}. In theory, the checks on the default ACL should never fail, because we should always copy the missing required entries from the access ACL. However, if there is a bug, then it's better to bail earlier instead of producing an invalid default ACL that gets used later. # Added one more test in {{TestAclTransformation}}. This test failed before I made the fix in {{AclTransformation}}. Replacing only the default ACL can fail to copy unspecified base entries from the access ACL. - Key: HDFS-5737 URL: https://issues.apache.org/jira/browse/HDFS-5737 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5737.1.patch The final round of changes in HDFS-5673 switched to a search approach instead of a scan approach for finding base access entries that need to be copied to the default ACL. However, in the case of doing full replacement on the default ACL, the list may not be sorted properly at this point in the code, causing the searches to miss the access entries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5714) Use byte array to represent UnderConstruction feature and Snapshot feature for INodeFile
[ https://issues.apache.org/jira/browse/HDFS-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5714: Attachment: HDFS-5714.000.patch Early patch for review. In general, the patch 1. Encodes the whole FileDiffList into a byte array. Instead of always keeping the byte array in memory, currently the patch only encodes a FileDiffList to a byte array when loading it from FSImage for the first time. And later if the corresponding snapshot information is accessed the byte array will be decoded to the FileDiffList and will not be encoded again (until the next time NN restarting). 2. Remove ClientNode from FileUnderConstructionFeature and use a byte array to represent the ClientName and ClientMachine. Use byte array to represent UnderConstruction feature and Snapshot feature for INodeFile Key: HDFS-5714 URL: https://issues.apache.org/jira/browse/HDFS-5714 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5714.000.patch Currently we define specific classes to represent different INode features, such as FileUnderConstructionFeature and FileWithSnapshotFeature. While recording these feature information in memory, the internal information and object references can still cost a lot of memory. For example, for FileWithSnapshotFeature, not considering the INode's local name, the whole FileDiff list (with size n) can cost around 120n bytes. In order to decrease the memory usage, we plan to use byte array to record the UnderConstruction feature and Snapshot feature for INodeFile. Specifically, if we use protobuf's encoding, the memory usage for a FileWithSnapshotFeature can be less than 56n bytes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5738: - Attachment: HDFS-5738.000.patch Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5738) Serialize INode information in protobuf
Haohui Mai created HDFS-5738: Summary: Serialize INode information in protobuf Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865762#comment-13865762 ] Jing Zhao commented on HDFS-5738: - Can you give more details about how you serialize the inode information (e.g., traversing the fsdiretory tree or using inodesMap etc.)? This information will help others get a better understanding of your patch. Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865782#comment-13865782 ] Eric Sirianni commented on HDFS-5483: - Arpit - I noticed that the supplied patch only ignores the extra replica in the full Block Report code path ({{processReport()}}). Doesn't this leave the assertion still exposed on the {{BLOCK_RECEIVED}} ({{processIncrementalReportedBlock()}}) path? It seems like this code might need to be changed to search based on storage ID also: {code} if (reportedState == ReplicaState.FINALIZED (storedBlock.findDatanode(dn) 0 || corruptReplicas.isReplicaCorrupt(storedBlock, dn))) { toAdd.add(storedBlock); } {code} NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Fix For: 3.0.0 Attachments: h5483.02.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5739) ACL RPC must allow null name or null permissions in ACL entries.
Chris Nauroth created HDFS-5739: --- Summary: ACL RPC must allow null name or null permissions in ACL entries. Key: HDFS-5739 URL: https://issues.apache.org/jira/browse/HDFS-5739 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Currently, the ACL RPC defines ACL entries with required fields for name and permissions. These fields actually need to be optional. The name can be null to represent unnamed ACL entries, such as the file owner or mask. Permissions can be null when passed in an ACL spec to remove ACL entries via {{FileSystem#removeAclEntries}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5739) ACL RPC must allow null name or null permissions in ACL entries.
[ https://issues.apache.org/jira/browse/HDFS-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5739 started by Chris Nauroth. ACL RPC must allow null name or null permissions in ACL entries. Key: HDFS-5739 URL: https://issues.apache.org/jira/browse/HDFS-5739 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Currently, the ACL RPC defines ACL entries with required fields for name and permissions. These fields actually need to be optional. The name can be null to represent unnamed ACL entries, such as the file owner or mask. Permissions can be null when passed in an ACL spec to remove ACL entries via {{FileSystem#removeAclEntries}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5677 started by Vincent Sheffer. Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5739) ACL RPC must allow null name or null permissions in ACL entries.
[ https://issues.apache.org/jira/browse/HDFS-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5739: Attachment: HDFS-5739.1.patch This patch switches the fields to optional in the protobuf spec, updates the translation logic in {{PBHelper}} and expands on the tests in {{TestPBHelper}} to cover these cases. ACL RPC must allow null name or null permissions in ACL entries. Key: HDFS-5739 URL: https://issues.apache.org/jira/browse/HDFS-5739 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5739.1.patch Currently, the ACL RPC defines ACL entries with required fields for name and permissions. These fields actually need to be optional. The name can be null to represent unnamed ACL entries, such as the file owner or mask. Permissions can be null when passed in an ACL spec to remove ACL entries via {{FileSystem#removeAclEntries}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5740) getmerge file system shell command needs error message for user error
John Pfuntner created HDFS-5740: --- Summary: getmerge file system shell command needs error message for user error Key: HDFS-5740 URL: https://issues.apache.org/jira/browse/HDFS-5740 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 1.1.2 Environment: {noformat}[jpfuntner@h58 tmp]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.0 (Santiago) [jpfuntner@h58 tmp]$ hadoop version Hadoop 1.1.2.21 Subversion -r Compiled by jenkins on Thu Jan 10 03:38:39 PST 2013 From source with checksum ce0aa0de785f572347f1afee69c73861{noformat} Reporter: John Pfuntner Priority: Minor I naively tried a {{getmerge}} operation but it didn't seem to do anything and there was no error message: {noformat}[jpfuntner@h58 tmp]$ hadoop fs -mkdir /user/jpfuntner/tmp [jpfuntner@h58 tmp]$ num=0; while [ $num -lt 5 ]; do echo file$num | hadoop fs -put - /user/jpfuntner/tmp/file$num; let num=num+1; done [jpfuntner@h58 tmp]$ ls -A [jpfuntner@h58 tmp]$ hadoop fs -getmerge /user/jpfuntner/tmp/file* files.txt [jpfuntner@h58 tmp]$ ls -A [jpfuntner@h58 tmp]$ hadoop fs -ls /user/jpfuntner/tmp Found 5 items -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file0 -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file1 -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file2 -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file3 -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file4 [jpfuntner@h58 tmp]$ {noformat} It was pointed out to me that I made a mistake and my source should have been a directory not a set of regular files. It works if I use the directory: {noformat}[jpfuntner@h58 tmp]$ hadoop fs -getmerge /user/jpfuntner/tmp/ files.txt [jpfuntner@h58 tmp]$ ls -A files.txt .files.txt.crc [jpfuntner@h58 tmp]$ cat files.txt file0 file1 file2 file3 file4 [jpfuntner@h58 tmp]$ {noformat} I think the {{getmerge}} command should issue an error message to let the user know they made a mistake. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5737) Replacing only the default ACL can fail to copy unspecified base entries from the access ACL.
[ https://issues.apache.org/jira/browse/HDFS-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865903#comment-13865903 ] Haohui Mai commented on HDFS-5737: -- The patch looks good to me. +1. However, there are a couple efficiency issues that can be addressed in separate jiras: # Implement your own binary search so that (1) it supports finding in a sub list of the collection, and (2) it always returns the lowest element in the list. That way you can make finding the pivot more efficient, and you don't need to create sub lists in {{copyDefaultsIfNeeded}}. # Since you know the pivot, you can insert the default entries at the pivot position and sort that sub list. Alternatively you can separate the ACLs into default entries and access entries, and concat them at the very end. Replacing only the default ACL can fail to copy unspecified base entries from the access ACL. - Key: HDFS-5737 URL: https://issues.apache.org/jira/browse/HDFS-5737 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5737.1.patch The final round of changes in HDFS-5673 switched to a search approach instead of a scan approach for finding base access entries that need to be copied to the default ACL. However, in the case of doing full replacement on the default ACL, the list may not be sorted properly at this point in the code, causing the searches to miss the access entries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5739) ACL RPC must allow null name or null permissions in ACL entries.
[ https://issues.apache.org/jira/browse/HDFS-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865918#comment-13865918 ] Haohui Mai commented on HDFS-5739: -- The name parts looks good. Since {AclEntry#permissions} is a enum, from a semantic point of view I would prefer that it is non nullable. Is it possible to simply ignore the value in {{removeAclEntries}}? ACL RPC must allow null name or null permissions in ACL entries. Key: HDFS-5739 URL: https://issues.apache.org/jira/browse/HDFS-5739 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5739.1.patch Currently, the ACL RPC defines ACL entries with required fields for name and permissions. These fields actually need to be optional. The name can be null to represent unnamed ACL entries, such as the file owner or mask. Permissions can be null when passed in an ACL spec to remove ACL entries via {{FileSystem#removeAclEntries}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5612) NameNode: change all permission checks to enforce ACLs in addition to permissions.
[ https://issues.apache.org/jira/browse/HDFS-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865931#comment-13865931 ] Haohui Mai commented on HDFS-5612: -- Can you specify the invariants (i.e., the correctness conditions) of a valid list of AclEntry? I think it is important to document them as {{checkAcl}} depend on these invariants. It seems that the following invariants hold for a valid list of AclEntry: # The list has to be sorted. # Each entry in the list is unique. # Default entries do not have names. # There is at least one user / group / other entry does not have a name. (Why?) I guess it is not immediately clear to me what is the semantic of the name of an entry. Can you please explain? NameNode: change all permission checks to enforce ACLs in addition to permissions. -- Key: HDFS-5612 URL: https://issues.apache.org/jira/browse/HDFS-5612 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5612.1.patch, HDFS-5612.2.patch All {{NameNode}} code paths that enforce permissions must be updated so that they also enforce ACLs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865941#comment-13865941 ] Haohui Mai commented on HDFS-5738: -- This patch serializes the inode information into two sections, INODE and INODE_DIRECTORY. On a high level, the inode information can be seen as a graph, where the inode are the vertices and the references are the edges. The INODE section records the information about the inode, such as atime / mtime. The INODE_DIRECTORY section records the all children for each inode. The design simplifies the serialization of snapshot information. Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5737) Replacing only the default ACL can fail to copy unspecified base entries from the access ACL.
[ https://issues.apache.org/jira/browse/HDFS-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5737: Hadoop Flags: Reviewed Thanks for the review, Haohui. I'll commit this in a moment. bq. Implement your own binary search so that (1) it supports finding in a sub list of the collection, and (2) it always returns the lowest element in the list. That way you can make finding the pivot more efficient, and you don't need to create sub lists in copyDefaultsIfNeeded. My understanding is that {{ArrayList#subList}} returns an alternative view over the same underlying array, just with a different offset and length to pin it within the requested range. This would mean that there is no cost incurred for copying the underlying data, just some extra math to deal with offset calculations, so perhaps the efficiency gain would be minor. Here is the code for {{ArrayList#subList}}: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/ArrayList.java#876 Agreed on point 2 though that we'd need a custom binary search variant if we want to do that. {{Collections#binarySearch}} can't do it. Replacing only the default ACL can fail to copy unspecified base entries from the access ACL. - Key: HDFS-5737 URL: https://issues.apache.org/jira/browse/HDFS-5737 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5737.1.patch The final round of changes in HDFS-5673 switched to a search approach instead of a scan approach for finding base access entries that need to be copied to the default ACL. However, in the case of doing full replacement on the default ACL, the list may not be sorted properly at this point in the code, causing the searches to miss the access entries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5483) NN should gracefully handle multiple block replicas on same DN
[ https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865965#comment-13865965 ] Arpit Agarwal commented on HDFS-5483: - Eric, the blockreceived path won't assert since it doesn't try to manipulate the BlockInfo list directly. However looking at it some more I think we can eliminate the findDatanode routine, or at least make it 'private'. I'll file a separate Jira for it. NN should gracefully handle multiple block replicas on same DN -- Key: HDFS-5483 URL: https://issues.apache.org/jira/browse/HDFS-5483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Fix For: 3.0.0 Attachments: h5483.02.patch {{BlockManager#reportDiff}} can cause an assertion failure in {{BlockInfo#moveBlockToHead}} if the block report shows the same block as belonging to more than one storage. The issue is that {{moveBlockToHead}} assumes it will find the DatanodeStorageInfo for the given block. Exception details: {code} java.lang.AssertionError: Index is out of bound at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5741) BlockInfo#findDataNode can be deprecated
Arpit Agarwal created HDFS-5741: --- Summary: BlockInfo#findDataNode can be deprecated Key: HDFS-5741 URL: https://issues.apache.org/jira/browse/HDFS-5741 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal {{BlockInfo#findDataNode}} can be replaced with {{BlockInfo#findStorageInfo}} everywhere else except in {{#addStorage}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5741) BlockInfo#findDataNode can be deprecated
[ https://issues.apache.org/jira/browse/HDFS-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5741: Remaining Estimate: (was: 2h) Original Estimate: (was: 2h) BlockInfo#findDataNode can be deprecated Key: HDFS-5741 URL: https://issues.apache.org/jira/browse/HDFS-5741 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal {{BlockInfo#findDataNode}} can be replaced with {{BlockInfo#findStorageInfo}} everywhere else except in {{#addStorage}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5741) BlockInfo#findDataNode can be deprecated
[ https://issues.apache.org/jira/browse/HDFS-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5741: Priority: Minor (was: Major) BlockInfo#findDataNode can be deprecated Key: HDFS-5741 URL: https://issues.apache.org/jira/browse/HDFS-5741 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor NN now tracks replicas by storage, so {{BlockInfo#findDataNode}} can be replaced with {{BlockInfo#findStorageInfo}}. {{BlockManager#reportDiff}} is being fixed as part of HDFS-5483, this Jira is to fix the rest of the callers. [suggested by [~sirianni] on HDFS-5483] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5741) BlockInfo#findDataNode can be deprecated
[ https://issues.apache.org/jira/browse/HDFS-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5741: Description: NN now tracks replicas by storage, so {{BlockInfo#findDataNode}} can be replaced with {{BlockInfo#findStorageInfo}}. {{BlockManager#reportDiff}} is being fixed as part of HDFS-5483, this Jira is to fix the rest of the callers. [suggested by [~sirianni] on HDFS-5483] was:{{BlockInfo#findDataNode}} can be replaced with {{BlockInfo#findStorageInfo}} everywhere else except in {{#addStorage}}. BlockInfo#findDataNode can be deprecated Key: HDFS-5741 URL: https://issues.apache.org/jira/browse/HDFS-5741 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal NN now tracks replicas by storage, so {{BlockInfo#findDataNode}} can be replaced with {{BlockInfo#findStorageInfo}}. {{BlockManager#reportDiff}} is being fixed as part of HDFS-5483, this Jira is to fix the rest of the callers. [suggested by [~sirianni] on HDFS-5483] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5739) ACL RPC must allow null name or null permissions in ACL entries.
[ https://issues.apache.org/jira/browse/HDFS-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5739: Attachment: HDFS-5739.2.patch Thanks for the review, Haohui. I'm attaching patch version 2 to show what this looks like when we keep permissions required. bq. Is it possible to simply ignore the value in removeAclEntries? Yes, the logic currently ignores it. If we wanted to strictly match existing implementations like Linux, then we would actually send an error back to the user if they tried to specify permissions in a remove call. I don't know that we need to be rigid about that, and we could always choose to implement that check at the CLI layer if we want it, so I'm fine with this approach. The effect of this is that protobuf will default initialize the enum field to the 0'th element (NONE) on conversion from proto to model. For symmetry, this patch adds the corresponding logic in the conversion from model to proto too. ACL RPC must allow null name or null permissions in ACL entries. Key: HDFS-5739 URL: https://issues.apache.org/jira/browse/HDFS-5739 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5739.1.patch, HDFS-5739.2.patch Currently, the ACL RPC defines ACL entries with required fields for name and permissions. These fields actually need to be optional. The name can be null to represent unnamed ACL entries, such as the file owner or mask. Permissions can be null when passed in an ACL spec to remove ACL entries via {{FileSystem#removeAclEntries}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
Arpit Agarwal created HDFS-5742: --- Summary: DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5739) ACL RPC must allow null name or null permissions in ACL entries.
[ https://issues.apache.org/jira/browse/HDFS-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865994#comment-13865994 ] Haohui Mai commented on HDFS-5739: -- I think that it is fine to check it at the CLI layer. +1 on the v2 patch. ACL RPC must allow null name or null permissions in ACL entries. Key: HDFS-5739 URL: https://issues.apache.org/jira/browse/HDFS-5739 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5739.1.patch, HDFS-5739.2.patch Currently, the ACL RPC defines ACL entries with required fields for name and permissions. These fields actually need to be optional. The name can be null to represent unnamed ACL entries, such as the file owner or mask. Permissions can be null when passed in an ACL spec to remove ACL entries via {{FileSystem#removeAclEntries}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5612) NameNode: change all permission checks to enforce ACLs in addition to permissions.
[ https://issues.apache.org/jira/browse/HDFS-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865997#comment-13865997 ] Chris Nauroth commented on HDFS-5612: - Sure thing. Here is a list of the invariants. I'll also fold this list into the comments in a new patch later. # The list must be sorted. # Each entry in the list is unique. # There is exactly one each of the unnamed user / group / other entries. These entries are identical to the classic owner / group / other permissions encoded in permission bits today. The ACL enforcement algorithm states that owner permissions trump named user permissions. This becomes important if the file owner also has a named user entry in the ACL. Assume the file owner is haohui, and the owner permissions are rw-, but there is also a named user entry for user:haohui:r--. In this case, the owner entry must take precedence over the named user entry so that you get read-write access. Additionally, the effective permissions granted to a user through groups must include the permissions of the file's group (if the user is a member). # The mask entry, if present, must not have a name. (The name would be meaningless.) # The owner entry must not have a name. (The name would be meaningless.) # There may be any number of named user entries. These entries are used if the username is a specific match (assuming the user is not the owner as discussed above). # There may be any number of named group entries. Assuming the user is not the owner, and there is no named user entry matching that user, and the user is a member of at least one named group or the file's group, then the user's effective permissions are the union of permissions for all such groups in which the user is a member. # Default entries are ignored during permission enforcement. Regarding default entries, these are not used during permission enforcement at all, so there really are no invariants related to the default ACL within the context of {{checkAcl}}. However, the default ACL on a directory will be copied to the access ACL of its newly created child inodes. Since the default ACL eventually becomes an access ACL for a different inode, we can say that the same set of invariants must hold for the default ACL entries. (Otherwise, we'd have a violation of invariants later when it comes time to run {{checkAcl}} on that child inode.) NameNode: change all permission checks to enforce ACLs in addition to permissions. -- Key: HDFS-5612 URL: https://issues.apache.org/jira/browse/HDFS-5612 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5612.1.patch, HDFS-5612.2.patch All {{NameNode}} code paths that enforce permissions must be updated so that they also enforce ACLs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5742: Attachment: HDFS-5742.01.patch DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Attachments: HDFS-5742.01.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5739) ACL RPC must allow null name or unspecified permissions in ACL entries.
[ https://issues.apache.org/jira/browse/HDFS-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5739: Summary: ACL RPC must allow null name or unspecified permissions in ACL entries. (was: ACL RPC must allow null name or null permissions in ACL entries.) ACL RPC must allow null name or unspecified permissions in ACL entries. --- Key: HDFS-5739 URL: https://issues.apache.org/jira/browse/HDFS-5739 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5739.1.patch, HDFS-5739.2.patch Currently, the ACL RPC defines ACL entries with required fields for name and permissions. These fields actually need to be optional. The name can be null to represent unnamed ACL entries, such as the file owner or mask. Permissions can be null when passed in an ACL spec to remove ACL entries via {{FileSystem#removeAclEntries}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5737) Replacing only the default ACL can fail to copy unspecified base entries from the access ACL.
[ https://issues.apache.org/jira/browse/HDFS-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5737. - Resolution: Fixed Fix Version/s: HDFS ACLs (HDFS-4685) I committed this to the HDFS-4685 feature branch. Replacing only the default ACL can fail to copy unspecified base entries from the access ACL. - Key: HDFS-5737 URL: https://issues.apache.org/jira/browse/HDFS-5737 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: HDFS ACLs (HDFS-4685) Attachments: HDFS-5737.1.patch The final round of changes in HDFS-5673 switched to a search approach instead of a scan approach for finding base access entries that need to be copied to the default ACL. However, in the case of doing full replacement on the default ACL, the list may not be sorted properly at this point in the code, causing the searches to miss the access entries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5739) ACL RPC must allow null name or unspecified permissions in ACL entries.
[ https://issues.apache.org/jira/browse/HDFS-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5739. - Resolution: Fixed Fix Version/s: HDFS ACLs (HDFS-4685) Hadoop Flags: Reviewed I committed the v2 patch to the HDFS-4685 feature branch. Thanks again for the review, Haohui. ACL RPC must allow null name or unspecified permissions in ACL entries. --- Key: HDFS-5739 URL: https://issues.apache.org/jira/browse/HDFS-5739 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: HDFS ACLs (HDFS-4685) Attachments: HDFS-5739.1.patch, HDFS-5739.2.patch Currently, the ACL RPC defines ACL entries with required fields for name and permissions. These fields actually need to be optional. The name can be null to represent unnamed ACL entries, such as the file owner or mask. Permissions can be null when passed in an ACL spec to remove ACL entries via {{FileSystem#removeAclEntries}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5743) Use protobuf to serialize snapshot information
Haohui Mai created HDFS-5743: Summary: Use protobuf to serialize snapshot information Key: HDFS-5743 URL: https://issues.apache.org/jira/browse/HDFS-5743 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Jing Zhao This jira tracks the efforts of using protobuf to serialize snapshot-related information in FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5742: Status: Patch Available (was: Open) DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Attachments: HDFS-5742.01.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866011#comment-13866011 ] Jing Zhao commented on HDFS-5742: - +1 Patch looks good to me. DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Attachments: HDFS-5742.01.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5743) Use protobuf to serialize snapshot information
[ https://issues.apache.org/jira/browse/HDFS-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5743: - Target Version/s: HDFS-5698 (FSImage in protobuf) Use protobuf to serialize snapshot information -- Key: HDFS-5743 URL: https://issues.apache.org/jira/browse/HDFS-5743 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Jing Zhao This jira tracks the efforts of using protobuf to serialize snapshot-related information in FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5738: - Target Version/s: HDFS-5698 (FSImage in protobuf) Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5717) Save FSImage header in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5717: - Target Version/s: HDFS-5698 (FSImage in protobuf) Save FSImage header in protobuf --- Key: HDFS-5717 URL: https://issues.apache.org/jira/browse/HDFS-5717 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5717.000.patch, HDFS-5717.001.patch, HDFS-5717.002.patch This jira introduces the basic framework to serialize and deserialize FSImage in protobuf, and it serializes some header information in the new protobuf format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage
[ https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866104#comment-13866104 ] Haohui Mai commented on HDFS-5722: -- Indeed the difficulties come from the efficiency prospectives. Currently skipping N bytes in a compressed stream requires decompressing the data. It can be problematic because N can be huge. (e.g., when skipping the inode section N can be as large as a couple GB) Just to clarify, the code will continue to support old FsImage that has compression enabled. This jira only proposes to move compression support out of the new FSImage format. Implement compression in the HTTP server of SNN / SBN instead of FSImage Key: HDFS-5722 URL: https://issues.apache.org/jira/browse/HDFS-5722 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai The current FSImage format support compression, there is a field in the header which specifies the compression codec used to compress the data in the image. The main motivation was to reduce the number of bytes to be transferred between SNN / SBN / NN. The main disadvantage, however, is that it requires the client to access the FSImage in strictly sequential order. This might not fit well with the new design of FSImage. For example, serializing the data in protobuf allows the client to quickly skip data that it does not understand. The compression built-in the format, however, complicates the calculation of offsets and lengths. Recovering from a corrupted, compressed FSImage is also non-trivial as off-the-shelf tools like bzip2recover is inapplicable. This jira proposes to move the compression from the format of the FSImage to the transport layer, namely, the HTTP server of SNN / SBN. This design simplifies the format of FSImage, opens up the opportunity to quickly navigate through the FSImage, and eases the process of recovery. It also retains the benefits of reducing the number of bytes to be transferred across the wire since there are compression on the transport layer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5742: Status: Open (was: Patch Available) Withdrawing the patch for now. There are other bugs in DatanodeCluster. Will submit a combined patch later. DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage
[ https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866114#comment-13866114 ] Haohui Mai commented on HDFS-5722: -- Had an offline discussion with @Jing Zhao, and digged into the original jira (HDFS-1435) that did compression work. One concern is that it might increase disk I/O when writing FSImage uncompressed into the disk. The following table shows that it does not seems to be a problem: https://issues.apache.org/jira/browse/HDFS-1435?focusedCommentId=12921060page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12921060 Based on the data, I think it makes sense to move compression out of the FSImage format. The code can compress the data on the fly when transferring it through HTTP, or write the FSImage uncompressed onto the disk, and compute the digest and compresses the whole file in the background. Both solutions can reduce the time that the NN spent safe mode when saving the namespace. Implement compression in the HTTP server of SNN / SBN instead of FSImage Key: HDFS-5722 URL: https://issues.apache.org/jira/browse/HDFS-5722 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai The current FSImage format support compression, there is a field in the header which specifies the compression codec used to compress the data in the image. The main motivation was to reduce the number of bytes to be transferred between SNN / SBN / NN. The main disadvantage, however, is that it requires the client to access the FSImage in strictly sequential order. This might not fit well with the new design of FSImage. For example, serializing the data in protobuf allows the client to quickly skip data that it does not understand. The compression built-in the format, however, complicates the calculation of offsets and lengths. Recovering from a corrupted, compressed FSImage is also non-trivial as off-the-shelf tools like bzip2recover is inapplicable. This jira proposes to move the compression from the format of the FSImage to the transport layer, namely, the HTTP server of SNN / SBN. This design simplifies the format of FSImage, opens up the opportunity to quickly navigate through the FSImage, and eases the process of recovery. It also retains the benefits of reducing the number of bytes to be transferred across the wire since there are compression on the transport layer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup
[ https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866118#comment-13866118 ] Jing Zhao commented on HDFS-5653: - For the current patch, since getCurrentProxyInfo and getProxy are called in different places, is it possible that a failover happened in the middle (triggered by another RPC call, e.g.)? I think another possible solution is to let getProxy return (Proxy + extra tag) where the tag can be used to indicate the NN. Log namenode hostname in various exceptions being thrown in a HA setup -- Key: HDFS-5653 URL: https://issues.apache.org/jira/browse/HDFS-5653 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 2.2.0 Reporter: Arpit Gupta Assignee: Haohui Mai Priority: Minor Attachments: HDFS-5653.000.patch, HDFS-5653.001.patch, HDFS-5653.002.patch, HDFS-5653.003.patch In a HA setup any time we see an exception such as safemode or namenode in standby etc we dont know which namenode it came from. The user has to go to the logs of the namenode and determine which one was active and/or standby around the same time. I think it would help with debugging if any such exceptions could include the namenode hostname so the user could know exactly which namenode served the request. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5717) Save FSImage header in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866145#comment-13866145 ] Jing Zhao commented on HDFS-5717: - [~wheat9], could you update the description and give more details about your basic designs? +1 after that. Save FSImage header in protobuf --- Key: HDFS-5717 URL: https://issues.apache.org/jira/browse/HDFS-5717 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5717.000.patch, HDFS-5717.001.patch, HDFS-5717.002.patch This jira introduces the basic framework to serialize and deserialize FSImage in protobuf, and it serializes some header information in the new protobuf format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5717) Save FSImage header in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866147#comment-13866147 ] Jing Zhao commented on HDFS-5717: - bq. Don't call newLoader newLoader is the method which creates the real Loader instance, which can be either old loader or new loader supporting protobuf. Thus the name newLoader makes sense to me. Save FSImage header in protobuf --- Key: HDFS-5717 URL: https://issues.apache.org/jira/browse/HDFS-5717 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5717.000.patch, HDFS-5717.001.patch, HDFS-5717.002.patch This jira introduces the basic framework to serialize and deserialize FSImage in protobuf, and it serializes some header information in the new protobuf format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5742) DatanodeCluster (mini cluster of DNs) fails to start
[ https://issues.apache.org/jira/browse/HDFS-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866161#comment-13866161 ] Hadoop QA commented on HDFS-5742: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622054/HDFS-5742.02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5847//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5847//console This message is automatically generated. DatanodeCluster (mini cluster of DNs) fails to start Key: HDFS-5742 URL: https://issues.apache.org/jira/browse/HDFS-5742 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Attachments: HDFS-5742.01.patch, HDFS-5742.02.patch DatanodeCluster fails to start with NPE in MiniDFSCluster. Looks like a simple bug in {{MiniDFSCluster#determineDfsBaseDir}} - missing check for null configuration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-3544) Ability to use SimpleRegeratingCode to fix missing blocks
[ https://issues.apache.org/jira/browse/HDFS-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866170#comment-13866170 ] Chris Li commented on HDFS-3544: Any updates on this issue? We're interested in trying this out to save space on our cold files. Ability to use SimpleRegeratingCode to fix missing blocks - Key: HDFS-3544 URL: https://issues.apache.org/jira/browse/HDFS-3544 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/raid Reporter: dhruba borthakur Assignee: Weiyan Wang ReedSolomon encoding (n, k) has n storage nodes and can tolerate n-k failures. Regenerating a block needs to access k blocks. This is a problem when n and k are large. Instead, we can use simple regenerating codes (n, k, f) that does first does ReedSolomon (n,k) and then does XOR with f stripe size. Then, a single disk failure needs to access only f nodes and f can be very small. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage
[ https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866181#comment-13866181 ] Colin Patrick McCabe commented on HDFS-5722: [~tlipcon], [~atm], [~hairong], how do you feel about removing support for on-disk FSImage compression? It seems to me that we should just add an option for doing HTTP compression, but keep the old option for on-disk compression. It concerns me that someone with a small disk might upgrade to a new version of Hadoop and then be unable to save his (much larger) fsimage on a small partition once compression support has been removed. I also think that for really large FSImages, loading a compressed version could be faster, if the compression were offloaded to a worker thread like Todd suggested in HDFS-1435. The FSImage is always read sequentially. If we implement optional sections, that won't change this fact. So I just don't see a reason for messing with this. But maybe there's something I have overlooked. Thoughts? Implement compression in the HTTP server of SNN / SBN instead of FSImage Key: HDFS-5722 URL: https://issues.apache.org/jira/browse/HDFS-5722 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai The current FSImage format support compression, there is a field in the header which specifies the compression codec used to compress the data in the image. The main motivation was to reduce the number of bytes to be transferred between SNN / SBN / NN. The main disadvantage, however, is that it requires the client to access the FSImage in strictly sequential order. This might not fit well with the new design of FSImage. For example, serializing the data in protobuf allows the client to quickly skip data that it does not understand. The compression built-in the format, however, complicates the calculation of offsets and lengths. Recovering from a corrupted, compressed FSImage is also non-trivial as off-the-shelf tools like bzip2recover is inapplicable. This jira proposes to move the compression from the format of the FSImage to the transport layer, namely, the HTTP server of SNN / SBN. This design simplifies the format of FSImage, opens up the opportunity to quickly navigate through the FSImage, and eases the process of recovery. It also retains the benefits of reducing the number of bytes to be transferred across the wire since there are compression on the transport layer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5717) Save FSImage header in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5717: - Description: This jira introduces several basic components to serialize / deserialize the FSImage in protobuf, including: * Using protobuf to describe the skeleton of the new FSImage format. * Introducing a separate code path to serialize and deserialize the new FSImage format. * Saving the summary of the FSImage in the new format. was:This jira introduces the basic framework to serialize and deserialize FSImage in protobuf, and it serializes some header information in the new protobuf format. Save FSImage header in protobuf --- Key: HDFS-5717 URL: https://issues.apache.org/jira/browse/HDFS-5717 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5717.000.patch, HDFS-5717.001.patch, HDFS-5717.002.patch This jira introduces several basic components to serialize / deserialize the FSImage in protobuf, including: * Using protobuf to describe the skeleton of the new FSImage format. * Introducing a separate code path to serialize and deserialize the new FSImage format. * Saving the summary of the FSImage in the new format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5717) Save FSImage header in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-5717. - Resolution: Fixed Fix Version/s: HDFS-5698 (FSImage in protobuf) Hadoop Flags: Reviewed I've committed this. Save FSImage header in protobuf --- Key: HDFS-5717 URL: https://issues.apache.org/jira/browse/HDFS-5717 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-5698 (FSImage in protobuf) Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS-5698 (FSImage in protobuf) Attachments: HDFS-5717.000.patch, HDFS-5717.001.patch, HDFS-5717.002.patch This jira introduces several basic components to serialize / deserialize the FSImage in protobuf, including: * Using protobuf to describe the skeleton of the new FSImage format. * Introducing a separate code path to serialize and deserialize the new FSImage format. * Saving the summary of the FSImage in the new format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5738) Serialize INode information in protobuf
[ https://issues.apache.org/jira/browse/HDFS-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5738: - Attachment: HDFS-5738.001.patch Rebase on the current branch. Serialize INode information in protobuf --- Key: HDFS-5738 URL: https://issues.apache.org/jira/browse/HDFS-5738 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5738.000.patch, HDFS-5738.001.patch This jira proposes to serialize inode information with protobuf. Snapshot-related information are out of the scope of this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: HDFS-5579-branch-1.2.patch HDFS-5579.patch Good point. Thanks Jing. Update patches to fix this problem. Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579-branch-1.2.patch, HDFS-5579.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579-branch-1.2.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866258#comment-13866258 ] Hadoop QA commented on HDFS-5579: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622097/HDFS-5579-branch-1.2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5848//console This message is automatically generated. Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: HDFS-5579.patch Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5579: --- Attachment: (was: HDFS-5579.patch) Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5645) Support upgrade marker in editlog streams
[ https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5645: - Attachment: h5645_20130109.patch h5645_20130109.patch: updated with the branch. Since the patch also applies to trunk, let me try submitting it. Support upgrade marker in editlog streams - Key: HDFS-5645 URL: https://issues.apache.org/jira/browse/HDFS-5645 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch During upgrade, a marker can be inserted into the editlog streams so that it is possible to roll back to the marker transaction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5645) Support upgrade marker in editlog streams
[ https://issues.apache.org/jira/browse/HDFS-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5645: - Status: Patch Available (was: Open) Support upgrade marker in editlog streams - Key: HDFS-5645 URL: https://issues.apache.org/jira/browse/HDFS-5645 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: editsStored, h5645_20130103.patch, h5645_20130109.patch During upgrade, a marker can be inserted into the editlog streams so that it is possible to roll back to the marker transaction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-5721: - Issue Type: Improvement (was: Bug) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt, hdfs-5721-v3.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)