[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864006#comment-13864006 ] Hadoop QA commented on HDFS-5715: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621702/HDFS-5715.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5833//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5833//console This message is automatically generated. Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK
[ https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5704: Status: Patch Available (was: Open) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK Key: HDFS-5704 URL: https://issues.apache.org/jira/browse/HDFS-5704 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch Currently every time a block a allocated, the entire list of blocks are written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth issue. The total size of editlog records for a file with large number of blocks could be huge. The goal of this jira is discuss adding a different editlog record that only records allocation of block and not the entire block list, on every block allocation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK
[ https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5704: Attachment: HDFS-5704.001.patch Thanks for the review Suresh! Update the patch to address your comments. Also add two unit tests to cover some basic scenarios. Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK Key: HDFS-5704 URL: https://issues.apache.org/jira/browse/HDFS-5704 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch Currently every time a block a allocated, the entire list of blocks are written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth issue. The total size of editlog records for a file with large number of blocks could be huge. The goal of this jira is discuss adding a different editlog record that only records allocation of block and not the entire block list, on every block allocation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-2994) If lease soft limit is recovered successfully the append can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864016#comment-13864016 ] Yu Li commented on HDFS-2994: - Happen to find this JIRA already integrated into 2.1.1-beta release, but the status here remains unresolved. May someone update the status? :-) If lease soft limit is recovered successfully the append can fail - Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: Tao Luo Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-2994) If lease soft limit is recovered successfully the append can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864021#comment-13864021 ] Hadoop QA commented on HDFS-2994: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598329/HDFS-2994-2.0.6-alpha.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5835//console This message is automatically generated. If lease soft limit is recovered successfully the append can fail - Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: Tao Luo Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864039#comment-13864039 ] Todd Lipcon commented on HDFS-5138: --- General: - thanks for the description in the above JIRA comment. Can you transfer this comment somewhere into the docs, perhaps hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithQJM.apt.vm or a new page? Perhaps with a slightly more user-facing angle. - what would happen if the admin called finalizeUpgrade() when neither node had yet transitioned to active? I don't see any sanity check here.. is it possible you'd end up leaving the shared in an orphaned upgrading state and never end up finalizing it? Similarly, what happens if you start one NN with -upgrade, and you start the other one without -upgrade. It seems to me it should check for the upgrade lock file in the shared dir and say looks like an upgrade is in progress, please start the SBN with -upgrade. - there are a few TODOs in the code that probably need to be addressed - nothing big, just a few things you may have missed. JournalManager.java: - would be good to add Javadoc on the new methods, so that JM implementors know what the upgrade process looks like. i.e what is pre-upgrade, etc? QuorumJournalManager.java: - Could not perform upgrade or more JournalNodes error message has some missing words in it. +throw new IOException(Failed to lock shared log.); - this line should be unreachable, right? maybe an AssertionError(Unreachable code) would make more sense? Also this same exception message is used down below in canRollBack which isn't quite right. Journal: - when you upgrade the journal, I'd think you'd to copy over all the data from the PersistentLongFiles into the new dir? FileJournalManager: - worth considering a protobuf for the shared log lock, in case we want to add other fields to it later (instead of the custom serialization you do now) - need try...finally around the code where you write the shared log lock. On the read side you're also forgetting to close it. - the creation of the shared log lock file is non-atomic... I'm worried that we may hit the race in practice, since the AtomicFileOutputStream implies an fsync, which means that between the exists() check and the rename to the lock file, you may really have a decently long time window. Maybe we can use locking code like Storage does? Feel free to punt to a follow-up. FSNamesystem.java: - can you add a doc on doUpgradeOfSharedLogOnTransitionToActive()? NNUpgradeUtil.java: - why are some of the functions package-private and others are public? - make it an abstract class or give it a private constructor so it can't be instantiated, since it's just static methods - brief javadocs would be nice for these methods, even though they're straight refactors of existing code. FSEditLog.java: - in canRollBack(), you throw an exception if there is no shared log. That doesn't seem right... - capitalization of RollBack vs Rollback is a little inconsistent. Looks like Rollback is consistently used prior to this patch, so probably best to stick with that. FSImage.java: - in the switch statement on the startup option, I think you should keep the ROLLBACK case, but just have it throw AssertionError -- just to make sure we don't accidentally have some case where we're passing it there but shouldn't be.GA Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5667) Include DatanodeStorage in StorageReport
[ https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864107#comment-13864107 ] Hudson commented on HDFS-5667: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/445/]) HDFS-5667. Add test missed in previous checkin (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555956) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java HDFS-5667. Include DatanodeStorage in StorageReport. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555929) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReport.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSClusterWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/common/TestJspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java Include DatanodeStorage in StorageReport Key: HDFS-5667 URL: https://issues.apache.org/jira/browse/HDFS-5667 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Eric Sirianni Assignee: Arpit Agarwal Fix For: 3.0.0, 2.4.0 Attachments: h5667.02.patch, h5667.03.patch, h5667.04.patch, h5667.05.patch The fix for HDFS-5484 was accidentally regressed by the following change made via HDFS-5542 {code} + DatanodeStorageInfo updateStorage(DatanodeStorage s) { synchronized (storageMap) { DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); if (storage == null) { @@ -670,8 +658,6 @@ for DN + getXferAddr()); storage = new DatanodeStorageInfo(this, s); storageMap.put(s.getStorageID(), storage); - } else { -storage.setState(s.getState()); } return storage; } {code} By removing the 'else' and no longer updating the state in the BlockReport processing path, we effectively get the bogus state type that is set via the first heartbeat (see the fix for HDFS-5455): {code} + if (storage == null) { +// This is seen during cluster initialization when the heartbeat +// is received before the initial block reports from each storage. +storage = updateStorage(new DatanodeStorage(report.getStorageID())); {code} Even reverting the change and reintroducing the 'else' leaves the state type temporarily inaccurate until the first block report. As discussed with [~arpitagarwal], a better fix would be to simply include the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to only the Storage ID). This requires adding the {{DatanodeStorage}} object to {{StorageReportProto}}. It needs to be a new optional field and we cannot remove the existing {{StorageUuid}} for protocol compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK
[ https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864112#comment-13864112 ] Hadoop QA commented on HDFS-5704: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621762/HDFS-5704.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer org.apache.hadoop.hdfs.TestFileAppendRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5834//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5834//console This message is automatically generated. Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK Key: HDFS-5704 URL: https://issues.apache.org/jira/browse/HDFS-5704 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch Currently every time a block a allocated, the entire list of blocks are written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth issue. The total size of editlog records for a file with large number of blocks could be huge. The goal of this jira is discuss adding a different editlog record that only records allocation of block and not the entire block list, on every block allocation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5719) FSImage#doRollback() should close prevState before return
[ https://issues.apache.org/jira/browse/HDFS-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864108#comment-13864108 ] Hudson commented on HDFS-5719: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/445/]) HDFS-5719. FSImage#doRollback() should close prevState before return. Contributed by Ted Yu (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556057) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java FSImage#doRollback() should close prevState before return - Key: HDFS-5719 URL: https://issues.apache.org/jira/browse/HDFS-5719 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 3.0.0 Attachments: hdfs-5719.txt {code} FSImage prevState = new FSImage(conf); {code} prevState should be closed before return from doRollback() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864102#comment-13864102 ] Hudson commented on HDFS-2832: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/445/]) HDFS-2832. Update CHANGES.txt to reflect merge to branch-2 (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556088) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 3.0.0, 2.4.0 Attachments: 20130813-HeterogeneousStorage.pdf, 20131125-HeterogeneousStorage-TestPlan.pdf, 20131125-HeterogeneousStorage.pdf, 20131202-HeterogeneousStorage-TestPlan.pdf, 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, editsStored, editsStored, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch, h2832_20131211b.patch, h2832_branch-2_20131226.patch, h2832_branch-2_20140103.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5589) Namenode loops caching and uncaching when data should be uncached
[ https://issues.apache.org/jira/browse/HDFS-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864110#comment-13864110 ] Hudson commented on HDFS-5589: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/445/]) HDFS-5589. Namenode loops caching and uncaching when data should be uncached. (awang via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555996) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java Namenode loops caching and uncaching when data should be uncached - Key: HDFS-5589 URL: https://issues.apache.org/jira/browse/HDFS-5589 Project: Hadoop HDFS Issue Type: Sub-task Components: caching, namenode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-5589-1.patch, hdfs-5589-2.patch This was reported by [~cnauroth] and [~brandonli], and [~schu] repro'd it too. If you add a new caching directive then remove it, the Namenode will sometimes get stuck in a loop where it sends DNA_CACHE and then DNA_UNCACHE repeatedly to the datanodes where the data was previously cached. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5220) Expose group resolution time as metric
[ https://issues.apache.org/jira/browse/HDFS-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864109#comment-13864109 ] Hudson commented on HDFS-5220: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/445/]) HDFS-5220. Expose group resolution time as metric (jxiang via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555976) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestUserGroupInformation.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java Expose group resolution time as metric -- Key: HDFS-5220 URL: https://issues.apache.org/jira/browse/HDFS-5220 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Rob Weltman Assignee: Jimmy Xiang Fix For: 2.4.0 Attachments: 2.4-5220.addendum, 2.4-5220.patch, hdfs-5220.addendum, hdfs-5220.patch, hdfs-5220_v2.patch It would help detect issues with authentication configuration and with overloading an authentication source if the name node exposed the time taken for group resolution as a metric. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4834) Add -exclude path to fsck
[ https://issues.apache.org/jira/browse/HDFS-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864115#comment-13864115 ] Gerardo Vázquez commented on HDFS-4834: --- seems duplicated for HDFS-4993 Add -exclude path to fsck Key: HDFS-4834 URL: https://issues.apache.org/jira/browse/HDFS-4834 Project: Hadoop HDFS Issue Type: Improvement Reporter: Gerardo Vázquez Priority: Minor fsck would fail if the current file being check is deleted. If you are loading and deleting loaded files quite often this would lead to many fsck attempts until you can do a complete check. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864219#comment-13864219 ] Hudson commented on HDFS-2832: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/]) HDFS-2832. Update CHANGES.txt to reflect merge to branch-2 (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556088) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 3.0.0, 2.4.0 Attachments: 20130813-HeterogeneousStorage.pdf, 20131125-HeterogeneousStorage-TestPlan.pdf, 20131125-HeterogeneousStorage.pdf, 20131202-HeterogeneousStorage-TestPlan.pdf, 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, editsStored, editsStored, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch, h2832_20131211b.patch, h2832_branch-2_20131226.patch, h2832_branch-2_20140103.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5220) Expose group resolution time as metric
[ https://issues.apache.org/jira/browse/HDFS-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864226#comment-13864226 ] Hudson commented on HDFS-5220: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/]) HDFS-5220. Expose group resolution time as metric (jxiang via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555976) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestUserGroupInformation.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java Expose group resolution time as metric -- Key: HDFS-5220 URL: https://issues.apache.org/jira/browse/HDFS-5220 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Rob Weltman Assignee: Jimmy Xiang Fix For: 2.4.0 Attachments: 2.4-5220.addendum, 2.4-5220.patch, hdfs-5220.addendum, hdfs-5220.patch, hdfs-5220_v2.patch It would help detect issues with authentication configuration and with overloading an authentication source if the name node exposed the time taken for group resolution as a metric. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5719) FSImage#doRollback() should close prevState before return
[ https://issues.apache.org/jira/browse/HDFS-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864225#comment-13864225 ] Hudson commented on HDFS-5719: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/]) HDFS-5719. FSImage#doRollback() should close prevState before return. Contributed by Ted Yu (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556057) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java FSImage#doRollback() should close prevState before return - Key: HDFS-5719 URL: https://issues.apache.org/jira/browse/HDFS-5719 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 3.0.0 Attachments: hdfs-5719.txt {code} FSImage prevState = new FSImage(conf); {code} prevState should be closed before return from doRollback() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5667) Include DatanodeStorage in StorageReport
[ https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864224#comment-13864224 ] Hudson commented on HDFS-5667: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/]) HDFS-5667. Add test missed in previous checkin (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555956) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java HDFS-5667. Include DatanodeStorage in StorageReport. (Arpit Agarwal) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555929) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReport.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSClusterWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/common/TestJspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java Include DatanodeStorage in StorageReport Key: HDFS-5667 URL: https://issues.apache.org/jira/browse/HDFS-5667 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Eric Sirianni Assignee: Arpit Agarwal Fix For: 3.0.0, 2.4.0 Attachments: h5667.02.patch, h5667.03.patch, h5667.04.patch, h5667.05.patch The fix for HDFS-5484 was accidentally regressed by the following change made via HDFS-5542 {code} + DatanodeStorageInfo updateStorage(DatanodeStorage s) { synchronized (storageMap) { DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); if (storage == null) { @@ -670,8 +658,6 @@ for DN + getXferAddr()); storage = new DatanodeStorageInfo(this, s); storageMap.put(s.getStorageID(), storage); - } else { -storage.setState(s.getState()); } return storage; } {code} By removing the 'else' and no longer updating the state in the BlockReport processing path, we effectively get the bogus state type that is set via the first heartbeat (see the fix for HDFS-5455): {code} + if (storage == null) { +// This is seen during cluster initialization when the heartbeat +// is received before the initial block reports from each storage. +storage = updateStorage(new DatanodeStorage(report.getStorageID())); {code} Even reverting the change and reintroducing the 'else' leaves the state type temporarily inaccurate until the first block report. As discussed with [~arpitagarwal], a better fix would be to simply include the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to only the Storage ID). This requires adding the {{DatanodeStorage}} object to {{StorageReportProto}}. It needs to be a new optional field and we cannot remove the existing {{StorageUuid}} for protocol compatibility. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5589) Namenode loops caching and uncaching when data should be uncached
[ https://issues.apache.org/jira/browse/HDFS-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864227#comment-13864227 ] Hudson commented on HDFS-5589: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/]) HDFS-5589. Namenode loops caching and uncaching when data should be uncached. (awang via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555996) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java Namenode loops caching and uncaching when data should be uncached - Key: HDFS-5589 URL: https://issues.apache.org/jira/browse/HDFS-5589 Project: Hadoop HDFS Issue Type: Sub-task Components: caching, namenode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-5589-1.patch, hdfs-5589-2.patch This was reported by [~cnauroth] and [~brandonli], and [~schu] repro'd it too. If you add a new caching directive then remove it, the Namenode will sometimes get stuck in a loop where it sends DNA_CACHE and then DNA_UNCACHE repeatedly to the datanodes where the data was previously cached. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5719) FSImage#doRollback() should close prevState before return
[ https://issues.apache.org/jira/browse/HDFS-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864289#comment-13864289 ] Hudson commented on HDFS-5719: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/]) HDFS-5719. FSImage#doRollback() should close prevState before return. Contributed by Ted Yu (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556057) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java FSImage#doRollback() should close prevState before return - Key: HDFS-5719 URL: https://issues.apache.org/jira/browse/HDFS-5719 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 3.0.0 Attachments: hdfs-5719.txt {code} FSImage prevState = new FSImage(conf); {code} prevState should be closed before return from doRollback() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5220) Expose group resolution time as metric
[ https://issues.apache.org/jira/browse/HDFS-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864290#comment-13864290 ] Hudson commented on HDFS-5220: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/]) HDFS-5220. Expose group resolution time as metric (jxiang via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555976) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestUserGroupInformation.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java Expose group resolution time as metric -- Key: HDFS-5220 URL: https://issues.apache.org/jira/browse/HDFS-5220 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Rob Weltman Assignee: Jimmy Xiang Fix For: 2.4.0 Attachments: 2.4-5220.addendum, 2.4-5220.patch, hdfs-5220.addendum, hdfs-5220.patch, hdfs-5220_v2.patch It would help detect issues with authentication configuration and with overloading an authentication source if the name node exposed the time taken for group resolution as a metric. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864283#comment-13864283 ] Hudson commented on HDFS-2832: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/]) HDFS-2832. Update CHANGES.txt to reflect merge to branch-2 (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556088) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 3.0.0, 2.4.0 Attachments: 20130813-HeterogeneousStorage.pdf, 20131125-HeterogeneousStorage-TestPlan.pdf, 20131125-HeterogeneousStorage.pdf, 20131202-HeterogeneousStorage-TestPlan.pdf, 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, editsStored, editsStored, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch, h2832_20131211b.patch, h2832_branch-2_20131226.patch, h2832_branch-2_20140103.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5589) Namenode loops caching and uncaching when data should be uncached
[ https://issues.apache.org/jira/browse/HDFS-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864291#comment-13864291 ] Hudson commented on HDFS-5589: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/]) HDFS-5589. Namenode loops caching and uncaching when data should be uncached. (awang via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555996) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java Namenode loops caching and uncaching when data should be uncached - Key: HDFS-5589 URL: https://issues.apache.org/jira/browse/HDFS-5589 Project: Hadoop HDFS Issue Type: Sub-task Components: caching, namenode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-5589-1.patch, hdfs-5589-2.patch This was reported by [~cnauroth] and [~brandonli], and [~schu] repro'd it too. If you add a new caching directive then remove it, the Namenode will sometimes get stuck in a loop where it sends DNA_CACHE and then DNA_UNCACHE repeatedly to the datanodes where the data was previously cached. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
Uma Maheswara Rao G created HDFS-5724: - Summary: modifyCacheDirective logging audit log command wrongly as addCacheDirective Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-5724: -- Attachment: HDFS-5724.patch modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-5724: -- Status: Patch Available (was: Open) modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5167) Add metrics about the NameNode retry cache
[ https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864449#comment-13864449 ] Tsuyoshi OZAWA commented on HDFS-5167: -- [~jingzhao], could you check a latest patch if you have a chance? Add metrics about the NameNode retry cache -- Key: HDFS-5167 URL: https://issues.apache.org/jira/browse/HDFS-5167 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Jing Zhao Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: HDFS-5167.1.patch, HDFS-5167.2.patch, HDFS-5167.3.patch, HDFS-5167.4.patch, HDFS-5167.5.patch, HDFS-5167.6.patch, HDFS-5167.6.patch, HDFS-5167.7.patch, HDFS-5167.8.patch, HDFS-5167.9-2.patch, HDFS-5167.9.patch It will be helpful to have metrics in NameNode about the retry cache, such as the retry count etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5724: -- Labels: caching (was: ) modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Labels: caching Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864451#comment-13864451 ] Andrew Wang commented on HDFS-5724: --- Thanks Uma, nice catch. +1 pending jenkins, no test is fine here. modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Labels: caching Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5167) Add metrics about the NameNode retry cache
[ https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864467#comment-13864467 ] Jing Zhao commented on HDFS-5167: - Sorry for the long time delay [~ozawa].. I will review it today. Add metrics about the NameNode retry cache -- Key: HDFS-5167 URL: https://issues.apache.org/jira/browse/HDFS-5167 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Jing Zhao Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: HDFS-5167.1.patch, HDFS-5167.2.patch, HDFS-5167.3.patch, HDFS-5167.4.patch, HDFS-5167.5.patch, HDFS-5167.6.patch, HDFS-5167.6.patch, HDFS-5167.7.patch, HDFS-5167.8.patch, HDFS-5167.9-2.patch, HDFS-5167.9.patch It will be helpful to have metrics in NameNode about the retry cache, such as the retry count etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5167) Add metrics about the NameNode retry cache
[ https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864482#comment-13864482 ] Tsuyoshi OZAWA commented on HDFS-5167: -- [~jingzhao], it's OK, no problem :-) Add metrics about the NameNode retry cache -- Key: HDFS-5167 URL: https://issues.apache.org/jira/browse/HDFS-5167 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Jing Zhao Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: HDFS-5167.1.patch, HDFS-5167.2.patch, HDFS-5167.3.patch, HDFS-5167.4.patch, HDFS-5167.5.patch, HDFS-5167.6.patch, HDFS-5167.6.patch, HDFS-5167.7.patch, HDFS-5167.8.patch, HDFS-5167.9-2.patch, HDFS-5167.9.patch It will be helpful to have metrics in NameNode about the retry cache, such as the retry count etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null
[ https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-5710: -- Attachment: HDFS-5710.patch Just returning empty string in case if inodes become null when its called with out holding global lock. getFullPathName called many places. Instead of retuning null and checking evrywhere null, returning empty string may be ok. Attached simple patch with the change. FSDirectory#getFullPathName should check inodes against null Key: HDFS-5710 URL: https://issues.apache.org/jira/browse/HDFS-5710 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: Uma Maheswara Rao G Attachments: HDFS-5710.patch, hdfs-5710-output.html From https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/ : {code} 2014-01-01 00:10:15,571 INFO [IPC Server handler 2 on 50198] blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 2014-01-01 00:10:16,559 WARN [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] namenode.FSDirectory(1854): Could not get full path. Corresponding file might have deleted already. 2014-01-01 00:10:16,560 FATAL [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871) at org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482) at org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112) at java.lang.Thread.run(Thread.java:724) {code} Looks like getRelativePathINodes() returned null but getFullPathName() didn't check inodes against null, leading to NPE. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864509#comment-13864509 ] Aaron T. Myers commented on HDFS-5138: -- Thanks a lot for the comments, Todd and Suresh. I've got some obligations during the first part of today but will try to get back to you later today or tomorrow. Suresh - as regards to a design doc, I could potentially write up a small one if you really think it's necessary, but there really aren't all that many subtle points here, and hopefully by answering the (very good!) questions you've raised everything will become much clearer. The core of the patch isn't even all that large - there's a ton of plumbing of new RPCs, etc. that make it look more complex than it is. One of the goals I had one producing it was to leave the existing non-HA upgrade system as untouched as possible, to reduce the possibility of regressions so we can put this in a 2.x update ASAP. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-2994) If lease soft limit is recovered successfully the append can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2994: -- Resolution: Fixed Fix Version/s: 2.1.1-beta Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) If lease soft limit is recovered successfully the append can fail - Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: Tao Luo Fix For: 2.1.1-beta Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5724: - Target Version/s: 3.0.0 modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Labels: caching Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-2994) If lease soft limit is recovered successfully the append can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864520#comment-13864520 ] Suresh Srinivas commented on HDFS-2994: --- [~carp84], thanks for pointing it out. You are right. This was fixed in 2.1.1-beta. Marking this as resolved. If lease soft limit is recovered successfully the append can fail - Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: Tao Luo Fix For: 2.1.1-beta Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864525#comment-13864525 ] Suresh Srinivas commented on HDFS-5138: --- bq. Suresh - as regards to a design doc, I could potentially write up a small one if you really think it's necessary, but there really aren't all that many subtle points here [~atm], you probably are right. Perhaps answering my questions will do. I may also take the answers from you and post a one pager to describe how I understand it to see I got it right. That could perhaps be the document that we can post in this jira, if you agree. BTW have you looked at HDFS-5535. Are there anythings we can leverage from that, especially around rollback marker in editlog etc. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK
[ https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5704: Attachment: HDFS-5704.002.patch editsStored Update the patch to fix TestFileAppendRestart. TestOfflineEditsViewer requires new editsStored binary file to pass. Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK Key: HDFS-5704 URL: https://issues.apache.org/jira/browse/HDFS-5704 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch, HDFS-5704.002.patch, editsStored Currently every time a block a allocated, the entire list of blocks are written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth issue. The total size of editlog records for a file with large number of blocks could be huge. The goal of this jira is discuss adding a different editlog record that only records allocation of block and not the entire block list, on every block allocation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864547#comment-13864547 ] Aaron T. Myers commented on HDFS-5138: -- bq. BTW have you looked at HDFS-5535. Are there anythings we can leverage from that, especially around rollback marker in editlog etc. Yes, I have looked at that. It's a good idea, but with this patch I was explicitly trying to _not_ redo the existing upgrade/rollback system, and instead just extend the non-HA upgrade/rollback system to work in an HA setup. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null
[ https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5710: Status: Patch Available (was: Open) FSDirectory#getFullPathName should check inodes against null Key: HDFS-5710 URL: https://issues.apache.org/jira/browse/HDFS-5710 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: Uma Maheswara Rao G Attachments: HDFS-5710.patch, hdfs-5710-output.html From https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/ : {code} 2014-01-01 00:10:15,571 INFO [IPC Server handler 2 on 50198] blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 2014-01-01 00:10:16,559 WARN [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] namenode.FSDirectory(1854): Could not get full path. Corresponding file might have deleted already. 2014-01-01 00:10:16,560 FATAL [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b] blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871) at org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482) at org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112) at java.lang.Thread.run(Thread.java:724) {code} Looks like getRelativePathINodes() returned null but getFullPathName() didn't check inodes against null, leading to NPE. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864577#comment-13864577 ] Hadoop QA commented on HDFS-5724: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621814/HDFS-5724.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5836//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5836//console This message is automatically generated. modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Labels: caching Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5382) Implement the UI of browsing filesystems in HTML 5 page
[ https://issues.apache.org/jira/browse/HDFS-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864591#comment-13864591 ] Kihwal Lee commented on HDFS-5382: -- bq. Since the same HTTP server serves both WebHDFS and the web UI, it seems to me that the right fix is to allow WebHDFS to use the customized auth filters as well. This can be a bit complicated due to the limitation of HttpServer and WebHDFS compatibility. [~szetszwo]: Nicholas, what do you think about Haohui's proposal? Implement the UI of browsing filesystems in HTML 5 page --- Key: HDFS-5382 URL: https://issues.apache.org/jira/browse/HDFS-5382 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.4.0 Attachments: HDFS-5382.000.patch, HDFS-5382.001.patch, HDFS-5382.002.patch, HDFS-5382.003.patch, browse-dir.png, file-info.png The UI of browsing filesystems can be implemented as an HTML 5 application. The UI can pull the data from WebHDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5612) NameNode: change all permission checks to enforce ACLs in addition to permissions.
[ https://issues.apache.org/jira/browse/HDFS-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5612: Attachment: HDFS-5612.2.patch I'm attaching patch version 2 with the following changes: # Rebased against current HDFS-4685 branch, which is up to date with trunk since yesterday. # Corrected comment about sorting on {{FSPermissionChecker#checkAcl}} in reaction to recent changes on the finalized HDFS-5673 patch. # Refactored several common methods to {{AclTestHelpers}}. NameNode: change all permission checks to enforce ACLs in addition to permissions. -- Key: HDFS-5612 URL: https://issues.apache.org/jira/browse/HDFS-5612 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5612.1.patch, HDFS-5612.2.patch All {{NameNode}} code paths that enforce permissions must be updated so that they also enforce ACLs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5715: Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review Arpit! I've committed this to trunk. Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK
[ https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864695#comment-13864695 ] Hadoop QA commented on HDFS-5704: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621827/HDFS-5704.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5837//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5837//console This message is automatically generated. Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK Key: HDFS-5704 URL: https://issues.apache.org/jira/browse/HDFS-5704 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch, HDFS-5704.002.patch, editsStored Currently every time a block a allocated, the entire list of blocks are written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth issue. The total size of editlog records for a file with large number of blocks could be huge. The goal of this jira is discuss adding a different editlog record that only records allocation of block and not the entire block list, on every block allocation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5725) Remove compression support from FSImage
Haohui Mai created HDFS-5725: Summary: Remove compression support from FSImage Key: HDFS-5725 URL: https://issues.apache.org/jira/browse/HDFS-5725 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai As proposed in HDFS-5722, this jira removes the support of compression in the FSImage format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5725) Remove compression support from FSImage
[ https://issues.apache.org/jira/browse/HDFS-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5725: - Attachment: HDFS-5725.000.patch Remove compression support from FSImage --- Key: HDFS-5725 URL: https://issues.apache.org/jira/browse/HDFS-5725 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5725.000.patch As proposed in HDFS-5722, this jira removes the support of compression in the FSImage format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration
[ https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-5677: - Fix Version/s: 2.3.0 3.0.0 Need error checking for HA cluster configuration Key: HDFS-5677 URL: https://issues.apache.org/jira/browse/HDFS-5677 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ha Affects Versions: 2.0.6-alpha Environment: centos6.5, oracle jdk6 45, Reporter: Vincent Sheffer Assignee: Vincent Sheffer Priority: Minor Fix For: 3.0.0, 2.3.0 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message is provided to indicate that. The only indication of a problem is a log message like the following: {code} WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: myCluster:8020 {code} Another way to look at this is that no error or warning is provided when a servicerpc-address/rpc-address property is defined for a node without a corresponding node declared in *dfs.ha.namenodes.myCluster*. This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for one of my node names. It would be very helpful to have at least a warning message on startup if there is a configuration problem like this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage
[ https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864735#comment-13864735 ] Colin Patrick McCabe commented on HDFS-5722: Can you be more specific about how on-disk compression might not fit well with the new design of the fsimage? As far as I know, the FSImage is always loaded in sequential order, from start to finish. Having optional protobuf fields doesn't change that fact. In general, it is not possible to skip forward by an arbitrary number of protobuf types, since you don't know in advance how big each one is. Sorry if there's part of the discussion I missed, but I don't see any discussion about making the FSImage seekable in HDFS-5698. Implement compression in the HTTP server of SNN / SBN instead of FSImage Key: HDFS-5722 URL: https://issues.apache.org/jira/browse/HDFS-5722 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai The current FSImage format support compression, there is a field in the header which specifies the compression codec used to compress the data in the image. The main motivation was to reduce the number of bytes to be transferred between SNN / SBN / NN. The main disadvantage, however, is that it requires the client to access the FSImage in strictly sequential order. This might not fit well with the new design of FSImage. For example, serializing the data in protobuf allows the client to quickly skip data that it does not understand. The compression built-in the format, however, complicates the calculation of offsets and lengths. Recovering from a corrupted, compressed FSImage is also non-trivial as off-the-shelf tools like bzip2recover is inapplicable. This jira proposes to move the compression from the format of the FSImage to the transport layer, namely, the HTTP server of SNN / SBN. This design simplifies the format of FSImage, opens up the opportunity to quickly navigate through the FSImage, and eases the process of recovery. It also retains the benefits of reducing the number of bytes to be transferred across the wire since there are compression on the transport layer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5724: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed to trunk. thanks, Uma. modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Labels: caching Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours
[ https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864749#comment-13864749 ] Jing Zhao commented on HDFS-5579: - The latest patch looks good to me. The only comment is that the following comment from Vinay has not been addressed? +1 after fixing this. {quote} 4. + underReplicatedInOpenFiles++; This should be incremented only if enough replicas are not there. {quote} Under construction files make DataNode decommission take very long hours Key: HDFS-5579 URL: https://issues.apache.org/jira/browse/HDFS-5579 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch We noticed that some times decommission DataNodes takes very long time, even exceeds 100 hours. After check the code, I found that in BlockManager:computeReplicationWorkForBlocks(ListListBlock blocksToReplicate) it won't replicate blocks which belongs to under construction files, however in BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there is block need replicate no matter whether it belongs to under construction or not, the decommission progress will continue running. That's the reason some time the decommission takes very long time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864768#comment-13864768 ] Karthik Kambatla commented on HDFS-5715: Looks like this breaks mvn clean install -DskipTests fails after this patch. [~jingzhao] - can you look into it? Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864800#comment-13864800 ] Hudson commented on HDFS-5724: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4971 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4971/]) HDFS-5724. modifyCacheDirective logging audit log command wrongly as addCacheDirective (Uma Maheswara Rao G via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556386) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java modifyCacheDirective logging audit log command wrongly as addCacheDirective --- Key: HDFS-5724 URL: https://issues.apache.org/jira/browse/HDFS-5724 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Labels: caching Attachments: HDFS-5724.patch modifyCacheDirective: {code} if (isAuditEnabled() isExternalInvocation()) { logAuditEvent(success, addCacheDirective, null, null, null); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5649: - Attachment: HDFS-5649.002.patch Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864823#comment-13864823 ] Jing Zhao commented on HDFS-5715: - It works fine on my machine. What's the error msg in your build? Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HDFS-5721: Assignee: Ted Yu sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-5721: - Attachment: hdfs-5721-v1.txt sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-5721: - Status: Patch Available (was: Open) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864844#comment-13864844 ] Hadoop QA commented on HDFS-5649: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621874/HDFS-5649.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5839//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5839//console This message is automatically generated. Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864850#comment-13864850 ] Brandon Li commented on HDFS-5649: -- I've manually tested the patch and see the service was unregistered when the services were shut down. Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864853#comment-13864853 ] Karthik Kambatla commented on HDFS-5715: Interesting! maven - 3.0.3, jdk - 1.7.0_40 {noformat} [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[32,48] OutputFormat is internal proprietary API and may be removed in a future release [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[33,48] XMLSerializer is internal proprietary API and may be removed in a future release [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,4] OutputFormat is internal proprietary API and may be removed in a future release [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,33] OutputFormat is internal proprietary API and may be removed in a future release [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,4] XMLSerializer is internal proprietary API and may be removed in a future release [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,35] XMLSerializer is internal proprietary API and may be removed in a future release [INFO] 7 errors {noformat} Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864856#comment-13864856 ] Karthik Kambatla commented on HDFS-5715: Just verified it builds fine with JDK6. To make sure it is this JIRA, I dropped the commit and it builds fine against JDK7 as well. Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864860#comment-13864860 ] Jing Zhao commented on HDFS-5715: - I see. Let me check with JDK7. Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864862#comment-13864862 ] Colin Patrick McCabe commented on HDFS-5715: This has broken the build for me as well. It's not your fault, it passed Jenkins and seems to be fine on JDK6. It seems like we need to start talking about JDK7 build slaves. In the meantime, can we get a fix or revert? The issue is this: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864870#comment-13864870 ] Brandon Li commented on HDFS-5649: -- Thank you, Jing, for the review. I've committed the patch. Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864883#comment-13864883 ] Hudson commented on HDFS-5649: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4972 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4972/]) HDFS-5649. Unregister NFS and Mount service when NFS gateway is shutting down. Contributed by Brandon Li (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556405) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/portmap/PortmapRequest.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/DFSClientCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5649: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.3.0 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864894#comment-13864894 ] Jing Zhao commented on HDFS-5715: - Let me create a new jira to fix it. Thanks Karthik and Colin! Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down
[ https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5649: - Fix Version/s: 2.3.0 Unregister NFS and Mount service when NFS gateway is shutting down -- Key: HDFS-5649 URL: https://issues.apache.org/jira/browse/HDFS-5649 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.3.0 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch The services should be unregistered if the gateway is asked to shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
Jing Zhao created HDFS-5726: --- Summary: Fix compilation error in AbstractINodeDiff for JDK7 Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
[ https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864898#comment-13864898 ] Jing Zhao commented on HDFS-5715: - Created HDFS-5726 for this. Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff --- Key: HDFS-5715 URL: https://issues.apache.org/jira/browse/HDFS-5715 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, HDFS-5715.002.patch Currently FileDiff and DirectoryDiff both contain a snapshot object reference to indicate its associated snapshot. Instead, we can simply record the corresponding snapshot id there. This can simplify some logic and allow us to use a byte array to represent the snapshot feature (HDFS-5714). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
[ https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5726: Affects Version/s: 3.0.0 Status: Patch Available (was: Open) Fix compilation error in AbstractINodeDiff for JDK7 --- Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5726.000.patch HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
[ https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5726: Attachment: HDFS-5726.000.patch I tried this patch in my local machine (Oracle JDK7 + OSX) and it works for me. Fix compilation error in AbstractINodeDiff for JDK7 --- Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5726.000.patch HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
[ https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5726: - Priority: Minor (was: Major) Hadoop Flags: Reviewed +1 patch looks good. Fix compilation error in AbstractINodeDiff for JDK7 --- Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5726.000.patch HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864971#comment-13864971 ] Aaron T. Myers commented on HDFS-5138: -- Hi Suresh, hopefully the below answers to your questions clears things up. Please let me know if anything is unclear, or if you have any more questions. bq. Can you describe what is the difference between this vs older version of finalize? The command -finalize is fairly well know and this change will be backward compatible. I'm guessing you mean incompatible here. There's no this vs. older version of finalize. For as long as I can remember, we've always supported two ways of finalizing an upgrade: either by shutting down the running NN and then starting it again with the -finalize startup option, or by just running `hdfs dfsadmin -finalizeUpgrade' which makes an RPC to a running NN. The trouble with the startup option in an HA scenario is that an NN can't guarantee that it will be active at the time it starts, since determining who is active and who is standby is handled externally to the NN. I don't see any reason to prefer using the startup option even in a non-HA setup, so seemed like we could remove it here. I could certainly just remove support for it in the HA case, if you'd prefer. bq. Sorry I am not sure I understand this. Why does HA rollback become more difficult? In the case of the '-upgrade' flag it's reasonable to only do the upgrade on transition to active, since we have to load the current fsimage/edits anyway before doing the upgrade, and the act of upgrading moves the transaction ID forward. In the case of '-rollback', however, it doesn't make much sense to start up in the standby state, load the full fsimage/edits, and then roll back everything, and reload the old fsimage upon becoming active. Given that the act of rolling back does not require loading the fsimage/edits at all, just moving some directories around, seems to make sense to me that this should not be a mode but rather just a standalone command that runs and then exits. bq. Why is the lock file required? Why cannot NN just write an editlog about upgrade intent, including the new layout version? During rollback we can discount the editlog starting from the upgrade intent log. Infact we can also consider requiring users to save namespace with empty editlogs? With this, perhaps we can avoid the following: This is again because an HA NN that is just starting up should not be writing to the shared log, but two HA NNs that are starting up need to synchronize/agree on the new CTime to use during upgrade. This needs to be known before doing the saveNamespace which is part of the upgrade process. You could imagine writing the new CTime to the edit log upon transitioning to the active state, but this would require the NNs to do the saveNamespace upon transitioning to active and/or when reading from the shared log as part of being the standby. It seems quite problematic to do the long, blocking operation of writing out a potentially large fsimage file in either of these places. bq. You mean finalize of the shared log in above? Yep, sure did. My bad. :) Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864978#comment-13864978 ] Hadoop QA commented on HDFS-5721: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621877/hdfs-5721-v1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5840//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5840//console This message is automatically generated. sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864993#comment-13864993 ] Ted Yu commented on HDFS-5721: -- I ran TestHASafeMode locally on Mac but didn't reproduce the test failure. sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Pluggable interface for replica counting
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864999#comment-13864999 ] Arpit Agarwal commented on HDFS-5318: - Hi Eric, I took a look at pipeline recovery today. It looks like the following cases are of interest: # Block is finalized, r/w replica is lost, r/o replica is available. In this case the existing NN replication mechanisms will cause an extra replica to be created (q. what happens if a client attempts to append before the replication happens? Client probably needs to be fixed to handle this). # Block is RBW. r/w replica is lost, r/o replica is available. In the usual case DFSClientOutputStream will recover the write pipeline by selecting another DN, transferring block contents to the new DN and inserting it in the write pipeline. However pipeline recovery will not work when the single replica in the pipeline is lost, as you guys already mentioned on HDFS-5318. I think you can use either the client side setting or block placement policy option in that case that is being discussed there. Updating the suggested approach: # Each DataNode presents a different StorageID for the same physical storage. # Read-only replicas are not counted towards satisfying the replication factor. This assumes that read-only replicas are 'shared' (i.e. what you called using writability of a replica as a proxy for deducing whether or not that replica is shared). # Read-only replicas cannot be pruned (follows from (2)). # Client should be able to bootstrap a write pipeline with read-only replicas. # Read-only storages will not be used for block placement. I am not sure if there are any special conditions wrt lease recovery that also need to be considered. Pluggable interface for replica counting Key: HDFS-5318 URL: https://issues.apache.org/jira/browse/HDFS-5318 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Eric Sirianni Attachments: HDFS-5318.patch, hdfs-5318.pdf There are several use cases for using shared-storage for datanode block storage in an HDFS environment (storing cold blocks on a NAS device, Amazon S3, etc.). With shared-storage, there is a distinction between: # a distinct physical copy of a block # an access-path to that block via a datanode. A single 'replication count' metric cannot accurately capture both aspects. However, for most of the current uses of 'replication count' in the Namenode, the number of physical copies aspect seems to be the appropriate semantic. I propose altering the replication counting algorithm in the Namenode to accurately infer distinct physical copies in a shared storage environment. With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor additional semantics to the {{StorageID}} - namely that multiple datanodes attaching to the same physical shared storage pool should report the same {{StorageID}} for that pool. A minor modification would be required in the DataNode to enable the generation of {{StorageID}} s to be pluggable behind the {{FsDatasetSpi}} interface. With those semantics in place, the number of physical copies of a block in a shared storage environment can be calculated as the number of _distinct_ {{StorageID}} s associated with that block. Consider the following combinations for two {{(DataNode ID, Storage ID)}} pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: * {{DN_A != DN_B S_A != S_B}} - *different* access paths to *different* physical replicas (i.e. the traditional HDFS case with local disks) ** rarr; Block B has {{ReplicationCount == 2}} * {{DN_A != DN_B S_A == S_B}} - *different* access paths to the *same* physical replica (e.g. HDFS datanodes mounting the same NAS share) ** rarr; Block B has {{ReplicationCount == 1}} For example, if block B has the following location tuples: * {{DN_1, STORAGE_A}} * {{DN_2, STORAGE_A}} * {{DN_3, STORAGE_B}} * {{DN_4, STORAGE_B}}, the effect of this proposed change would be to calculate the replication factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
[ https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865006#comment-13865006 ] Hadoop QA commented on HDFS-5726: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621893/HDFS-5726.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5841//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5841//console This message is automatically generated. Fix compilation error in AbstractINodeDiff for JDK7 --- Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5726.000.patch HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865017#comment-13865017 ] Junping Du commented on HDFS-5721: -- Thanks Ted for the patch! I think TestHASafeMode is unrelated as I saw this get failed intermittent. However, would you mind to fix other similar issues for using FSImage in this JIRA? I did a quick search and found following issues: NameNode.java line 818 fsImage is created and formated but not get closed if any exceptions are thrown. FSNameSystem.java line 603, fsImage is created and loaded to namesystem but not get closed if anything wrong. BootstrapStandby.java line 192. image is created and initialized but not get closed if exceptions are thrown. sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
[ https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5726: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks for the review, Nicholas! I've committed this to trunk. Fix compilation error in AbstractINodeDiff for JDK7 --- Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5726.000.patch HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7
[ https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865024#comment-13865024 ] Hudson commented on HDFS-5726: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4973 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4973/]) HDFS-5726. Fix compilation error in AbstractINodeDiff for JDK7. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556433) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java Fix compilation error in AbstractINodeDiff for JDK7 --- Key: HDFS-5726 URL: https://issues.apache.org/jira/browse/HDFS-5726 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5726.000.patch HDFS-5715 breaks JDK7 build for the following error: {code} [ERROR] /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53] error: snapshotId has private access in AbstractINodeDiff {code} This jira will fix the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-5721: - Attachment: hdfs-5721-v2.txt Patch v2 addresses Junping's comments. sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage
[ https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865040#comment-13865040 ] Colin Patrick McCabe commented on HDFS-5722: bq. The design requires putting offset and length in the FSImage, and having compression inside the file makes things difficult. Therefore this jira proposes to move compression from FSImage to the higher-level application logic. I don't see why having compression makes things difficult. If the software wants to skip an N byte section did doesn't understand, it just asks the {{CompressedStream}} to skip N bytes. The stream takes care of the details of translating that into byte offsets in the file. It may be more efficient to do this when compression is not enabled, but that is no reason to break the configurations of users who do have compression enabled now. I like the idea of implementing compression in the HTTP server code. But I don't see why we need to remove a feature that people are using, the on-disk FSImage compression feature. Possibly we should deprecate this feature, since HTTP compression is better for most use cases. Implement compression in the HTTP server of SNN / SBN instead of FSImage Key: HDFS-5722 URL: https://issues.apache.org/jira/browse/HDFS-5722 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai The current FSImage format support compression, there is a field in the header which specifies the compression codec used to compress the data in the image. The main motivation was to reduce the number of bytes to be transferred between SNN / SBN / NN. The main disadvantage, however, is that it requires the client to access the FSImage in strictly sequential order. This might not fit well with the new design of FSImage. For example, serializing the data in protobuf allows the client to quickly skip data that it does not understand. The compression built-in the format, however, complicates the calculation of offsets and lengths. Recovering from a corrupted, compressed FSImage is also non-trivial as off-the-shelf tools like bzip2recover is inapplicable. This jira proposes to move the compression from the format of the FSImage to the transport layer, namely, the HTTP server of SNN / SBN. This design simplifies the format of FSImage, opens up the opportunity to quickly navigate through the FSImage, and eases the process of recovery. It also retains the benefits of reducing the number of bytes to be transferred across the wire since there are compression on the transport layer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-5721: - Attachment: (was: hdfs-5721-v2.txt) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-5721: - Attachment: hdfs-5721-v2.txt sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5727) introduce a self-maintain io queue handling mechanism
Liang Xie created HDFS-5727: --- Summary: introduce a self-maintain io queue handling mechanism Key: HDFS-5727 URL: https://issues.apache.org/jira/browse/HDFS-5727 Project: Hadoop HDFS Issue Type: New Feature Components: datanode Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Currently the datanode read/write SLA is dfficult to be ganranteed for HBase online requirement. One of major reasons is we don't support io priority or io reqeust reorder inside datanode. I proposal introducing a self-maintain io queue mechanism to handle io request priority. Image there're lots of concurrent read/write reqeust from HBase side, and a background datanode block scanner is running(default is every 21 days, IIRC) just in time, then the HBase read/write 99% or 99.9% percentile latency would be vulnerable despite we have a bg thread throttling... the reorder stuf i have not thought clearly enough, but definitely the reorder in the queue in the app side would beat the currently relying OS's io queue merge. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5727) introduce a self-maintain io queue handling mechanism
[ https://issues.apache.org/jira/browse/HDFS-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865046#comment-13865046 ] Liang Xie commented on HDFS-5727: - so far no design doc available, just put the raw thought here as a placeholder, hopefully start to work on it 3~4 weeks later due to other higher priority issues need be done these days. introduce a self-maintain io queue handling mechanism - Key: HDFS-5727 URL: https://issues.apache.org/jira/browse/HDFS-5727 Project: Hadoop HDFS Issue Type: New Feature Components: datanode Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Currently the datanode read/write SLA is dfficult to be ganranteed for HBase online requirement. One of major reasons is we don't support io priority or io reqeust reorder inside datanode. I proposal introducing a self-maintain io queue mechanism to handle io request priority. Image there're lots of concurrent read/write reqeust from HBase side, and a background datanode block scanner is running(default is every 21 days, IIRC) just in time, then the HBase read/write 99% or 99.9% percentile latency would be vulnerable despite we have a bg thread throttling... the reorder stuf i have not thought clearly enough, but definitely the reorder in the queue in the app side would beat the currently relying OS's io queue merge. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block
Vinay created HDFS-5728: --- Summary: [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block Key: HDFS-5728 URL: https://issues.apache.org/jira/browse/HDFS-5728 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not one time upload, data will be written slowly. 2. One of the DataNode got diskfull ( due to some other data filled up disks) 3. Unfortunately block was being written to only this datanode in cluster, so client write has also failed. 4. After some time disk is made free and all processes are restarted. 5. Now HMaster try to recover the file by calling recoverLease. At this time recovery was failing saying file length mismatch. When checked, actual block file length: 62484480 Calculated block length: 62455808 This was because, metafile was having crc for only 62455808 bytes, and it considered 62455808 as the block size. No matter how many times, recovery was continously failing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block
[ https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865059#comment-13865059 ] Vinay commented on HDFS-5728: - 2013-12-28 13:22:30,467 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to updateBlock (newblock=BP-720706819-x-1389113739092:blk_5575900364052391670_517444, datanode=tmm-e8:11242) java.io.IOException: File length mismatched. The length of /usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current/BP-720706819-x-1389113739092/current/rbw/blk_5575900364052391670 is 62484480 but r=ReplicaUnderRecovery, blk_5575900364052391670_320295, RUR getNumBytes() = 62455808 getBytesOnDisk() = 62455808 getVisibleLength()= -1 getVolume() = /usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current getBlockFile()= /usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current/BP-720706819-x-1389113739092/current/rbw/blk_5575900364052391670 recoveryId=517444 original=ReplicaWaitingToBeRecovered, blk_5575900364052391670_320295, RWR getNumBytes() = 62455808 getBytesOnDisk() = 62455808 getVisibleLength()= -1 getVolume() = /usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current getBlockFile()= /usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current/BP-720706819-x-1389113739092/current/rbw/blk_5575900364052391670 unlinked=false at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkReplicaFiles(FsDatasetImpl.java:1063) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.updateReplicaUnderRecovery(FsDatasetImpl.java:1541) at org.apache.hadoop.hdfs.server.datanode.DataNode.updateReplicaUnderRecovery(DataNode.java:1907) at org.apache.hadoop.hdfs.server.datanode.DataNode$BlockRecord.updateReplicaUnderRecovery(DataNode.java:1938) at org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:2090) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1988) at org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:225) at org.apache.hadoop.hdfs.server.datanode.DataNode$2.run(DataNode.java:1869) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block -- Key: HDFS-5728 URL: https://issues.apache.org/jira/browse/HDFS-5728 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not one time upload, data will be written slowly. 2. One of the DataNode got diskfull ( due to some other data filled up disks) 3. Unfortunately block was being written to only this datanode in cluster, so client write has also failed. 4. After some time disk is made free and all processes are restarted. 5. Now HMaster try to recover the file by calling recoverLease. At this time recovery was failing saying file length mismatch. When checked, actual block file length: 62484480 Calculated block length: 62455808 This was because, metafile was having crc for only 62455808 bytes, and it considered 62455808 as the block size. No matter how many times, recovery was continously failing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865062#comment-13865062 ] Ming Ma commented on HDFS-5585: --- Interesting. Should client ask DN to do the upgrade via ClientDatanodeProtocol, or should client ask NN to do the upgrade via refreshNodes approach and NN ask DNs after that? The nice thing about going through NN is NN has the state and is able to decide the order in which DNs are restarted to minimize the impact on write and read operation. Provide admin commands for data node upgrade Key: HDFS-5585 URL: https://issues.apache.org/jira/browse/HDFS-5585 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Several new methods to ClientDatanodeProtocol may need to be added to support querying version, initiating upgrade, etc. The admin CLI needs to be added as well. This primary use case is for rolling upgrade, but this can be used for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block
[ https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5728: Status: Patch Available (was: Open) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block -- Key: HDFS-5728 URL: https://issues.apache.org/jira/browse/HDFS-5728 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5728.patch 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not one time upload, data will be written slowly. 2. One of the DataNode got diskfull ( due to some other data filled up disks) 3. Unfortunately block was being written to only this datanode in cluster, so client write has also failed. 4. After some time disk is made free and all processes are restarted. 5. Now HMaster try to recover the file by calling recoverLease. At this time recovery was failing saying file length mismatch. When checked, actual block file length: 62484480 Calculated block length: 62455808 This was because, metafile was having crc for only 62455808 bytes, and it considered 62455808 as the block size. No matter how many times, recovery was continously failing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block
[ https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5728: Attachment: HDFS-5728.patch Attached the patch, please review [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block -- Key: HDFS-5728 URL: https://issues.apache.org/jira/browse/HDFS-5728 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5728.patch 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not one time upload, data will be written slowly. 2. One of the DataNode got diskfull ( due to some other data filled up disks) 3. Unfortunately block was being written to only this datanode in cluster, so client write has also failed. 4. After some time disk is made free and all processes are restarted. 5. Now HMaster try to recover the file by calling recoverLease. At this time recovery was failing saying file length mismatch. When checked, actual block file length: 62484480 Calculated block length: 62455808 This was because, metafile was having crc for only 62455808 bytes, and it considered 62455808 as the block size. No matter how many times, recovery was continously failing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865125#comment-13865125 ] Hadoop QA commented on HDFS-5721: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621915/hdfs-5721-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5842//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5842//console This message is automatically generated. sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4273) Fix some issue in DFSInputstream
[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-4273: Attachment: HDFS-4273.v8.patch Update patch to remove expiring local deadNodes related changes. Will create another jira addressing it. Fix some issue in DFSInputstream Key: HDFS-4273 URL: https://issues.apache.org/jira/browse/HDFS-4273 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, HDFS-4273.v7.patch, HDFS-4273.v8.patch, TestDFSInputStream.java Following issues in DFSInputStream are addressed in this jira: 1. read may not retry enough in some cases cause early failure Assume the following call logic {noformat} readWithStrategy() - blockSeekTo() - readBuffer() - reader.doRead() - seekToNewSource() add currentNode to deadnode, wish to get a different datanode - blockSeekTo() - chooseDataNode() - block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false readBuffer() re-throw the exception quit loop readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. {noformat} 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it is cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit. Change failures to local variable solve this issue. 3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865141#comment-13865141 ] Ming Ma commented on HDFS-5535: --- Nice work. Some comments: 1. HDFS Configuration update is another scenario; it could be different from code upgrade in terms of the design. For example, this requirement could mean if we can support DN dynamic config reload to handle certain config changes, no DN restart is required. 2. The write pipeline pause and resume approach is interesting as NN isn’t involved. One scenario similar to DN rolling upgrade is top-of-rack switch upgrade for 30 minutes. During the 30 minutes, we don’t want NN to consider DNs dead and trigger replication. For this specific scenario, write pipeline pause and resume approach might not be enough. Umbrella jira for improved HDFS rolling upgrades Key: HDFS-5535 URL: https://issues.apache.org/jira/browse/HDFS-5535 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, ha, hdfs-client, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Nathan Roberts Attachments: HDFSRollingUpgradesHighLevelDesign.pdf In order to roll a new HDFS release through a large cluster quickly and safely, a few enhancements are needed in HDFS. An initial High level design document will be attached to this jira, and sub-jiras will itemize the individual tasks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
[ https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865142#comment-13865142 ] Junping Du commented on HDFS-5721: -- Thanks Ted for the patch! v2 patch looks good overall and only issue is we should replace *System.out* below with using LOG. {code} + System.out.println(Encountered exception during format: + ioe); {code} +1 when this issues is addressed. sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns --- Key: HDFS-5721 URL: https://issues.apache.org/jira/browse/HDFS-5721 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt At line 901: {code} FSImage sharedEditsImage = new FSImage(conf, Lists.URInewArrayList(), sharedEditsDirs); {code} sharedEditsImage is not closed before the method returns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block
[ https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865158#comment-13865158 ] Hadoop QA commented on HDFS-5728: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621917/HDFS-5728.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5843//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5843//console This message is automatically generated. [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block -- Key: HDFS-5728 URL: https://issues.apache.org/jira/browse/HDFS-5728 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5728.patch 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not one time upload, data will be written slowly. 2. One of the DataNode got diskfull ( due to some other data filled up disks) 3. Unfortunately block was being written to only this datanode in cluster, so client write has also failed. 4. After some time disk is made free and all processes are restarted. 5. Now HMaster try to recover the file by calling recoverLease. At this time recovery was failing saying file length mismatch. When checked, actual block file length: 62484480 Calculated block length: 62455808 This was because, metafile was having crc for only 62455808 bytes, and it considered 62455808 as the block size. No matter how many times, recovery was continously failing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)