[jira] [Commented] (HDFS-4456) Add concat to HttpFS and WebHDFS REST API docs
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569503#comment-13569503 ] Hudson commented on HDFS-4456: -- Integrated in Hadoop-Yarn-trunk #115 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/115/]) HDFS-4456. Add concat to HttpFS and WebHDFS REST API docs. (plamenj2003 via tucu) (Revision 1441603) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441603 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSParametersProvider.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/BaseTestHttpFSWith.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ConcatSourcesParam.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm Add concat to HttpFS and WebHDFS REST API docs -- Key: HDFS-4456 URL: https://issues.apache.org/jira/browse/HDFS-4456 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Tsz Wo (Nicholas), SZE Assignee: Plamen Jeliazkov Fix For: 2.0.3-alpha Attachments: HDFS-3598.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, HDFS-4456.trunk.patch HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569505#comment-13569505 ] Hudson commented on HDFS-4452: -- Integrated in Hadoop-Yarn-trunk #115 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/115/]) HDFS-4452. getAdditionalBlock() can create multiple blocks if the client times out and retries. Contributed by Konstantin Shvachko. (Revision 1441681) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441681 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddBlockRetry.java getAdditionalBlock() can create multiple blocks if the client times out and retries. Key: HDFS-4452 URL: https://issues.apache.org/jira/browse/HDFS-4452 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.2-alpha Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Priority: Critical Fix For: 2.0.3-alpha Attachments: getAdditionalBlock-branch2.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, TestAddBlockRetry.java HDFS client tries to addBlock() to a file. If NameNode is busy the client can timeout and will reissue the same request again. The two requests will race with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in creating two new blocks on the NameNode while the client will know of only one of them. This eventually results in {{NotReplicatedYetException}} because the extra block is never reported by any DataNode, which stalls file creation and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file
[ https://issues.apache.org/jira/browse/HDFS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569518#comment-13569518 ] Hudson commented on HDFS-3119: -- Integrated in Hadoop-Hdfs-0.23-Build #513 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/513/]) merge -r 1311379:1311380 Merging from trunk to branch-0.23 to fix HDFS-3119 (Revision 1441656) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441656 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestOverReplicatedBlocks.java Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file Key: HDFS-3119 URL: https://issues.apache.org/jira/browse/HDFS-3119 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.24.0 Reporter: J.Andreina Assignee: Ashish Singhi Priority: Minor Labels: patch Fix For: 0.24.0, 2.0.0-alpha, 0.23.7 Attachments: HDFS-3119-1.patch, HDFS-3119-1.patch, HDFS-3119.patch cluster setup: -- 1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB step1: write a file filewrite.txt of size 90bytes with sync(not closed) step2: change the replication factor to 1 using the command: ./hdfs dfs -setrep 1 /filewrite.txt step3: close the file * At the NN side the file Decreasing replication from 2 to 1 for /filewrite.txt , logs has occured but the overreplicated blocks are not deleted even after the block report is sent from DN * while listing the file in the console using ./hdfs dfs -ls the replication factor for that file is mentioned as 1 * In fsck report for that files displays that the file is replicated to 2 datanodes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4444) Add space between total transaction time and number of transactions in FSEditLog#printStatistics
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569528#comment-13569528 ] Hudson commented on HDFS-: -- Integrated in Hadoop-Hdfs-0.23-Build #513 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/513/]) HDFS-. Add space between total transaction time and number of transactions in FSEditLog#printStatistics. (Stephen Chu via tgraves) (Revision 1441652) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441652 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java Add space between total transaction time and number of transactions in FSEditLog#printStatistics Key: HDFS- URL: https://issues.apache.org/jira/browse/HDFS- Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Trivial Fix For: 1.2.0, 2.0.3-alpha, 0.23.7 Attachments: HDFS-.patch.001, HDFS-.patch.branch-1 Currently, when we log statistics, we see something like {code} 13/01/25 23:16:59 INFO namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 {code} Notice how the value for total transactions time and Number of transactions batched in Syncs needs a space to separate them. FSEditLog#printStatistics: {code} private void printStatistics(boolean force) { long now = now(); if (lastPrintTime + 6 now !force) { return; } lastPrintTime = now; StringBuilder buf = new StringBuilder(); buf.append(Number of transactions: ); buf.append(numTransactions); buf.append( Total time for transactions(ms): ); buf.append(totalTimeTransactions); buf.append(Number of transactions batched in Syncs: ); buf.append(numTransactionsBatchedInSync); buf.append( Number of syncs: ); buf.append(editLogStream.getNumSync()); buf.append( SyncTimes(ms): ); buf.append(journalSet.getSyncTimes()); LOG.info(buf); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks
[ https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569529#comment-13569529 ] Hudson commented on HDFS-2476: -- Integrated in Hadoop-Hdfs-0.23-Build #513 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/513/]) merge -r 1201990:1201991 Merging from trunk to branch-0.23 to fix HDFS-2476 (Revision 1441463) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441463 Files : * /hadoop/common/branches/branch-0.23 * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnderReplicatedBlocks.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightLinkedSet.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestLightWeightHashSet.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestLightWeightLinkedSet.java More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks Key: HDFS-2476 URL: https://issues.apache.org/jira/browse/HDFS-2476 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 0.23.0 Reporter: Tomasz Nykiel Assignee: Tomasz Nykiel Fix For: 2.0.0-alpha, 0.23.7 Attachments: hashStructures.patch, hashStructures.patch-2, hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5, hashStructures.patch-6, hashStructures.patch-7, hashStructures.patch-8, hashStructures.patch-9 This patch introduces two hash data structures for storing under-replicated, over-replicated and invalidated blocks. 1. LightWeightHashSet 2. LightWeightLinkedSet Currently in all these cases we are using java.util.TreeSet which adds unnecessary overhead. The main bottlenecks addressed by this patch are: -cluster instability times, when these queues (especially under-replicated) tend to grow quite drastically, -initial cluster startup, when the queues are initialized, after leaving safemode, -block reports, -explicit acks for block addition and deletion 1. The introduced structures are CPU-optimized. 2. They shrink and expand according to current capacity. 3. Add/contains/delete ops are performed in O(1) time (unlike current log n for TreeSet). 4. The sets are equipped with fast access methods for polling a number of elements (get+remove), which are used for handling the queues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1765) Block Replication should respect under-replication block priority
[ https://issues.apache.org/jira/browse/HDFS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569530#comment-13569530 ] Hudson commented on HDFS-1765: -- Integrated in Hadoop-Hdfs-0.23-Build #513 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/513/]) merge -r 1213536:1213537 Merging from trunk to branch-0.23 to fix HDFS-1765 (Revision 1441577) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441577 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnderReplicatedBlocks.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java Block Replication should respect under-replication block priority - Key: HDFS-1765 URL: https://issues.apache.org/jira/browse/HDFS-1765 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Uma Maheswara Rao G Fix For: 2.0.0-alpha, 0.23.7 Attachments: HDFS-1765.patch, HDFS-1765.patch, HDFS-1765.patch, HDFS-1765.patch, HDFS-1765.pdf, underReplicatedQueue.pdf Time Spent: 0.5h Remaining Estimate: 0h Currently under-replicated blocks are assigned different priorities depending on how many replicas a block has. However the replication monitor works on blocks in a round-robin fashion. So the newly added high priority blocks won't get replicated until all low-priority blocks are done. One example is that on decommissioning datanode WebUI we often observe that blocks with only decommissioning replicas do not get scheduled to replicate before other blocks, so risking data availability if the node is shutdown for repair before decommission completes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4456) Add concat to HttpFS and WebHDFS REST API docs
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569538#comment-13569538 ] Hudson commented on HDFS-4456: -- Integrated in Hadoop-Hdfs-trunk #1304 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1304/]) HDFS-4456. Add concat to HttpFS and WebHDFS REST API docs. (plamenj2003 via tucu) (Revision 1441603) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441603 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSParametersProvider.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/BaseTestHttpFSWith.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ConcatSourcesParam.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm Add concat to HttpFS and WebHDFS REST API docs -- Key: HDFS-4456 URL: https://issues.apache.org/jira/browse/HDFS-4456 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Tsz Wo (Nicholas), SZE Assignee: Plamen Jeliazkov Fix For: 2.0.3-alpha Attachments: HDFS-3598.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, HDFS-4456.trunk.patch HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569540#comment-13569540 ] Hudson commented on HDFS-4452: -- Integrated in Hadoop-Hdfs-trunk #1304 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1304/]) HDFS-4452. getAdditionalBlock() can create multiple blocks if the client times out and retries. Contributed by Konstantin Shvachko. (Revision 1441681) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441681 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddBlockRetry.java getAdditionalBlock() can create multiple blocks if the client times out and retries. Key: HDFS-4452 URL: https://issues.apache.org/jira/browse/HDFS-4452 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.2-alpha Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Priority: Critical Fix For: 2.0.3-alpha Attachments: getAdditionalBlock-branch2.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, TestAddBlockRetry.java HDFS client tries to addBlock() to a file. If NameNode is busy the client can timeout and will reissue the same request again. The two requests will race with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in creating two new blocks on the NameNode while the client will know of only one of them. This eventually results in {{NotReplicatedYetException}} because the extra block is never reported by any DataNode, which stalls file creation and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
[ https://issues.apache.org/jira/browse/HDFS-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569546#comment-13569546 ] Hudson commented on HDFS-4464: -- Integrated in Hadoop-Hdfs-Snapshots-Branch-build #89 (See [https://builds.apache.org/job/Hadoop-Hdfs-Snapshots-Branch-build/89/]) HDFS-4464. Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot and rename it to destroySubtreeAndCollectBlocks. (Revision 1441680) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441680 Files : * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-2802.txt * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeSymlink.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiffList.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/INodeDirectorySnapshottable.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/INodeDirectoryWithSnapshot.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/INodeFileUnderConstructionWithSnapshot.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/INodeFileWithSnapshot.java Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot Key: HDFS-4464 URL: https://issues.apache.org/jira/browse/HDFS-4464 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: Snapshot (HDFS-2802) Attachments: h4464_20120201b.patch, h4464_20120201.patch Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive methods for deleting inodes and collecting blocks for further block deletion/update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569557#comment-13569557 ] Hudson commented on HDFS-4452: -- Integrated in Hadoop-Mapreduce-trunk #1332 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1332/]) HDFS-4452. getAdditionalBlock() can create multiple blocks if the client times out and retries. Contributed by Konstantin Shvachko. (Revision 1441681) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441681 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddBlockRetry.java getAdditionalBlock() can create multiple blocks if the client times out and retries. Key: HDFS-4452 URL: https://issues.apache.org/jira/browse/HDFS-4452 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.2-alpha Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Priority: Critical Fix For: 2.0.3-alpha Attachments: getAdditionalBlock-branch2.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, TestAddBlockRetry.java HDFS client tries to addBlock() to a file. If NameNode is busy the client can timeout and will reissue the same request again. The two requests will race with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in creating two new blocks on the NameNode while the client will know of only one of them. This eventually results in {{NotReplicatedYetException}} because the extra block is never reported by any DataNode, which stalls file creation and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4456) Add concat to HttpFS and WebHDFS REST API docs
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569555#comment-13569555 ] Hudson commented on HDFS-4456: -- Integrated in Hadoop-Mapreduce-trunk #1332 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1332/]) HDFS-4456. Add concat to HttpFS and WebHDFS REST API docs. (plamenj2003 via tucu) (Revision 1441603) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441603 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSParametersProvider.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/BaseTestHttpFSWith.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ConcatSourcesParam.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm Add concat to HttpFS and WebHDFS REST API docs -- Key: HDFS-4456 URL: https://issues.apache.org/jira/browse/HDFS-4456 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Tsz Wo (Nicholas), SZE Assignee: Plamen Jeliazkov Fix For: 2.0.3-alpha Attachments: HDFS-3598.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, HDFS-4456.trunk.patch HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4197) SNN JMX is lacking checkpoint info
[ https://issues.apache.org/jira/browse/HDFS-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569561#comment-13569561 ] Daisuke Kobayashi commented on HDFS-4197: - Just found HDFS-3409 and dup of it. Does it need to expose the info on the 2NN? SNN JMX is lacking checkpoint info -- Key: HDFS-4197 URL: https://issues.apache.org/jira/browse/HDFS-4197 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.2-alpha Reporter: Andy Isaacson Assignee: Daisuke Kobayashi Labels: newbie The SecondaryNameNode status.jsp page contains the following: {noformat} SecondaryNameNode Status Name Node Address: snn1/172.29.122.91:50020 Start Time : Fri Nov 09 09:25:29 PST 2012 Last Checkpoint Time : Thu Nov 15 15:35:06 PST 2012 Checkpoint Period: 30 seconds Checkpoint Size : 39.06 KB (= 4 bytes) Checkpoint Dirs : [file:///tmp/hdfs-adi/dfs/namesecondary] Checkpoint Edits Dirs: [file:///tmp/hdfs-adi/dfs/namesecondary] {noformat} The JMX page at {{:50090/jmx}} should also provide this info; at the very least the Last Checkpoint Time and Checkpoint Size so that users are not tempted to scrape the {{status.jsp}} output. Perhaps I'm missing it but these data seem to be missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4414) Create a DiffReport class to represent the diff between snapshots to end users
[ https://issues.apache.org/jira/browse/HDFS-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569634#comment-13569634 ] Suresh Srinivas commented on HDFS-4414: --- Comments: # Remove unnecessary import change in DFSClient.java # In DFSClient.java and DistributedFileSystem.java point to the documentation in ClientProtocol#getSnapshotDiffReport() method, instead of repeating the same javadoc. # ClientProtocol.java - Methods getSnapshotDiffReport, allowSnapshot, and disallowSnapshot should document the specific expections thrown by the method and the condition in which they are thrown. This is unrelated to the change in this patch and can be done as a separate jira. # SnapshotDiffReport class should in o.a.h.protocol package. # INodeDirectoryWithSnapshot - javadoc typo fromEarlierSnapshot - fromEarlier # SnapshotDiffReport.java - lineSepearator could be static final variable # We should create another jira to conver the current implementation into iterative report. Otherwise dealing with large set of changes in a single response will result in issues. This should work similar to iterative ls operation. +1 with these changes. Create a DiffReport class to represent the diff between snapshots to end users -- Key: HDFS-4414 URL: https://issues.apache.org/jira/browse/HDFS-4414 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-4414.001.patch, HDFS-4414.003.patch, HDFS-4414.004.patch, HDFS-4414+4131.002.patch HDFS-4131 computes the difference between two snapshots (or between a snapshot and the current tree). In this jira we create a DiffReport class to represent the diff to end users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4414) Add support for getting snapshot diff from DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-4414: -- Summary: Add support for getting snapshot diff from DistributedFileSystem (was: Create a DiffReport class to represent the diff between snapshots to end users) Add support for getting snapshot diff from DistributedFileSystem Key: HDFS-4414 URL: https://issues.apache.org/jira/browse/HDFS-4414 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-4414.001.patch, HDFS-4414.003.patch, HDFS-4414.004.patch, HDFS-4414+4131.002.patch HDFS-4131 computes the difference between two snapshots (or between a snapshot and the current tree). In this jira we create a DiffReport class to represent the diff to end users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4414) Add support for getting snapshot diff from DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4414: Attachment: HDFS-4414.005.patch Thanks for the comments Suresh! Upload the new patch. Will create jiras to address 37. Add support for getting snapshot diff from DistributedFileSystem Key: HDFS-4414 URL: https://issues.apache.org/jira/browse/HDFS-4414 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-4414.001.patch, HDFS-4414.003.patch, HDFS-4414.004.patch, HDFS-4414.005.patch, HDFS-4414+4131.002.patch HDFS-4131 computes the difference between two snapshots (or between a snapshot and the current tree). In this jira we create a DiffReport class to represent the diff to end users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4414) Add support for getting snapshot diff from DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HDFS-4414. --- Resolution: Fixed Fix Version/s: Snapshot (HDFS-2802) Hadoop Flags: Reviewed Committed the patch to HDFS-2802 branch. Thank you Jing! Add support for getting snapshot diff from DistributedFileSystem Key: HDFS-4414 URL: https://issues.apache.org/jira/browse/HDFS-4414 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Jing Zhao Assignee: Jing Zhao Fix For: Snapshot (HDFS-2802) Attachments: HDFS-4414.001.patch, HDFS-4414.003.patch, HDFS-4414.004.patch, HDFS-4414.005.patch, HDFS-4414+4131.002.patch HDFS-4131 computes the difference between two snapshots (or between a snapshot and the current tree). In this jira we create a DiffReport class to represent the diff to end users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4350) Make enabling of stale marking on read and write paths independent
[ https://issues.apache.org/jira/browse/HDFS-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569653#comment-13569653 ] Suresh Srinivas commented on HDFS-4350: --- +1 for the branch-1 patch as well. Make enabling of stale marking on read and write paths independent -- Key: HDFS-4350 URL: https://issues.apache.org/jira/browse/HDFS-4350 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-4350-1.patch, hdfs-4350-2.patch, hdfs-4350-3.patch, hdfs-4350-4.patch, hdfs-4350-5.patch, hdfs-4350-6.patch, hdfs-4350-7.patch, hdfs-4350-branch-1-1.patch, hdfs-4350-branch-1-2.patch, hdfs-4350-branch-1-3.patch, hdfs-4350.txt Marking of datanodes as stale for the read and write path was introduced in HDFS-3703 and HDFS-3912 respectively. This is enabled using two new keys, {{DFS_NAMENODE_CHECK_STALE_DATANODE_KEY}} and {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY}}. However, there currently exists a dependency, since you cannot enable write marking without also enabling read marking, since the first key enables both checking of staleness and read marking. I propose renaming the first key to {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_READ_KEY}}, and make checking enabled if either of the keys are set. This will allow read and write marking to be enabled independently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4350) Make enabling of stale marking on read and write paths independent
[ https://issues.apache.org/jira/browse/HDFS-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569654#comment-13569654 ] Hudson commented on HDFS-4350: -- Integrated in Hadoop-trunk-Commit #3315 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3315/]) HDFS-4350. Make enabling of stale marking on read and write paths independent. Contributed by Andrew Wang. (Revision 1441819) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1441819 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java Make enabling of stale marking on read and write paths independent -- Key: HDFS-4350 URL: https://issues.apache.org/jira/browse/HDFS-4350 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-4350-1.patch, hdfs-4350-2.patch, hdfs-4350-3.patch, hdfs-4350-4.patch, hdfs-4350-5.patch, hdfs-4350-6.patch, hdfs-4350-7.patch, hdfs-4350-branch-1-1.patch, hdfs-4350-branch-1-2.patch, hdfs-4350-branch-1-3.patch, hdfs-4350.txt Marking of datanodes as stale for the read and write path was introduced in HDFS-3703 and HDFS-3912 respectively. This is enabled using two new keys, {{DFS_NAMENODE_CHECK_STALE_DATANODE_KEY}} and {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY}}. However, there currently exists a dependency, since you cannot enable write marking without also enabling read marking, since the first key enables both checking of staleness and read marking. I propose renaming the first key to {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_READ_KEY}}, and make checking enabled if either of the keys are set. This will allow read and write marking to be enabled independently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3703) Decrease the datanode failure detection time
[ https://issues.apache.org/jira/browse/HDFS-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-3703: -- Fix Version/s: (was: 2.0.3-alpha) (was: 3.0.0) 2.0.2-alpha Decrease the datanode failure detection time Key: HDFS-3703 URL: https://issues.apache.org/jira/browse/HDFS-3703 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 1.0.3, 2.0.0-alpha, 3.0.0 Reporter: nkeywal Assignee: Jing Zhao Fix For: 1.1.0, 2.0.2-alpha Attachments: 3703-hadoop-1.0.txt, HDFS-3703-branch-1.1-read-only.patch, HDFS-3703-branch-1.1-read-only.patch, HDFS-3703-branch2.patch, HDFS-3703.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-with-write.patch By default, if a box dies, the datanode will be marked as dead by the namenode after 10:30 minutes. In the meantime, this datanode will still be proposed by the nanenode to write blocks or to read replicas. It happens as well if the datanode crashes: there is no shutdown hooks to tell the nanemode we're not there anymore. It especially an issue with HBase. HBase regionserver timeout for production is often 30s. So with these configs, when a box dies HBase starts to recover after 30s and, while 10 minutes, the namenode will consider the blocks on the same box as available. Beyond the write errors, this will trigger a lot of missed reads: - during the recovery, HBase needs to read the blocks used on the dead box (the ones in the 'HBase Write-Ahead-Log') - after the recovery, reading these data blocks (the 'HBase region') will fail 33% of the time with the default number of replica, slowering the data access, especially when the errors are socket timeout (i.e. around 60s most of the time). Globally, it would be ideal if HDFS settings could be under HBase settings. As a side note, HBase relies on ZooKeeper to detect regionservers issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3703) Decrease the datanode failure detection time
[ https://issues.apache.org/jira/browse/HDFS-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-3703: -- Fix Version/s: (was: 2.0.2-alpha) 2.0.3-alpha Decrease the datanode failure detection time Key: HDFS-3703 URL: https://issues.apache.org/jira/browse/HDFS-3703 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 1.0.3, 2.0.0-alpha, 3.0.0 Reporter: nkeywal Assignee: Jing Zhao Fix For: 1.1.0, 2.0.3-alpha Attachments: 3703-hadoop-1.0.txt, HDFS-3703-branch-1.1-read-only.patch, HDFS-3703-branch-1.1-read-only.patch, HDFS-3703-branch2.patch, HDFS-3703.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-with-write.patch By default, if a box dies, the datanode will be marked as dead by the namenode after 10:30 minutes. In the meantime, this datanode will still be proposed by the nanenode to write blocks or to read replicas. It happens as well if the datanode crashes: there is no shutdown hooks to tell the nanemode we're not there anymore. It especially an issue with HBase. HBase regionserver timeout for production is often 30s. So with these configs, when a box dies HBase starts to recover after 30s and, while 10 minutes, the namenode will consider the blocks on the same box as available. Beyond the write errors, this will trigger a lot of missed reads: - during the recovery, HBase needs to read the blocks used on the dead box (the ones in the 'HBase Write-Ahead-Log') - after the recovery, reading these data blocks (the 'HBase region') will fail 33% of the time with the default number of replica, slowering the data access, especially when the errors are socket timeout (i.e. around 60s most of the time). Globally, it would be ideal if HDFS settings could be under HBase settings. As a side note, HBase relies on ZooKeeper to detect regionservers issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4350) Make enabling of stale marking on read and write paths independent
[ https://issues.apache.org/jira/browse/HDFS-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-4350: -- Resolution: Fixed Fix Version/s: 2.0.3-alpha 1.2.0 Target Version/s: 1.2.0, 2.0.3-alpha (was: 1.2.0, 3.0.0) Release Note: This patch makes an incompatible configuration change, as described below: In releases 1.1.0 and other point releases 1.1.x, the configuration parameter dfs.namenode.check.stale.datanode could be used to turn on checking for the stale nodes. This configuration is no longer supported in release 1.2.0 onwards and is renamed as dfs.namenode.avoid.read.stale.datanode. How feature works and configuring this feature: As described in HDFS-3703 release notes, datanode stale period can be configured using parameter dfs.namenode.stale.datanode.interval in seconds (default value is 30 seconds). NameNode can be configured to use this staleness information for reads using configuration dfs.namenode.avoid.read.stale.datanode. When this parameter is set to true, namenode picks a stale datanode as the last target to read from when returning block locations for reads. Using staleness information for writes is as described in the releases notes of HDFS-3912. Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) Status: Resolved (was: Patch Available) I committed the patch to trunk, branch-2 and branch-1. Thank you Andrew! Make enabling of stale marking on read and write paths independent -- Key: HDFS-4350 URL: https://issues.apache.org/jira/browse/HDFS-4350 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 1.2.0, 2.0.3-alpha Attachments: hdfs-4350-1.patch, hdfs-4350-2.patch, hdfs-4350-3.patch, hdfs-4350-4.patch, hdfs-4350-5.patch, hdfs-4350-6.patch, hdfs-4350-7.patch, hdfs-4350-branch-1-1.patch, hdfs-4350-branch-1-2.patch, hdfs-4350-branch-1-3.patch, hdfs-4350.txt Marking of datanodes as stale for the read and write path was introduced in HDFS-3703 and HDFS-3912 respectively. This is enabled using two new keys, {{DFS_NAMENODE_CHECK_STALE_DATANODE_KEY}} and {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY}}. However, there currently exists a dependency, since you cannot enable write marking without also enabling read marking, since the first key enables both checking of staleness and read marking. I propose renaming the first key to {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_READ_KEY}}, and make checking enabled if either of the keys are set. This will allow read and write marking to be enabled independently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4343) When storageID of dfs.data.dir of being inconsistent, restart datanode will be failure.
[ https://issues.apache.org/jira/browse/HDFS-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HDFS-4343. --- Resolution: Invalid Closing this jira as Invalid. The behavior in the description is the expected behavior. When storage directories are in inconsistent state, invalid or stale directories are expected to be cleaned manually, before restarting a datanode. If you disagree, feel free to reopen the jira. Please justify why you think it is a valid jira, when you reopen the jira. When storageID of dfs.data.dir of being inconsistent, restart datanode will be failure. --- Key: HDFS-4343 URL: https://issues.apache.org/jira/browse/HDFS-4343 Project: Hadoop HDFS Issue Type: Bug Components: datanode Environment: namenode datanode Reporter: liuyang Attachments: hadoop-root-datanode-167-52-0-55.log, VERSION-1, VERSION-2 A datanode has multiple storage directories configured using dfs.data.dir. When the storageID in the VERSION files in these directories, the datanode fails to startup. Consider a scenario, when old data in a storage directory is not cleared, the storage ID from it will not match with storage ID of in other storage storage directories. In this situation, the DataNode will quit and restart fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3374) hdfs' TestDelegationToken fails intermittently with a race condition
[ https://issues.apache.org/jira/browse/HDFS-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569658#comment-13569658 ] Suresh Srinivas commented on HDFS-3374: --- bq. I will upload a branch-1 patch to remove the synchronization in ExpiredTokenRemover.run(). Can you please do this in a separate jira? hdfs' TestDelegationToken fails intermittently with a race condition Key: HDFS-3374 URL: https://issues.apache.org/jira/browse/HDFS-3374 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 1.0.3 Attachments: HDFS-3374-branch-1.0.patch, hdfs-3374.patch, HDFS-3374.patch, HDFS-3374.trunk.patch The testcase is failing because the MiniDFSCluster is shutdown before the secret manager can change the key, which calls system.exit with no edit streams available. {code} [junit] 2012-05-04 15:03:51,521 WARN common.Storage (FSImage.java:updateRemovedDirs(224)) - Removing storage dir /home/horton/src/hadoop/build/test/data/dfs/name1 [junit] 2012-05-04 15:03:51,522 FATAL namenode.FSNamesystem (FSEditLog.java:fatalExit(388)) - No edit streams are accessible [junit] java.lang.Exception: No edit streams are accessible [junit] at org.apache.hadoop.hdfs.server.namenode.FSEditLog.fatalExit(FSEditLog.java:388) [junit] at org.apache.hadoop.hdfs.server.namenode.FSEditLog.exitIfNoStreams(FSEditLog.java:407) [junit] at org.apache.hadoop.hdfs.server.namenode.FSEditLog.removeEditsAndStorageDir(FSEditLog.java:432) [junit] at org.apache.hadoop.hdfs.server.namenode.FSEditLog.removeEditsStreamsAndStorageDirs(FSEditLog.java:468) [junit] at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1028) [junit] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logUpdateMasterKey(FSNamesystem.java:5641) [junit] at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logUpdateMasterKey(DelegationTokenSecretManager.java:286) [junit] at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.updateCurrentKey(AbstractDelegationTokenSecretManager.java:150) [junit] at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.rollMasterKey(AbstractDelegationTokenSecretManager.java:174) [junit] at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:385) [junit] at java.lang.Thread.run(Thread.java:662) [junit] Running org.apache.hadoop.hdfs.security.TestDelegationToken [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.hdfs.security.TestDelegationToken FAILED (crashed) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3771) Namenode can't restart due to corrupt edit logs, timing issue with shutdown and edit log rolling
[ https://issues.apache.org/jira/browse/HDFS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3771: -- Affects Version/s: (was: 2.0.0-alpha) This isn't needed in 2.x - perhaps the 0.23.x maintainers want to keep this open for 0.23.x? Otherwise feel free to close. (I removed the 2.x affects version) Namenode can't restart due to corrupt edit logs, timing issue with shutdown and edit log rolling Key: HDFS-3771 URL: https://issues.apache.org/jira/browse/HDFS-3771 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.3 Environment: QE, 20 node Federated cluster with 3 NNs and 15 DNs, using Kerberos based security Reporter: patrick white Priority: Critical Our 0.23.3 nightly HDFS regression suite encountered a particularly nasty issue recently, which resulted in the cluster's default Namenode being unable to restart, this was on a 20 node Federated cluster with security. The cause appears to be that the NN was just starting to roll its edit log when a shutdown occurred, the shutdown was intentional to restart the cluster as part of an automated test. The tests that were running do not appear to be the issue in themselves, the cluster was just wrapping up an adminReport subset and this failure case has not reproduce so far, nor was it failing previously. It looks like a chance occurrence of sending the shutdown just as the edit log roll was begun. From the NN log, the following sequence is noted: 1. an InvalidateBlocks operation had completed 2. FSNamesystem: Roll Edit Log from [Secondary Namenode IPaddr] 3. FSEditLog: Ending log segment 23963 4. FSEditLog: Starting log segment at 23967 4. NameNode: SHUTDOWN_MSG = the NN shuts down and then is restarted... 5. FSImageTransactionalStorageInspector: Logs beginning at txid 23967 were are all in-progress 6. FSImageTransactionalStorageInspector: Marking log at /grid/[PATH]/edits_inprogress_0023967 as corrupt since it has no transactions in it. 7. NameNode: Exception in namenode join [main]java.lang.IllegalStateException: No non-corrupt logs for txid 23967 = NN start attempts continue to cycle trying to restart but can't, failing on the same exception due to lack of non-corrupt edit logs If observations are correct and issue is from shutdown happening as edit logs are rolling, does the NN have an equivalent to the conventional fs 'sync' blocking action that should be called, or perhaps has a timing hole? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira