[jira] [Commented] (HDFS-4376) Intermittent timeout of TestBalancerWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796500#comment-13796500 ] Hadoop QA commented on HDFS-4376: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608660/HDFS-4376-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5207//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5207//console This message is automatically generated. Intermittent timeout of TestBalancerWithNodeGroup - Key: HDFS-4376 URL: https://issues.apache.org/jira/browse/HDFS-4376 Project: Hadoop HDFS Issue Type: Bug Components: balancer, test Affects Versions: 2.0.3-alpha Reporter: Aaron T. Myers Assignee: Junping Du Attachments: BalancerTest-HDFS-4376-v1.tar.gz, HDFS-4376-v1.patch, HDFS-4376-v2.patch, HDFS-4376-v3.patch, test-balancer-with-node-group-timeout.txt HDFS-4261 fixed several issues with the balancer and balancer tests, and reduced the frequency with which TestBalancerWithNodeGroup times out. Despite this, occasional timeouts still occur in this test. This JIRA is to track and fix this problem. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5042) Completed files lost after power failure
[ https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796509#comment-13796509 ] Luke Lu commented on HDFS-5042: --- Looks like this is the most compelling reason to use XFS, where *all* transactions prior to the fsync() triggered log force are guaranteed to be on disk once the fsync completes. There are no plans to change this behavior, either, because we rely on this architectural characteristic to provide strong ordering of metadata operations in many places. Completed files lost after power failure Key: HDFS-5042 URL: https://issues.apache.org/jira/browse/HDFS-5042 Project: Hadoop HDFS Issue Type: Bug Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5) Reporter: Dave Latham Priority: Critical We suffered a cluster wide power failure after which HDFS lost data that it had acknowledged as closed and complete. The client was HBase which compacted a set of HFiles into a new HFile, then after closing the file successfully, deleted the previous versions of the file. The cluster then lost power, and when brought back up the newly created file was marked CORRUPT. Based on reading the logs it looks like the replicas were created by the DataNodes in the 'blocksBeingWritten' directory. Then when the file was closed they were moved to the 'current' directory. After the power cycle those replicas were again in the blocksBeingWritten directory of the underlying file system (ext3). When those DataNodes reported in to the NameNode it deleted those replicas and lost the file. Some possible fixes could be having the DataNode fsync the directory(s) after moving the block from blocksBeingWritten to current to ensure the rename is durable or having the NameNode accept replicas from blocksBeingWritten under certain circumstances. Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode): {noformat} RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c with permission=rwxrwxrwx NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. blk_1395839728632046111_357084589 DN 2013-06-29 11:16:06,832 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: /10.0.5.237:50010 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to blk_1395839728632046111_357084589 size 25418340 DN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327 DN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_1395839728632046111_357084589 terminating NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on file /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c from client DFSClient_hb_rs_hs745,60020,1372470111932 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c is closed by DFSClient_hb_rs_hs745,60020,1372470111932 RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c to hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: Completed major compaction of 7 file(s) in n of users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m --- CRASH, RESTART - NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: addStoredBlock request received for blk_1395839728632046111_357084589 on
[jira] [Created] (HDFS-5367) Restore fsimage locked NameNode too long when the size of fsimage are big
zhaoyunjiong created HDFS-5367: -- Summary: Restore fsimage locked NameNode too long when the size of fsimage are big Key: HDFS-5367 URL: https://issues.apache.org/jira/browse/HDFS-5367 Project: Hadoop HDFS Issue Type: Improvement Reporter: zhaoyunjiong Assignee: zhaoyunjiong Our cluster have 40G fsimage, we write one copy of edit log to NFS. After NFS temporary failed, when doing checkpoint, NameNode try to recover it, and it will save 40G fsimage to NFS, it takes some time ( 40G/128MB/s = 320 seconds) , and it locked FSNamesystem, and this bring down our cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5367) Restore fsimage locked NameNode too long when the size of fsimage are big
[ https://issues.apache.org/jira/browse/HDFS-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5367: --- Attachment: (was: HDFS-5367) Restore fsimage locked NameNode too long when the size of fsimage are big - Key: HDFS-5367 URL: https://issues.apache.org/jira/browse/HDFS-5367 Project: Hadoop HDFS Issue Type: Improvement Reporter: zhaoyunjiong Assignee: zhaoyunjiong Our cluster have 40G fsimage, we write one copy of edit log to NFS. After NFS temporary failed, when doing checkpoint, NameNode try to recover it, and it will save 40G fsimage to NFS, it takes some time ( 40G/128MB/s = 320 seconds) , and it locked FSNamesystem, and this bring down our cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5367) Restore fsimage locked NameNode too long when the size of fsimage are big
[ https://issues.apache.org/jira/browse/HDFS-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5367: --- Attachment: HDFS-5367 The fsimage restored when SecondaryNameNode call rollEditLog will be replaced soon when SecondaryNameNode call rollFsImage. So I think restore fsimage is not necessary. Restore fsimage locked NameNode too long when the size of fsimage are big - Key: HDFS-5367 URL: https://issues.apache.org/jira/browse/HDFS-5367 Project: Hadoop HDFS Issue Type: Improvement Reporter: zhaoyunjiong Assignee: zhaoyunjiong Our cluster have 40G fsimage, we write one copy of edit log to NFS. After NFS temporary failed, when doing checkpoint, NameNode try to recover it, and it will save 40G fsimage to NFS, it takes some time ( 40G/128MB/s = 320 seconds) , and it locked FSNamesystem, and this bring down our cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5367) Restore fsimage locked NameNode too long when the size of fsimage are big
[ https://issues.apache.org/jira/browse/HDFS-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-5367: --- Attachment: HDFS-5367-branch-1.2.patch This patch avoid restore fsimage to make rollEditLog finished as soon as possible. Restore fsimage locked NameNode too long when the size of fsimage are big - Key: HDFS-5367 URL: https://issues.apache.org/jira/browse/HDFS-5367 Project: Hadoop HDFS Issue Type: Improvement Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5367-branch-1.2.patch Our cluster have 40G fsimage, we write one copy of edit log to NFS. After NFS temporary failed, when doing checkpoint, NameNode try to recover it, and it will save 40G fsimage to NFS, it takes some time ( 40G/128MB/s = 320 seconds) , and it locked FSNamesystem, and this bring down our cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5283: Attachment: HDFS-5283.patch Updated the patch with comments. Please review. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5368) Namenode deadlock during safemode extention
[ https://issues.apache.org/jira/browse/HDFS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5368: Description: Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. {normat} Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 {noformat} Check attached jstack for complete stack was: Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. {normat}Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953{noformat} Check attached jstack for complete stack Namenode deadlock during safemode extention --- Key: HDFS-5368 URL: https://issues.apache.org/jira/browse/HDFS-5368 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Priority: Blocker Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. {normat} Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 {noformat} Check attached jstack for complete stack -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5368) Namenode deadlock during safemode extention
[ https://issues.apache.org/jira/browse/HDFS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5368: Description: Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 Check attached jstack for complete stack was: Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. {normat} Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 {noformat} Check attached jstack for complete stack Namenode deadlock during safemode extention --- Key: HDFS-5368 URL: https://issues.apache.org/jira/browse/HDFS-5368 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Priority: Blocker Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 Check attached jstack for complete stack -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5368) Namenode deadlock during safemode extention
[ https://issues.apache.org/jira/browse/HDFS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5368: Attachment: NN-deadlock.zip Namenode deadlock during safemode extention --- Key: HDFS-5368 URL: https://issues.apache.org/jira/browse/HDFS-5368 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: NN-deadlock.zip Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 Check attached jstack for complete stack -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5368) Namenode deadlock during safemode extention
[ https://issues.apache.org/jira/browse/HDFS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796612#comment-13796612 ] Vinay commented on HDFS-5368: - HDFS-3486 was fixed in Branch-1. Namenode deadlock during safemode extention --- Key: HDFS-5368 URL: https://issues.apache.org/jira/browse/HDFS-5368 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: NN-deadlock.zip Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 Check attached jstack for complete stack -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5368) Namenode deadlock during safemode extention
[ https://issues.apache.org/jira/browse/HDFS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5368: Attachment: HDFS-5368.patch Attaching a patch which takes out {{namenode.isInSafeMode()}} out of {{datanodeMap}} synchronization Namenode deadlock during safemode extention --- Key: HDFS-5368 URL: https://issues.apache.org/jira/browse/HDFS-5368 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5368.patch, NN-deadlock.zip Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 Check attached jstack for complete stack -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5368) Namenode deadlock during safemode extention
[ https://issues.apache.org/jira/browse/HDFS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5368: Status: Patch Available (was: Open) Namenode deadlock during safemode extention --- Key: HDFS-5368 URL: https://issues.apache.org/jira/browse/HDFS-5368 Project: Hadoop HDFS Issue Type: Bug Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5368.patch, NN-deadlock.zip Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 Check attached jstack for complete stack -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5368) Namenode deadlock during safemode extention
[ https://issues.apache.org/jira/browse/HDFS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5368: Affects Version/s: 2.2.0 3.0.0 Namenode deadlock during safemode extention --- Key: HDFS-5368 URL: https://issues.apache.org/jira/browse/HDFS-5368 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5368.patch, NN-deadlock.zip Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 Check attached jstack for complete stack -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796661#comment-13796661 ] Hadoop QA commented on HDFS-5283: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608680/HDFS-5283.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot org.apache.hadoop.hdfs.TestDecommission The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5208//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5208//console This message is automatically generated. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5368) Namenode deadlock during safemode extention
[ https://issues.apache.org/jira/browse/HDFS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796710#comment-13796710 ] Hadoop QA commented on HDFS-5368: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608692/HDFS-5368.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5209//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5209//console This message is automatically generated. Namenode deadlock during safemode extention --- Key: HDFS-5368 URL: https://issues.apache.org/jira/browse/HDFS-5368 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5368.patch, NN-deadlock.zip Namenode entered to safemode during restart 1. After restart NN entered to safemode extention. 2. During this time deadlock happened between datanode heartbeat and SafemodeMonitor() thread. Found one Java-level deadlock: = org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953: waiting to lock monitor 0x18c3b42c (object 0x0439c6f8, a java.util.TreeMap), which is held by IPC Server handler 2 on 62212 IPC Server handler 2 on 62212: waiting to lock monitor 0x18c3987c (object 0x043849a0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo), which is held by org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor@9fe953 Check attached jstack for complete stack -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-5283: Attachment: HDFS-5283.patch Updated the patch. {{assert hasReadLock();}} is replaced with {{assert hasReadOrWriteLock();}} Since {{isInSnapshot()}} is being called holding the writeLock, {{hasReadlock()}} returning false and assertion failed. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5346) Replication queues should not be initialized in the middle of IBR processing.
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-5346: --- Status: Open (was: Patch Available) Replication queues should not be initialized in the middle of IBR processing. - Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796863#comment-13796863 ] Hadoop QA commented on HDFS-5283: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608707/HDFS-5283.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5210//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5210//console This message is automatically generated. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5346) Replication queues should not be initialized in the middle of IBR processing.
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-5346: --- Attachment: HDFS-5346.patch Attaching the same patch for trunk, even though the branch-23 patch applies to trunk with some fuzz. Replication queues should not be initialized in the middle of IBR processing. - Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5346) Replication queues should not be initialized in the middle of IBR processing.
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-5346: --- Status: Patch Available (was: Open) Replication queues should not be initialized in the middle of IBR processing. - Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5346) Replication queues should not be initialized in the middle of IBR processing.
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-5346: --- Attachment: HDFS-5346.branch-23.patch Hmm We realized we can set dfs.namenode.replqueue.threshold-pct to 1.0 or even 1.5 to make sure that only when the NN enters the Safemode extension period are the replication queues initialized. Thus truncating the patch to include only the optimization for the condition to not traverse the TreeMap. Replication queues should not be initialized in the middle of IBR processing. - Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5096) Automatically cache new data added to a cached path
[ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796979#comment-13796979 ] Chris Nauroth commented on HDFS-5096: - Agreed with Andrew that we're getting close. Almost all of my prior feedback has been addressed. I found a few more small things after reviewing test code. Here is the full list of remaining feedback (some of it redundant, but this way you don't have to look at multiple old comments). hdfs-default.xml: Let's document {{dfs.namenode.path.based.cache.refresh.interval.ms}}. {{IntrusiveCollection#addFirst}}: This method appears to be only called from test code. Do you want to keep it, or is it better to delete it? {{TestPathBasedCacheRequests#waitForCachedBlocks}}: This is another spot where I think we should use {{GenericTestUtils#waitFor}}. Even though the JUnit-level timeouts would abort, this tends to leave the process hanging around. {{GenericTestUtils#waitFor}} would throw and exit more cleanly. Automatically cache new data added to a cached path --- Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Andrew Wang Assignee: Colin Patrick McCabe Attachments: HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch, HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5203) Concurrent clients that add a cache directive on the same path may prematurely uncache from each other.
[ https://issues.apache.org/jira/browse/HDFS-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth reassigned HDFS-5203: --- Assignee: Chris Nauroth Concurrent clients that add a cache directive on the same path may prematurely uncache from each other. --- Key: HDFS-5203 URL: https://issues.apache.org/jira/browse/HDFS-5203 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth When a client adds a cache directive, we assign it a unique ID and return that ID to the client. If multiple clients add a cache directive for the same path, then we return the same ID. If one client then removes the cache entry for that ID, then it is removed for all clients. Then, when this change becomes visible in subsequent cache reports, the datanodes may {{munlock}} the block before the other clients are done with it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5369) Support negative caching of user-group mapping
Andrew Wang created HDFS-5369: - Summary: Support negative caching of user-group mapping Key: HDFS-5369 URL: https://issues.apache.org/jira/browse/HDFS-5369 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Andrew Wang We've seen a situation at a couple of our customers where interactions from an unknown user leads to a high-rate of group mapping calls. In one case, this was happening at a rate of 450 calls per second with the shell-based group mapping, enough to severely impact overall namenode performance and also leading to large amounts of log spam (prints a stack trace each time). Let's consider negative caching of group mapping, as well as quashing the rate of this log message. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5203) Concurrent clients that add a cache directive on the same path may prematurely uncache from each other.
[ https://issues.apache.org/jira/browse/HDFS-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796994#comment-13796994 ] Chris Nauroth commented on HDFS-5203: - Now that the big changes in HDFS-5096 are winding down, I'm planning on revisiting HDFS-5203 soon and preparing a patch. Concurrent clients that add a cache directive on the same path may prematurely uncache from each other. --- Key: HDFS-5203 URL: https://issues.apache.org/jira/browse/HDFS-5203 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth When a client adds a cache directive, we assign it a unique ID and return that ID to the client. If multiple clients add a cache directive for the same path, then we return the same ID. If one client then removes the cache entry for that ID, then it is removed for all clients. Then, when this change becomes visible in subsequent cache reports, the datanodes may {{munlock}} the block before the other clients are done with it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5369) Support negative caching of user-group mapping
[ https://issues.apache.org/jira/browse/HDFS-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797004#comment-13797004 ] Andrew Wang commented on HDFS-5369: --- I saw some discussion about negative-caching in HADOOP-8088, where the conclusion was that other services on the NN host perform caching, preventing expensive RTTs to do an LDAP lookup. However, ~450 shell calls per second is expensive even if the result is cached, and even with JNI it still seems like unnecessary overhead. Support negative caching of user-group mapping -- Key: HDFS-5369 URL: https://issues.apache.org/jira/browse/HDFS-5369 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Andrew Wang We've seen a situation at a couple of our customers where interactions from an unknown user leads to a high-rate of group mapping calls. In one case, this was happening at a rate of 450 calls per second with the shell-based group mapping, enough to severely impact overall namenode performance and also leading to large amounts of log spam (prints a stack trace each time). Let's consider negative caching of group mapping, as well as quashing the rate of this log message. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5369) Support negative caching of user-group mapping
[ https://issues.apache.org/jira/browse/HDFS-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang reassigned HDFS-5369: - Assignee: Andrew Wang Support negative caching of user-group mapping -- Key: HDFS-5369 URL: https://issues.apache.org/jira/browse/HDFS-5369 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Andrew Wang Assignee: Andrew Wang We've seen a situation at a couple of our customers where interactions from an unknown user leads to a high-rate of group mapping calls. In one case, this was happening at a rate of 450 calls per second with the shell-based group mapping, enough to severely impact overall namenode performance and also leading to large amounts of log spam (prints a stack trace each time). Let's consider negative caching of group mapping, as well as quashing the rate of this log message. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5346) Replication queues should not be initialized in the middle of IBR processing.
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797023#comment-13797023 ] Hadoop QA commented on HDFS-5346: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608720/HDFS-5346.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5211//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5211//console This message is automatically generated. Replication queues should not be initialized in the middle of IBR processing. - Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5363) Create SPENGO-authenticated connection in URLConnectionFactory instead WebHdfsFileSystem
[ https://issues.apache.org/jira/browse/HDFS-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5363: - Attachment: HDFS-5363.001.patch Create SPENGO-authenticated connection in URLConnectionFactory instead WebHdfsFileSystem Key: HDFS-5363 URL: https://issues.apache.org/jira/browse/HDFS-5363 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5363.000.patch, HDFS-5363.001.patch Currently the WebHdfsSystem class creates the http connection of urls that require SPENGO authentication. This patch moves the above logic to URLConnectionFactory, which is the factory class that supposes to create all http connection of WebHdfs client. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5363) Create SPENGO-authenticated connection in URLConnectionFactory instead WebHdfsFileSystem
[ https://issues.apache.org/jira/browse/HDFS-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5363: - Issue Type: Sub-task (was: Improvement) Parent: HDFS-5305 Create SPENGO-authenticated connection in URLConnectionFactory instead WebHdfsFileSystem Key: HDFS-5363 URL: https://issues.apache.org/jira/browse/HDFS-5363 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5363.000.patch, HDFS-5363.001.patch Currently the WebHdfsSystem class creates the http connection of urls that require SPENGO authentication. This patch moves the above logic to URLConnectionFactory, which is the factory class that supposes to create all http connection of WebHdfs client. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5247) Namenode should close editlog and unlock storage when removing failed storage dir
[ https://issues.apache.org/jira/browse/HDFS-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797045#comment-13797045 ] Suresh Srinivas commented on HDFS-5247: --- This is a rare enough problem that can be worked around by monitoring the available disk space. This part of the code has been quite brittle. Some of the changes in this area have resulted in more serious bugs and subsequent bug fixes for stabilization. My preference is to leave this as is, since monitoring disk space can avoid this issue. Namenode should close editlog and unlock storage when removing failed storage dir - Key: HDFS-5247 URL: https://issues.apache.org/jira/browse/HDFS-5247 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.1 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 1.2.1 Attachments: HDFS-5247-branch-1.2.patch When one of dfs.name.dir failed, namenode didn't close editlog and unlock the storage: java24764 hadoop 78uW REG 252,320 393219 /volume1/nn/dfs/in_use.lock (deleted) java24764 hadoop 107u REG 252,32 1155072 393229 /volume1/nn/dfs/current/edits.new (deleted) java24764 hadoop 119u REG 252,320 393238 /volume1/nn/dfs/current/fstime.tmp java24764 hadoop 140u REG 252,32 1761805 393239 /volume1/nn/dfs/current/edits If this dir is limit of space, then restore this storage may fail. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5358) Add replication field to PathBasedCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797052#comment-13797052 ] Chris Nauroth commented on HDFS-5358: - I have a failure in {{TestOfflineEditsViewer#testStored}} since this patch. It looks like we forgot to commit an updated editsStored binary file. [~cmccabe] or [~andrew.wang], do you still have the correct version locally, and if so, would you please commit it? Thanks! Add replication field to PathBasedCacheDirective Key: HDFS-5358 URL: https://issues.apache.org/jira/browse/HDFS-5358 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: HDFS-4949 Attachments: HDFS-5358-caching.001.patch, HDFS-5358-caching.002.patch Add a 'replication' field to PathBasedCacheDirective, so that administrators can configure how many cached replicas of a block the cluster should try to maintain. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5096) Automatically cache new data added to a cached path
[ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5096: --- Attachment: HDFS-5096-caching.012.patch Automatically cache new data added to a cached path --- Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Andrew Wang Assignee: Colin Patrick McCabe Attachments: HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch, HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch, HDFS-5096-caching.012.patch For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5358) Add replication field to PathBasedCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797055#comment-13797055 ] Andrew Wang commented on HDFS-5358: --- Probably the same binary diff issue as last time. I'm +1 if anyone wants to just commit new files, seems unnecessary to do another JIRA. Add replication field to PathBasedCacheDirective Key: HDFS-5358 URL: https://issues.apache.org/jira/browse/HDFS-5358 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: HDFS-4949 Attachments: HDFS-5358-caching.001.patch, HDFS-5358-caching.002.patch Add a 'replication' field to PathBasedCacheDirective, so that administrators can configure how many cached replicas of a block the cluster should try to maintain. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
Kousuke Saruta created HDFS-5370: Summary: Typo in Error Message: different between range in condition and range in error message Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0 Reporter: Kousuke Saruta Priority: Minor Fix For: 3.0.0 In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated HDFS-5370: - Attachment: HDFS-5370.patch I've attached a patch for this issue. Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0 Reporter: Kousuke Saruta Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated HDFS-5370: - Status: Patch Available (was: Open) Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0 Reporter: Kousuke Saruta Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797078#comment-13797078 ] Tsz Wo (Nicholas), SZE commented on HDFS-5283: -- +1 patch looks good. Since isInSnapshot() is being called holding the writeLock, hasReadlock() returning false ... It is a bug. Let's fix it separately. I will file a JIRA. NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5360) Improvement of usage message of renameSnapshot and deleteSnapshot
[ https://issues.apache.org/jira/browse/HDFS-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797076#comment-13797076 ] Shinichi Yamashita commented on HDFS-5360: -- Thank you for your comment. I agree with you. The information of the argument uses a included in USAGE. So, we should confirm whether the number of arguments is right. And I didn't notice about a spelling mistake. Improvement of usage message of renameSnapshot and deleteSnapshot - Key: HDFS-5360 URL: https://issues.apache.org/jira/browse/HDFS-5360 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5360.patch When the argument of hdfs dfs -createSnapshot comamnd is inappropriate, it is displayed as follows. {code} [hadoop@trunk ~]$ hdfs dfs -createSnapshot -createSnapshot: snapshotDir is missing. Usage: hadoop fs [generic options] -createSnapshot snapshotDir [snapshotName] {code} On the other hands, the commands of -renameSnapshot and -deleteSnapshot is displayed as follows. And there are not kind for the user. {code} [hadoop@trunk ~]$ hdfs dfs -renameSnapshot renameSnapshot: args number not 3: 0 [hadoop@trunk ~]$ hdfs dfs -deleteSnapshot deleteSnapshot: args number not 2: 0 {code} It changes -renameSnapshot and -deleteSnapshot to output the message which is similar to -createSnapshot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5360) Improvement of usage message of renameSnapshot and deleteSnapshot
[ https://issues.apache.org/jira/browse/HDFS-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-5360: - Attachment: HDFS-5360.patch I attach a revised patch. Improvement of usage message of renameSnapshot and deleteSnapshot - Key: HDFS-5360 URL: https://issues.apache.org/jira/browse/HDFS-5360 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5360.patch, HDFS-5360.patch When the argument of hdfs dfs -createSnapshot comamnd is inappropriate, it is displayed as follows. {code} [hadoop@trunk ~]$ hdfs dfs -createSnapshot -createSnapshot: snapshotDir is missing. Usage: hadoop fs [generic options] -createSnapshot snapshotDir [snapshotName] {code} On the other hands, the commands of -renameSnapshot and -deleteSnapshot is displayed as follows. And there are not kind for the user. {code} [hadoop@trunk ~]$ hdfs dfs -renameSnapshot renameSnapshot: args number not 3: 0 [hadoop@trunk ~]$ hdfs dfs -deleteSnapshot deleteSnapshot: args number not 2: 0 {code} It changes -renameSnapshot and -deleteSnapshot to output the message which is similar to -createSnapshot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5360) Improvement of usage message of renameSnapshot and deleteSnapshot
[ https://issues.apache.org/jira/browse/HDFS-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797083#comment-13797083 ] Andrew Wang commented on HDFS-5360: --- +1 pending Jenkins, thanks for your contribution Improvement of usage message of renameSnapshot and deleteSnapshot - Key: HDFS-5360 URL: https://issues.apache.org/jira/browse/HDFS-5360 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5360.patch, HDFS-5360.patch When the argument of hdfs dfs -createSnapshot comamnd is inappropriate, it is displayed as follows. {code} [hadoop@trunk ~]$ hdfs dfs -createSnapshot -createSnapshot: snapshotDir is missing. Usage: hadoop fs [generic options] -createSnapshot snapshotDir [snapshotName] {code} On the other hands, the commands of -renameSnapshot and -deleteSnapshot is displayed as follows. And there are not kind for the user. {code} [hadoop@trunk ~]$ hdfs dfs -renameSnapshot renameSnapshot: args number not 3: 0 [hadoop@trunk ~]$ hdfs dfs -deleteSnapshot deleteSnapshot: args number not 2: 0 {code} It changes -renameSnapshot and -deleteSnapshot to output the message which is similar to -createSnapshot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5371) Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled
[ https://issues.apache.org/jira/browse/HDFS-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5371: Description: Currently when dfs.client.test.drop.namenode.response.number is enabled for testing, the client will start failover and try the other NN. But in most of the testing cases we do not need to trigger the client failover here since if the drop-response number is 1 the next response received from the other NN will also be dropped. We can let the client just simply retry the same NN. Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled Key: HDFS-5371 URL: https://issues.apache.org/jira/browse/HDFS-5371 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5371.000.patch Currently when dfs.client.test.drop.namenode.response.number is enabled for testing, the client will start failover and try the other NN. But in most of the testing cases we do not need to trigger the client failover here since if the drop-response number is 1 the next response received from the other NN will also be dropped. We can let the client just simply retry the same NN. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5371) Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled
[ https://issues.apache.org/jira/browse/HDFS-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5371: Attachment: HDFS-5371.000.patch Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled Key: HDFS-5371 URL: https://issues.apache.org/jira/browse/HDFS-5371 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5371.000.patch Currently when dfs.client.test.drop.namenode.response.number is enabled for testing, the client will start failover and try the other NN. But in most of the testing cases we do not need to trigger the client failover here since if the drop-response number is 1 the next response received from the other NN will also be dropped. We can let the client just simply retry the same NN. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5371) Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled
Jing Zhao created HDFS-5371: --- Summary: Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled Key: HDFS-5371 URL: https://issues.apache.org/jira/browse/HDFS-5371 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5367) Restore fsimage locked NameNode too long when the size of fsimage are big
[ https://issues.apache.org/jira/browse/HDFS-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797094#comment-13797094 ] Benoy Antony commented on HDFS-5367: +1. Solves problem on our clusters. Please review and commit. John , could you please provide a patch for trunk as well ? Restore fsimage locked NameNode too long when the size of fsimage are big - Key: HDFS-5367 URL: https://issues.apache.org/jira/browse/HDFS-5367 Project: Hadoop HDFS Issue Type: Improvement Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-5367-branch-1.2.patch Our cluster have 40G fsimage, we write one copy of edit log to NFS. After NFS temporary failed, when doing checkpoint, NameNode try to recover it, and it will save 40G fsimage to NFS, it takes some time ( 40G/128MB/s = 320 seconds) , and it locked FSNamesystem, and this bring down our cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5283: - Resolution: Fixed Fix Version/s: 2.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I have committed this. Thanks, Vinay! NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5372) In FSNamesystem, hasReadLock() returns false if the current thread holds the write lock
Tsz Wo (Nicholas), SZE created HDFS-5372: Summary: In FSNamesystem, hasReadLock() returns false if the current thread holds the write lock Key: HDFS-5372 URL: https://issues.apache.org/jira/browse/HDFS-5372 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE This bug was discovered by [~vinayrpet] in [this comment|https://issues.apache.org/jira/browse/HDFS-5283?focusedCommentId=13796752page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13796752]. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5096) Automatically cache new data added to a cached path
[ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797122#comment-13797122 ] Colin Patrick McCabe commented on HDFS-5096: bq. hdfs-default.xml: Let's document dfs.namenode.path.based.cache.refresh.interval.ms. Added. bq. IntrusiveCollection#addFirst: This method appears to be only called from test code. Do you want to keep it, or is it better to delete it? It's a pretty small function. I'd like to keep it in case it's needed later. Since we have a doubly-linked list, being able to add at the beginning or the end is a nice feature. bq. TestPathBasedCacheRequests#waitForCachedBlocks: This is another spot where I think we should use GenericTestUtils#waitFor. Even though the JUnit-level timeouts would abort, this tends to leave the process hanging around. GenericTestUtils#waitFor would throw and exit more cleanly. Good idea. Automatically cache new data added to a cached path --- Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Andrew Wang Assignee: Colin Patrick McCabe Attachments: HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch, HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch, HDFS-5096-caching.012.patch For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated HDFS-5370: - Assignee: Kousuke Saruta Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5358) Add replication field to PathBasedCacheDirective
[ https://issues.apache.org/jira/browse/HDFS-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797130#comment-13797130 ] Chris Nauroth commented on HDFS-5358: - Thanks, Andrew. I just committed a fix. The problem was that editsStored didn't have the replication field on the {{AddPathBasedCacheDirectiveOp}}, so it would fail in deserialization. The editsStored.xml file was already updated to include replication though. The easiest thing to do was to run offline edits viewer to convert editsStored.xml to editsStored binary and check that in. Note however that the test won't pass until HDFS-5096 goes in. During that code review, we found a {{NullPointerException}} in {{setCacheReplication}}. It made sense to fix it over there along with all of the refactoring that happened. Add replication field to PathBasedCacheDirective Key: HDFS-5358 URL: https://issues.apache.org/jira/browse/HDFS-5358 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: HDFS-4949 Attachments: HDFS-5358-caching.001.patch, HDFS-5358-caching.002.patch Add a 'replication' field to PathBasedCacheDirective, so that administrators can configure how many cached replicas of a block the cluster should try to maintain. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated HDFS-5370: - Hadoop Flags: Reviewed Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Minor Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated HDFS-5370: - Affects Version/s: 2.2.0 Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Minor Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated HDFS-5370: - Fix Version/s: 2.2.1 Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Minor Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797136#comment-13797136 ] Tsuyoshi OZAWA commented on HDFS-5370: -- +1, LGTM. Pending Jenkins. Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Minor Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5313) NameNode hangs during startup trying to apply OP_ADD_PATH_BASED_CACHE_DIRECTIVE.
[ https://issues.apache.org/jira/browse/HDFS-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5313. - Resolution: Duplicate Assignee: Chris Nauroth I've confirmed that HDFS-5096 fixes this bug. NameNode hangs during startup trying to apply OP_ADD_PATH_BASED_CACHE_DIRECTIVE. Key: HDFS-5313 URL: https://issues.apache.org/jira/browse/HDFS-5313 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth During namenode startup, if the edits contain a {{OP_ADD_PATH_BASED_CACHE_DIRECTIVE}} for an existing file, then the process hangs while trying to apply the op. This is because of a call to {{FSDirectory#setCacheReplication}}, which calls {{FSDirectory#waitForReady}}, but of course nothing is ever going to mark the directory ready, because it's still in the process of loading. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5096) Automatically cache new data added to a cached path
[ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797144#comment-13797144 ] Chris Nauroth commented on HDFS-5096: - +1 for the patch, pending resolution of feedback from Andrew too. Thanks very much, Colin! I've had a chance to take this patch for a manual test run too in a pseudo-distributed deployment. I created some files in a directory, and then applied a cache directive on that directory. All of the existing files got cached relatively quickly due to {{CacheReplicationMonitor#kick}}. Next, I added some new files in the same directory. After the {{dfs.namenode.path.based.cache.refresh.interval.ms}} elapsed, {{CacheReplicationMonitor}} scanned again and cached the new files. I ran pmap to confirm that the block files were memory-mapped into the datanode process. I also put my namenode through a restart to confirm that we had fixed the hanging problem I reported in HDFS-5313. I'll close that issue now. It all looks good! Automatically cache new data added to a cached path --- Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Andrew Wang Assignee: Colin Patrick McCabe Attachments: HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch, HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch, HDFS-5096-caching.012.patch For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797164#comment-13797164 ] Hudson commented on HDFS-5283: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4612 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4612/]) Add TestOpenFilesWithSnapshot.java for HDFS-5283. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532860) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java HDFS-5283. Under construction blocks only inside snapshots should not be counted in safemode threshhold. Contributed by Vinay (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532857) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Blocker Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5373) hdfs cacheadmin -addDirective short usage does not mention -replication parameter.
Chris Nauroth created HDFS-5373: --- Summary: hdfs cacheadmin -addDirective short usage does not mention -replication parameter. Key: HDFS-5373 URL: https://issues.apache.org/jira/browse/HDFS-5373 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth The short description of hdfs cacheadmin -addDirective does not mention that you can set the -replication parameter. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5373) hdfs cacheadmin -addDirective short usage does not mention -replication parameter.
[ https://issues.apache.org/jira/browse/HDFS-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797184#comment-13797184 ] Chris Nauroth commented on HDFS-5373: - The long usage does mention the -replication parameter. The problem is limited to just the short usage. This was probably just a minor oversight from HDFS-5358. hdfs cacheadmin -addDirective short usage does not mention -replication parameter. -- Key: HDFS-5373 URL: https://issues.apache.org/jira/browse/HDFS-5373 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth The short description of hdfs cacheadmin -addDirective does not mention that you can set the -replication parameter. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5373) hdfs cacheadmin -addDirective short usage does not mention -replication parameter.
[ https://issues.apache.org/jira/browse/HDFS-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5373: Priority: Trivial (was: Major) hdfs cacheadmin -addDirective short usage does not mention -replication parameter. -- Key: HDFS-5373 URL: https://issues.apache.org/jira/browse/HDFS-5373 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial The short description of hdfs cacheadmin -addDirective does not mention that you can set the -replication parameter. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5363) Create SPENGO-authenticated connection in URLConnectionFactory instead WebHdfsFileSystem
[ https://issues.apache.org/jira/browse/HDFS-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797188#comment-13797188 ] Hadoop QA commented on HDFS-5363: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608755/HDFS-5363.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpURLTimeouts {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5212//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5212//console This message is automatically generated. Create SPENGO-authenticated connection in URLConnectionFactory instead WebHdfsFileSystem Key: HDFS-5363 URL: https://issues.apache.org/jira/browse/HDFS-5363 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5363.000.patch, HDFS-5363.001.patch Currently the WebHdfsSystem class creates the http connection of urls that require SPENGO authentication. This patch moves the above logic to URLConnectionFactory, which is the factory class that supposes to create all http connection of WebHdfs client. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Work started] (HDFS-5373) hdfs cacheadmin -addDirective short usage does not mention -replication parameter.
[ https://issues.apache.org/jira/browse/HDFS-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5373 started by Chris Nauroth. hdfs cacheadmin -addDirective short usage does not mention -replication parameter. -- Key: HDFS-5373 URL: https://issues.apache.org/jira/browse/HDFS-5373 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-5373.1.patch The short description of hdfs cacheadmin -addDirective does not mention that you can set the -replication parameter. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5373) hdfs cacheadmin -addDirective short usage does not mention -replication parameter.
[ https://issues.apache.org/jira/browse/HDFS-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5373: Attachment: HDFS-5373.1.patch Here is a trivial patch to update the short usage string. I also updated testCacheAdminConf.xml so that it tries passing -replication. [~andrew.wang] or [~cmccabe], does this look good? hdfs cacheadmin -addDirective short usage does not mention -replication parameter. -- Key: HDFS-5373 URL: https://issues.apache.org/jira/browse/HDFS-5373 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-5373.1.patch The short description of hdfs cacheadmin -addDirective does not mention that you can set the -replication parameter. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5373) hdfs cacheadmin -addDirective short usage does not mention -replication parameter.
[ https://issues.apache.org/jira/browse/HDFS-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797200#comment-13797200 ] Andrew Wang commented on HDFS-5373: --- +1 thanks Chris hdfs cacheadmin -addDirective short usage does not mention -replication parameter. -- Key: HDFS-5373 URL: https://issues.apache.org/jira/browse/HDFS-5373 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-5373.1.patch The short description of hdfs cacheadmin -addDirective does not mention that you can set the -replication parameter. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797215#comment-13797215 ] Hadoop QA commented on HDFS-5370: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608762/HDFS-5370.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5213//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5213//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5213//console This message is automatically generated. Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Minor Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5373) hdfs cacheadmin -addDirective short usage does not mention -replication parameter.
[ https://issues.apache.org/jira/browse/HDFS-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5373. - Resolution: Fixed Fix Version/s: HDFS-4949 Hadoop Flags: Reviewed Thanks, Andrew. I've committed this to the HDFS-4949 branch. hdfs cacheadmin -addDirective short usage does not mention -replication parameter. -- Key: HDFS-5373 URL: https://issues.apache.org/jira/browse/HDFS-5373 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: HDFS-4949 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: HDFS-4949 Attachments: HDFS-5373.1.patch The short description of hdfs cacheadmin -addDirective does not mention that you can set the -replication parameter. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797228#comment-13797228 ] Suresh Srinivas commented on HDFS-5370: --- +1 for the change. I do not think the Jenkins -1 is related to this straightforward patch. Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Minor Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5370: -- Priority: Trivial (was: Minor) Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Trivial Fix For: 3.0.0, 2.2.1 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5096) Automatically cache new data added to a cached path
[ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5096: --- Attachment: HDFS-5096-caching.002.patch thanks, Chris. minor fixup here: the rescan thread now removes CacheBlock objects from the pending uncached list for a DN if the nodes are no longer cached on that DN. Automatically cache new data added to a cached path --- Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Andrew Wang Assignee: Colin Patrick McCabe Attachments: HDFS-5096-caching.002.patch, HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch, HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch, HDFS-5096-caching.012.patch For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797246#comment-13797246 ] Hudson commented on HDFS-5370: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4616 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4616/]) HDFS-5370. Typo in Error Message: different between range in condition and range in error message. Contributed by Kousuke Saruta. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532899) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Trivial Fix For: 2.2.1 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5370) Typo in Error Message: different between range in condition and range in error message
[ https://issues.apache.org/jira/browse/HDFS-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5370: -- Resolution: Fixed Fix Version/s: (was: 3.0.0) Status: Resolved (was: Patch Available) I have committed the patch to branch-2.2 and other branches leading up to it. Thank you Kousuke Saruta. Typo in Error Message: different between range in condition and range in error message --- Key: HDFS-5370 URL: https://issues.apache.org/jira/browse/HDFS-5370 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Trivial Fix For: 2.2.1 Attachments: HDFS-5370.patch In DFSInputStream#getBlockAt, there is an if statement with a condition using = but the error message says . -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5346) Avoid calling
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5346: - Summary: Avoid calling (was: Replication queues should not be initialized in the middle of IBR processing.) Avoid calling -- Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5346) Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5346: - Summary: Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing (was: Avoid calling ) Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5346) Replication queues should not be initialized in the middle of IBR processing.
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797248#comment-13797248 ] Kihwal Lee commented on HDFS-5346: -- bq. We realized we can set dfs.namenode.replqueue.threshold-pct to 1.0 or even 1.5 to make sure that only when the NN enters the Safemode extension period are the replication queues initialized. Thanks for the analysis, Ravi. As you said, setting this config to something 1.0 will prevent the replication queues from being initialized in the middle of block report processing. Since the main loop of SafeModeMonitor in trunk/branch-2 and leaveSafeMode() called by SafeModeMonitor in branch-0.23 are acquiring FSN lock, nothing will get in the way between replication queue initialization and leaving safe mode and cause delays. +1 The patch looks good. I will change the title of this jira to reflect the actual change. Replication queues should not be initialized in the middle of IBR processing. - Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5346) Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5346: - Description: When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. Update: The main issue can be resolved by config. The other issue of calling was:When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. Update: The main issue can be resolved by config. The other issue of calling -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5346) Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5346: - Description: When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. Update: The main issue can be resolved by config. The other issue of calling getNumLiveDataNodes() for each block in the block report will be addressed in this jira was: When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. Update: The main issue can be resolved by config. The other issue of calling Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. Update: The main issue can be resolved by config. The other issue of calling getNumLiveDataNodes() for each block in the block report will be addressed in this jira -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5346) Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5346: - Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to branch-0.23, branch-2 and trunk. Thanks for working on the fix, Ravi. Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 3.0.0, 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. Update: The main issue can be resolved by config. The other issue of calling getNumLiveDataNodes() for each block in the block report will be addressed in this jira -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5096) Automatically cache new data added to a cached path
[ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797270#comment-13797270 ] Andrew Wang commented on HDFS-5096: --- I'm +1 pending some nitty things. It's mostly just rolling through my previous comments. Great work Colin! CachedBlock * Add a little more to the {{CachedBlock#triplets}} javadoc that specifies {{element, prev, next}}. You could even just copy the javadoc from {{BlockInfo}}. * getDatanodes javadoc should mention pending uncached blocks too * Class javadoc explaining the use of the GSet and IntrusiveCollection * A short is 16 bits, the comment on {{replicationAndMark}} indicates it's 8 bits. CacheReplicationMonitor * I think this should be a {{=}}: {code} if (numCached neededCached) { {code} Follow-on work (some might just be part of HDFS-5366): * Refactor out a separate {{CacheReplicationPolicy}} class with more smarts (is this HDFS-5366?) * Take into account DN decomissioning status when doing caching/uncaching, this should be easy to fix as part of HDFS-5366 * Incremental kicking of the CRMon on PBCE changes * Kicking on a DN failure Automatically cache new data added to a cached path --- Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Andrew Wang Assignee: Colin Patrick McCabe Attachments: HDFS-5096-caching.002.patch, HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch, HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch, HDFS-5096-caching.012.patch For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5374) Remove deadcode in DFSOutputStream
Suresh Srinivas created HDFS-5374: - Summary: Remove deadcode in DFSOutputStream Key: HDFS-5374 URL: https://issues.apache.org/jira/browse/HDFS-5374 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Priority: Trivial Attachments: HDFS-4374.patch Deadcode: {code} if (one.isHeartbeatPacket()) { //heartbeat packet } {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5374) Remove deadcode in DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas reassigned HDFS-5374: - Assignee: Suresh Srinivas Remove deadcode in DFSOutputStream -- Key: HDFS-5374 URL: https://issues.apache.org/jira/browse/HDFS-5374 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Suresh Srinivas Priority: Trivial Attachments: HDFS-4374.patch Deadcode: {code} if (one.isHeartbeatPacket()) { //heartbeat packet } {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5346) Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing
[ https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797275#comment-13797275 ] Hudson commented on HDFS-5346: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4618 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4618/]) HDFS-5346. Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing. Contributed by Ravi Prakash. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532915) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing Key: HDFS-5346 URL: https://issues.apache.org/jira/browse/HDFS-5346 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Ravi Prakash Fix For: 3.0.0, 2.3.0, 0.23.10 Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch When initial block reports are being processed, checkMode() is called from incrementSafeBlockCount(). This causes the replication queues to be initialized in the middle of processing a block report in the IBR processing mode. If there are many block reports waiting to be processed, SafeModeMonitor won't be able to make name node leave the safe mode soon. It appears that the block report processing speed degrades considerably during this time. Update: The main issue can be resolved by config. The other issue of calling getNumLiveDataNodes() for each block in the block report will be addressed in this jira -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5374) Remove deadcode in DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5374: -- Status: Patch Available (was: Open) Remove deadcode in DFSOutputStream -- Key: HDFS-5374 URL: https://issues.apache.org/jira/browse/HDFS-5374 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Suresh Srinivas Priority: Trivial Attachments: HDFS-4374.patch Deadcode: {code} if (one.isHeartbeatPacket()) { //heartbeat packet } {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5374) Remove deadcode in DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5374: -- Attachment: HDFS-4374.patch Removed the dead code. I also fixed some typos and java warnings. Remove deadcode in DFSOutputStream -- Key: HDFS-5374 URL: https://issues.apache.org/jira/browse/HDFS-5374 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Suresh Srinivas Priority: Trivial Attachments: HDFS-4374.patch Deadcode: {code} if (one.isHeartbeatPacket()) { //heartbeat packet } {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5371) Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled
[ https://issues.apache.org/jira/browse/HDFS-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5371: Status: Patch Available (was: Open) Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled Key: HDFS-5371 URL: https://issues.apache.org/jira/browse/HDFS-5371 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5371.000.patch Currently when dfs.client.test.drop.namenode.response.number is enabled for testing, the client will start failover and try the other NN. But in most of the testing cases we do not need to trigger the client failover here since if the drop-response number is 1 the next response received from the other NN will also be dropped. We can let the client just simply retry the same NN. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5374) Remove deadcode in DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797296#comment-13797296 ] Brandon Li commented on HDFS-5374: -- +1 Remove deadcode in DFSOutputStream -- Key: HDFS-5374 URL: https://issues.apache.org/jira/browse/HDFS-5374 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Suresh Srinivas Priority: Trivial Attachments: HDFS-4374.patch Deadcode: {code} if (one.isHeartbeatPacket()) { //heartbeat packet } {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5375) hdfs.cmd does not expose several snapshot commands.
[ https://issues.apache.org/jira/browse/HDFS-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5375: Attachment: HDFS-5375.1.patch Here is a patch to add the commands to the cmd file. Thanks to [~rramya] for finding and reporting the bug. hdfs.cmd does not expose several snapshot commands. --- Key: HDFS-5375 URL: https://issues.apache.org/jira/browse/HDFS-5375 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-5375.1.patch We need to update hdfs.cmd to expose the snapshotDiff and lsSnapshottableDir commands on Windows. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5375) hdfs.cmd does not expose several snapshot commands.
[ https://issues.apache.org/jira/browse/HDFS-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5375: Status: Patch Available (was: Open) hdfs.cmd does not expose several snapshot commands. --- Key: HDFS-5375 URL: https://issues.apache.org/jira/browse/HDFS-5375 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-5375.1.patch We need to update hdfs.cmd to expose the snapshotDiff and lsSnapshottableDir commands on Windows. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5375) hdfs.cmd does not expose several snapshot commands.
Chris Nauroth created HDFS-5375: --- Summary: hdfs.cmd does not expose several snapshot commands. Key: HDFS-5375 URL: https://issues.apache.org/jira/browse/HDFS-5375 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor We need to update hdfs.cmd to expose the snapshotDiff and lsSnapshottableDir commands on Windows. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5366) recaching improvements
[ https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797319#comment-13797319 ] Andrew Wang commented on HDFS-5366: --- One interesting idea from the block replication code is having priorities for replication work based on the current and expected replication factor. Maybe a 0 of 3 case should be rescheduled elsewhere more quickly than the 10.5 minute dead datanode interval, while we let a mild case of 2 of 3 sit. I don't think this will require tracking our own list of stale or dead nodes, just a list of nodes we've already tried for an outstanding request. We reset if we've tried all targets. I seem to remember the block recovery code or something doing this. Avoiding stale nodes might also be good enough, if we think that heartbeats are a good proxy for the DN's ability to cache/uncache. This probably isn't true for uncaching though, since as you've noted, a hung client could just hold onto a ZCR lease. recaching improvements -- Key: HDFS-5366 URL: https://issues.apache.org/jira/browse/HDFS-5366 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe There are a few things about our HDFS-4949 recaching strategy that could be improved. * We should monitor the DN's maximum and current mlock'ed memory consumption levels, so that we don't ask the DN to do stuff it can't. * We should not try to initiate caching on stale DataNodes (although we should not recache things stored on such nodes until they're declared dead). * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few times before giving up. Currently, we only send it once. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5375) hdfs.cmd does not expose several snapshot commands.
[ https://issues.apache.org/jira/browse/HDFS-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797317#comment-13797317 ] Jing Zhao commented on HDFS-5375: - The patch looks pretty good to me. +1 hdfs.cmd does not expose several snapshot commands. --- Key: HDFS-5375 URL: https://issues.apache.org/jira/browse/HDFS-5375 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-5375.1.patch We need to update hdfs.cmd to expose the snapshotDiff and lsSnapshottableDir commands on Windows. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5096) Automatically cache new data added to a cached path
[ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5096: --- Attachment: HDFS-5096-caching.014.patch Automatically cache new data added to a cached path --- Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Andrew Wang Assignee: Colin Patrick McCabe Attachments: HDFS-5096-caching.002.patch, HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch, HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch, HDFS-5096-caching.012.patch, HDFS-5096-caching.014.patch For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5366) recaching improvements
[ https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797325#comment-13797325 ] Colin Patrick McCabe commented on HDFS-5366: As andrew pointed out on HDFS-5096, we should also kick the CRMon on a DN failure. We should also avoid scheduling new work on decommissioning nodes (as well as stale nodes) recaching improvements -- Key: HDFS-5366 URL: https://issues.apache.org/jira/browse/HDFS-5366 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe There are a few things about our HDFS-4949 recaching strategy that could be improved. * We should monitor the DN's maximum and current mlock'ed memory consumption levels, so that we don't ask the DN to do stuff it can't. * We should not try to initiate caching on stale DataNodes (although we should not recache things stored on such nodes until they're declared dead). * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few times before giving up. Currently, we only send it once. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5096) Automatically cache new data added to a cached path
[ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797324#comment-13797324 ] Colin Patrick McCabe commented on HDFS-5096: bq. Add a little more to the CachedBlock#triplets javadoc that specifies element, prev, next. You could even just copy the javadoc from BlockInfo. ok bq. Class javadoc explaining the use of the GSet and IntrusiveCollection ok bq. A short is 16 bits, the comment on replicationAndMark indicates it's 8 bits. fixed bq. I think this should be a =: agree for the follow-on work, I added a comment about kicking on a DN failure and avoiding decomissioned DNs to HDFS-5366 incremental rescan is down the road, I think. we should do the pool management stuff before that... thanks. will commit shortly. Automatically cache new data added to a cached path --- Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Andrew Wang Assignee: Colin Patrick McCabe Attachments: HDFS-5096-caching.002.patch, HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch, HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch, HDFS-5096-caching.012.patch For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5371) Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled
[ https://issues.apache.org/jira/browse/HDFS-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797350#comment-13797350 ] Hadoop QA commented on HDFS-5371: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608767/HDFS-5371.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5216//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5216//console This message is automatically generated. Let client retry the same NN when dfs.client.test.drop.namenode.response.number is enabled Key: HDFS-5371 URL: https://issues.apache.org/jira/browse/HDFS-5371 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5371.000.patch Currently when dfs.client.test.drop.namenode.response.number is enabled for testing, the client will start failover and try the other NN. But in most of the testing cases we do not need to trigger the client failover here since if the drop-response number is 1 the next response received from the other NN will also be dropped. We can let the client just simply retry the same NN. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5336) DataNode should not output 'StartupProgress' metrics
[ https://issues.apache.org/jira/browse/HDFS-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797354#comment-13797354 ] Chris Nauroth commented on HDFS-5336: - Thanks for the patch, Akira. I built and verified that startup progress metrics only showed up in namenode and not datanode. bq. Change the context of the startup metrics from 'default' to 'dfs'. Unfortunately, I think this would be backwards-incompatible. For example, if someone was using metrics filtering, then their filtering configuration under the old context would stop working. Can we please remove this part of the change? DataNode should not output 'StartupProgress' metrics Key: HDFS-5336 URL: https://issues.apache.org/jira/browse/HDFS-5336 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.1.0-beta Environment: trunk Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: metrics Attachments: HDFS-5336.patch I found the following metrics output from DataNode. {code} 1381355455731 default.StartupProgress: Hostname=trunk, ElapsedTime=0, PercentComplete=0.0, LoadingFsImageCount=0, LoadingFsImageElapsedTime=0, LoadingFsImageTotal=0, LoadingFsImagePercentComplete=0.0, LoadingEditsCount=0, LoadingEditsElapsedTime=0, LoadingEditsTotal=0, LoadingEditsPercentComplete=0.0, SavingCheckpointCount=0, SavingCheckpointElapsedTime=0, SavingCheckpointTotal=0, SavingCheckpointPercentComplete=0.0, SafeModeCount=0, SafeModeElapsedTime=0, SafeModeTotal=0, SafeModePercentComplete=0.0 {code} DataNode should not output 'StartupProgress' metrics because the metrics shows the progress of NameNode startup. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5096) Automatically cache new data added to a cached path
[ https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-5096. Resolution: Fixed Fix Version/s: HDFS-4949 Target Version/s: HDFS-4949 thanks for the reviews, Andrew and Chris. Automatically cache new data added to a cached path --- Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Andrew Wang Assignee: Colin Patrick McCabe Fix For: HDFS-4949 Attachments: HDFS-5096-caching.002.patch, HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch, HDFS-5096-caching.009.patch, HDFS-5096-caching.010.patch, HDFS-5096-caching.011.patch, HDFS-5096-caching.012.patch, HDFS-5096-caching.014.patch For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5376) Incremental rescanning of cached blocks and cache entries
Andrew Wang created HDFS-5376: - Summary: Incremental rescanning of cached blocks and cache entries Key: HDFS-5376 URL: https://issues.apache.org/jira/browse/HDFS-5376 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-4949 Reporter: Andrew Wang Assignee: Andrew Wang {{CacheReplicationMonitor#rescan}} is invoked whenever a new cache entry is added or removed. This involves a complete rescan of all cache entries and cached blocks, which is potentially expensive. It'd be better to do an incremental scan instead. This would also let us incrementally re-scan on namespace changes like rename and create for better caching latency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5360) Improvement of usage message of renameSnapshot and deleteSnapshot
[ https://issues.apache.org/jira/browse/HDFS-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797339#comment-13797339 ] Hadoop QA commented on HDFS-5360: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608765/HDFS-5360.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5214//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5214//console This message is automatically generated. Improvement of usage message of renameSnapshot and deleteSnapshot - Key: HDFS-5360 URL: https://issues.apache.org/jira/browse/HDFS-5360 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5360.patch, HDFS-5360.patch When the argument of hdfs dfs -createSnapshot comamnd is inappropriate, it is displayed as follows. {code} [hadoop@trunk ~]$ hdfs dfs -createSnapshot -createSnapshot: snapshotDir is missing. Usage: hadoop fs [generic options] -createSnapshot snapshotDir [snapshotName] {code} On the other hands, the commands of -renameSnapshot and -deleteSnapshot is displayed as follows. And there are not kind for the user. {code} [hadoop@trunk ~]$ hdfs dfs -renameSnapshot renameSnapshot: args number not 3: 0 [hadoop@trunk ~]$ hdfs dfs -deleteSnapshot deleteSnapshot: args number not 2: 0 {code} It changes -renameSnapshot and -deleteSnapshot to output the message which is similar to -createSnapshot. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5336) DataNode should not output 'StartupProgress' metrics
[ https://issues.apache.org/jira/browse/HDFS-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5336: Attachment: HDFS-5336.2.patch DataNode should not output 'StartupProgress' metrics Key: HDFS-5336 URL: https://issues.apache.org/jira/browse/HDFS-5336 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.1.0-beta Environment: trunk Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: metrics Attachments: HDFS-5336.2.patch, HDFS-5336.patch I found the following metrics output from DataNode. {code} 1381355455731 default.StartupProgress: Hostname=trunk, ElapsedTime=0, PercentComplete=0.0, LoadingFsImageCount=0, LoadingFsImageElapsedTime=0, LoadingFsImageTotal=0, LoadingFsImagePercentComplete=0.0, LoadingEditsCount=0, LoadingEditsElapsedTime=0, LoadingEditsTotal=0, LoadingEditsPercentComplete=0.0, SavingCheckpointCount=0, SavingCheckpointElapsedTime=0, SavingCheckpointTotal=0, SavingCheckpointPercentComplete=0.0, SafeModeCount=0, SafeModeElapsedTime=0, SafeModeTotal=0, SafeModePercentComplete=0.0 {code} DataNode should not output 'StartupProgress' metrics because the metrics shows the progress of NameNode startup. -- This message was sent by Atlassian JIRA (v6.1#6144)