[jira] [Updated] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5428: Attachment: HDFS-5428.001.patch Upload a new patch that replaces the block but without replacing the inodefile. under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815766#comment-13815766 ] Vinay commented on HDFS-5428: - bq. So here my question is whether it's possible that we just replace the last block of the snapshot INode with a BlockInfoUC (but without replacing the INodeFile with an INodeFileUC)? If we replace the problem is, if the same INode is referring to a completed file [ might be due to rename and leaserecovery ] in normal path and replacing a last block in this INode may not be correct. And one more problem here is the snapshotUCMap will not always contains the latest snapshot inode which will be written to fsmage as underconstruction file. for ex: 1. when the file is being written, after allocating block b1, take snapshot s1 2. File is renamed. 3. Now the file is closed by lease recovery. and appended again one more block b2, and before closing one more snapshot is taken s2 4. and finally file is deleted. 5. Now while writing the inode tree to fsimage, inode in s2 comes first and then s1 , then only INode in s1 will be marked as underconstruction. but actual underconstruction is INode in S2 snapshot under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815775#comment-13815775 ] Jing Zhao commented on HDFS-5428: - bq. if the same INode is referring to a completed file [ might be due to rename and leaserecovery ] in normal path We will replace the whole Inode if it is in normal path. We only replace its last block if the file is only in snapshot. But next time when we do the checkpoint again, we may need to check a file's last block to decide whether it's a fileUC. Another option here is that we replace the inode for all the cases. To cover the challenge that we cannot get the full snapshot path, we can use the inode id to get the inode first, then scan the diff list of its parent to do the replacement. This will be inefficient but might be ok in case that we do not have a lot of snapshots and inodeUC. bq. Now while writing the inode tree to fsimage, inode in s2 comes first and then s1 , then only INode in s1 will be marked as underconstruction. but actual underconstruction is INode in S2 snapshot For rename, we will only have one INode here, which is referenced by two INodeReference instances stored in s1 and s2. And since we only record inode id in snapshotUCMap, this scenario might be fine? under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5411) Update Bookkeeper dependency to 4.2.1
[ https://issues.apache.org/jira/browse/HDFS-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815819#comment-13815819 ] Rakesh R commented on HDFS-5411: Thanks a lot for giving a try with 4.2.2. I'll take a look at this. Update Bookkeeper dependency to 4.2.1 - Key: HDFS-5411 URL: https://issues.apache.org/jira/browse/HDFS-5411 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Robert Rati Priority: Minor Attachments: HDFS-5411.patch Update the bookkeeper dependency to 4.2.1. This eases compilation on Fedora platforms -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815821#comment-13815821 ] Vinay commented on HDFS-5428: - bq. We will replace the whole Inode if it is in normal path. Here we will replace whole Inode only if its underconstruction. What if the same file is closed and present in some other path.? bq. Another option here is that we replace the inode for all the cases. To cover the challenge that we cannot get the full snapshot path, we can use the inode id to get the inode first, then scan the diff list of its parent to do the replacement. This will be inefficient but might be ok in case that we do not have a lot of snapshots and inodeUC. To what level of scanning we can do..? And how we can find out the all previous locations of the inode. same INode might be renamed to different locations in snapshot bq. For rename, we will only have one INode here, which is referenced by two INodeReference instances stored in s1 and s2. And since we only record inode id in snapshotUCMap, this scenario might be fine? I am not sure about this. As far as I have seen while debugging if there is any modification done (such as adding one more block) on snapshotted node, a new inode instance will be saved inside snaphot diffs, not the INodeReference. INodeReference will be used only if there is no modification between two inodes attributes other than name. Actually I got this point, because I have already faced these problems while preparing my patch. under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815827#comment-13815827 ] Hadoop QA commented on HDFS-5428: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612548/HDFS-5428.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5353//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5353//console This message is automatically generated. under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815873#comment-13815873 ] Vinay commented on HDFS-5443: - Patch will not clear the blocks in this case. 1. rename underconstruction file/directory with 0-sized blocks after snapshot 2. delete the renamed directory. because INode is saved to snapshot while renaming itself. so updation will not happen during deletion. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch, HDFS-5443.000.patch This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815876#comment-13815876 ] Vinay commented on HDFS-5443: - But, without making this patch much complex, if want to go ahead for committing, I have no objection as this will be covered anyway in HDFS-5428. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch, HDFS-5443.000.patch This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5472) Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816175#comment-13816175 ] Arpit Agarwal commented on HDFS-5472: - +1 for the patch. I will commit it shortly. Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark --- Key: HDFS-5472 URL: https://issues.apache.org/jira/browse/HDFS-5472 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5472_20131106.patch - DatanodeDescriptor should be initialized with updateHeartbeat for updating the timestamps. - NNThroughputBenchmark should create DatanodeRegistrations with real datanode UUIDs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5472) Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDFS-5472. - Resolution: Fixed Fix Version/s: Heterogeneous Storage (HDFS-2832) Hadoop Flags: Reviewed Committed this to branch HDFS-2832. Thanks Nicholas. Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark --- Key: HDFS-5472 URL: https://issues.apache.org/jira/browse/HDFS-5472 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: Heterogeneous Storage (HDFS-2832) Attachments: h5472_20131106.patch - DatanodeDescriptor should be initialized with updateHeartbeat for updating the timestamps. - NNThroughputBenchmark should create DatanodeRegistrations with real datanode UUIDs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816184#comment-13816184 ] Brandon Li commented on HDFS-5252: -- Thank you, Jing, for the review. I've committed the patch. Stable write is not handled correctly in someplace -- Key: HDFS-5252 URL: https://issues.apache.org/jira/browse/HDFS-5252 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch When the client asks for a stable write but the prerequisite writes are not transferred to NFS gateway, the stableness can't be honored. NFS gateway has to treat the write as unstable write and set the flag to UNSTABLE in the write response. One bug was found during test with Ubuntu client when copying one 1KB file. For small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC flag). However, NFS gateway missed one place where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated to UNSTABLE. With this bug, the client thinks the write is on disk and thus doesn't send COMMIT anymore. The following test tries to read the data back and of course fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5252: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Stable write is not handled correctly in someplace -- Key: HDFS-5252 URL: https://issues.apache.org/jira/browse/HDFS-5252 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch When the client asks for a stable write but the prerequisite writes are not transferred to NFS gateway, the stableness can't be honored. NFS gateway has to treat the write as unstable write and set the flag to UNSTABLE in the write response. One bug was found during test with Ubuntu client when copying one 1KB file. For small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC flag). However, NFS gateway missed one place where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated to UNSTABLE. With this bug, the client thinks the write is on disk and thus doesn't send COMMIT anymore. The following test tries to read the data back and of course fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-2832: Attachment: H2832_20131107.patch Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816196#comment-13816196 ] Hudson commented on HDFS-5252: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4700 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4700/]) HDFS-5252. Stable write is not handled correctly in someplace. Contributed by Brandon Li (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539740) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/READ3Request.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/Nfs3Utils.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Stable write is not handled correctly in someplace -- Key: HDFS-5252 URL: https://issues.apache.org/jira/browse/HDFS-5252 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch When the client asks for a stable write but the prerequisite writes are not transferred to NFS gateway, the stableness can't be honored. NFS gateway has to treat the write as unstable write and set the flag to UNSTABLE in the write response. One bug was found during test with Ubuntu client when copying one 1KB file. For small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC flag). However, NFS gateway missed one place where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated to UNSTABLE. With this bug, the client thinks the write is on disk and thus doesn't send COMMIT anymore. The following test tries to read the data back and of course fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5252: - Fix Version/s: 2.2.1 Stable write is not handled correctly in someplace -- Key: HDFS-5252 URL: https://issues.apache.org/jira/browse/HDFS-5252 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.2.1 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch When the client asks for a stable write but the prerequisite writes are not transferred to NFS gateway, the stableness can't be honored. NFS gateway has to treat the write as unstable write and set the flag to UNSTABLE in the write response. One bug was found during test with Ubuntu client when copying one 1KB file. For small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC flag). However, NFS gateway missed one place where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated to UNSTABLE. With this bug, the client thinks the write is on disk and thus doesn't send COMMIT anymore. The following test tries to read the data back and of course fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5443: Summary: Delete 0-sized block when deleting an under-construction file that is included in snapshot (was: Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.) Delete 0-sized block when deleting an under-construction file that is included in snapshot -- Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch, HDFS-5443.000.patch Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5443: Description: Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. was: This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch, HDFS-5443.000.patch Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816211#comment-13816211 ] Andrew Wang commented on HDFS-5326: --- bq. reordering methods I think you missed one reordering in FSEditLog :) bq. Let's do this as part of HDFS-5471 if it looks good... similarly with refactoring pc#checkPermission. OK, I'll cross post my cleanup comments there. +1 once addressed (and the Findbugs warning), thanks Colin. add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, HDFS-5326.006.patch, HDFS-5326.007.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816218#comment-13816218 ] Jing Zhao commented on HDFS-5443: - bq. because INode is saved to snapshot while renaming itself. so updation will not happen during deletion. Thanks for the comments Vinay! I still think our current rename implementation will not lead to this scenario. But let's continue this discussion in HDFS-5428 and add possible fix there. I will commit the current patch shortly. Delete 0-sized block when deleting an under-construction file that is included in snapshot -- Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: sathish Attachments: 5443-test.patch, HDFS-5443.000.patch Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5471) CacheAdmin -listPools fails when pools exist that user does not have permissions to
[ https://issues.apache.org/jira/browse/HDFS-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816220#comment-13816220 ] Andrew Wang commented on HDFS-5471: --- Colin asked to bump some CacheManager cleanup work from HDFS-5326 to this JIRA, cross-posting: * Add and modify aren't that different besides the difference in required, optional, and default fields. I just first validate all present fields in the directive, then enforce required fields, then fill in default values. * Modify and remove have the same checks for an existing entry * Add and modify have the same checks for an existing cache pool * All three do write checks to a cache pool, moving this into FSPermissionChecker or a method was an easy savings * success/fail logs are inconsistently formatted. I'd like something like e.g. methodName: successfully verb directive directive and methodName: failed to verb noun parameters:, e {code} LOG.warn(addDirective + directive + : failed, e); LOG.info(addDirective + directive + : succeeded.); ... LOG.warn(modifyDirective + idString + : error, e); LOG.info(modifyDirective + idString + : applied + directive); ... LOG.warn(removeDirective + id + failed, e); LOG.info(removeDirective + id + : removed); {code} CacheAdmin -listPools fails when pools exist that user does not have permissions to --- Key: HDFS-5471 URL: https://issues.apache.org/jira/browse/HDFS-5471 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0 Reporter: Stephen Chu When a user does not have read permissions to a cache pool and executes hdfs cacheadmin -listPools the command will error complaining about missing required fields with something like: {code} [schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools Exception in thread main org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): Message missing required fields: ownerName, groupName, mode, weight at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ListCachePoolsResponseElementProto$Builder.build(ClientNamenodeProtocolProtos.java:51722) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCachePools(ClientNamenodeProtocolServerSideTranslatorPB.java:1200) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2057) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1515) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2051) at org.apache.hadoop.hdfs.tools.CacheAdmin$ListCachePoolsCommand.run(CacheAdmin.java:675) at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:85) at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:90) [schu@hdfs-nfs ~]$ {code} In this example, the pool root has 750 permissions, and the root superuser is able to successfully -listPools: {code} [root@hdfs-nfs ~]# hdfs cacheadmin -listPools Found 4 results. NAME OWNER GROUP MODE WEIGHT bar root root rwxr-xr-x 100 foo root root rwxr-xr-x 100 root root root rwxr-x--- 100 schu root root rwxr-xr-x 100 [root@hdfs-nfs ~]# {code} When we modify the root pool to mode 755, schu user can now -listPools successfully without error. {code} [schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools Found 4 results. NAME OWNER GROUP MODE WEIGHT bar root root rwxr-xr-x 100 foo root root rwxr-xr-x 100 root root root rwxr-xr-x 100 schu root root rwxr-xr-x 100 [schu@hdfs-nfs ~]$ {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5443: Resolution: Fixed Fix Version/s: 2.3.0 Assignee: Jing Zhao (was: sathish) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Delete 0-sized block when deleting an under-construction file that is included in snapshot -- Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Fix For: 2.3.0 Attachments: 5443-test.patch, HDFS-5443.000.patch Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816235#comment-13816235 ] Hudson commented on HDFS-5443: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4701 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4701/]) HDFS-5443. Delete 0-sized block when deleting an under-construction file that is included in snapshot. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539754) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFileUnderConstruction.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotBlocksMap.java Delete 0-sized block when deleting an under-construction file that is included in snapshot -- Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Fix For: 2.3.0 Attachments: 5443-test.patch, HDFS-5443.000.patch Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5326: --- Attachment: (was: HDFS-5326.008.patch) add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, HDFS-5326.006.patch, HDFS-5326.007.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5326: --- Attachment: HDFS-5326.008.patch fix findbugs warning add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, HDFS-5326.006.patch, HDFS-5326.007.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5326: --- Attachment: HDFS-5326.008.patch complete reordering add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, HDFS-5326.006.patch, HDFS-5326.007.patch, HDFS-5326.008.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5471) CacheAdmin -listPools fails when pools exist that user does not have permissions to
[ https://issues.apache.org/jira/browse/HDFS-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5471: --- Assignee: Andrew Wang CacheAdmin -listPools fails when pools exist that user does not have permissions to --- Key: HDFS-5471 URL: https://issues.apache.org/jira/browse/HDFS-5471 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0 Reporter: Stephen Chu Assignee: Andrew Wang When a user does not have read permissions to a cache pool and executes hdfs cacheadmin -listPools the command will error complaining about missing required fields with something like: {code} [schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools Exception in thread main org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): Message missing required fields: ownerName, groupName, mode, weight at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ListCachePoolsResponseElementProto$Builder.build(ClientNamenodeProtocolProtos.java:51722) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCachePools(ClientNamenodeProtocolServerSideTranslatorPB.java:1200) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2057) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1515) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2051) at org.apache.hadoop.hdfs.tools.CacheAdmin$ListCachePoolsCommand.run(CacheAdmin.java:675) at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:85) at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:90) [schu@hdfs-nfs ~]$ {code} In this example, the pool root has 750 permissions, and the root superuser is able to successfully -listPools: {code} [root@hdfs-nfs ~]# hdfs cacheadmin -listPools Found 4 results. NAME OWNER GROUP MODE WEIGHT bar root root rwxr-xr-x 100 foo root root rwxr-xr-x 100 root root root rwxr-x--- 100 schu root root rwxr-xr-x 100 [root@hdfs-nfs ~]# {code} When we modify the root pool to mode 755, schu user can now -listPools successfully without error. {code} [schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools Found 4 results. NAME OWNER GROUP MODE WEIGHT bar root root rwxr-xr-x 100 foo root root rwxr-xr-x 100 root root root rwxr-xr-x 100 schu root root rwxr-xr-x 100 [schu@hdfs-nfs ~]$ {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5475) NN should not allow more than one replica per storage
Arpit Agarwal created HDFS-5475: --- Summary: NN should not allow more than one replica per storage Key: HDFS-5475 URL: https://issues.apache.org/jira/browse/HDFS-5475 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal NN chooses a provisional target storage when allocating a new block and records that block in the blockList of that storage. However the datanode is free to choose a different storage for the block. On the next block report the NN ends up with two blockList entries for the same replica+DN combination. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5475) NN incorrectly tracks more than one replica per DN
[ https://issues.apache.org/jira/browse/HDFS-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5475: Summary: NN incorrectly tracks more than one replica per DN (was: NN should not allow more than one replica per storage) NN incorrectly tracks more than one replica per DN -- Key: HDFS-5475 URL: https://issues.apache.org/jira/browse/HDFS-5475 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal NN chooses a provisional target storage when allocating a new block and records that block in the blockList of that storage. However the datanode is free to choose a different storage for the block. On the next block report the NN ends up with two blockList entries for the same replica+DN combination. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5468) CacheAdmin help command does not recognize commands
[ https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816340#comment-13816340 ] Colin Patrick McCabe commented on HDFS-5468: +1. The audit warning is bogus based on it not finding an apache release header on some pid files that were left over from a previous jenkins job CacheAdmin help command does not recognize commands --- Key: HDFS-5468 URL: https://issues.apache.org/jira/browse/HDFS-5468 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.3.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-5468.patch Currently, the hdfs cacheadmin -help command will not recognize correct command inputs: {code} [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help {code} In the code, we strip the input command of leading hyphens, but then compare it to the command names, which are all prefixed by a hyphen. Also, cacheadmin -removeDirectives requires specifying a path with -path but -path is not shown in the usage. We should fix this as well. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5468) CacheAdmin help command does not recognize commands
[ https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5468: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) CacheAdmin help command does not recognize commands --- Key: HDFS-5468 URL: https://issues.apache.org/jira/browse/HDFS-5468 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.3.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5468.patch Currently, the hdfs cacheadmin -help command will not recognize correct command inputs: {code} [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help {code} In the code, we strip the input command of leading hyphens, but then compare it to the command names, which are all prefixed by a hyphen. Also, cacheadmin -removeDirectives requires specifying a path with -path but -path is not shown in the usage. We should fix this as well. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5364: - Attachment: HDFS-5364.008.patch Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816367#comment-13816367 ] Brandon Li commented on HDFS-5364: -- {quote}2 and 3 are optimization of the eviction method. As we discussed offline, I will file a following up JIRA for that.{quote} The new patch added the optimization of the eviction method. Also the scan() method is not holding the lock all the time. A unit test is added to test the scan() method. Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816378#comment-13816378 ] Chris Nauroth commented on HDFS-5394: - I just tested with patch version 7, and the datanode didn't uncache previously cached blocks after receiving the DNA_CACHE message. Debug logging shows that it's due to the following logic in {{FsDatasetCache#uncacheBlock}}. I assume {{case CACHED}} should be doing the same as the {{default}} block and submitting an {{UncachingTask}}. {code} case CACHED: if (LOG.isDebugEnabled()) { LOG.debug(Block with id + blockId + , pool + bpid + + does not need to be uncached, because it is + in state + prevValue.state + .); } break; {code} fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5468) CacheAdmin help command does not recognize commands
[ https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816384#comment-13816384 ] Hudson commented on HDFS-5468: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4702 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4702/]) HDFS-5468. CacheAdmin help command does not recognize commands (Stephen Chu via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539786) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testCacheAdminConf.xml CacheAdmin help command does not recognize commands --- Key: HDFS-5468 URL: https://issues.apache.org/jira/browse/HDFS-5468 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.3.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 3.0.0 Attachments: HDFS-5468.patch Currently, the hdfs cacheadmin -help command will not recognize correct command inputs: {code} [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools Sorry, I don't know the command 'listPools'. Valid command names are: -addDirective, -removeDirective, -removeDirectives, -listDirectives, -addPool, -modifyPool, -removePool, -listPools, -help {code} In the code, we strip the input command of leading hyphens, but then compare it to the command names, which are all prefixed by a hyphen. Also, cacheadmin -removeDirectives requires specifying a path with -path but -path is not shown in the usage. We should fix this as well. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816394#comment-13816394 ] Tsz Wo (Nicholas), SZE commented on HDFS-2832: -- With a billion nodes the probability of a collision in a 128-bit space is less than 1 in 10^20. ... Let n be the number of possible IDs. Let m be the number of nodes. The probability of no collision is P = n!/((n-m)! n^m). Put n=2^128 and m=10^9, we have * P ~= 0.853063206294150856 The probability of collision is * 1-P ~= 1.4693679370584914464 * 10^(-21) 10^(-20). However, randomly generated UUIDs only have 122 random bits accoring to [Wikipedia|http://en.wikipedia.org/wiki/UUID#Random_UUID_probability_of_duplicates]. Now put n=2^122 and m=10^9, we have * P ~= 0.9990596045202825654743 The probability of collision is * 1-P ~= 9.403954797174345257 * 10^(-20) 10^(-19) Similar result can be obtained using approximation P ~= exp(-m^2/(2*n)). Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816397#comment-13816397 ] Tsz Wo (Nicholas), SZE commented on HDFS-2832: -- ... Even though unlikely, a collision if it happens creates a serious problem for the system integrity. Does it concern you? It depends on how small the probability is - certainly not for 10^(-19). - Below is quoted from [Wikipedia|http://en.wikipedia.org/wiki/UUID#Random_UUID_probability_of_duplicates] {quote} To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion, which means the probability is about 0.006 (6 × 10^(−11)), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs. {quote} - I beg you have heard [risk of cosmic rays|http://stackoverflow.com/questions/2580933/cosmic-rays-what-is-the-probability-they-will-affect-a-program] argurment. Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816399#comment-13816399 ] Hadoop QA commented on HDFS-5364: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612683/HDFS-5364.008.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5356//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5356//console This message is automatically generated. Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816403#comment-13816403 ] Colin Patrick McCabe commented on HDFS-5394: Good catch. That was a bug introduced by the latest round of shuffling everything around. the default and CACHED cases were switched. fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5394: --- Attachment: HDFS-5394.008.patch fix uncaching issue discovered by chris fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch, HDFS-5394.008.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5364: - Attachment: HDFS-5364.009.patch Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, HDFS-5364.009.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816421#comment-13816421 ] Jing Zhao commented on HDFS-5364: - Thanks for addressing all the comments, Brandon! The new patch looks good to me. +1 pending Jenkins. Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, HDFS-5364.009.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5376) Incremental rescanning of cached blocks and cache entries
[ https://issues.apache.org/jira/browse/HDFS-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5376: --- Issue Type: Wish (was: Sub-task) Parent: (was: HDFS-4949) Incremental rescanning of cached blocks and cache entries - Key: HDFS-5376 URL: https://issues.apache.org/jira/browse/HDFS-5376 Project: Hadoop HDFS Issue Type: Wish Components: namenode Affects Versions: HDFS-4949 Reporter: Andrew Wang Assignee: Andrew Wang {{CacheReplicationMonitor#rescan}} is invoked whenever a new cache entry is added or removed. This involves a complete rescan of all cache entries and cached blocks, which is potentially expensive. It'd be better to do an incremental scan instead. This would also let us incrementally re-scan on namespace changes like rename and create for better caching latency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816457#comment-13816457 ] Jing Zhao commented on HDFS-5428: - From HDFS-5443: bq. Patch will not clear the blocks in this case. So I checked the rename case. Looks like we have a bug there and we fail to clean the blocks for INodeFile/INodeFileUnderConstruction in some cases after rename. I will fix it in a new jira. under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816455#comment-13816455 ] Hadoop QA commented on HDFS-5326: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612666/HDFS-5326.008.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5355//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5355//console This message is automatically generated. add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, HDFS-5326.006.patch, HDFS-5326.007.patch, HDFS-5326.008.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816474#comment-13816474 ] Colin Patrick McCabe commented on HDFS-5326: As described earlier, the test failure is just the fact that jenkins failed to apply the binary diff to the editsStored file. Eclipse:eclipse has been failing today in several other jobs... it seems to be an environment issue. Thanks for the +1. Will commit shortly. add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, HDFS-5326.006.patch, HDFS-5326.007.patch, HDFS-5326.008.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816492#comment-13816492 ] Andrew Wang commented on HDFS-5394: --- Thanks for bumping Colin, basically just rollup in this review: bq. Could this be written with value.state == State.CACHING_CANCELLED instead? My point here was about the logic, since I did a find usages on CACHING_CANCELLED in Eclipse and only saw it being set. Right now it checks not CACHED which should be equivalent to is CACHING_CANCELLED because of the state transition invariants, and ideally with this kind of logic, we transition based on being *in* a state rather than *not being* in a state. bq. I would rather not do that, since right now we can look at entries in the map and instantly know that anything in state UNCACHING has an associated Runnable scheduled in the Executor. I guess this makes sense in light of HDFS-5182, since uncaching might require waiting for clients while cancelling caching shouldn't. In either case though, something needs to happen, it's just that instead of deferring the work to an UncachingTask, it's deferred to the end of the CachingTask. bq. waitFor Makes sense, though I'll note that 6,000,000 is 100 minutes, not ten minutes :) Overkill. bq. catching FileNotFoundException This is better, thanks. As a general comment, I'd like to avoid relying on NN retries if possible, but I guess it's okay for now. Test: * Do we need that {{Preconditions}} check in {{setUp}}? There's already an assumeTrue for the same thing right above it, so I don't think it'll do anything. * I'd like to see the {{LogVerificationAppender}} used in {{testUncachingBlocksBeforeCachingFinishes}} too. This seems like it might be flaky though. What was wrong with the old approach that used a barrier to force ordering? Also need to run through the Jenkins stuff still. The javac warning is fine (the new usage of Unsafe to get the page size) but the rest needs to be touched up. Not sure about the test failure. fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch, HDFS-5394.008.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Work stopped] (HDFS-5166) caching PB cleanups
[ https://issues.apache.org/jira/browse/HDFS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5166 stopped by Colin Patrick McCabe. caching PB cleanups --- Key: HDFS-5166 URL: https://issues.apache.org/jira/browse/HDFS-5166 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Once we have a better idea of what we need in the RPCs, let's do some protobuf cleanups on the caching RPCs. We may want to factor some fields out into a common type, for example. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5166) caching PB cleanups
[ https://issues.apache.org/jira/browse/HDFS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-5166. Resolution: Duplicate caching PB cleanups --- Key: HDFS-5166 URL: https://issues.apache.org/jira/browse/HDFS-5166 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Once we have a better idea of what we need in the RPCs, let's do some protobuf cleanups on the caching RPCs. We may want to factor some fields out into a common type, for example. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5166) caching PB cleanups
[ https://issues.apache.org/jira/browse/HDFS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816496#comment-13816496 ] Colin Patrick McCabe commented on HDFS-5166: we did this as part of HDFS-5326 caching PB cleanups --- Key: HDFS-5166 URL: https://issues.apache.org/jira/browse/HDFS-5166 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Once we have a better idea of what we need in the RPCs, let's do some protobuf cleanups on the caching RPCs. We may want to factor some fields out into a common type, for example. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5326: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed to trunk add modifyDirective to cacheAdmin - Key: HDFS-5326 URL: https://issues.apache.org/jira/browse/HDFS-5326 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, HDFS-5326.006.patch, HDFS-5326.007.patch, HDFS-5326.008.patch We should add a way of modifying cache directives on the command-line, similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Work started] (HDFS-5166) caching PB cleanups
[ https://issues.apache.org/jira/browse/HDFS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5166 started by Colin Patrick McCabe. caching PB cleanups --- Key: HDFS-5166 URL: https://issues.apache.org/jira/browse/HDFS-5166 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Once we have a better idea of what we need in the RPCs, let's do some protobuf cleanups on the caching RPCs. We may want to factor some fields out into a common type, for example. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816735#comment-13816735 ] Colin Patrick McCabe commented on HDFS-5394: bq. CACHING_CANCELLED discussion yeah, it does make more sense to explicitly check for the states we expect to be in, rather than having a catch-all. I have changed this to use {{Precondition}} to assert that we are in the correct state, since that seemed more appropriate, and also to be clearer about needing to be in the {{CACHING}} or {{CACHING_CANCELLED}} state there. bq. Makes sense, though I'll note that 6,000,000 is 100 minutes, not ten minutes Overkill. Noted. Reduced this to 10 minutes, which should be ample. bq. Do we need that Preconditions check in setUp? There's already an assumeTrue for the same thing right above it, so I don't think it'll do anything. No, it's a repeat of the previous one. Removed. bq. I'd like to see the LogVerificationAppender used in testUncachingBlocksBeforeCachingFinishes too. This seems like it might be flaky though. What was wrong with the old approach that used a barrier to force ordering? The problem is we don't have a barrier in all the places we would need it. We'd need to know that the DN had received the DN_CACHE heartbeat response and initiated caching during the 3-second window it has to do so, in order to know that we would later see a log message about cancellation. To check for the log message would be, as you guessed, flaky and we don't need another flaky test. I'd like to keep a LogVerificationAppender for this test in mind as a future improvement, but still get this fix committed soon since HDFS-5366, HDFS-5320, HDFS-5451, and HDFS-5431 all depend on this patch to some extent. Perhaps we can roll a test improvement for this into HDFS-5451, since that JIRA is all about debuggability and logging. fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch, HDFS-5394.008.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5394: --- Attachment: HDFS-5394.009.patch rebase on trunk reduce test timeouts add preconditions fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5451) add more debugging for cache rescan
[ https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816750#comment-13816750 ] Andrew Wang commented on HDFS-5451: --- Cross-posting my comment from HDFS-5394 as a follow on for here: bq. I'd like to see the LogVerificationAppender used in testUncachingBlocksBeforeCachingFinishes too. This seems like it might be flaky though. What was wrong with the old approach that used a barrier to force ordering? add more debugging for cache rescan --- Key: HDFS-5451 URL: https://issues.apache.org/jira/browse/HDFS-5451 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Andrew Wang It would be nice to have message at DEBUG level that described all the decisions we made for cache entries. That way we could turn on this debugging to get more information. We should also store the number of bytes each PBCE wanted, and the number of bytes it got, plus the number of inodes it got, and output those in {{listDirectives}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816749#comment-13816749 ] Andrew Wang commented on HDFS-5394: --- +1 pending Jenkins, thanks Colin. I'll cross-post the LogVerificationAppender improvement ot HDFS-5451, agree we should get rolling on the rest. fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816753#comment-13816753 ] Hadoop QA commented on HDFS-5364: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612696/HDFS-5364.009.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5358//console This message is automatically generated. Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, HDFS-5364.009.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5451) add more debugging for cache rescan
[ https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816774#comment-13816774 ] Colin Patrick McCabe commented on HDFS-5451: one way to support using the {{LogVerificationAppender}} in {{testUncachingBlocksBeforeCachingFinishes}} would be to use {{Mockito}} to detect when we had started caching on the DN, and only have the test proceed after that. add more debugging for cache rescan --- Key: HDFS-5451 URL: https://issues.apache.org/jira/browse/HDFS-5451 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Andrew Wang It would be nice to have message at DEBUG level that described all the decisions we made for cache entries. That way we could turn on this debugging to get more information. We should also store the number of bytes each PBCE wanted, and the number of bytes it got, plus the number of inodes it got, and output those in {{listDirectives}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5366) recaching improvements
[ https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816802#comment-13816802 ] Chris Nauroth commented on HDFS-5366: - I tested this patch and found that blocks were never uncaching. The NameNode never sent DNA_UNCACHE messages to the DataNode. The reason is that there are separate calls to {{DatanodeManager#getCacheCommand}} to get the DNA_CACHE set followed by the DNA_UNCACHE set. The method internally resets the last message time for the DataNode. This means that when it's time to send messages, the first call for the DNA_CACHE messages succeeds and resets the clock for that DataNode to right now. Then, the second call for the DNA_UNCACHE messages always returns null, because it looks like it's not time to send messages. To solve this, we need to set the DataNode's last caching directive sent time just once, after calculating both the DNA_CACHE and DNA_UNCACHE commands. I changed the code as follows to do this. Feel free to incorporate it into the next patch. (I'm not uploading a new patch right now, because I don't want to detangle it out of the HDFS-5394 patch applied in my environment.) In {{DatanodeManager#handleHeartbeat}}: {code} long monoTimeMs = Time.monotonicNow(); if (sendCachingCommands) { if ((monoTimeMs - nodeinfo.getLastCachingDirectiveSentTimeMs()) = timeBetweenResendingCachingDirectivesMs) { DatanodeCommand pendingCacheCommand = getCacheCommand( nodeinfo.getPendingCached(), nodeinfo, DatanodeProtocol.DNA_CACHE, blockPoolId); if (pendingCacheCommand != null) { cmds.add(pendingCacheCommand); } DatanodeCommand pendingUncacheCommand = getCacheCommand( nodeinfo.getPendingUncached(), nodeinfo, DatanodeProtocol.DNA_UNCACHE, blockPoolId); if (pendingUncacheCommand != null) { cmds.add(pendingUncacheCommand); } nodeinfo.setLastCachingDirectiveSentTimeMs(monoTimeMs); } } {code} And {{DatanodeManager#getCacheCommand}}: {code} /** * Convert a CachedBlockList into a DatanodeCommand with a list of blocks. * * @param list The {@link CachedBlocksList}. This function * clears the list. * @param datanode The datanode. * @param action The action to perform in the command. * @param poolId The block pool id. * @return A DatanodeCommand to be sent back to the DN, or null if * there is nothing to be done. */ private DatanodeCommand getCacheCommand(CachedBlocksList list, DatanodeDescriptor datanode, int action, String poolId) { int length = list.size(); if (length == 0) { return null; } // Read and clear the existing cache commands. long[] blockIds = new long[length]; int i = 0; for (IteratorCachedBlock iter = list.iterator(); iter.hasNext(); ) { CachedBlock cachedBlock = iter.next(); blockIds[i++] = cachedBlock.getBlockId(); iter.remove(); } return new BlockIdCommand(action, poolId, blockIds); } {code} I re-tested with these changes, and it worked. recaching improvements -- Key: HDFS-5366 URL: https://issues.apache.org/jira/browse/HDFS-5366 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5366-caching.001.patch There are a few things about our HDFS-4949 recaching strategy that could be improved. * We should monitor the DN's maximum and current mlock'ed memory consumption levels, so that we don't ask the DN to do stuff it can't. * We should not try to initiate caching on stale or decomissioning DataNodes (although we should not recache things stored on such nodes until they're declared dead). * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few times before giving up. Currently, we only send it once. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5475) NN incorrectly tracks more than one replica per DN
[ https://issues.apache.org/jira/browse/HDFS-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5475: Attachment: h5475.01.patch Patch to update target storage in {{BlockInfo}} and {{BlockInfoUnderConstruction}} from block reports. This fixes {{TestGetBlocks}}. NN incorrectly tracks more than one replica per DN -- Key: HDFS-5475 URL: https://issues.apache.org/jira/browse/HDFS-5475 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: h5475.01.patch NN chooses a provisional target storage when allocating a new block and records that block in the blockList of that storage. However the datanode is free to choose a different storage for the block. On the next block report the NN ends up with two blockList entries for the same replica+DN combination. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5366) recaching improvements
[ https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816843#comment-13816843 ] Colin Patrick McCabe commented on HDFS-5366: good find, Chris. We definitely should update the {{lastCachingDirectiveSentTimeMs}} just once in that function. As you said, I'm waiting for 5394 to land before rebasing this. I kicked the jenkins build, but it's still pending. recaching improvements -- Key: HDFS-5366 URL: https://issues.apache.org/jira/browse/HDFS-5366 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5366-caching.001.patch There are a few things about our HDFS-4949 recaching strategy that could be improved. * We should monitor the DN's maximum and current mlock'ed memory consumption levels, so that we don't ask the DN to do stuff it can't. * We should not try to initiate caching on stale or decomissioning DataNodes (although we should not recache things stored on such nodes until they're declared dead). * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few times before giving up. Currently, we only send it once. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5475) NN incorrectly tracks more than one replica per DN
[ https://issues.apache.org/jira/browse/HDFS-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5475: - Hadoop Flags: Reviewed +1 patch looks good. NN incorrectly tracks more than one replica per DN -- Key: HDFS-5475 URL: https://issues.apache.org/jira/browse/HDFS-5475 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: h5475.01.patch NN chooses a provisional target storage when allocating a new block and records that block in the blockList of that storage. However the datanode is free to choose a different storage for the block. On the next block report the NN ends up with two blockList entries for the same replica+DN combination. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5475) NN incorrectly tracks more than one replica per DN
[ https://issues.apache.org/jira/browse/HDFS-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDFS-5475. - Resolution: Fixed Fix Version/s: Heterogeneous Storage (HDFS-2832) Thanks for the quick review Nicholas! I committed it to branch HDFS-2832. NN incorrectly tracks more than one replica per DN -- Key: HDFS-5475 URL: https://issues.apache.org/jira/browse/HDFS-5475 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: Heterogeneous Storage (HDFS-2832) Attachments: h5475.01.patch NN chooses a provisional target storage when allocating a new block and records that block in the blockList of that storage. However the datanode is free to choose a different storage for the block. On the next block report the NN ends up with two blockList entries for the same replica+DN combination. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
Jing Zhao created HDFS-5476: --- Summary: Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion Key: HDFS-5476 URL: https://issues.apache.org/jira/browse/HDFS-5476 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree under the DstReference node for file/directory/snapshot deletion. Use case 1: # rename under-construction file with 0-sized blocks after snapshot. # delete the renamed directory. We need to make sure we delete the 0-sized block. Use case 2: # create snapshot s0 for / # create a new file under /foo/bar/ # rename foo -- foo2 # create snapshot s1 # delete bar and foo2 # delete snapshot s1 We need to make sure we delete the file under /foo/bar since it is not included in snapshot s0. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
[ https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5476: Status: Patch Available (was: Open) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion -- Key: HDFS-5476 URL: https://issues.apache.org/jira/browse/HDFS-5476 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5476.001.patch Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree under the DstReference node for file/directory/snapshot deletion. Use case 1: # rename under-construction file with 0-sized blocks after snapshot. # delete the renamed directory. We need to make sure we delete the 0-sized block. Use case 2: # create snapshot s0 for / # create a new file under /foo/bar/ # rename foo -- foo2 # create snapshot s1 # delete bar and foo2 # delete snapshot s1 We need to make sure we delete the file under /foo/bar since it is not included in snapshot s0. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
[ https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5476: Attachment: HDFS-5476.001.patch Upload the initial patch including two unit tests to reproduce the two use cases described above. Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion -- Key: HDFS-5476 URL: https://issues.apache.org/jira/browse/HDFS-5476 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5476.001.patch Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree under the DstReference node for file/directory/snapshot deletion. Use case 1: # rename under-construction file with 0-sized blocks after snapshot. # delete the renamed directory. We need to make sure we delete the 0-sized block. Use case 2: # create snapshot s0 for / # create a new file under /foo/bar/ # rename foo -- foo2 # create snapshot s1 # delete bar and foo2 # delete snapshot s1 We need to make sure we delete the file under /foo/bar since it is not included in snapshot s0. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816882#comment-13816882 ] Jing Zhao commented on HDFS-5428: - Created HDFS-5476 for the rename issue. under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5477) Block manager as a service
Daryn Sharp created HDFS-5477: - Summary: Block manager as a service Key: HDFS-5477 URL: https://issues.apache.org/jira/browse/HDFS-5477 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The block manager needs to evolve towards having the ability to run as a standalone service to improve NN vertical and horizontal scalability. The goal is reducing the memory footprint of the NN proper to support larger namespaces, and improve overall performance by decoupling the block manager from the namespace and its lock. Ideally, a distinct BM will be transparent to clients and DNs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5477) Block manager as a service
[ https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816886#comment-13816886 ] Daryn Sharp commented on HDFS-5477: --- This a joint effort between researchers and coworkers. I will post preliminary design docs soon. Block manager as a service -- Key: HDFS-5477 URL: https://issues.apache.org/jira/browse/HDFS-5477 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The block manager needs to evolve towards having the ability to run as a standalone service to improve NN vertical and horizontal scalability. The goal is reducing the memory footprint of the NN proper to support larger namespaces, and improve overall performance by decoupling the block manager from the namespace and its lock. Ideally, a distinct BM will be transparent to clients and DNs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816897#comment-13816897 ] Hadoop QA commented on HDFS-5394: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612725/HDFS-5394.009.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1545 javac compiler warnings (more than the trunk's current 1544 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5357//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5357//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5357//console This message is automatically generated. fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-2832: Attachment: h2832_20131107b.patch Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch, h2832_20131107b.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5364: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.2.1 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, HDFS-5364.009.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816912#comment-13816912 ] Brandon Li commented on HDFS-5364: -- Thank you, Jing. I've committed the patch. Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.2.1 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, HDFS-5364.009.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache
[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5364: - Fix Version/s: 2.2.1 Add OpenFileCtx cache - Key: HDFS-5364 URL: https://issues.apache.org/jira/browse/HDFS-5364 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.2.1 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, HDFS-5364.009.patch NFS gateway can run out of memory when the stream timeout is set to a relatively long period(e.g., 1 minute) and user uploads thousands of files in parallel. Each stream DFSClient creates a DataStreamer thread, and will eventually run out of memory by creating too many threads. NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816945#comment-13816945 ] Colin Patrick McCabe commented on HDFS-5394: Thanks for the +1, will commit shortly. As mentioned earlier, the javac and the javadoc warning are about the use of {{sun.misc.Unsafe}}, which we can't really avoid here. I will increase {{OK_JAVADOC_WARNINGS}} in {{test-patch.sh}} to prevent warning spew. The javac warning will be ignored automatically on the next build after submission. fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5394: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
[ https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816954#comment-13816954 ] Hadoop QA commented on HDFS-5476: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612754/HDFS-5476.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5359//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5359//console This message is automatically generated. Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion -- Key: HDFS-5476 URL: https://issues.apache.org/jira/browse/HDFS-5476 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5476.001.patch Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree under the DstReference node for file/directory/snapshot deletion. Use case 1: # rename under-construction file with 0-sized blocks after snapshot. # delete the renamed directory. We need to make sure we delete the 0-sized block. Use case 2: # create snapshot s0 for / # create a new file under /foo/bar/ # rename foo -- foo2 # create snapshot s1 # delete bar and foo2 # delete snapshot s1 We need to make sure we delete the file under /foo/bar since it is not included in snapshot s0. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816964#comment-13816964 ] Hudson commented on HDFS-5394: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4704 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4704/]) HDFS-5394: Fix race conditions in DN caching and uncaching (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539909) * /hadoop/common/trunk/dev-support/test-patch.sh * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ClientMmap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java fix race conditions in DN caching and uncaching --- Key: HDFS-5394 URL: https://issues.apache.org/jira/browse/HDFS-5394 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Attachments: HDFS-5394-caching.001.patch, HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch The DN needs to handle situations where it is asked to cache the same replica more than once. (Currently, it can actually do two mmaps and mlocks.) It also needs to handle the situation where caching a replica is cancelled before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5478) File size reports as zero after writing and calling FSDataOutputStream#hsync()
Brett Randall created HDFS-5478: --- Summary: File size reports as zero after writing and calling FSDataOutputStream#hsync() Key: HDFS-5478 URL: https://issues.apache.org/jira/browse/HDFS-5478 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Environment: RHEL/OEL 6u3 Reporter: Brett Randall Using a Java client to write to a FSDataOutputStream. After some data is written and hsync() is called, {{hdfs dfs -get /path/to/file}} gets a file containing the data written so-far, all good. {{hdfs dfs -ls /path/to/file}} however reports a zero-byte file, presumably until the stream is closed (it then shows the correct size). Hue File Browser (running CDH4) also shows zero bytes until the stream is closed. See also http://grokbase.com/t/hadoop/hdfs-user/113j63nrce/zero-file-size-after-hsync which discusses the same problem. After the buffer is flushed it would be good if the reported file size was updated. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
[ https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5476: - Hadoop Flags: Reviewed +1 patch looks good. Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion -- Key: HDFS-5476 URL: https://issues.apache.org/jira/browse/HDFS-5476 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5476.001.patch Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree under the DstReference node for file/directory/snapshot deletion. Use case 1: # rename under-construction file with 0-sized blocks after snapshot. # delete the renamed directory. We need to make sure we delete the 0-sized block. Use case 2: # create snapshot s0 for / # create a new file under /foo/bar/ # rename foo -- foo2 # create snapshot s1 # delete bar and foo2 # delete snapshot s1 We need to make sure we delete the file under /foo/bar since it is not included in snapshot s0. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5479) Fix test failures in Balancer.
Junping Du created HDFS-5479: Summary: Fix test failures in Balancer. Key: HDFS-5479 URL: https://issues.apache.org/jira/browse/HDFS-5479 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Junping Du Many tests failures w.r.t balancer as https://builds.apache.org/job/PreCommit-HDFS-Build/5360/#showFailuresLink shows. -- This message was sent by Atlassian JIRA (v6.1#6144)