[jira] [Commented] (HDFS-12984) BlockPoolSlice can leak in a mini dfs cluster
[ https://issues.apache.org/jira/browse/HDFS-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324092#comment-16324092 ] Robert Joseph Evans commented on HDFS-12984: +1 for committing it. > BlockPoolSlice can leak in a mini dfs cluster > - > > Key: HDFS-12984 > URL: https://issues.apache.org/jira/browse/HDFS-12984 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.5 >Reporter: Robert Joseph Evans >Assignee: Ajay Kumar > Attachments: HDFS-12984.001.patch, Screen Shot 2018-01-05 at 4.38.06 > PM.png, Screen Shot 2018-01-05 at 5.26.54 PM.png, Screen Shot 2018-01-05 at > 5.31.52 PM.png > > > When running some unit tests for storm we found that we would occasionally > get out of memory errors on the HDFS integration tests. > When I got a heap dump I found that the ShutdownHookManager was full of > BlockPoolSlice$1 instances. Which hold a reference to the BlockPoolSlice > which then in turn holds a reference to the DataNode etc > It looks like when shutdown is called on the BlockPoolSlice there is no way > to remove the shut down hook in because no reference to it is saved. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12984) BlockPoolSlice can leak in a mini dfs cluster
[ https://issues.apache.org/jira/browse/HDFS-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317259#comment-16317259 ] Robert Joseph Evans commented on HDFS-12984: Thanks [~ajayydv], Looks good to me I am +1 [~kihwal], It has been a long time since I checked anything into Hadoop. Would you be willing to merge this in, and preferably take a look at it too? > BlockPoolSlice can leak in a mini dfs cluster > - > > Key: HDFS-12984 > URL: https://issues.apache.org/jira/browse/HDFS-12984 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.5 >Reporter: Robert Joseph Evans >Assignee: Ajay Kumar > Attachments: HDFS-12984.001.patch, Screen Shot 2018-01-05 at 4.38.06 > PM.png, Screen Shot 2018-01-05 at 5.26.54 PM.png, Screen Shot 2018-01-05 at > 5.31.52 PM.png > > > When running some unit tests for storm we found that we would occasionally > get out of memory errors on the HDFS integration tests. > When I got a heap dump I found that the ShutdownHookManager was full of > BlockPoolSlice$1 instances. Which hold a reference to the BlockPoolSlice > which then in turn holds a reference to the DataNode etc > It looks like when shutdown is called on the BlockPoolSlice there is no way > to remove the shut down hook in because no reference to it is saved. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12984) BlockPoolSlice can leak in a mini dfs cluster
[ https://issues.apache.org/jira/browse/HDFS-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316626#comment-16316626 ] Robert Joseph Evans commented on HDFS-12984: [~ajayydv], I also ran into issues trying to reproduce this in some environments. Specifically I could never make it happen on my MBP and I don't know why. But if you look at the code inside the BlockPoolSlice https://github.com/apache/hadoop/blob/01f3f2167ec20b52a18bc2cf250fb4229cfd2c14/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L165-L173 If an instance of this is ever created it can never be collected. I am not sure why BlockPoolSlice instances are created some times by a MiniDFSCluster and not others. I am not that familiar with the internals of the DataNode to say off the top of my head. Glad to see you going in the right direction, and I agree that removing everything from the ShutdownHooksManager is far from ideal, but I didn't see this happening, at least not with 2.7.5 and 2.6.2. > BlockPoolSlice can leak in a mini dfs cluster > - > > Key: HDFS-12984 > URL: https://issues.apache.org/jira/browse/HDFS-12984 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.5 >Reporter: Robert Joseph Evans >Assignee: Ajay Kumar > Attachments: Screen Shot 2018-01-05 at 4.38.06 PM.png, Screen Shot > 2018-01-05 at 5.26.54 PM.png, Screen Shot 2018-01-05 at 5.31.52 PM.png > > > When running some unit tests for storm we found that we would occasionally > get out of memory errors on the HDFS integration tests. > When I got a heap dump I found that the ShutdownHookManager was full of > BlockPoolSlice$1 instances. Which hold a reference to the BlockPoolSlice > which then in turn holds a reference to the DataNode etc > It looks like when shutdown is called on the BlockPoolSlice there is no way > to remove the shut down hook in because no reference to it is saved. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12984) BlockPoolSlice can leak in a mini dfs cluster
Robert Joseph Evans created HDFS-12984: -- Summary: BlockPoolSlice can leak in a mini dfs cluster Key: HDFS-12984 URL: https://issues.apache.org/jira/browse/HDFS-12984 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.5 Reporter: Robert Joseph Evans When running some unit tests for storm we found that we would occasionally get out of memory errors on the HDFS integration tests. When I got a heap dump I found that the ShutdownHookManager was full of BlockPoolSlice$1 instances. Which hold a reference to the BlockPoolSlice which then in turn holds a reference to the DataNode etc It looks like when shutdown is called on the BlockPoolSlice there is no way to remove the shut down hook in because no reference to it is saved. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3594) ListPathsServlet should not log a warning for paths that do not exist
[ https://issues.apache.org/jira/browse/HDFS-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3594: -- Resolution: Duplicate Status: Resolved (was: Patch Available) ListPathsServlet should not log a warning for paths that do not exist - Key: HDFS-3594 URL: https://issues.apache.org/jira/browse/HDFS-3594 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Attachments: HDFS-3594.patch, HDFS-3594.patch ListPathsServlet logs a warning message every time someone request a listing for a directory that does not exist. This should be a debug or at most an info message, because the is expected behavior. People will ask for things that do not exist. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3594) ListPathsServlet should not log a warning for paths that do not exist
[ https://issues.apache.org/jira/browse/HDFS-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969564#comment-13969564 ] Robert Joseph Evans commented on HDFS-3594: --- Yup HADOOP-10015 makes the log statement a debug, so it is no longer an issue. ListPathsServlet should not log a warning for paths that do not exist - Key: HDFS-3594 URL: https://issues.apache.org/jira/browse/HDFS-3594 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Attachments: HDFS-3594.patch, HDFS-3594.patch ListPathsServlet logs a warning message every time someone request a listing for a directory that does not exist. This should be a debug or at most an info message, because the is expected behavior. People will ask for things that do not exist. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI
[ https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886702#comment-13886702 ] Robert Joseph Evans commented on HDFS-5852: --- +1 for HDFS-5852.best.txt. I love purple (Y!) :) Change the colors on the hdfs UI Key: HDFS-5852 URL: https://issues.apache.org/jira/browse/HDFS-5852 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Labels: webui Fix For: 2.3.0 Attachments: HDFS-5852.best.txt, HDFS-5852v2.txt, HDFS-5852v3-dkgreen.txt, color-rationale.png, compromise_gray.png, dkgreen.png, hdfs-5852.txt, new_hdfsui_colors.png The HDFS UI colors are too close to HWX green. Here is a patch that steers clear of vendor colors. I made it a blocker thinking this something we'd want to fix before we release apache hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-4948) mvn site for hadoop-hdfs-nfs fails
Robert Joseph Evans created HDFS-4948: - Summary: mvn site for hadoop-hdfs-nfs fails Key: HDFS-4948 URL: https://issues.apache.org/jira/browse/HDFS-4948 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Robert Joseph Evans Running mvn site on trunk results in the following error. {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (default) on project hadoop-hdfs-nfs: An Ant BuildException has occured: Warning: Could not find file /home/evans/src/hadoop-git/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/resources/hdfs-nfs-default.xml to copy. - [Help 1] {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4731) Loading data from HDFS to tape
[ https://issues.apache.org/jira/browse/HDFS-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved HDFS-4731. --- Resolution: Invalid Is this a question of how you would go about doing this? If so please use u...@hadoop.apache.org instead. JIRA is not the place for this. If you are proposing a new feature to be added to Hadoop, please provide better details about the new feature. As this JIRA sounds a lot more like the later I am closing this as. If I am wrong and this is a new feature request please feel free to reopen the JIRA. Loading data from HDFS to tape -- Key: HDFS-4731 URL: https://issues.apache.org/jira/browse/HDFS-4731 Project: Hadoop HDFS Issue Type: Bug Reporter: prashanthi I want to load my HDFS data directly to a tape or external storage device. Please let me know if there is any way to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4632) globStatus using backslash for escaping does not work on Windows
[ https://issues.apache.org/jira/browse/HDFS-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612831#comment-13612831 ] Robert Joseph Evans commented on HDFS-4632: --- HADOOP-8139 is another JIRA that had a lot of discussion on it about this subject. I believe that it was never resolved specifically because of the windows issue. You probably want to read through the discussion there as well. globStatus using backslash for escaping does not work on Windows Key: HDFS-4632 URL: https://issues.apache.org/jira/browse/HDFS-4632 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth {{Path}} normalizes backslashes to forward slashes on Windows. Later, when passed to {{FileSystem#globStatus}}, the path is no longer treated as an escape sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4199) Provide test for HdfsVolumeId
[ https://issues.apache.org/jira/browse/HDFS-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504832#comment-13504832 ] Robert Joseph Evans commented on HDFS-4199: --- The changes look good to me. I am a +1 on them, but I would like feedback from Andrew on this before I check it in. Provide test for HdfsVolumeId - Key: HDFS-4199 URL: https://issues.apache.org/jira/browse/HDFS-4199 Project: Hadoop HDFS Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky Priority: Minor Attachments: HADOOP-9053.patch, HDFS-4199--b.patch, HDFS-4199--c.patch, HDFS-4199--d.patch, HDFS-4199.patch Provide test for HdfsVolumeId to improve the code coverage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4199) Provide test for HdfsVolumeId
[ https://issues.apache.org/jira/browse/HDFS-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500615#comment-13500615 ] Robert Joseph Evans commented on HDFS-4199: --- The changes look fairly simple and straight forward. They also match the code. However, I am just a bit concerned that we are testing/locking in functionality that is arguably wrong. We are testing that new HdfsVolumeId(A, false).equals(new HdfsVolumeId(A, true)). If you look at how the code actually works it starts out by creating a bunch of invalid ids with null for the id. Then it goes off and replaces them with valid IDs once it finds them. I personally don't think that a valid volume ID should ever be equal to an invalid one. I added Andrew who originally wrote this code to see if he can take a look at it and tell us if this is expected behavior on not. Provide test for HdfsVolumeId - Key: HDFS-4199 URL: https://issues.apache.org/jira/browse/HDFS-4199 Project: Hadoop HDFS Issue Type: Test Affects Versions: 2.0.2-alpha Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky Priority: Minor Attachments: HADOOP-9053.patch, HDFS-4199--b.patch, HDFS-4199.patch Provide test for HdfsVolumeId to improve the code coverage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4182: -- Attachment: HDFS-4182.txt This patch fixes an NPE that was found by the existing unit tests and adds in some more tests to validate that the changes are working. I also manually brought up a cluster and saw that the NameCache moved out of initializing. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Attachments: HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4182: -- Attachment: HDFS-4182.txt Adds in requested annotation. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Attachments: HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4182: -- Status: Patch Available (was: Open) SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0 Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Attachments: HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4182: -- Attachment: HDFS-4182.txt Updated to address Suresh's comments. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Attachments: HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13497363#comment-13497363 ] Robert Joseph Evans commented on HDFS-4182: --- Jenkins came back with a +1 and with a +1 for Surash and a +1 for Kihwal, I will check this in. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Attachments: HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13497363#comment-13497363 ] Robert Joseph Evans edited comment on HDFS-4182 at 11/14/12 7:17 PM: - Jenkins came back with a +1 and with a +1 for Suresh and a +1 for Kihwal, I will check this in. was (Author: revans2): Jenkins came back with a +1 and with a +1 for Surash and a +1 for Kihwal, I will check this in. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Attachments: HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13497438#comment-13497438 ] Robert Joseph Evans commented on HDFS-4182: --- The patch is in for trunk, and branch-2. I am working on a patch for branch-0.23 because there were merge conflicts. It looks like the leak exists there too. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Attachments: HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4182: -- Attachment: HDFS-4182-branch-0.23.txt I am attaching the upmerged patch for review. I am still running the unit tests and manual tests to be sure that the leak is plugged. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Attachments: HDFS-4182-branch-0.23.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13497529#comment-13497529 ] Robert Joseph Evans commented on HDFS-4182: --- All of the unit tests for branch-0.23 passed, so with Daryn's +1 I'll check this into branch-0.23 too. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Attachments: HDFS-4182-branch-0.23.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4182: -- Resolution: Fixed Fix Version/s: 0.23.5 2.0.3-alpha 3.0.0 Status: Resolved (was: Patch Available) I put this into trunk, branch-2, and branch-0.23 SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Robert Joseph Evans Priority: Critical Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: HDFS-4182-branch-0.23.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt, HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496262#comment-13496262 ] Robert Joseph Evans commented on HDFS-4182: --- Todd, Are you working on a patch for this? It seems critical enough that I really would like to get a patch in for 0.23.5. But I also don't want to start working on something if you are already doing it. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Priority: Critical We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4182: -- Attachment: HDFS-4182.txt The patch I am attaching does not include any tests yet. I wanted to see if the direction I was going in seemed OK. I changed FSDirectory.reset to also reset the NameCache and mark the directory as not ready. Then in the SecondaryNameNode after loading the new image it informs the FSDirectory that the image was loaded. I am going to run some manual tests and then see if I can write some unit tests for it. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Priority: Critical Attachments: HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4182) SecondaryNameNode leaks NameCache entries
[ https://issues.apache.org/jira/browse/HDFS-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496586#comment-13496586 ] Robert Joseph Evans commented on HDFS-4182: --- Ya I thought about disabling the NameCache, because it is not really needed. If you think that would be less of an impact I am happy to switch over to that instead. SecondaryNameNode leaks NameCache entries - Key: HDFS-4182 URL: https://issues.apache.org/jira/browse/HDFS-4182 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4, 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Priority: Critical Attachments: HDFS-4182.txt We recently saw an issue where a 2NN ran out of memory, even though it had a relatively small fsimage. When we looked at the heap dump, we saw that all of the memory had gone to entries in the NameCache. It appears that the NameCache is staying in initializing mode forever, and therefore a long running 2NN leaks entries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4172) namenode does not URI-encode parameters when building URI for datanode request
[ https://issues.apache.org/jira/browse/HDFS-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495441#comment-13495441 ] Robert Joseph Evans commented on HDFS-4172: --- I only have two very minor comments. # There are tabs in the code in a few places. (mostly in toValueString()) # In StringParam.toValueString() it is probably not necessary to call value.toString(). Very minor. namenode does not URI-encode parameters when building URI for datanode request -- Key: HDFS-4172 URL: https://issues.apache.org/jira/browse/HDFS-4172 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4 Reporter: Derek Dagit Assignee: Derek Dagit Priority: Minor Attachments: HDFS-4172.patch Param values such as foobar or foo=bar Are not escaped in Param.toSortedString() When these are given as, say, token parameter values, a string like token=foobartoken=foo=bar is returned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4172) namenode does not URI-encode parameters when building URI for datanode request
[ https://issues.apache.org/jira/browse/HDFS-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495447#comment-13495447 ] Robert Joseph Evans commented on HDFS-4172: --- I am not sure what happened with the test failure. It timed out in Jenkins, but when I run it manually with the patch it passes. I ran it 4 times to be sure. namenode does not URI-encode parameters when building URI for datanode request -- Key: HDFS-4172 URL: https://issues.apache.org/jira/browse/HDFS-4172 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4 Reporter: Derek Dagit Assignee: Derek Dagit Priority: Minor Attachments: HDFS-4172.patch Param values such as foobar or foo=bar Are not escaped in Param.toSortedString() When these are given as, say, token parameter values, a string like token=foobartoken=foo=bar is returned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4172) namenode does not URI-encode parameters when building URI for datanode request
[ https://issues.apache.org/jira/browse/HDFS-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495501#comment-13495501 ] Robert Joseph Evans commented on HDFS-4172: --- The new patch looks good to me. I am +1. I'll check it in. namenode does not URI-encode parameters when building URI for datanode request -- Key: HDFS-4172 URL: https://issues.apache.org/jira/browse/HDFS-4172 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4 Reporter: Derek Dagit Assignee: Derek Dagit Priority: Minor Attachments: HDFS-4172.patch Param values such as foobar or foo=bar Are not escaped in Param.toSortedString() When these are given as, say, token parameter values, a string like token=foobartoken=foo=bar is returned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4172) namenode does not URI-encode parameters when building URI for datanode request
[ https://issues.apache.org/jira/browse/HDFS-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4172: -- Resolution: Fixed Fix Version/s: 0.23.5 2.0.3-alpha 3.0.0 Status: Resolved (was: Patch Available) Thanks Derek, I put this into trunk, branch-2, and branch-0.23 namenode does not URI-encode parameters when building URI for datanode request -- Key: HDFS-4172 URL: https://issues.apache.org/jira/browse/HDFS-4172 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4 Reporter: Derek Dagit Assignee: Derek Dagit Priority: Minor Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: HDFS-4172.patch Param values such as foobar or foo=bar Are not escaped in Param.toSortedString() When these are given as, say, token parameter values, a string like token=foobartoken=foo=bar is returned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4172) namenode does not URI-encode parameters when building URI for datanode request
[ https://issues.apache.org/jira/browse/HDFS-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494416#comment-13494416 ] Robert Joseph Evans commented on HDFS-4172: --- The patch looks good to me. +1 pending Jenkins. namenode does not URI-encode parameters when building URI for datanode request -- Key: HDFS-4172 URL: https://issues.apache.org/jira/browse/HDFS-4172 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.4 Reporter: Derek Dagit Assignee: Derek Dagit Priority: Minor Attachments: HDFS-4172.patch Param values such as foobar or foo=bar Are not escaped in Param.toSortedString() When these are given as, say, token parameter values, a string like token=foobartoken=foo=bar is returned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK
[ https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reopened HDFS-3809: --- Branch-2 is failing with {noformat} main: [exec] bkjournal.proto:30:12: NamespaceInfoProto is not defined. {noformat} after this was merged in. Please either fix it or revert the change. Make BKJM use protobufs for all serialization with ZK - Key: HDFS-3809 URL: https://issues.apache.org/jira/browse/HDFS-3809 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.0.0, 2.0.3-alpha Attachments: 0004-HDFS-3809-for-branch-2.patch, HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff HDFS uses protobufs for serialization in many places. Protobufs allow fields to be added without breaking bc or requiring new parsing code to be written. For this reason, we should use them in BKJM also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3809) Make BKJM use protobufs for all serialization with ZK
[ https://issues.apache.org/jira/browse/HDFS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487094#comment-13487094 ] Robert Joseph Evans commented on HDFS-3809: --- Thanks for doing that Uma. It looks like there is something about the build scripts that is causing it, because hdfs.proto, where NameSpaceInfoProto is defined, is more or less identical between trunk and branch-2. Make BKJM use protobufs for all serialization with ZK - Key: HDFS-3809 URL: https://issues.apache.org/jira/browse/HDFS-3809 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.0.0, 2.0.3-alpha Attachments: 0004-HDFS-3809-for-branch-2.patch, HDFS-3809.diff, HDFS-3809.diff, HDFS-3809.diff HDFS uses protobufs for serialization in many places. Protobufs allow fields to be added without breaking bc or requiring new parsing code to be written. For this reason, we should use them in BKJM also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3996) Add debug log removed in HDFS-3873 back
[ https://issues.apache.org/jira/browse/HDFS-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3996: -- Fix Version/s: 0.23.5 I pulled this into branch-0.23 too. Add debug log removed in HDFS-3873 back --- Key: HDFS-3996 URL: https://issues.apache.org/jira/browse/HDFS-3996 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Fix For: 2.0.3-alpha, 0.23.5 Attachments: hdfs-3996.txt Per HDFS-3873 let's add the debug log back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3483) Better error message when hdfs fsck is run against a ViewFS config
[ https://issues.apache.org/jira/browse/HDFS-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3483: -- Fix Version/s: 0.23.5 I pulled this into branch-0.23 too. Better error message when hdfs fsck is run against a ViewFS config -- Key: HDFS-3483 URL: https://issues.apache.org/jira/browse/HDFS-3483 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Stephen Chu Assignee: Stephen Fritz Labels: newbie Fix For: 2.0.3-alpha, 0.23.5 Attachments: core-site.xml, HDFS-3483.patch, hdfs-site.xml I'm running a HA + secure + federated cluster. When I run hdfs fsck /nameservices/ha-nn-uri/, I see the following: bash-3.2$ hdfs fsck /nameservices/ha-nn-uri/ FileSystem is viewfs://oracle/ DFSck exiting. Any path I enter will return the same message. Attached are my core-site.xml and hdfs-site.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4016) back-port HDFS-3582 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477051#comment-13477051 ] Robert Joseph Evans commented on HDFS-4016: --- The patch looks good to me. I am running the unit tests and I will try to bring up a small cluster. If everything goes OK I'll check it in. +1 Thanks for the work Ivan. back-port HDFS-3582 to branch-0.23 -- Key: HDFS-4016 URL: https://issues.apache.org/jira/browse/HDFS-4016 Project: Hadoop HDFS Issue Type: Bug Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky Priority: Minor Attachments: HDFS-4016-branch-0.23.patch We suggest a patch that back-ports the change https://issues.apache.org/jira/browse/HDFS-3582 to branch 0.23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4016) back-port HDFS-3582 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4016: -- Resolution: Fixed Fix Version/s: 0.23.5 Status: Resolved (was: Patch Available) Thanks again Ivan. I put this into branch-0.23 back-port HDFS-3582 to branch-0.23 -- Key: HDFS-4016 URL: https://issues.apache.org/jira/browse/HDFS-4016 Project: Hadoop HDFS Issue Type: Bug Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky Priority: Minor Fix For: 0.23.5 Attachments: HDFS-4016-branch-0.23.patch We suggest a patch that back-ports the change https://issues.apache.org/jira/browse/HDFS-3582 to branch 0.23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3224) Bug in check for DN re-registration with different storage ID
[ https://issues.apache.org/jira/browse/HDFS-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474215#comment-13474215 ] Robert Joseph Evans commented on HDFS-3224: --- It looks like a clean port to 0.23. +1 feel free to check it in. Bug in check for DN re-registration with different storage ID - Key: HDFS-3224 URL: https://issues.apache.org/jira/browse/HDFS-3224 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins Assignee: Jason Lowe Priority: Minor Fix For: 2.0.3-alpha Attachments: HDFS-3224-branch0.23.patch, HDFS-3224.patch, HDFS-3224.patch, HDFS-3224.patch, HDFS-3224.patch DatanodeManager#registerDatanode checks the host to node map using an IP:port key, however the map is keyed on IP, so this check will always fail. It's performing the check to determine if a DN with the same IP and storage ID has already registered, and if so to remove this DN from the map and indicate that eg it's no longer hosting these blocks. This bug has been here forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4016) back-port HDFS-3582 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-4016: -- Target Version/s: 0.23.5 back-port HDFS-3582 to branch-0.23 -- Key: HDFS-4016 URL: https://issues.apache.org/jira/browse/HDFS-4016 Project: Hadoop HDFS Issue Type: Bug Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky Priority: Minor Attachments: HDFS-4016-branch-0.23.patch We suggest a patch that back-ports the change https://issues.apache.org/jira/browse/HDFS-3582 to branch 0.23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4016) back-port HDFS-3582 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471652#comment-13471652 ] Robert Joseph Evans commented on HDFS-4016: --- There seems to be a lot in this patch that is not in the original. It looks like you pulled in Time.java so that you could also update GenericTestUtils.waitFor, which is something that is not related to HDFS-3582. In fact it looks like GenericTestUtils does not need to be updated at all. Please revert it and Time.java so if we ever do decide to port HDFS-3641 and others it will not be so confusing or difficult. The same goes for DFSConfigKeys.java. None of the changes in there are used at all. Also the isActive method inside FSEditLog.java looks like it can still be marked as private. Other then that the port looks good. back-port HDFS-3582 to branch-0.23 -- Key: HDFS-4016 URL: https://issues.apache.org/jira/browse/HDFS-4016 Project: Hadoop HDFS Issue Type: Bug Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky Priority: Minor Attachments: HDFS-4016-branch-0.23.patch We suggest a patch that back-ports the change https://issues.apache.org/jira/browse/HDFS-3582 to branch 0.23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3919) MiniDFSCluster:waitClusterUp can hang forever
[ https://issues.apache.org/jira/browse/HDFS-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3919: -- Fix Version/s: 0.23.5 I just pulled this into branch-0.23 MiniDFSCluster:waitClusterUp can hang forever - Key: HDFS-3919 URL: https://issues.apache.org/jira/browse/HDFS-3919 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.1-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Fix For: 2.0.3-alpha, 0.23.5 Attachments: hdfs3919.txt A test run hung due to a known system config issue, but the hang was interesting: {noformat} 2012-09-11 13:22:41,888 WARN hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(925)) - Waiting for the Mini HDFS Cluster to start... 2012-09-11 13:22:42,889 WARN hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(925)) - Waiting for the Mini HDFS Cluster to start... 2012-09-11 13:22:43,889 WARN hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(925)) - Waiting for the Mini HDFS Cluster to start... 2012-09-11 13:22:44,890 WARN hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(925)) - Waiting for the Mini HDFS Cluster to start... {noformat} The MiniDFSCluster should give up after a few seconds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3373) FileContext HDFS implementation can leak socket caches
[ https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465670#comment-13465670 ] Robert Joseph Evans commented on HDFS-3373: --- The 0.23 patch looks like a fairly straight forward port of the trunk version, but what happened to hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestSocketCache.java? FileContext HDFS implementation can leak socket caches -- Key: HDFS-3373 URL: https://issues.apache.org/jira/browse/HDFS-3373 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0 Reporter: Todd Lipcon Assignee: John George Fix For: 2.0.3-alpha Attachments: HDFS-3373.branch-23.patch, HDFS-3373.branch23.patch, HDFS-3373.trunk.patch, HDFS-3373.trunk.patch.1, HDFS-3373.trunk.patch.2, HDFS-3373.trunk.patch.3, HDFS-3373.trunk.patch.3, HDFS-3373.trunk.patch.4 As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, and thus never calls DFSClient.close(). This means that, until finalizers run, DFSClient will hold on to its SocketCache object and potentially have a lot of outstanding sockets/fds held on to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3373) FileContext HDFS implementation can leak socket caches
[ https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465721#comment-13465721 ] Robert Joseph Evans commented on HDFS-3373: --- Makes since. Because it is such a straight forward patch I feel OK checking the code in. Thanks for the work John. FileContext HDFS implementation can leak socket caches -- Key: HDFS-3373 URL: https://issues.apache.org/jira/browse/HDFS-3373 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0 Reporter: Todd Lipcon Assignee: John George Fix For: 2.0.3-alpha Attachments: HDFS-3373.branch-23.patch, HDFS-3373.branch23.patch, HDFS-3373.trunk.patch, HDFS-3373.trunk.patch.1, HDFS-3373.trunk.patch.2, HDFS-3373.trunk.patch.3, HDFS-3373.trunk.patch.3, HDFS-3373.trunk.patch.4 As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, and thus never calls DFSClient.close(). This means that, until finalizers run, DFSClient will hold on to its SocketCache object and potentially have a lot of outstanding sockets/fds held on to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches
[ https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3373: -- Resolution: Fixed Fix Version/s: 0.23.4 Target Version/s: 2.0.0-alpha, 0.23.3 (was: 0.23.3, 2.0.0-alpha) Status: Resolved (was: Patch Available) I pulled this into branch-0.23 too FileContext HDFS implementation can leak socket caches -- Key: HDFS-3373 URL: https://issues.apache.org/jira/browse/HDFS-3373 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0 Reporter: Todd Lipcon Assignee: John George Fix For: 0.23.4, 2.0.3-alpha Attachments: HDFS-3373.branch-23.patch, HDFS-3373.branch23.patch, HDFS-3373.trunk.patch, HDFS-3373.trunk.patch.1, HDFS-3373.trunk.patch.2, HDFS-3373.trunk.patch.3, HDFS-3373.trunk.patch.3, HDFS-3373.trunk.patch.4 As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, and thus never calls DFSClient.close(). This means that, until finalizers run, DFSClient will hold on to its SocketCache object and potentially have a lot of outstanding sockets/fds held on to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3831) Failure to renew tokens due to test-sources left in classpath
[ https://issues.apache.org/jira/browse/HDFS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464895#comment-13464895 ] Robert Joseph Evans commented on HDFS-3831: --- The change looks fine to me. It is simple and removes the need for Mockito/Junit from the FakeRenewer. +1 I would like to ultimately see the tests removed from the classpath. But that can happen later as we try to clean up the classpath in general. I'll check this in. Failure to renew tokens due to test-sources left in classpath - Key: HDFS-3831 URL: https://issues.apache.org/jira/browse/HDFS-3831 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: HDFS-3831.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3831) Failure to renew tokens due to test-sources left in classpath
[ https://issues.apache.org/jira/browse/HDFS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3831: -- Resolution: Fixed Fix Version/s: 2.0.3-alpha 3.0.0 0.23.4 Status: Resolved (was: Patch Available) Thanks Jason, I put this into trunk, branch-2, and branch-0.23 Failure to renew tokens due to test-sources left in classpath - Key: HDFS-3831 URL: https://issues.apache.org/jira/browse/HDFS-3831 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Fix For: 0.23.4, 3.0.0, 2.0.3-alpha Attachments: HDFS-3831.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
[ https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3860: -- Fix Version/s: 0.23.4 I pulled this into branch-0.23 too HeartbeatManager#Monitor may wrongly hold the writelock of namesystem - Key: HDFS-3860 URL: https://issues.apache.org/jira/browse/HDFS-3860 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 0.23.4, 2.0.2-alpha Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the monitor thread will acquire the write lock of namesystem, and recheck the safemode. If it is in safemode, the monitor thread will return from the heartbeatCheck function without release the write lock. This may cause the monitor thread wrongly holding the write lock forever. The attached test case tries to simulate this bad scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3626) Creating file with invalid path can corrupt edit log
[ https://issues.apache.org/jira/browse/HDFS-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3626: -- Fix Version/s: (was: 0.23.3) 0.23.4 Creating file with invalid path can corrupt edit log Key: HDFS-3626 URL: https://issues.apache.org/jira/browse/HDFS-3626 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.23.4, 3.0.0, 2.0.2-alpha Attachments: hdfs-3626.txt, hdfs-3626.txt, hdfs-3626.txt, hdfs-3626.txt Joris Bontje reports the following: The following command results in a corrupt NN editlog (note the double slash and reading from stdin): $ cat /usr/share/dict/words | hadoop fs -put - hdfs://localhost:8020//path/file After this, restarting the namenode will result into the following fatal exception: {code} 2012-07-10 06:29:19,910 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/edits_173-188 expecting start txid #173 2012-07-10 06:29:19,912 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation MkdirOp [length=0, path=/, timestamp=1341915658216, permissions=cloudera:supergroup:rwxr-xr-x, opCode=OP_MKDIR, txid=182] java.lang.ArrayIndexOutOfBoundsException: -1 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3553) Hftp proxy tokens are broken
[ https://issues.apache.org/jira/browse/HDFS-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3553: -- Fix Version/s: (was: 0.23.3) 0.23.4 Hftp proxy tokens are broken Key: HDFS-3553 URL: https://issues.apache.org/jira/browse/HDFS-3553 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.2, 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Fix For: 0.23.4, 3.0.0, 2.0.2-alpha Attachments: HDFS-3553-1.branch-1.0.patch, HDFS-3553-2.branch-1.0.patch, HDFS-3553-3.branch-1.0.patch, HDFS-3553.branch-1.0.patch, HDFS-3553.branch-23.patch, HDFS-3553.trunk.patch Proxy tokens are broken for hftp. The impact is systems using proxy tokens, such as oozie jobs, cannot use hftp. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3108) [UI] Few Namenode links are not working
[ https://issues.apache.org/jira/browse/HDFS-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3108: -- Fix Version/s: (was: 0.23.3) 0.23.4 [UI] Few Namenode links are not working --- Key: HDFS-3108 URL: https://issues.apache.org/jira/browse/HDFS-3108 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.23.1 Reporter: Brahma Reddy Battula Priority: Minor Fix For: 0.23.4 Attachments: Scenario2_Trace.txt Scenario 1 == Once tail a file from UI and click on Go Back to File View,I am getting HTTP ERROR 404 Scenario 2 === Frequently I am getting following execption If a click on (BrowseFileSystem or anyfile)java.lang.IllegalArgumentException: java.net.UnknownHostException: HOST-10-18-40-24 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated
[ https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3087: -- Fix Version/s: (was: 0.23.3) 0.23.4 Decomissioning on NN restart can complete without blocks being replicated - Key: HDFS-3087 URL: https://issues.apache.org/jira/browse/HDFS-3087 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 0.23.0, 0.24.0, 0.23.2, 0.23.4 If a data node is added to the exclude list and the name node is restarted, the decomissioning happens right away on the data node registration. At this point the initial block report has not been sent, so the name node thinks the node has zero blocks and the decomissioning completes very quick, without replicating the blocks on that node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3973) Old trash directories are never deleted on upgrade from 1.x
Robert Joseph Evans created HDFS-3973: - Summary: Old trash directories are never deleted on upgrade from 1.x Key: HDFS-3973 URL: https://issues.apache.org/jira/browse/HDFS-3973 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3, 2.0.2-alpha Reporter: Robert Joseph Evans The older format of the trash checkpoint for 1.x is yyMMddHHmm the new format is yyMMddHHmmss(-\d+)? so if you upgrade from an old cluster to a new one, all of the entires in .trash will never be deleted because they currently are always ignored on deletion. We should support deleting the older format as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3971) Add a resume feature to the copyFromLocal and put commands
[ https://issues.apache.org/jira/browse/HDFS-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463192#comment-13463192 ] Robert Joseph Evans commented on HDFS-3971: --- It almost sounds like you want to turn this into something like rsync. I think it would be much more useful to just add in an rsync command with a simmilar set of features and flags then trying to reinvent it piecemeal. Then it can look at time stamps on the files, and possibly checksums as well, to pick up where it left off on a failure. Add a resume feature to the copyFromLocal and put commands -- Key: HDFS-3971 URL: https://issues.apache.org/jira/browse/HDFS-3971 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 2.0.1-alpha Reporter: Adam Muise Priority: Minor Fix For: 2.0.1-alpha Add a resume feature to the copyFromLocal command. Failures in large transfers result in a great deal of wasted time. For large files, it would be good to be able to continue from the last good block onwards. The file would have to be unavailable to other clients for reads or regular writes until the resume process was completed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448781#comment-13448781 ] Robert Joseph Evans commented on HDFS-3731: --- Thanks for reassigning this to me. I have been distracted by a number of other things, but I should get back to is shortly. 2.0 release upgrade must handle blocks being written from 1.0 - Key: HDFS-3731 URL: https://issues.apache.org/jira/browse/HDFS-3731 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Suresh Srinivas Assignee: Robert Joseph Evans Priority: Blocker Fix For: 2.2.0-alpha Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 release. Problem reported by Brahma Reddy. The {{DataNode}} will only have one block pool after upgrading from a 1.x release. (This is because in the 1.x releases, there were no block pools-- or equivalently, everything was in the same block pool). During the upgrade, we should hardlink the block files from the {{blocksBeingWritten}} directory into the {{rbw}} directory of this block pool. Similarly, on {{-finalize}}, we should delete the {{blocksBeingWritten}} directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446386#comment-13446386 ] Robert Joseph Evans commented on HDFS-3873: --- It looks good to me. I am not really an expert on HFTP, but this is a simple enough change that I feel OK to give it a +1, but please use your discretion before checking it in. I am not sure why Jenkins ran the tests again and they failed, but when I run them with your patch they pass. except for TestHftpDelegationToken which is a known issue. Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443221#comment-13443221 ] Robert Joseph Evans commented on HDFS-3731: --- Any update on branch-0.23? Do you want me to look into it? 2.0 release upgrade must handle blocks being written from 1.0 - Key: HDFS-3731 URL: https://issues.apache.org/jira/browse/HDFS-3731 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Suresh Srinivas Assignee: Colin Patrick McCabe Priority: Blocker Fix For: 2.2.0-alpha Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 release. Problem reported by Brahma Reddy. The {{DataNode}} will only have one block pool after upgrading from a 1.x release. (This is because in the 1.x releases, there were no block pools-- or equivalently, everything was in the same block pool). During the upgrade, we should hardlink the block files from the {{blocksBeingWritten}} directory into the {{rbw}} directory of this block pool. Similarly, on {{-finalize}}, we should delete the {{blocksBeingWritten}} directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443551#comment-13443551 ] Robert Joseph Evans commented on HDFS-3731: --- Do you have a list of ones you know about? If not I can start pulling on that thread tomorrow. 2.0 release upgrade must handle blocks being written from 1.0 - Key: HDFS-3731 URL: https://issues.apache.org/jira/browse/HDFS-3731 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Suresh Srinivas Assignee: Colin Patrick McCabe Priority: Blocker Fix For: 2.2.0-alpha Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 release. Problem reported by Brahma Reddy. The {{DataNode}} will only have one block pool after upgrading from a 1.x release. (This is because in the 1.x releases, there were no block pools-- or equivalently, everything was in the same block pool). During the upgrade, we should hardlink the block files from the {{blocksBeingWritten}} directory into the {{rbw}} directory of this block pool. Similarly, on {{-finalize}}, we should delete the {{blocksBeingWritten}} directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3841) Port HDFS-3835 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved HDFS-3841. --- Resolution: Fixed Fix Version/s: 0.23.3 Daryn just checked this in to branch-0.23 Port HDFS-3835 to branch-0.23 - Key: HDFS-3841 URL: https://issues.apache.org/jira/browse/HDFS-3841 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.3 Attachments: HDFS-3841.txt, HDFS-3841.txt, HDFS-3841.txt HDFS-3835 does not cleanly merge into branch-0.23. This is to port it over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3841) Port HDFS-3835 to branch-0.23
Robert Joseph Evans created HDFS-3841: - Summary: Port HDFS-3835 to branch-0.23 Key: HDFS-3841 URL: https://issues.apache.org/jira/browse/HDFS-3841 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans HDFS-3835 does not cleanly merge into branch-0.23. This is to port it over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3841) Port HDFS-3835 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3841: -- Attachment: HDFS-3841.txt This patch only applies to branch-0.23. The main difference between this patch and HDFS-3835 is that the DelegationTokenSecretManager is in a different location so FSImage was modified to use the new location. Also The tests do not compile because HDFS-2579 is not part of 0.23 so DFS_NAMENODE_DELEGATION_TOKEN_ALWAYS_USE_KEY does not work. In response I removed the test. It seemed more risky to try and pull out DFS_NAMENODE_DELEGATION_TOKEN_ALWAYS_USE_KEY support then to simply remove the test. Port HDFS-3835 to branch-0.23 - Key: HDFS-3841 URL: https://issues.apache.org/jira/browse/HDFS-3841 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3841.txt HDFS-3835 does not cleanly merge into branch-0.23. This is to port it over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3841) Port HDFS-3835 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3841: -- Status: Patch Available (was: Open) test-patch is not going to work. I have run several of the HDFS tests manually without any failures. I will update the JIRA once my test run completes. Port HDFS-3835 to branch-0.23 - Key: HDFS-3841 URL: https://issues.apache.org/jira/browse/HDFS-3841 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3841.txt HDFS-3835 does not cleanly merge into branch-0.23. This is to port it over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3841) Port HDFS-3835 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439823#comment-13439823 ] Robert Joseph Evans commented on HDFS-3841: --- You are correct, by bad. I commented it out to validate that it runs, and I forgot to remove it. I'll upload a new patch. Thanks for the catch. Port HDFS-3835 to branch-0.23 - Key: HDFS-3841 URL: https://issues.apache.org/jira/browse/HDFS-3841 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3841.txt HDFS-3835 does not cleanly merge into branch-0.23. This is to port it over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3841) Port HDFS-3835 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3841: -- Attachment: HDFS-3841.txt Patch updated without test. Port HDFS-3835 to branch-0.23 - Key: HDFS-3841 URL: https://issues.apache.org/jira/browse/HDFS-3841 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3841.txt, HDFS-3841.txt HDFS-3835 does not cleanly merge into branch-0.23. This is to port it over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3841) Port HDFS-3835 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3841: -- Attachment: HDFS-3841.txt Patch with space between if and ( Port HDFS-3835 to branch-0.23 - Key: HDFS-3841 URL: https://issues.apache.org/jira/browse/HDFS-3841 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3841.txt, HDFS-3841.txt, HDFS-3841.txt HDFS-3835 does not cleanly merge into branch-0.23. This is to port it over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3843) Large dist cache can block tasktracker heartbeat
Robert Joseph Evans created HDFS-3843: - Summary: Large dist cache can block tasktracker heartbeat Key: HDFS-3843 URL: https://issues.apache.org/jira/browse/HDFS-3843 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0, 0.20.205.0 Reporter: Robert Joseph Evans -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3843) Large dist cache can block tasktracker heartbeat
[ https://issues.apache.org/jira/browse/HDFS-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439874#comment-13439874 ] Robert Joseph Evans commented on HDFS-3843: --- MAPREDUCE-2494 introduced a new lock when releasing a dist cache entry that introduced this problem. Thanks to Koji for finding and debugging this. Essentially the heartbeat thread holds a lock on the TaskTracker object. So does the job cleanup thread. Which also holds a lock on the TrackerDistributedCacheMenager's big list lock (this is the lock that MAPREDUCE-2494 added in). The thread that deletes things from the dist cache also grabs that big lock, and at the same time grabs locks in turn for every entry in the dist cache. While an entry in the dist cache is being downloaded it also holds the lock for the dist cache entry. So this can result in Downloading thread holds dist cache lock which blocks the dist cache delete thread which holds the full dist cache map lock that blocks the job cleanup thread that holds that Task Tracker lock which blocks the heartbeat thread. This can be seen below. I think it is probably best to change the DistCache entries' locks so that when we go to delete them if the lock is held we skip that entry instead of having it block. {noformat} Here, tracing from the heartbeat thread. 1= main prio=10 tid=0x0875c400 nid=0x3fca waiting for monitor entry [0xf73e6000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1790) - waiting to lock 0xb4299248 (a org.apache.hadoop.mapred.TaskTracker) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744) Looking for lock 0xb4299248 2= taskCleanup daemon prio=10 tid=0x0949ac00 nid=0x405c waiting for monitor entry [0xadead000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.filecache.TrackerDistributedCacheManager$CacheStatus.decRefCount(TrackerDistributedCacheManager.java:597) - waiting to lock 0xb4214308 (a java.util.LinkedHashMap) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.releaseCache(TrackerDistributedCacheManager.java:233) at org.apache.hadoop.filecache.TaskDistributedCacheManager.release(TaskDistributedCacheManager.java:254) at org.apache.hadoop.mapred.TaskTracker.purgeJob(TaskTracker.java:2066) - locked 0xb51e5d78 (a org.apache.hadoop.mapred.TaskTracker$RunningJob) - locked 0xb4299248 (a org.apache.hadoop.mapred.TaskTracker) at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:439) at java.lang.Thread.run(Thread.java:619) Looking for the lock 0xb4214308 3= Thread-27 prio=10 tid=0xae501400 nid=0x4021 waiting for monitor entry [0xae4ad000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.filecache.TrackerDistributedCacheManager$BaseDirManager.checkAndCleanup(TrackerDistributedCacheManager.java:1019) - waiting to lock 0xb52776c0 (a org.apache.hadoop.filecache.TrackerDistributedCacheManager$CacheStatus) - locked 0xb4214308 (a java.util.LinkedHashMap) at org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:948) Looking for the lock 0xb52776c0 4= Thread-187419 daemon prio=10 tid=0xaa103400 nid=0x3758 runnable [0xad75c000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0xb52998d0 (a sun.nio.ch.Util$1) - locked 0xb52998e0 (a java.util.Collections$UnmodifiableSet) - locked 0xb5299880 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked 0xb5505ec8 (a java.io.BufferedInputStream) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:153) at
[jira] [Commented] (HDFS-3843) Large dist cache can block tasktracker heartbeat
[ https://issues.apache.org/jira/browse/HDFS-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439877#comment-13439877 ] Robert Joseph Evans commented on HDFS-3843: --- I forgot to add in that I tested this on 0.23, and mrv2 does not have this issue at all. I added in a dist cache entry that takes 30 min to download and the job succeeded. Large dist cache can block tasktracker heartbeat Key: HDFS-3843 URL: https://issues.apache.org/jira/browse/HDFS-3843 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.205.0, 1.0.0 Reporter: Robert Joseph Evans -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3841) Port HDFS-3835 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439895#comment-13439895 ] Robert Joseph Evans commented on HDFS-3841: --- Thanks for the reviews. The HDFS unit tests all pass. I asked Daryn to check it in when he gets a chance. Port HDFS-3835 to branch-0.23 - Key: HDFS-3841 URL: https://issues.apache.org/jira/browse/HDFS-3841 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3841.txt, HDFS-3841.txt, HDFS-3841.txt HDFS-3835 does not cleanly merge into branch-0.23. This is to port it over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2745) unclear to users which command to use to access the filesystem
[ https://issues.apache.org/jira/browse/HDFS-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438027#comment-13438027 ] Robert Joseph Evans commented on HDFS-2745: --- The changes look good to me +1, non-binding. unclear to users which command to use to access the filesystem -- Key: HDFS-2745 URL: https://issues.apache.org/jira/browse/HDFS-2745 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0, 1.2.0, 2.2.0-alpha Reporter: Thomas Graves Assignee: Andrew Wang Priority: Critical Attachments: hdfs-2745-1.patch Its unclear to users which command to use to access the filesystem. Need some background and then we can fix accordingly. We have 3 choices: hadoop dfs - says its deprecated and to use hdfs. If I run hdfs usage it doesn't list any options like -ls in the usage, although there is an hdfs dfs command hdfs dfs - not in the usage of hdfs. If we recommend it when running hadoop dfs it should atleast be in the usage. hadoop fs - seems like one to use it appears generic for any filesystem. Any input on this what is the recommended way to do this? Based on that we can fix up the other issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434190#comment-13434190 ] Robert Joseph Evans commented on HDFS-3731: --- I am not an HDFS expert but the patch looks good to me. +1 non-binding. 2.0 release upgrade must handle blocks being written from 1.0 - Key: HDFS-3731 URL: https://issues.apache.org/jira/browse/HDFS-3731 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Suresh Srinivas Assignee: Colin Patrick McCabe Priority: Blocker Attachments: HDFS-3731.002.patch, HDFS-3731.003.patch Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 release. Problem reported by Brahma Reddy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431254#comment-13431254 ] Robert Joseph Evans commented on HDFS-3731: --- Is there any update on this? There has been no activity for about a week, and this seems fairly critical to fix. 2.0 release upgrade must handle blocks being written from 1.0 - Key: HDFS-3731 URL: https://issues.apache.org/jira/browse/HDFS-3731 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Suresh Srinivas Assignee: Colin Patrick McCabe Priority: Blocker Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 release. Problem reported by Brahma Reddy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3751) DN should log warnings for lengthy disk IOs
[ https://issues.apache.org/jira/browse/HDFS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427320#comment-13427320 ] Robert Joseph Evans commented on HDFS-3751: --- If we are collecting this data to be able to output a warning it would be good to also keep metrics for each disk. This would potentially give us the ability in the future to have an admin look at the disk metrics and look for outliers. They could then investigate further and possible remove the failing disk. DN should log warnings for lengthy disk IOs --- Key: HDFS-3751 URL: https://issues.apache.org/jira/browse/HDFS-3751 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 1.2.0, 2.1.0-alpha Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Occasionally failing disks or other OS-and-below issues cause a single IO to take tens of seconds, or even minutes in the case of failures. This often results in timeout exceptions at the client side which are hard to diagnose. It would be easier to root-cause these issues if the DN logged a WARN like IO of 64kb to volume /data/1/dfs/dn for block 12345234 client 1.2.3.4 took 61.3 seconds or somesuch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425798#comment-13425798 ] Robert Joseph Evans commented on HDFS-3731: --- I thought that hardlinks to directories are not typically supported. HSF+ on the mac is the only one I know of that allows it. I am nervous about implementing an upgrade path that will only work on a Mac. Did you actually mean a symbolic link, or did you intend to hardlink all of the files in the directories? 2.0 release upgrade must handle blocks being written from 1.0 - Key: HDFS-3731 URL: https://issues.apache.org/jira/browse/HDFS-3731 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Suresh Srinivas Assignee: Todd Lipcon Priority: Blocker Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 release. Problem reported by Brahma Reddy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3696) Create files with WebHdfsFileSystem goes OOM when file size is big
[ https://issues.apache.org/jira/browse/HDFS-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423124#comment-13423124 ] Robert Joseph Evans commented on HDFS-3696: --- Comment Thanks for the patch for branch-0.23. +1 (non-binding) for it. I reviewed the change and ran the tests. Create files with WebHdfsFileSystem goes OOM when file size is big -- Key: HDFS-3696 URL: https://issues.apache.org/jira/browse/HDFS-3696 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Assignee: Tsz Wo (Nicholas), SZE Priority: Critical Fix For: 0.23.3 Attachments: h3696_20120724.patch, h3696_20120724_0.23.patch, h3696_20120724_b-1.patch When doing fs -put to a WebHdfsFileSystem (webhdfs://), the FsShell goes OOM if the file size is large. When I tested, 20MB files were fine, but 200MB didn't work. I also tried reading a large file by issuing -cat and piping to a slow sink in order to force buffering. The read path didn't have this problem. The memory consumption stayed the same regardless of progress. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423182#comment-13423182 ] Robert Joseph Evans commented on HDFS-3731: --- I am a bit confused by this, as I am not an expert on HDFS. I am mainly concerned if this does impact 0.23, I assume it does, and if so what that impact is. Does it mean that the datanode could drop the last block from a file because that block is in a bbw file as the datanode is upgraded? You mention HBase here, does this only impact a block that is being written to with hsync? 2.0 release upgrade must handle blocks being written from 1.0 - Key: HDFS-3731 URL: https://issues.apache.org/jira/browse/HDFS-3731 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Suresh Srinivas Priority: Blocker Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 release. Problem reported by Brahma Reddy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0
[ https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423303#comment-13423303 ] Robert Joseph Evans commented on HDFS-3731: --- Thanks for the clarification Suresh. If it is simple to put this into 0.23 I really would appreciate it. If not I can do the porting myself when the time comes. 2.0 release upgrade must handle blocks being written from 1.0 - Key: HDFS-3731 URL: https://issues.apache.org/jira/browse/HDFS-3731 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Suresh Srinivas Priority: Blocker Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 release. Problem reported by Brahma Reddy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3667) Add retry support to WebHdfsFileSystem
[ https://issues.apache.org/jira/browse/HDFS-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13421678#comment-13421678 ] Robert Joseph Evans commented on HDFS-3667: --- Nicholas, If it is too much of a pain to separate them, that is OK. I want the OOM fix in 0.23, but I realize that is not a priority for a lot of others and I can port it over myself once this goes into branch-2. Add retry support to WebHdfsFileSystem -- Key: HDFS-3667 URL: https://issues.apache.org/jira/browse/HDFS-3667 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3667_20120718.patch, h3667_20120721.patch, h3667_20120722.patch DFSClient (i.e. DistributedFileSystem) has a configurable retry policy and it retries on exceptions such as connection failure, safemode. WebHdfsFileSystem should have similar retry support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3626) Creating file with invalid path can corrupt edit log
[ https://issues.apache.org/jira/browse/HDFS-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417343#comment-13417343 ] Robert Joseph Evans commented on HDFS-3626: --- This is especially true with ViewFs. A symbolic link for one client could point to a totally different file/directory for another client. Creating file with invalid path can corrupt edit log Key: HDFS-3626 URL: https://issues.apache.org/jira/browse/HDFS-3626 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Attachments: hdfs-3626.txt, hdfs-3626.txt Joris Bontje reports the following: The following command results in a corrupt NN editlog (note the double slash and reading from stdin): $ cat /usr/share/dict/words | hadoop fs -put - hdfs://localhost:8020//path/file After this, restarting the namenode will result into the following fatal exception: {code} 2012-07-10 06:29:19,910 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/edits_173-188 expecting start txid #173 2012-07-10 06:29:19,912 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation MkdirOp [length=0, path=/, timestamp=1341915658216, permissions=cloudera:supergroup:rwxr-xr-x, opCode=OP_MKDIR, txid=182] java.lang.ArrayIndexOutOfBoundsException: -1 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3318) Hftp hangs on transfers 2GB
[ https://issues.apache.org/jira/browse/HDFS-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13415140#comment-13415140 ] Robert Joseph Evans commented on HDFS-3318: --- Yes this probably also impacts branch-1. Hftp hangs on transfers 2GB Key: HDFS-3318 URL: https://issues.apache.org/jira/browse/HDFS-3318 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.24.0, 0.23.3, 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Fix For: 0.23.3 Attachments: HDFS-3318-1.patch, HDFS-3318.patch Hftp transfers 2GB hang after the transfer is complete. The problem appears to be caused by java internally using an int for the content length. When it overflows 2GB, it won't check the bounds of the reads on the input stream. The client continues reading after all data is received, and the client blocks until the server times out the connection -- _many_ minutes later. In conjunction with hftp timeouts, all transfers 2G fail with a read timeout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3622) Backport HDFS-3541 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13415197#comment-13415197 ] Robert Joseph Evans commented on HDFS-3622: --- HDFS-3541 also depends on HDFS-2878, so I am going to include that here too. It is just a fix to some tests. Backport HDFS-3541 to branch-0.23 - Key: HDFS-3622 URL: https://issues.apache.org/jira/browse/HDFS-3622 Project: Hadoop HDFS Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans HDFS-3541 Deadlock between recovery, xceiver and packet responder does not apply directly to branch-0.23, but the bug exists there too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3577) WebHdfsFileSystem can not read files larger than 24KB
[ https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3577: -- Target Version/s: 1.1.0, 0.23.3, 2.1.0-alpha (was: 1.1.0, 2.1.0-alpha) Affects Version/s: 0.23.3 This impacts branch-0.23 as well. I really would like to see whatever fix happens go into branch-0.23 as well. I applied the latest patch and it looks to apply fairly cleanly. If it does not apply cleanly when checking in the file fix I will be happy to port it. WebHdfsFileSystem can not read files larger than 24KB - Key: HDFS-3577 URL: https://issues.apache.org/jira/browse/HDFS-3577 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Tsz Wo (Nicholas), SZE Priority: Blocker Attachments: h3577_20120705.patch, h3577_20120708.patch, h3577_20120714.patch If reading a file large enough for which the httpserver running webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of webhdfs), then the WebHdfsFileSystem client fails with an IOException with message *Content-Length header is missing*. It looks like WebHdfsFileSystem is delegating opening of the inputstream to *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* header, but when using chunked transfer encoding the *Content-Length* header is not present and the *URLOpener.openInputStream()* method thrown an exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3622) Backport HDFS-3541 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3622: -- Attachment: HDFS-3622.txt Backport HDFS-3541 to branch-0.23 - Key: HDFS-3622 URL: https://issues.apache.org/jira/browse/HDFS-3622 Project: Hadoop HDFS Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3622.txt HDFS-3541 Deadlock between recovery, xceiver and packet responder does not apply directly to branch-0.23, but the bug exists there too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3622) Backport HDFS-3541 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3622: -- Status: Patch Available (was: Open) Backport HDFS-3541 to branch-0.23 - Key: HDFS-3622 URL: https://issues.apache.org/jira/browse/HDFS-3622 Project: Hadoop HDFS Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3622.txt HDFS-3541 Deadlock between recovery, xceiver and packet responder does not apply directly to branch-0.23, but the bug exists there too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3622) Backport HDFS-3541 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13415355#comment-13415355 ] Robert Joseph Evans commented on HDFS-3622: --- I ran all of the HDFS tests on branch-0.23 and they all pass. Backport HDFS-3541 to branch-0.23 - Key: HDFS-3622 URL: https://issues.apache.org/jira/browse/HDFS-3622 Project: Hadoop HDFS Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3622.txt HDFS-3541 Deadlock between recovery, xceiver and packet responder does not apply directly to branch-0.23, but the bug exists there too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3486) offlineimageviewer can't read fsimage files that contain persistent delegation tokens
[ https://issues.apache.org/jira/browse/HDFS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3486: -- Fix Version/s: 0.23.3 offlineimageviewer can't read fsimage files that contain persistent delegation tokens - Key: HDFS-3486 URL: https://issues.apache.org/jira/browse/HDFS-3486 Project: Hadoop HDFS Issue Type: Bug Components: security, tools Affects Versions: 2.0.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Fix For: 0.23.3, 2.0.1-alpha Attachments: HDFS-3486.001.patch, HDFS-3486.002.patch OfflineImageViewer (oiv) crashes when trying to read fsimage files that contain persistent delegation tokens. Example stack trace: {code} Caused by: java.lang.IndexOutOfBoundsException at java.io.DataInputStream.readFully(DataInputStream.java:175) at org.apache.hadoop.io.Text.readFields(Text.java:284) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.readFields(AbstractDelegationTokenIdentifier.java:178) at org.apache.hadoop.hdfs.tools.offlineImageViewer.ImageLoaderCurrent.processDelegationTokens(ImageLoaderCurrent.java:222) at org.apache.hadoop.hdfs.tools.offlineImageViewer.ImageLoaderCurrent.loadImage(ImageLoaderCurrent.java:186) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.go(OfflineImageViewer.java:129) {code} The oiv and loadFSImage code paths are separate. The issue here seems to be that the loadFSImage code path has diverged from the oiv code path. On the loadFSImage code path (from FSImageFormat#loadCurrentTokens): {code} /** * Private helper methods to load Delegation tokens from fsimage */ private synchronized void loadCurrentTokens(DataInputStream in) throws IOException { int numberOfTokens = in.readInt(); for (int i = 0; i numberOfTokens; i++) { DelegationTokenIdentifier id = new DelegationTokenIdentifier(); id.readFields(in); long expiryTime = in.readLong(); addPersistedDelegationToken(id, expiryTime); } } {code} Notice how it loads a 4-byte int after every DelegationTokenIdentifier. On the oiv code path (from ImageLoaderCurrent#processDelegationTokens): {code} int numDTokens = in.readInt(); v.visitEnclosingElement(ImageElement.DELEGATION_TOKENS, ImageElement.NUM_DELEGATION_TOKENS, numDTokens); for(int i=0; inumDTokens; i++){ DelegationTokenIdentifier id = new DelegationTokenIdentifier(); id.readFields(in); v.visit(ImageElement.DELEGATION_TOKEN_IDENTIFIER, id.toString()); } {code} Notice how it does *not* load a 4-byte int after every DelegationTokenIdentifier. This bug seems to have been introduced by change 916534, the same change which introduced persistent delegation tokens. So I don't think oiv was ever able to decode them in the past. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2978) The NameNode should expose name dir statuses via JMX
[ https://issues.apache.org/jira/browse/HDFS-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-2978: -- Fix Version/s: 0.23.3 The NameNode should expose name dir statuses via JMX Key: HDFS-2978 URL: https://issues.apache.org/jira/browse/HDFS-2978 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.23.0, 1.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 1.0.2, 0.23.3, 2.0.0-alpha Attachments: HDFS-2978-branch-1.patch, HDFS-2978.patch, HDFS-2978.patch We currently display this info on the NN web UI, so users who wish to monitor this must either do it manually or parse HTML. We should publish this information via JMX. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3581) FSPermissionChecker#checkPermission sticky bit check missing range check
[ https://issues.apache.org/jira/browse/HDFS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3581: -- Fix Version/s: 0.23.3 FSPermissionChecker#checkPermission sticky bit check missing range check - Key: HDFS-3581 URL: https://issues.apache.org/jira/browse/HDFS-3581 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 0.23.3, 2.0.1-alpha Attachments: hdfs-3581.txt The checkStickyBit call in FSPermissionChecker#checkPermission is missing a range check which results in an index out of bounds when accessing root. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3622) Backport HDFS-3541 to branch-0.23
Robert Joseph Evans created HDFS-3622: - Summary: Backport HDFS-3541 to branch-0.23 Key: HDFS-3622 URL: https://issues.apache.org/jira/browse/HDFS-3622 Project: Hadoop HDFS Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans HDFS-3541 Deadlock between recovery, xceiver and packet responder does not apply directly to branch-0.23, but the bug exists there too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder
[ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409849#comment-13409849 ] Robert Joseph Evans commented on HDFS-3541: --- @Uma, Sorry it took me so long to respond. Yes, I would be happy to look into do the porting, as the patch does not just apply. I filed HDFS-3622 to do this work on. Deadlock between recovery, xceiver and packet responder --- Key: HDFS-3541 URL: https://issues.apache.org/jira/browse/HDFS-3541 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.3, 2.0.1-alpha Reporter: suja s Assignee: Vinay Fix For: 2.0.1-alpha, 3.0.0 Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch Block Recovery initiated while write in progress at Datanode side. Found a lock between recovery, xceiver and packet responder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3594) ListPathsServlet should not log a warning for paths that do not exist
Robert Joseph Evans created HDFS-3594: - Summary: ListPathsServlet should not log a warning for paths that do not exist Key: HDFS-3594 URL: https://issues.apache.org/jira/browse/HDFS-3594 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.3 Reporter: Robert Joseph Evans ListPathsServlet logs a warning message every time someone request a listing for a directory that does not exist. This should be a debug or at most an info message, because the is expected behavior. People will ask for things that do not exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3442) Incorrect count for Missing Replicas in FSCK report
[ https://issues.apache.org/jira/browse/HDFS-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3442: -- Fix Version/s: (was: 2.0.1-alpha) 0.23.3 Incorrect count for Missing Replicas in FSCK report --- Key: HDFS-3442 URL: https://issues.apache.org/jira/browse/HDFS-3442 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: suja s Assignee: Andrew Wang Priority: Minor Fix For: 0.23.3 Attachments: HDFS-3442-2.patch, HDFS-3442-3.patch, HDFS-3442.patch Scenario: Cluster running in HA mode with 2 DNs. Files are written with replication factor as 3. There are 7 blocks in cluster. FSCK report is including all blocks in UnderReplicated Blocks as well as Missing Replicas. HOST-XX-XX-XX-102:/home/Apr4/hadoop-2.0.0-SNAPSHOT/bin # ./hdfs fsck / Connecting to namenode via http://XX.XX.XX.55:50070 FSCK started by root (auth:SIMPLE) from /XX.XX.XX.102 for path / at Wed Apr 04 17:28:37 IST 2012 . /1: Under replicated BP-534619337-XX.XX.XX.55-1333526344705:blk_2551710840802340037_1002. Target Replicas is 3 but found 2 replica(s). . /2: Under replicated BP-534619337-XX.XX.XX.55-1333526344705:blk_-3851276776144500288_1004. Target Replicas is 3 but found 2 replica(s). . /3: Under replicated BP-534619337-XX.XX.XX.55-1333526344705:blk_-3210606555285049524_1006. Target Replicas is 3 but found 2 replica(s). . /4: Under replicated BP-534619337-XX.XX.XX.55-1333526344705:blk_4028835120510075310_1008. Target Replicas is 3 but found 2 replica(s). . /5: Under replicated BP-534619337-XX.XX.XX.55-1333526344705:blk_-5238093749956876969_1010. Target Replicas is 3 but found 2 replica(s). . /testrenamed/file1renamed: Under replicated BP-534619337-XX.XX.XX.55-1333526344705:blk_-5669194716756513504_1012. Target Replicas is 3 but found 2 replica(s). . /testrenamed/file2: Under replicated BP-534619337-XX.XX.XX.55-1333526344705:blk_8510284478280941311_1014. Target Replicas is 3 but found 2 replica(s). Status: HEALTHY Total size:33215 B Total dirs:3 Total files: 7 (Files currently being written: 1) Total blocks (validated): 7 (avg. block size 4745 B) Minimally replicated blocks: 7 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 7 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 7 (50.0 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Wed Apr 04 17:28:37 IST 2012 in 2 milliseconds The filesystem under path '/' is HEALTHY Also it indicates a measure as 50% in brackets (There are only 7 blocks in cluster and so if all 7 are included as Missing replicas it should be 100%) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3591) Backport HDFS-3357 to branch-0.23
Robert Joseph Evans created HDFS-3591: - Summary: Backport HDFS-3357 to branch-0.23 Key: HDFS-3591 URL: https://issues.apache.org/jira/browse/HDFS-3591 Project: Hadoop HDFS Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans I would like to have HDFS-3357 in branch-0.23, but it is not a trivial upmerge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3591) Backport HDFS-3357 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3591: -- Status: Patch Available (was: Open) This patch does not apply to trunk, or branch-0.23. This only applies to branch-0.23 as it is backporting code already in trunk, and branch-2. I ran all of the HDFS, and common tests, and they all passed for me. I also brought up a small 3 node cluster and ran a few tests on it, and they all passed. Backport HDFS-3357 to branch-0.23 - Key: HDFS-3591 URL: https://issues.apache.org/jira/browse/HDFS-3591 Project: Hadoop HDFS Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: HDFS-3357-branch-0.23.txt I would like to have HDFS-3357 in branch-0.23, but it is not a trivial upmerge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3554) TestRaidNode is failing
[ https://issues.apache.org/jira/browse/HDFS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400500#comment-13400500 ] Robert Joseph Evans commented on HDFS-3554: --- It looks like there is no history server up and running. In Yarn there is a race in the client. If the client asks for status if the AM is still up and running then it will talk to the AM. If it has exited, which it tends to do when the MR job has completed then the client will fall over to the history server. It looks like while you are running using the minicluster there is no corresponding history server to fulfill the request. TestRaidNode is failing --- Key: HDFS-3554 URL: https://issues.apache.org/jira/browse/HDFS-3554 Project: Hadoop HDFS Issue Type: Bug Components: contrib/raid, test Affects Versions: 3.0.0 Reporter: Jason Lowe Assignee: Weiyan Wang After MAPREDUCE-3868 re-enabled raid, TestRaidNode has been failing in Jenkins builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3549) dist tar build fails in hadoop-hdfs-raid project
[ https://issues.apache.org/jira/browse/HDFS-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397800#comment-13397800 ] Robert Joseph Evans commented on HDFS-3549: --- +1 the change looks good to me, but I am not an HDFS committer so you are going to need someone else to +1 and commit it. dist tar build fails in hadoop-hdfs-raid project Key: HDFS-3549 URL: https://issues.apache.org/jira/browse/HDFS-3549 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 3.0.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: HDFS-3549.patch Trying to build the distribution tarball in a clean tree via {{mvn install -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip}} fails with this error: {noformat} main: [exec] tar: hadoop-hdfs-raid-3.0.0-SNAPSHOT: Cannot stat: No such file or directory [exec] tar: Exiting with failure status due to previous errors {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3541) Deadlock between recovery, xceiver and packet responder
[ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3541: -- Target Version/s: 0.23.3 Affects Version/s: 0.23.3 I really would like to see this fixed in 0.23 as well. Deadlock between recovery, xceiver and packet responder --- Key: HDFS-3541 URL: https://issues.apache.org/jira/browse/HDFS-3541 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.3, 2.0.1-alpha Reporter: suja s Assignee: Vinay Attachments: DN_dump.rar Block Recovery initiated while write in progress at Datanode side. Found a lock between recovery, xceiver and packet responder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3527) Distributed cache object changed Error
[ https://issues.apache.org/jira/browse/HDFS-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13395962#comment-13395962 ] Robert Joseph Evans commented on HDFS-3527: --- Just guessing here from the name of the JIRA, but a Distributed cache object changed error typically happens when the file on HDFS changes in-between the submission of a job and a container being launched that is going to download the file. It sounds to me like it may be a test error where you a submitting lots of jobs and changing an object in HDFS in between them that is shared by all of the jobs. I cannot be sure without more information though. Distributed cache object changed Error -- Key: HDFS-3527 URL: https://issues.apache.org/jira/browse/HDFS-3527 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.0.0-alpha Reporter: Sujay Rau I'm writing some automation test code that basically runs the teragen, terasort, teravalidate sequence while repeatedly doing a manual failover throughout. About every fourth time I run the test script, the terasort phase crashes and returns the following error: http://c0405.hal.cloudera.com:50030/jobtasks.jsp?jobid=job_201206041130_1482type=setuppagenum=1state=killed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3483) hdfs fsck doesn't run with ViewFS path
[ https://issues.apache.org/jira/browse/HDFS-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288652#comment-13288652 ] Robert Joseph Evans commented on HDFS-3483: --- Daryn, I tend to disagree that we don't want to expose the mapping. I think it is incredibly useful to be able to know what is happening here, and expose it to the end user so they can then reason about what they want to have happen. For example doing a mv from one federated namespace to another will either be very slow, or it will fail, I don't remember which it is right now. In either case it would be good to expose the mounting to both the end user, and also programatically so that appropriate steps can be taken in those situations. Even if the step is just to call up ops and complain that they have the namespaces all wrong for what they want to do. All OSes expose it, type mount on Linux and it will list where each file system is mounted. hdfs fsck doesn't run with ViewFS path -- Key: HDFS-3483 URL: https://issues.apache.org/jira/browse/HDFS-3483 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Stephen Chu Labels: newbie Attachments: core-site.xml, hdfs-site.xml I'm running a HA + secure + federated cluster. When I run hdfs fsck /nameservices/ha-nn-uri/, I see the following: bash-3.2$ hdfs fsck /nameservices/ha-nn-uri/ FileSystem is viewfs://oracle/ DFSck exiting. Any path I enter will return the same message. Attached are my core-site.xml and hdfs-site.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3376) DFSClient fails to make connection to DN if there are many unusable cached sockets
[ https://issues.apache.org/jira/browse/HDFS-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269864#comment-13269864 ] Robert Joseph Evans commented on HDFS-3376: --- Hey Todd, I have been trying to follow some of the fixes you have been putting into the HDFS socket caching. I was wondering if you would be willing to pull HDFS-3357 and this one, HDFS-3376, into branch-0.23. They both seem to apply cleanly, but I am not an HDFS committer to do this myself. DFSClient fails to make connection to DN if there are many unusable cached sockets -- Key: HDFS-3376 URL: https://issues.apache.org/jira/browse/HDFS-3376 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 2.0.0 Attachments: hdfs-3376.txt After fixing the datanode side of keepalive to properly disconnect stale clients, (HDFS-3357), the client side has the following issue: when it connects to a DN, it first tries to use cached sockets, and will try a configurable number of sockets from the cache. If there are more cached sockets than the configured number of retries, and all of them have been closed by the datanode side, then the client will throw an exception and mark the replica node as dead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3376) DFSClient fails to make connection to DN if there are many unusable cached sockets
[ https://issues.apache.org/jira/browse/HDFS-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270022#comment-13270022 ] Robert Joseph Evans commented on HDFS-3376: --- Todd, You are much more of an expert on this then I am. I think HADOOP-8280 and HADOOP-8350 look fine to pull in too. Thanks for the help with this. Aaron, I spoke with Suresh off-line about it when I took over release manager for branch-0.23, as I was curious about it. He thought that I could not. I don't really see it being too much of a problem just yet, because there have not been very many HDFS issues that are applicable to branch-0.23. Although I am in the process of going through the full HDFS list to see if I have missed anything. DFSClient fails to make connection to DN if there are many unusable cached sockets -- Key: HDFS-3376 URL: https://issues.apache.org/jira/browse/HDFS-3376 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 2.0.0 Attachments: hdfs-3376.txt After fixing the datanode side of keepalive to properly disconnect stale clients, (HDFS-3357), the client side has the following issue: when it connects to a DN, it first tries to use cached sockets, and will try a configurable number of sockets from the cache. If there are more cached sockets than the configured number of retries, and all of them have been closed by the datanode side, then the client will throw an exception and mark the replica node as dead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3359) DFSClient.close should close cached sockets
[ https://issues.apache.org/jira/browse/HDFS-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3359: -- Attachment: hdfs-3359-branch-0.23.txt I really would like to see this fix go into 0.23 as well. The patch did not apply cleanly so I have created my own. If someone could please review and commit this to 0.23 I really would appreciate it. DFSClient.close should close cached sockets --- Key: HDFS-3359 URL: https://issues.apache.org/jira/browse/HDFS-3359 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.22.0, 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 2.0.0 Attachments: hdfs-3359-branch-0.23.txt, hdfs-3359.txt, hdfs-3359.txt Some applications like the TT/JT (pre-2.0) and probably the RM/NM cycle through DistributedFileSystem objects reasonably frequently. So long as they call close() it isn't a big problem, except that currently DFSClient.close() doesn't explicitly close the SocketCache. So unless a full GC runs (causing the references to get finalized), many SocketCaches can get orphaned, each with many open sockets inside. We should fix the close() function to close all cached sockets. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira