[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886254#action_12886254 ] Hairong Kuang commented on HDFS-1094: - I did not mean #racks per block. I mean racks for a file. Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886257#action_12886257 ] Rodrigo Schmidt commented on HDFS-1094: --- As Joydeep wrote, we didn't think this was a major problem. What is your proposal to fix that? Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1287) Why TreeSet is used when collecting block information FSDataSet::getBlockReport
Why TreeSet is used when collecting block information FSDataSet::getBlockReport --- Key: HDFS-1287 URL: https://issues.apache.org/jira/browse/HDFS-1287 Project: Hadoop HDFS Issue Type: Improvement Reporter: NarayanaSwamy As a return value we are converting this to array and returning and in name node also we are iterating ... so can we use list onstead of set. (As the block ids are unique, there may not be duplicates) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886363#action_12886363 ] Hairong Kuang commented on HDFS-1094: - For a large file, it does matter especially in the use case of compacting large number of small files (like reduce results) into one by concatenating or archiving. Anyway, no matter it matters or not, my question is why you want to have this rack limitation? Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886386#action_12886386 ] Suresh Srinivas commented on HDFS-1052: --- Gulin, # An application could choose to use one of the namenodes as default file system in its configuration. In that case /a/b/c will be resolved relative to that namespace. # There is a proposal in HDFS-1053 for client side mount tables, where client can define it's namespace and how it maps to server side namespace. In that case /a/b/c will be resolved in the context of client side mount table. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886384#action_12886384 ] Suresh Srinivas commented on HDFS-1052: --- Min, yes distributed namespace could be another proposal to solve this problem. However, it is a lot more complicated solution to develop, takes much longer time and involves a lot of changes to the system. This does not fit the time line in which we need a solution to namenode scalability. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1288) start-all.sh / stop-all.sh does not seem to work with HDFS
start-all.sh / stop-all.sh does not seem to work with HDFS -- Key: HDFS-1288 URL: https://issues.apache.org/jira/browse/HDFS-1288 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 0.21.0 Reporter: Aaron Kimball Priority: Minor The start-all.sh / stop-all.sh script shipping with the combined hadoop-0.21.0-rc1 does not start/stop the DFS daemons unless $HADOOP_HDFS_HOME is explicitly set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1288) start-all.sh / stop-all.sh does not seem to work with HDFS
[ https://issues.apache.org/jira/browse/HDFS-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886405#action_12886405 ] Aaron Kimball commented on HDFS-1288: - If I explicitly set $HADOOP_HDFS_HOME=$HADOOP_HOME/hdfs then it works fine. But what is curious is that I do not need to explicitly set $HADOOP_MAPRED_HOME. So there's some asymmetry in how these scripts work with HDFS and mapred. At the very least, they should print a warning that they couldn't do the dfs-side work if they can't find the scripts? start-all.sh / stop-all.sh does not seem to work with HDFS -- Key: HDFS-1288 URL: https://issues.apache.org/jira/browse/HDFS-1288 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 0.21.0 Reporter: Aaron Kimball Priority: Minor The start-all.sh / stop-all.sh script shipping with the combined hadoop-0.21.0-rc1 does not start/stop the DFS daemons unless $HADOOP_HDFS_HOME is explicitly set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1288) start-all.sh / stop-all.sh does not seem to work with HDFS
[ https://issues.apache.org/jira/browse/HDFS-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886416#action_12886416 ] Allen Wittenauer commented on HDFS-1288: This is a regression and should be a blocker. start-all.sh / stop-all.sh does not seem to work with HDFS -- Key: HDFS-1288 URL: https://issues.apache.org/jira/browse/HDFS-1288 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 0.21.0 Reporter: Aaron Kimball Priority: Minor The start-all.sh / stop-all.sh script shipping with the combined hadoop-0.21.0-rc1 does not start/stop the DFS daemons unless $HADOOP_HDFS_HOME is explicitly set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886418#action_12886418 ] Rodrigo Schmidt commented on HDFS-1094: --- I don't think I understand your use case. It only seem to be advantageous to do what you say if there are multiple readers for the same file. We designed it this way because it would be relatively easy to understand, implement, generalize, and plan for (as users). But I'm quite open to options. What would you propose instead? Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886430#action_12886430 ] Konstantin Shvachko commented on HDFS-1094: --- The math looks good to me (in pdf file). Data loss probability P depends on time T. Here the assumption is, correct me if it's wrong, that f nodes fail simultaneously. Otherwise, we should take into account replication process, which will be restoring some blocks while other nodes are still up, decreasing the probability of data loss. Probability of losing f nodes simultaneously at a particular moment does not depend on time. The probability of a simultaneous failure of f nodes during a specific period of time depends on the length of the period. So if you choose the parameter p in the document correctly (depending on the time period), then you get the the probability of a data loss during this period of time. The assumption p = 0.01 or 0.001 seems arbitrary, but it probably does not matter as you compare different strategies with the same value. What is missing in the analysis is that the probability of loosing a whole rack is much higher than the probability of loosing any 20 machines in the cluster. It should be actually equivalent to the probability of loosing one machine, because you loose one switch and the whole rack is out. And that was one of the main reasons why we decided to replicate off rack. Rodrigo, did I understand correctly that your idea is to experiment with replication within the rack, that is, all replicas are placed on different machines in the same rack? Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections
[ https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886451#action_12886451 ] Hudson commented on HDFS-1045: -- Integrated in Hadoop-Common-trunk-Commit #322 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/322/]) HADOOP-6853. Common component of HDFS-1045. In secure clusters, re-login is necessary for https clients before opening connections -- Key: HDFS-1045 URL: https://issues.apache.org/jira/browse/HDFS-1045 Project: Hadoop HDFS Issue Type: Bug Components: security Reporter: Jakob Homan Assignee: Jakob Homan Attachments: HDFS-1045-Y20.patch Ticket credentials expire and therefore clients opening https connections (only the NN and SNN doing image/edits exchange) should re-login before opening those connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.
[ https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1006: -- Status: Patch Available (was: Open) Re-submitting to Hudson. getImage/putImage http requests should be https for the case of security enabled. - Key: HDFS-1006 URL: https://issues.apache.org/jira/browse/HDFS-1006 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Boris Shkolnik Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, HDFS-1006-Y20.patch should use https:// and port 50475 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.
[ https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1006: -- Attachment: HDFS-1006-trunk-2.patch Updated patch for Devaraj's comments. Now throw an exception if using wildcard addr with security, although once we have a unit testing framework for Kerberos - please, please, please - we'll need to come up with a good way of dealing with this. getImage/putImage http requests should be https for the case of security enabled. - Key: HDFS-1006 URL: https://issues.apache.org/jira/browse/HDFS-1006 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Boris Shkolnik Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, HDFS-1006-Y20.patch should use https:// and port 50475 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.
[ https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1006: -- Status: Open (was: Patch Available) getImage/putImage http requests should be https for the case of security enabled. - Key: HDFS-1006 URL: https://issues.apache.org/jira/browse/HDFS-1006 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Boris Shkolnik Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, HDFS-1006-Y20.patch should use https:// and port 50475 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1286) TestFileAppend4 sometimes failed with already locked storage
[ https://issues.apache.org/jira/browse/HDFS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886504#action_12886504 ] Todd Lipcon commented on HDFS-1286: --- In both cases it looks like a lack of entropy on the build box caused the first test to time out: {code} java.lang.Exception: test timed out after 6 milliseconds at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:199) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:453) {code} The TFA4 tests use a lot of random bytes, and apparently the entropy pool is a bit low on the hudson box. We use this trick on our build boxes here: http://www.chrissearle.org/blog/technical/increase_entropy_26_kernel_linux_box Can someone with access to the Hudson box get this set up? TestFileAppend4 sometimes failed with already locked storage -- Key: HDFS-1286 URL: https://issues.apache.org/jira/browse/HDFS-1286 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Todd Lipcon Fix For: 0.22.0 Attachments: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log, TestFileAppend4.testRecoverFinalizedBlock.log Some test runs seem to fail with already locked errors, though it passes locally. For example: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/423/testReport/ http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/421/testReport/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1286) Dry entropy pool on Hudson boxes causing test timeouts
[ https://issues.apache.org/jira/browse/HDFS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1286: -- Summary: Dry entropy pool on Hudson boxes causing test timeouts (was: TestFileAppend4 sometimes failed with already locked storage) Issue Type: Task (was: Bug) Dry entropy pool on Hudson boxes causing test timeouts -- Key: HDFS-1286 URL: https://issues.apache.org/jira/browse/HDFS-1286 Project: Hadoop HDFS Issue Type: Task Components: test Affects Versions: 0.22.0 Reporter: Todd Lipcon Fix For: 0.22.0 Attachments: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log, TestFileAppend4.testRecoverFinalizedBlock.log Some test runs seem to fail with already locked errors, though it passes locally. For example: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/423/testReport/ http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/421/testReport/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1286) TestFileAppend4 sometimes failed with already locked storage
[ https://issues.apache.org/jira/browse/HDFS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1286: -- Attachment: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log TestFileAppend4.testRecoverFinalizedBlock.log Attaching the log files. TestFileAppend4 sometimes failed with already locked storage -- Key: HDFS-1286 URL: https://issues.apache.org/jira/browse/HDFS-1286 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Todd Lipcon Fix For: 0.22.0 Attachments: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log, TestFileAppend4.testRecoverFinalizedBlock.log Some test runs seem to fail with already locked errors, though it passes locally. For example: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/423/testReport/ http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/421/testReport/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1289) Datanode secure mode is broken
Datanode secure mode is broken -- Key: HDFS-1289 URL: https://issues.apache.org/jira/browse/HDFS-1289 Project: Hadoop HDFS Issue Type: Bug Components: data-node Reporter: Kan Zhang Assignee: Kan Zhang HDFS-520 introduced a new DataNode constructor, which tries to set up an RPC connection to the NN before a Kerberos login is done. This causes datanode to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1286) Dry entropy pool on Hudson boxes causing test timeouts
[ https://issues.apache.org/jira/browse/HDFS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886508#action_12886508 ] Todd Lipcon commented on HDFS-1286: --- Alternatively, we could probably change the tests to use sequential or other pseudo-random bytes, but the entropy thing caused lots of spurious test timeouts for us in the past, so fixing Hudson's probably worth it. Dry entropy pool on Hudson boxes causing test timeouts -- Key: HDFS-1286 URL: https://issues.apache.org/jira/browse/HDFS-1286 Project: Hadoop HDFS Issue Type: Task Components: test Affects Versions: 0.22.0 Reporter: Todd Lipcon Fix For: 0.22.0 Attachments: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log, TestFileAppend4.testRecoverFinalizedBlock.log Some test runs seem to fail with already locked errors, though it passes locally. For example: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/423/testReport/ http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/421/testReport/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886521#action_12886521 ] Rodrigo Schmidt commented on HDFS-1094: --- @Konstantin: The new policy will have, for every block, a limited window of racks you can choose from, and a limited window of machines within such racks. For every block, we will keep the idea of having a copy that is local to the writer, and two copies at a remote rack, but always respecting this limited window of choices. Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1140) Speedup INode.getPathComponents
[ https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886528#action_12886528 ] Konstantin Shvachko commented on HDFS-1140: --- I filed HDFS-1284 and HDFS-1285 to address two other test failures. I checked javaDoc warnings locally, don't see anything related to this jira. Speedup INode.getPathComponents --- Key: HDFS-1140 URL: https://issues.apache.org/jira/browse/HDFS-1140 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Dmytro Molkov Assignee: Dmytro Molkov Priority: Minor Fix For: 0.22.0 Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, HDFS-1140.patch When the namenode is loading the image there is a significant amount of time being spent in the DFSUtil.string2Bytes. We have a very specific workload here. The path that namenode does getPathComponents for shares N - 1 component with the previous path this method was called for (assuming current path has N components). Hence we can improve the image load time by caching the result of previous conversion. We thought of using some simple LRU cache for components, but the reality is, String.getBytes gets optimized during runtime and LRU cache doesn't perform as well, however using just the latest path components and their translation to bytes in two arrays gives quite a performance boost. I could get another 20% off of the time to load the image on our cluster (30 seconds vs 24) and I wrote a simple benchmark that tests performance with and without caching. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1140) Speedup INode.getPathComponents
[ https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1140: -- Status: Resolved (was: Patch Available) Resolution: Fixed I just committed this. Thank you Dmytro. Speedup INode.getPathComponents --- Key: HDFS-1140 URL: https://issues.apache.org/jira/browse/HDFS-1140 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Dmytro Molkov Assignee: Dmytro Molkov Priority: Minor Fix For: 0.22.0 Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, HDFS-1140.patch When the namenode is loading the image there is a significant amount of time being spent in the DFSUtil.string2Bytes. We have a very specific workload here. The path that namenode does getPathComponents for shares N - 1 component with the previous path this method was called for (assuming current path has N components). Hence we can improve the image load time by caching the result of previous conversion. We thought of using some simple LRU cache for components, but the reality is, String.getBytes gets optimized during runtime and LRU cache doesn't perform as well, however using just the latest path components and their translation to bytes in two arrays gives quite a performance boost. I could get another 20% off of the time to load the image on our cluster (30 seconds vs 24) and I wrote a simple benchmark that tests performance with and without caching. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1289) Datanode secure mode is broken
[ https://issues.apache.org/jira/browse/HDFS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated HDFS-1289: Status: Patch Available (was: Open) Datanode secure mode is broken -- Key: HDFS-1289 URL: https://issues.apache.org/jira/browse/HDFS-1289 Project: Hadoop HDFS Issue Type: Bug Components: data-node Reporter: Kan Zhang Assignee: Kan Zhang Attachments: h1289-01.patch HDFS-520 introduced a new DataNode constructor, which tries to set up an RPC connection to the NN before a Kerberos login is done. This causes datanode to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1289) Datanode secure mode is broken
[ https://issues.apache.org/jira/browse/HDFS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated HDFS-1289: Attachment: h1289-01.patch A small patch that moves the login call earlier (per Devaraj). Datanode secure mode is broken -- Key: HDFS-1289 URL: https://issues.apache.org/jira/browse/HDFS-1289 Project: Hadoop HDFS Issue Type: Bug Components: data-node Reporter: Kan Zhang Assignee: Kan Zhang Attachments: h1289-01.patch HDFS-520 introduced a new DataNode constructor, which tries to set up an RPC connection to the NN before a Kerberos login is done. This causes datanode to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1290) decommissioned nodes report not consistent / clear
decommissioned nodes report not consistent / clear -- Key: HDFS-1290 URL: https://issues.apache.org/jira/browse/HDFS-1290 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Environment: fedora 12 Reporter: Arun Ramakrishnan after i add the list of decom nodes to exclude list and -refreshNodes. In the WebUI the decom/excluded nodes show up in both the live node list and the dead node list. when I do -report from the command line. Datanodes available: 14 (20 total, 6 dead) The problem here is that is only 14 nodes total including the 6 added to the exclude list. Now, in the node level status for each of the nodes, the excluded nodes say Decommission Status : Normal DFS Used%: 100% DFS Remaining%: 0% But, all the nodes say the same thing. I think if it said something like in-progress, it would be more informative. note. one thing distinguishing these excluded nodes is that they all report 0 or 100% values in -report. Cause, at this point i know from https://issues.apache.org/jira/browse/HDFS-1125 that one may have to restart the cluster to completely remove the nodes. But, i have no clue when i should restart. Ultimately, whats needed is some indication to when the decomission is complete so that all references to the excluded nodes ( from excludes, slaves ) and restart the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1140) Speedup INode.getPathComponents
[ https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886556#action_12886556 ] Hudson commented on HDFS-1140: -- Integrated in Hadoop-Hdfs-trunk-Commit #334 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/334/]) HDFS-1140. Speedup INode.getPathComponents. Contributed by Dmytro Molkov. Speedup INode.getPathComponents --- Key: HDFS-1140 URL: https://issues.apache.org/jira/browse/HDFS-1140 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Dmytro Molkov Assignee: Dmytro Molkov Priority: Minor Fix For: 0.22.0 Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, HDFS-1140.patch When the namenode is loading the image there is a significant amount of time being spent in the DFSUtil.string2Bytes. We have a very specific workload here. The path that namenode does getPathComponents for shares N - 1 component with the previous path this method was called for (assuming current path has N components). Hence we can improve the image load time by caching the result of previous conversion. We thought of using some simple LRU cache for components, but the reality is, String.getBytes gets optimized during runtime and LRU cache doesn't perform as well, however using just the latest path components and their translation to bytes in two arrays gives quite a performance boost. I could get another 20% off of the time to load the image on our cluster (30 seconds vs 24) and I wrote a simple benchmark that tests performance with and without caching. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1125) Removing a datanode (failed or decommissioned) should not require a namenode restart
[ https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886554#action_12886554 ] Arun Ramakrishnan commented on HDFS-1125: - related to step c. when would one know if decom is finished ? Also i suppose you can remove from excludes same time you remove from slaves files ? Removing a datanode (failed or decommissioned) should not require a namenode restart Key: HDFS-1125 URL: https://issues.apache.org/jira/browse/HDFS-1125 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2 Reporter: Alex Loddengaard Priority: Minor I've heard of several Hadoop users using dfsadmin -report to monitor the number of dead nodes, and alert if that number is not 0. This mechanism tends to work pretty well, except when a node is decommissioned or fails, because then the namenode requires a restart for said node to be entirely removed from HDFS. More details here: http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results Removal from the exclude file and a refresh should get rid of the dead node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1125) Removing a datanode (failed or decommissioned) should not require a namenode restart
[ https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886560#action_12886560 ] Allen Wittenauer commented on HDFS-1125: It will show up in the dead node list. Removing a datanode (failed or decommissioned) should not require a namenode restart Key: HDFS-1125 URL: https://issues.apache.org/jira/browse/HDFS-1125 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2 Reporter: Alex Loddengaard Priority: Minor I've heard of several Hadoop users using dfsadmin -report to monitor the number of dead nodes, and alert if that number is not 0. This mechanism tends to work pretty well, except when a node is decommissioned or fails, because then the namenode requires a restart for said node to be entirely removed from HDFS. More details here: http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results Removal from the exclude file and a refresh should get rid of the dead node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1290) decommissioned nodes report not consistent / clear
[ https://issues.apache.org/jira/browse/HDFS-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Ramakrishnan updated HDFS-1290: Description: after i add the list of decom nodes to exclude list and -refreshNodes. In the WebUI the decom/excluded nodes show up in both the live node list and the dead node list. when I do -report from the command line. Datanodes available: 14 (20 total, 6 dead) The problem here is that is only 14 nodes total including the 6 added to the exclude list. Now, in the node level status for each of the nodes, the excluded nodes say Decommission Status : Normal But, all the nodes say the same thing. I think if it said something like in-progress, it would be more informative. note. one thing distinguishing these excluded nodes is that they all report 0 or 100% for all the values in -report. Cause, at this point i know from https://issues.apache.org/jira/browse/HDFS-1125 that one may have to restart the cluster to completely remove the nodes. But, i have no clue when i should restart. Ultimately, whats needed is some indication to when the decomission is complete so that all references to the excluded nodes ( from excludes, slaves ) and restart the cluster. was: after i add the list of decom nodes to exclude list and -refreshNodes. In the WebUI the decom/excluded nodes show up in both the live node list and the dead node list. when I do -report from the command line. Datanodes available: 14 (20 total, 6 dead) The problem here is that is only 14 nodes total including the 6 added to the exclude list. Now, in the node level status for each of the nodes, the excluded nodes say Decommission Status : Normal DFS Used%: 100% DFS Remaining%: 0% But, all the nodes say the same thing. I think if it said something like in-progress, it would be more informative. note. one thing distinguishing these excluded nodes is that they all report 0 or 100% values in -report. Cause, at this point i know from https://issues.apache.org/jira/browse/HDFS-1125 that one may have to restart the cluster to completely remove the nodes. But, i have no clue when i should restart. Ultimately, whats needed is some indication to when the decomission is complete so that all references to the excluded nodes ( from excludes, slaves ) and restart the cluster. decommissioned nodes report not consistent / clear -- Key: HDFS-1290 URL: https://issues.apache.org/jira/browse/HDFS-1290 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Environment: fedora 12 Reporter: Arun Ramakrishnan after i add the list of decom nodes to exclude list and -refreshNodes. In the WebUI the decom/excluded nodes show up in both the live node list and the dead node list. when I do -report from the command line. Datanodes available: 14 (20 total, 6 dead) The problem here is that is only 14 nodes total including the 6 added to the exclude list. Now, in the node level status for each of the nodes, the excluded nodes say Decommission Status : Normal But, all the nodes say the same thing. I think if it said something like in-progress, it would be more informative. note. one thing distinguishing these excluded nodes is that they all report 0 or 100% for all the values in -report. Cause, at this point i know from https://issues.apache.org/jira/browse/HDFS-1125 that one may have to restart the cluster to completely remove the nodes. But, i have no clue when i should restart. Ultimately, whats needed is some indication to when the decomission is complete so that all references to the excluded nodes ( from excludes, slaves ) and restart the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.
[ https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886570#action_12886570 ] Devaraj Das commented on HDFS-1006: --- +1 getImage/putImage http requests should be https for the case of security enabled. - Key: HDFS-1006 URL: https://issues.apache.org/jira/browse/HDFS-1006 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Boris Shkolnik Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, HDFS-1006-Y20.patch should use https:// and port 50475 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1272) HDFS changes corresponding to rename of TokenStorage to Credentials
[ https://issues.apache.org/jira/browse/HDFS-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886576#action_12886576 ] Jitendra Nath Pandey commented on HDFS-1272: javadoc, findbugs, javac warnings were tested manually. HDFS changes corresponding to rename of TokenStorage to Credentials --- Key: HDFS-1272 URL: https://issues.apache.org/jira/browse/HDFS-1272 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HDFS-1272.1.patch TokenStorage is renamed to Credentials as part of MAPREDUCE-1528 and HADOOP-6845. This jira tracks hdfs changes corresponding to that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.
[ https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1006: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I've committed this. Resolving as fixed. getImage/putImage http requests should be https for the case of security enabled. - Key: HDFS-1006 URL: https://issues.apache.org/jira/browse/HDFS-1006 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Boris Shkolnik Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, HDFS-1006-Y20.patch should use https:// and port 50475 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections
[ https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1045: -- Attachment: HDFS-1045-trunk.patch Patch for trunk. Straight-forward port, except that as noted in HDFS-1006, the bugfix that had been done on 20 for that patch is done here, since there was a dependency on 1045. In secure clusters, re-login is necessary for https clients before opening connections -- Key: HDFS-1045 URL: https://issues.apache.org/jira/browse/HDFS-1045 Project: Hadoop HDFS Issue Type: Bug Components: security Reporter: Jakob Homan Assignee: Jakob Homan Attachments: HDFS-1045-trunk.patch, HDFS-1045-Y20.patch Ticket credentials expire and therefore clients opening https connections (only the NN and SNN doing image/edits exchange) should re-login before opening those connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections
[ https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1045: -- Status: Open (was: Patch Available) In secure clusters, re-login is necessary for https clients before opening connections -- Key: HDFS-1045 URL: https://issues.apache.org/jira/browse/HDFS-1045 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, HDFS-1045-Y20.patch Ticket credentials expire and therefore clients opening https connections (only the NN and SNN doing image/edits exchange) should re-login before opening those connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections
[ https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1045: -- Status: Patch Available (was: Open) Affects Version/s: 0.22.0 Fix Version/s: 0.22.0 submitting patch. In secure clusters, re-login is necessary for https clients before opening connections -- Key: HDFS-1045 URL: https://issues.apache.org/jira/browse/HDFS-1045 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HDFS-1045-trunk.patch, HDFS-1045-Y20.patch Ticket credentials expire and therefore clients opening https connections (only the NN and SNN doing image/edits exchange) should re-login before opening those connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections
[ https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1045: -- Status: Patch Available (was: Open) re-submitting patch. In secure clusters, re-login is necessary for https clients before opening connections -- Key: HDFS-1045 URL: https://issues.apache.org/jira/browse/HDFS-1045 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, HDFS-1045-Y20.patch Ticket credentials expire and therefore clients opening https connections (only the NN and SNN doing image/edits exchange) should re-login before opening those connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections
[ https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1045: -- Attachment: HDFS-1045-trunk-2.patch Forgot to commit before making patch. Updated file. In secure clusters, re-login is necessary for https clients before opening connections -- Key: HDFS-1045 URL: https://issues.apache.org/jira/browse/HDFS-1045 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, HDFS-1045-Y20.patch Ticket credentials expire and therefore clients opening https connections (only the NN and SNN doing image/edits exchange) should re-login before opening those connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections
[ https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886584#action_12886584 ] Devaraj Das commented on HDFS-1045: --- +1 In secure clusters, re-login is necessary for https clients before opening connections -- Key: HDFS-1045 URL: https://issues.apache.org/jira/browse/HDFS-1045 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, HDFS-1045-Y20.patch Ticket credentials expire and therefore clients opening https connections (only the NN and SNN doing image/edits exchange) should re-login before opening those connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886595#action_12886595 ] Hairong Kuang commented on HDFS-1094: - I don't think I understand your use case. It only seem to be advantageous to do what you say if there are multiple readers for the same file. We designed it this way because it would be relatively easy to understand, implement, generalize, and plan for (as users). You still do not get my point. I think the goal of this policy is to reduce the data loss by limiting the # of nodes to place a file's data. Is this additional limitation of racks neccessary? Or are you saying this is just easy for you to implement. I do not see how this helps users understand or plan. In general, having less configuration parameters is easier for users to understand or plan. Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1289) Datanode secure mode is broken
[ https://issues.apache.org/jira/browse/HDFS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886596#action_12886596 ] Hadoop QA commented on HDFS-1289: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449028/h1289-01.patch against trunk revision 961966. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/211/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/211/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/211/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/211/console This message is automatically generated. Datanode secure mode is broken -- Key: HDFS-1289 URL: https://issues.apache.org/jira/browse/HDFS-1289 Project: Hadoop HDFS Issue Type: Bug Components: data-node Reporter: Kan Zhang Assignee: Kan Zhang Attachments: h1289-01.patch HDFS-520 introduced a new DataNode constructor, which tries to set up an RPC connection to the NN before a Kerberos login is done. This causes datanode to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886599#action_12886599 ] Rodrigo Schmidt commented on HDFS-1094: --- I think I get your point. I don't think you get mine, though. I just want to know what else you have in mind. Let me put it this way: If you were to implement this, what would you do? Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886600#action_12886600 ] Rodrigo Schmidt commented on HDFS-1094: --- Just complementing my previous comment, I don't know of a better way to implement this that wouldn't be either overly complicated, or hard to configure and understand. I'm open to other ideas, but you have to give me some. Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.
[ https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886601#action_12886601 ] Hudson commented on HDFS-1006: -- Integrated in Hadoop-Hdfs-trunk-Commit #335 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/335/]) HDFS-1006. getImage/putImage http requests should be https for the case of security enabled. getImage/putImage http requests should be https for the case of security enabled. - Key: HDFS-1006 URL: https://issues.apache.org/jira/browse/HDFS-1006 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Boris Shkolnik Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, HDFS-1006-Y20.patch should use https:// and port 50475 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections
[ https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886605#action_12886605 ] Hadoop QA commented on HDFS-1045: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449050/HDFS-1045-trunk-2.patch against trunk revision 962380. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/427/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/427/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/427/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/427/console This message is automatically generated. In secure clusters, re-login is necessary for https clients before opening connections -- Key: HDFS-1045 URL: https://issues.apache.org/jira/browse/HDFS-1045 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, HDFS-1045-Y20.patch Ticket credentials expire and therefore clients opening https connections (only the NN and SNN doing image/edits exchange) should re-login before opening those connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886613#action_12886613 ] Hairong Kuang commented on HDFS-1094: - Hi Rodrigo, I do not know your algorithm so I have no idea how relaxing the rack restriction would complicate your implementation.:) Assume that a user wants to place a file's blocks to at most N datanodes and a cluster has R racks, if you place blocks to at most N/R datanodes per rack, is it a special case as your proposal? Of course, there are other algorithms too... Intelligent block placement policy to decrease probability of block loss Key: HDFS-1094 URL: https://issues.apache.org/jira/browse/HDFS-1094 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Rodrigo Schmidt Attachments: prob.pdf, prob.pdf The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections
[ https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886615#action_12886615 ] Hadoop QA commented on HDFS-1045: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449050/HDFS-1045-trunk-2.patch against trunk revision 962380. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/212/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/212/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/212/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/212/console This message is automatically generated. In secure clusters, re-login is necessary for https clients before opening connections -- Key: HDFS-1045 URL: https://issues.apache.org/jira/browse/HDFS-1045 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, HDFS-1045-Y20.patch Ticket credentials expire and therefore clients opening https connections (only the NN and SNN doing image/edits exchange) should re-login before opening those connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.