[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886254#action_12886254
 ] 

Hairong Kuang commented on HDFS-1094:
-

I did not mean #racks per block. I mean racks for a file.

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886257#action_12886257
 ] 

Rodrigo Schmidt commented on HDFS-1094:
---

As Joydeep wrote, we didn't think this was a major problem. What is your 
proposal to fix that?

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1287) Why TreeSet is used when collecting block information FSDataSet::getBlockReport

2010-07-08 Thread NarayanaSwamy (JIRA)
Why TreeSet is used when collecting block information FSDataSet::getBlockReport
---

 Key: HDFS-1287
 URL: https://issues.apache.org/jira/browse/HDFS-1287
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: NarayanaSwamy


As a return value we are converting this to array and returning and in name 
node also we are iterating ... so can we use list onstead of set. (As the block 
ids are unique, there may not be duplicates)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886363#action_12886363
 ] 

Hairong Kuang commented on HDFS-1094:
-

For a large file, it does matter especially in the use case of compacting large 
number of small files (like reduce results) into one by concatenating or 
archiving. 

Anyway, no matter it matters or not, my question is why you want to have this 
rack limitation?

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes

2010-07-08 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886386#action_12886386
 ] 

Suresh Srinivas commented on HDFS-1052:
---

Gulin,
# An application could choose to use one of the namenodes as default file 
system in its configuration. In that case /a/b/c will be resolved relative to 
that namespace.
# There is a proposal in HDFS-1053 for client side mount tables, where client 
can define it's namespace and how it maps to server side namespace. In that 
case /a/b/c will be resolved in the context of client side mount table.


 HDFS scalability with multiple namenodes
 

 Key: HDFS-1052
 URL: https://issues.apache.org/jira/browse/HDFS-1052
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Affects Versions: 0.22.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf


 HDFS currently uses a single namenode that limits scalability of the cluster. 
 This jira proposes an architecture to scale the nameservice horizontally 
 using multiple namenodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes

2010-07-08 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886384#action_12886384
 ] 

Suresh Srinivas commented on HDFS-1052:
---

Min, yes distributed namespace could be another proposal to solve this problem. 
However, it is a lot more complicated solution to develop, takes much longer 
time and involves a lot of changes to the system. This does not fit the time 
line in which we need a solution to namenode scalability.

 HDFS scalability with multiple namenodes
 

 Key: HDFS-1052
 URL: https://issues.apache.org/jira/browse/HDFS-1052
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Affects Versions: 0.22.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf


 HDFS currently uses a single namenode that limits scalability of the cluster. 
 This jira proposes an architecture to scale the nameservice horizontally 
 using multiple namenodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1288) start-all.sh / stop-all.sh does not seem to work with HDFS

2010-07-08 Thread Aaron Kimball (JIRA)
start-all.sh / stop-all.sh does not seem to work with HDFS
--

 Key: HDFS-1288
 URL: https://issues.apache.org/jira/browse/HDFS-1288
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.21.0
Reporter: Aaron Kimball
Priority: Minor


The start-all.sh / stop-all.sh script shipping with the combined 
hadoop-0.21.0-rc1 does not start/stop the DFS daemons unless $HADOOP_HDFS_HOME 
is explicitly set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1288) start-all.sh / stop-all.sh does not seem to work with HDFS

2010-07-08 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886405#action_12886405
 ] 

Aaron Kimball commented on HDFS-1288:
-

If I explicitly set $HADOOP_HDFS_HOME=$HADOOP_HOME/hdfs then it works fine. But 
what is curious is that I do not need to explicitly set $HADOOP_MAPRED_HOME.

So there's some asymmetry in how these scripts work with HDFS and mapred. At 
the very least, they should print a warning that they couldn't do the dfs-side 
work if they can't find the scripts?


 start-all.sh / stop-all.sh does not seem to work with HDFS
 --

 Key: HDFS-1288
 URL: https://issues.apache.org/jira/browse/HDFS-1288
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.21.0
Reporter: Aaron Kimball
Priority: Minor

 The start-all.sh / stop-all.sh script shipping with the combined 
 hadoop-0.21.0-rc1 does not start/stop the DFS daemons unless 
 $HADOOP_HDFS_HOME is explicitly set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1288) start-all.sh / stop-all.sh does not seem to work with HDFS

2010-07-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886416#action_12886416
 ] 

Allen Wittenauer commented on HDFS-1288:


This is a regression and should be a blocker.

 start-all.sh / stop-all.sh does not seem to work with HDFS
 --

 Key: HDFS-1288
 URL: https://issues.apache.org/jira/browse/HDFS-1288
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.21.0
Reporter: Aaron Kimball
Priority: Minor

 The start-all.sh / stop-all.sh script shipping with the combined 
 hadoop-0.21.0-rc1 does not start/stop the DFS daemons unless 
 $HADOOP_HDFS_HOME is explicitly set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886418#action_12886418
 ] 

Rodrigo Schmidt commented on HDFS-1094:
---

I don't think I understand your use case. It only seem to be advantageous to do 
what you say if there are multiple readers for the same file.

We designed it this way because it would be relatively easy to understand, 
implement, generalize, and plan for (as users).

But I'm quite open to options. What would you propose instead?

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886430#action_12886430
 ] 

Konstantin Shvachko commented on HDFS-1094:
---

The math looks good to me (in pdf file).

 Data loss probability P depends on time T.

Here the assumption is, correct me if it's wrong, that f nodes fail 
simultaneously. Otherwise, we should take into account replication process, 
which will be restoring some blocks while other nodes are still up, decreasing 
the probability of data loss. Probability of losing f nodes simultaneously at a 
particular moment does not depend on time. The probability of a simultaneous 
failure of f nodes during a specific period of time depends on the length of 
the period. So if you choose the parameter p in the document correctly 
(depending on the time period), then you get the the probability of a data loss 
during this period of time.

The assumption p = 0.01 or 0.001 seems arbitrary, but it probably does not 
matter as you compare different strategies with the same value.

What is missing in the analysis is that the probability of loosing a whole rack 
is much higher than the probability of loosing any 20 machines in the cluster. 
It should be actually equivalent to the probability of loosing one machine, 
because you loose one switch and the whole rack is out.
And that was one of the main reasons why we decided to replicate off rack.
Rodrigo, did I understand correctly that your idea is to experiment with 
replication within the rack, that is, all replicas are placed on different 
machines in the same rack?

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections

2010-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886451#action_12886451
 ] 

Hudson commented on HDFS-1045:
--

Integrated in Hadoop-Common-trunk-Commit #322 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/322/])
HADOOP-6853. Common component of HDFS-1045.


 In secure clusters, re-login is necessary for https clients before opening 
 connections
 --

 Key: HDFS-1045
 URL: https://issues.apache.org/jira/browse/HDFS-1045
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Jakob Homan
Assignee: Jakob Homan
 Attachments: HDFS-1045-Y20.patch


 Ticket credentials expire and therefore clients opening https connections 
 (only the NN and SNN doing image/edits exchange) should re-login before 
 opening those connections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.

2010-07-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1006:
--

Status: Patch Available  (was: Open)

Re-submitting to Hudson.

 getImage/putImage http requests should be https for the case of security 
 enabled.
 -

 Key: HDFS-1006
 URL: https://issues.apache.org/jira/browse/HDFS-1006
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, 
 HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, 
 HDFS-1006-Y20.patch


 should use https:// and port 50475

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.

2010-07-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1006:
--

Attachment: HDFS-1006-trunk-2.patch

Updated patch for Devaraj's comments.  Now throw an exception if using wildcard 
addr with security, although once we have a unit testing framework for Kerberos 
- please, please, please - we'll need to come up with a good way of dealing 
with this.

 getImage/putImage http requests should be https for the case of security 
 enabled.
 -

 Key: HDFS-1006
 URL: https://issues.apache.org/jira/browse/HDFS-1006
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, 
 HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, 
 HDFS-1006-Y20.patch


 should use https:// and port 50475

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.

2010-07-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1006:
--

Status: Open  (was: Patch Available)

 getImage/putImage http requests should be https for the case of security 
 enabled.
 -

 Key: HDFS-1006
 URL: https://issues.apache.org/jira/browse/HDFS-1006
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, 
 HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, 
 HDFS-1006-Y20.patch


 should use https:// and port 50475

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1286) TestFileAppend4 sometimes failed with already locked storage

2010-07-08 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886504#action_12886504
 ] 

Todd Lipcon commented on HDFS-1286:
---

In both cases it looks like a lack of entropy on the build box caused the first 
test to time out:
{code}
java.lang.Exception: test timed out after 6 milliseconds
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:199)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at 
sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:453)
{code}

The TFA4 tests use a lot of random bytes, and apparently the entropy pool is a 
bit low on the hudson box.

We use this trick on our build boxes here:
http://www.chrissearle.org/blog/technical/increase_entropy_26_kernel_linux_box

Can someone with access to the Hudson box get this set up?

 TestFileAppend4 sometimes failed with already locked storage
 --

 Key: HDFS-1286
 URL: https://issues.apache.org/jira/browse/HDFS-1286
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Todd Lipcon
 Fix For: 0.22.0

 Attachments: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log, 
 TestFileAppend4.testRecoverFinalizedBlock.log


 Some test runs seem to fail with already locked errors, though it passes 
 locally. For example:
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/423/testReport/
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/421/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1286) Dry entropy pool on Hudson boxes causing test timeouts

2010-07-08 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1286:
--

   Summary: Dry entropy pool on Hudson boxes causing test timeouts  (was: 
TestFileAppend4 sometimes failed with already locked storage)
Issue Type: Task  (was: Bug)

 Dry entropy pool on Hudson boxes causing test timeouts
 --

 Key: HDFS-1286
 URL: https://issues.apache.org/jira/browse/HDFS-1286
 Project: Hadoop HDFS
  Issue Type: Task
  Components: test
Affects Versions: 0.22.0
Reporter: Todd Lipcon
 Fix For: 0.22.0

 Attachments: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log, 
 TestFileAppend4.testRecoverFinalizedBlock.log


 Some test runs seem to fail with already locked errors, though it passes 
 locally. For example:
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/423/testReport/
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/421/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1286) TestFileAppend4 sometimes failed with already locked storage

2010-07-08 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-1286:
--

Attachment: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log
TestFileAppend4.testRecoverFinalizedBlock.log

Attaching the log files.

 TestFileAppend4 sometimes failed with already locked storage
 --

 Key: HDFS-1286
 URL: https://issues.apache.org/jira/browse/HDFS-1286
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Todd Lipcon
 Fix For: 0.22.0

 Attachments: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log, 
 TestFileAppend4.testRecoverFinalizedBlock.log


 Some test runs seem to fail with already locked errors, though it passes 
 locally. For example:
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/423/testReport/
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/421/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1289) Datanode secure mode is broken

2010-07-08 Thread Kan Zhang (JIRA)
Datanode secure mode is broken
--

 Key: HDFS-1289
 URL: https://issues.apache.org/jira/browse/HDFS-1289
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Kan Zhang
Assignee: Kan Zhang


HDFS-520 introduced a new DataNode constructor, which tries to set up an RPC 
connection to the NN before a Kerberos login is done. This causes datanode to 
fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1286) Dry entropy pool on Hudson boxes causing test timeouts

2010-07-08 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886508#action_12886508
 ] 

Todd Lipcon commented on HDFS-1286:
---

Alternatively, we could probably change the tests to use sequential or other 
pseudo-random bytes, but the entropy thing caused lots of spurious test 
timeouts for us in the past, so fixing Hudson's probably worth it.

 Dry entropy pool on Hudson boxes causing test timeouts
 --

 Key: HDFS-1286
 URL: https://issues.apache.org/jira/browse/HDFS-1286
 Project: Hadoop HDFS
  Issue Type: Task
  Components: test
Affects Versions: 0.22.0
Reporter: Todd Lipcon
 Fix For: 0.22.0

 Attachments: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log, 
 TestFileAppend4.testRecoverFinalizedBlock.log


 Some test runs seem to fail with already locked errors, though it passes 
 locally. For example:
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/423/testReport/
 http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/421/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886521#action_12886521
 ] 

Rodrigo Schmidt commented on HDFS-1094:
---

@Konstantin: The new policy will have, for every block, a limited window of 
racks you can choose from, and a limited window of machines within such racks. 
For every block, we will keep the idea of having a copy that is local to the 
writer, and two copies at a remote rack, but always respecting this limited 
window of choices.

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1140) Speedup INode.getPathComponents

2010-07-08 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886528#action_12886528
 ] 

Konstantin Shvachko commented on HDFS-1140:
---

I filed HDFS-1284 and HDFS-1285 to address two other test failures. I checked 
javaDoc warnings locally, don't see anything related to this jira.

 Speedup INode.getPathComponents
 ---

 Key: HDFS-1140
 URL: https://issues.apache.org/jira/browse/HDFS-1140
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, 
 HDFS-1140.patch


 When the namenode is loading the image there is a significant amount of time 
 being spent in the DFSUtil.string2Bytes. We have a very specific workload 
 here. The path that namenode does getPathComponents for shares N - 1 
 component with the previous path this method was called for (assuming current 
 path has N components).
 Hence we can improve the image load time by caching the result of previous 
 conversion.
 We thought of using some simple LRU cache for components, but the reality is, 
 String.getBytes gets optimized during runtime and LRU cache doesn't perform 
 as well, however using just the latest path components and their translation 
 to bytes in two arrays gives quite a performance boost.
 I could get another 20% off of the time to load the image on our cluster (30 
 seconds vs 24) and I wrote a simple benchmark that tests performance with and 
 without caching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1140) Speedup INode.getPathComponents

2010-07-08 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-1140:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this. Thank you Dmytro.

 Speedup INode.getPathComponents
 ---

 Key: HDFS-1140
 URL: https://issues.apache.org/jira/browse/HDFS-1140
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, 
 HDFS-1140.patch


 When the namenode is loading the image there is a significant amount of time 
 being spent in the DFSUtil.string2Bytes. We have a very specific workload 
 here. The path that namenode does getPathComponents for shares N - 1 
 component with the previous path this method was called for (assuming current 
 path has N components).
 Hence we can improve the image load time by caching the result of previous 
 conversion.
 We thought of using some simple LRU cache for components, but the reality is, 
 String.getBytes gets optimized during runtime and LRU cache doesn't perform 
 as well, however using just the latest path components and their translation 
 to bytes in two arrays gives quite a performance boost.
 I could get another 20% off of the time to load the image on our cluster (30 
 seconds vs 24) and I wrote a simple benchmark that tests performance with and 
 without caching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1289) Datanode secure mode is broken

2010-07-08 Thread Kan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kan Zhang updated HDFS-1289:


Status: Patch Available  (was: Open)

 Datanode secure mode is broken
 --

 Key: HDFS-1289
 URL: https://issues.apache.org/jira/browse/HDFS-1289
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Kan Zhang
Assignee: Kan Zhang
 Attachments: h1289-01.patch


 HDFS-520 introduced a new DataNode constructor, which tries to set up an RPC 
 connection to the NN before a Kerberos login is done. This causes datanode to 
 fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1289) Datanode secure mode is broken

2010-07-08 Thread Kan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kan Zhang updated HDFS-1289:


Attachment: h1289-01.patch

A small patch that moves the login call earlier (per Devaraj).

 Datanode secure mode is broken
 --

 Key: HDFS-1289
 URL: https://issues.apache.org/jira/browse/HDFS-1289
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Kan Zhang
Assignee: Kan Zhang
 Attachments: h1289-01.patch


 HDFS-520 introduced a new DataNode constructor, which tries to set up an RPC 
 connection to the NN before a Kerberos login is done. This causes datanode to 
 fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1290) decommissioned nodes report not consistent / clear

2010-07-08 Thread Arun Ramakrishnan (JIRA)
decommissioned nodes report not consistent / clear
--

 Key: HDFS-1290
 URL: https://issues.apache.org/jira/browse/HDFS-1290
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
 Environment: fedora 12
Reporter: Arun Ramakrishnan


after i add the list of decom nodes to exclude list and -refreshNodes.
In the WebUI the decom/excluded nodes show up in both the live node list and 
the dead node list.

when I do -report from the command line.

  Datanodes available: 14 (20 total, 6 dead)

The problem here is that is only 14 nodes total including the 6 added to the 
exclude list. 

Now, in the node level status for each of the nodes, the excluded nodes say 

 Decommission Status : Normal
 DFS Used%: 100%
 DFS Remaining%: 0%

 But, all the nodes say the same thing. I think if it said something like 
in-progress, it would be more informative. 
note. one thing distinguishing these excluded nodes is that they all report 0 
or 100% values in -report.

Cause, at this point i know from 
https://issues.apache.org/jira/browse/HDFS-1125 that one may have to restart 
the cluster to completely remove the nodes.
But, i have no clue when i should restart.

Ultimately, whats needed is some indication to when the decomission is complete 
so that all references to the excluded nodes ( from excludes, slaves ) and 
restart the cluster.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1140) Speedup INode.getPathComponents

2010-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886556#action_12886556
 ] 

Hudson commented on HDFS-1140:
--

Integrated in Hadoop-Hdfs-trunk-Commit #334 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/334/])
HDFS-1140. Speedup INode.getPathComponents. Contributed by Dmytro Molkov.


 Speedup INode.getPathComponents
 ---

 Key: HDFS-1140
 URL: https://issues.apache.org/jira/browse/HDFS-1140
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, 
 HDFS-1140.patch


 When the namenode is loading the image there is a significant amount of time 
 being spent in the DFSUtil.string2Bytes. We have a very specific workload 
 here. The path that namenode does getPathComponents for shares N - 1 
 component with the previous path this method was called for (assuming current 
 path has N components).
 Hence we can improve the image load time by caching the result of previous 
 conversion.
 We thought of using some simple LRU cache for components, but the reality is, 
 String.getBytes gets optimized during runtime and LRU cache doesn't perform 
 as well, however using just the latest path components and their translation 
 to bytes in two arrays gives quite a performance boost.
 I could get another 20% off of the time to load the image on our cluster (30 
 seconds vs 24) and I wrote a simple benchmark that tests performance with and 
 without caching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1125) Removing a datanode (failed or decommissioned) should not require a namenode restart

2010-07-08 Thread Arun Ramakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886554#action_12886554
 ] 

Arun Ramakrishnan commented on HDFS-1125:
-

related to step c.  when would one know if decom is finished ?
Also i suppose you can remove from excludes same time you remove from slaves 
files ?

 Removing a datanode (failed or decommissioned) should not require a namenode 
 restart
 

 Key: HDFS-1125
 URL: https://issues.apache.org/jira/browse/HDFS-1125
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
Reporter: Alex Loddengaard
Priority: Minor

 I've heard of several Hadoop users using dfsadmin -report to monitor the 
 number of dead nodes, and alert if that number is not 0.  This mechanism 
 tends to work pretty well, except when a node is decommissioned or fails, 
 because then the namenode requires a restart for said node to be entirely 
 removed from HDFS.  More details here:
 http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results
 Removal from the exclude file and a refresh should get rid of the dead node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1125) Removing a datanode (failed or decommissioned) should not require a namenode restart

2010-07-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886560#action_12886560
 ] 

Allen Wittenauer commented on HDFS-1125:


It will show up in the dead node list.

 Removing a datanode (failed or decommissioned) should not require a namenode 
 restart
 

 Key: HDFS-1125
 URL: https://issues.apache.org/jira/browse/HDFS-1125
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
Reporter: Alex Loddengaard
Priority: Minor

 I've heard of several Hadoop users using dfsadmin -report to monitor the 
 number of dead nodes, and alert if that number is not 0.  This mechanism 
 tends to work pretty well, except when a node is decommissioned or fails, 
 because then the namenode requires a restart for said node to be entirely 
 removed from HDFS.  More details here:
 http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results
 Removal from the exclude file and a refresh should get rid of the dead node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1290) decommissioned nodes report not consistent / clear

2010-07-08 Thread Arun Ramakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Ramakrishnan updated HDFS-1290:


Description: 
after i add the list of decom nodes to exclude list and -refreshNodes.
In the WebUI the decom/excluded nodes show up in both the live node list and 
the dead node list.

when I do -report from the command line.

  Datanodes available: 14 (20 total, 6 dead)

The problem here is that is only 14 nodes total including the 6 added to the 
exclude list. 

Now, in the node level status for each of the nodes, the excluded nodes say 

 Decommission Status : Normal

 But, all the nodes say the same thing. I think if it said something like 
in-progress, it would be more informative. 
note. one thing distinguishing these excluded nodes is that they all report 0 
or 100% for all the values in -report.

Cause, at this point i know from 
https://issues.apache.org/jira/browse/HDFS-1125 that one may have to restart 
the cluster to completely remove the nodes.
But, i have no clue when i should restart.

Ultimately, whats needed is some indication to when the decomission is complete 
so that all references to the excluded nodes ( from excludes, slaves ) and 
restart the cluster.



  was:
after i add the list of decom nodes to exclude list and -refreshNodes.
In the WebUI the decom/excluded nodes show up in both the live node list and 
the dead node list.

when I do -report from the command line.

  Datanodes available: 14 (20 total, 6 dead)

The problem here is that is only 14 nodes total including the 6 added to the 
exclude list. 

Now, in the node level status for each of the nodes, the excluded nodes say 

 Decommission Status : Normal
 DFS Used%: 100%
 DFS Remaining%: 0%

 But, all the nodes say the same thing. I think if it said something like 
in-progress, it would be more informative. 
note. one thing distinguishing these excluded nodes is that they all report 0 
or 100% values in -report.

Cause, at this point i know from 
https://issues.apache.org/jira/browse/HDFS-1125 that one may have to restart 
the cluster to completely remove the nodes.
But, i have no clue when i should restart.

Ultimately, whats needed is some indication to when the decomission is complete 
so that all references to the excluded nodes ( from excludes, slaves ) and 
restart the cluster.




 decommissioned nodes report not consistent / clear
 --

 Key: HDFS-1290
 URL: https://issues.apache.org/jira/browse/HDFS-1290
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
 Environment: fedora 12
Reporter: Arun Ramakrishnan

 after i add the list of decom nodes to exclude list and -refreshNodes.
 In the WebUI the decom/excluded nodes show up in both the live node list and 
 the dead node list.
 when I do -report from the command line.
 
   Datanodes available: 14 (20 total, 6 dead)
 
 The problem here is that is only 14 nodes total including the 6 added to the 
 exclude list. 
 Now, in the node level status for each of the nodes, the excluded nodes say 
 
  Decommission Status : Normal
 
  But, all the nodes say the same thing. I think if it said something like 
 in-progress, it would be more informative. 
 note. one thing distinguishing these excluded nodes is that they all report 0 
 or 100% for all the values in -report.
 Cause, at this point i know from 
 https://issues.apache.org/jira/browse/HDFS-1125 that one may have to restart 
 the cluster to completely remove the nodes.
 But, i have no clue when i should restart.
 Ultimately, whats needed is some indication to when the decomission is 
 complete so that all references to the excluded nodes ( from excludes, slaves 
 ) and restart the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.

2010-07-08 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886570#action_12886570
 ] 

Devaraj Das commented on HDFS-1006:
---

+1

 getImage/putImage http requests should be https for the case of security 
 enabled.
 -

 Key: HDFS-1006
 URL: https://issues.apache.org/jira/browse/HDFS-1006
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, 
 HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, 
 HDFS-1006-Y20.patch


 should use https:// and port 50475

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1272) HDFS changes corresponding to rename of TokenStorage to Credentials

2010-07-08 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886576#action_12886576
 ] 

Jitendra Nath Pandey commented on HDFS-1272:


javadoc, findbugs, javac warnings were tested manually.

 HDFS changes corresponding to rename of TokenStorage to Credentials
 ---

 Key: HDFS-1272
 URL: https://issues.apache.org/jira/browse/HDFS-1272
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HDFS-1272.1.patch


 TokenStorage is renamed to Credentials as part of MAPREDUCE-1528 and 
 HADOOP-6845. This jira tracks hdfs changes corresponding to that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.

2010-07-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1006:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

I've committed this.  Resolving as fixed.

 getImage/putImage http requests should be https for the case of security 
 enabled.
 -

 Key: HDFS-1006
 URL: https://issues.apache.org/jira/browse/HDFS-1006
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, 
 HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, 
 HDFS-1006-Y20.patch


 should use https:// and port 50475

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections

2010-07-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1045:
--

Attachment: HDFS-1045-trunk.patch

Patch for trunk. Straight-forward port, except that as noted in HDFS-1006, the 
bugfix that had been done on 20 for that patch is done here, since there was a 
dependency on 1045.  

 In secure clusters, re-login is necessary for https clients before opening 
 connections
 --

 Key: HDFS-1045
 URL: https://issues.apache.org/jira/browse/HDFS-1045
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Jakob Homan
Assignee: Jakob Homan
 Attachments: HDFS-1045-trunk.patch, HDFS-1045-Y20.patch


 Ticket credentials expire and therefore clients opening https connections 
 (only the NN and SNN doing image/edits exchange) should re-login before 
 opening those connections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections

2010-07-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1045:
--

Status: Open  (was: Patch Available)

 In secure clusters, re-login is necessary for https clients before opening 
 connections
 --

 Key: HDFS-1045
 URL: https://issues.apache.org/jira/browse/HDFS-1045
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.22.0

 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, 
 HDFS-1045-Y20.patch


 Ticket credentials expire and therefore clients opening https connections 
 (only the NN and SNN doing image/edits exchange) should re-login before 
 opening those connections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections

2010-07-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1045:
--

   Status: Patch Available  (was: Open)
Affects Version/s: 0.22.0
Fix Version/s: 0.22.0

submitting patch.

 In secure clusters, re-login is necessary for https clients before opening 
 connections
 --

 Key: HDFS-1045
 URL: https://issues.apache.org/jira/browse/HDFS-1045
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.22.0

 Attachments: HDFS-1045-trunk.patch, HDFS-1045-Y20.patch


 Ticket credentials expire and therefore clients opening https connections 
 (only the NN and SNN doing image/edits exchange) should re-login before 
 opening those connections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections

2010-07-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1045:
--

Status: Patch Available  (was: Open)

re-submitting patch.

 In secure clusters, re-login is necessary for https clients before opening 
 connections
 --

 Key: HDFS-1045
 URL: https://issues.apache.org/jira/browse/HDFS-1045
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.22.0

 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, 
 HDFS-1045-Y20.patch


 Ticket credentials expire and therefore clients opening https connections 
 (only the NN and SNN doing image/edits exchange) should re-login before 
 opening those connections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections

2010-07-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1045:
--

Attachment: HDFS-1045-trunk-2.patch

Forgot to commit before making patch.  Updated file.

 In secure clusters, re-login is necessary for https clients before opening 
 connections
 --

 Key: HDFS-1045
 URL: https://issues.apache.org/jira/browse/HDFS-1045
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.22.0

 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, 
 HDFS-1045-Y20.patch


 Ticket credentials expire and therefore clients opening https connections 
 (only the NN and SNN doing image/edits exchange) should re-login before 
 opening those connections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections

2010-07-08 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886584#action_12886584
 ] 

Devaraj Das commented on HDFS-1045:
---

+1

 In secure clusters, re-login is necessary for https clients before opening 
 connections
 --

 Key: HDFS-1045
 URL: https://issues.apache.org/jira/browse/HDFS-1045
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.22.0

 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, 
 HDFS-1045-Y20.patch


 Ticket credentials expire and therefore clients opening https connections 
 (only the NN and SNN doing image/edits exchange) should re-login before 
 opening those connections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886595#action_12886595
 ] 

Hairong Kuang commented on HDFS-1094:
-

 I don't think I understand your use case. It only seem to be advantageous to 
 do what you say if there are multiple readers for the same file. We designed 
 it this way because it would be relatively easy to understand, implement, 
 generalize, and plan for (as users).

You still do not get my point. I think the goal of this policy is to reduce the 
data loss by limiting the # of nodes to place a file's data. Is this additional 
limitation of racks neccessary? Or are you saying this is just easy for you to 
implement. I do not see how this helps users understand or plan. In general, 
having less configuration parameters is easier for users to understand or plan.

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1289) Datanode secure mode is broken

2010-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886596#action_12886596
 ] 

Hadoop QA commented on HDFS-1289:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449028/h1289-01.patch
  against trunk revision 961966.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/211/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/211/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/211/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/211/console

This message is automatically generated.

 Datanode secure mode is broken
 --

 Key: HDFS-1289
 URL: https://issues.apache.org/jira/browse/HDFS-1289
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Kan Zhang
Assignee: Kan Zhang
 Attachments: h1289-01.patch


 HDFS-520 introduced a new DataNode constructor, which tries to set up an RPC 
 connection to the NN before a Kerberos login is done. This causes datanode to 
 fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886599#action_12886599
 ] 

Rodrigo Schmidt commented on HDFS-1094:
---

I think I get your point. I don't think you get mine, though. I just want to 
know what else you have in mind.

Let me put it this way: If you were to implement this, what would you do?

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886600#action_12886600
 ] 

Rodrigo Schmidt commented on HDFS-1094:
---

Just complementing my previous comment, I don't know of a better way to 
implement this that wouldn't be either overly complicated, or hard to configure 
and understand. I'm open to other ideas, but you have to give me some.

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1006) getImage/putImage http requests should be https for the case of security enabled.

2010-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886601#action_12886601
 ] 

Hudson commented on HDFS-1006:
--

Integrated in Hadoop-Hdfs-trunk-Commit #335 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/335/])
HDFS-1006. getImage/putImage http requests should be https for the case of 
security enabled.


 getImage/putImage http requests should be https for the case of security 
 enabled.
 -

 Key: HDFS-1006
 URL: https://issues.apache.org/jira/browse/HDFS-1006
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HDFS-1006-BP20.patch, hdfs-1006-bugfix-1.patch, 
 HDFS-1006-trunk-2.patch, HDFS-1006-trunk.patch, HDFS-1006-Y20.1.patch, 
 HDFS-1006-Y20.patch


 should use https:// and port 50475

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections

2010-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886605#action_12886605
 ] 

Hadoop QA commented on HDFS-1045:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12449050/HDFS-1045-trunk-2.patch
  against trunk revision 962380.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/427/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/427/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/427/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/427/console

This message is automatically generated.

 In secure clusters, re-login is necessary for https clients before opening 
 connections
 --

 Key: HDFS-1045
 URL: https://issues.apache.org/jira/browse/HDFS-1045
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.22.0

 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, 
 HDFS-1045-Y20.patch


 Ticket credentials expire and therefore clients opening https connections 
 (only the NN and SNN doing image/edits exchange) should re-login before 
 opening those connections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-08 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886613#action_12886613
 ] 

Hairong Kuang commented on HDFS-1094:
-

Hi Rodrigo, I do not know your algorithm so I have no idea how relaxing the 
rack restriction would complicate your implementation.:) 

Assume that a user wants to place a file's blocks to at most N datanodes and a 
cluster has R racks, if you place blocks to at most N/R datanodes per rack, is 
it a special case as your proposal? Of course, there are other algorithms too...

 Intelligent block placement policy to decrease probability of block loss
 

 Key: HDFS-1094
 URL: https://issues.apache.org/jira/browse/HDFS-1094
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Rodrigo Schmidt
 Attachments: prob.pdf, prob.pdf


 The current HDFS implementation specifies that the first replica is local and 
 the other two replicas are on any two random nodes on a random remote rack. 
 This means that if any three datanodes die together, then there is a 
 non-trivial probability of losing at least one block in the cluster. This 
 JIRA is to discuss if there is a better algorithm that can lower probability 
 of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1045) In secure clusters, re-login is necessary for https clients before opening connections

2010-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886615#action_12886615
 ] 

Hadoop QA commented on HDFS-1045:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12449050/HDFS-1045-trunk-2.patch
  against trunk revision 962380.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/212/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/212/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/212/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/212/console

This message is automatically generated.

 In secure clusters, re-login is necessary for https clients before opening 
 connections
 --

 Key: HDFS-1045
 URL: https://issues.apache.org/jira/browse/HDFS-1045
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.22.0

 Attachments: HDFS-1045-trunk-2.patch, HDFS-1045-trunk.patch, 
 HDFS-1045-Y20.patch


 Ticket credentials expire and therefore clients opening https connections 
 (only the NN and SNN doing image/edits exchange) should re-login before 
 opening those connections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.