[jira] Commented: (HDFS-1167) [Herriot] New property for local conf directory in system-test-hdfs.xml file.

2010-07-19 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889802#action_12889802
 ] 

Vinay Kumar Thota commented on HDFS-1167:
-

I could see 6 failures and they are unrelated to this patch. I don't think so 
the patch could raise these failures because the scope is just adds the new 
property in a xml file. 

> [Herriot] New property for local conf directory in system-test-hdfs.xml file.
> -
>
> Key: HDFS-1167
> URL: https://issues.apache.org/jira/browse/HDFS-1167
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: HDFS-1167.patch, HDFS-1167.patch
>
>
> Adding new property in system-test.xml file for local configuration directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-07-19 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-918:
---

Attachment: hbase-hdfs-benchmarks.ods

Benchmarked on EC2 this weekend, I set up 0.20.2-append clean, a copy with my 
multiplex patch applied, and a third copy which only ports filechannel pooling 
to the current architecture (can submit that patch later, it's at home).


All runs were with HBase block caching disabled to highlight the difference in 
filesystem access speeds.  

This is running across a decently small dataset (little less than 1GB) so all 
files are presumably in memory for the majority of test duration.

Run involved 6 clients reading 1,000,000 rows each divided over 10 mappers.  
Cluster setup was 3x EC2 High-CPU XL, 1 NN/JT/ZK/Master and 2x DN/TT/RS.  Ran 
in 3 batches of 3 runs each.  Cluster was restarted in between each batch for 
each run type because we're changing DN implementation.


Topline numbers (rest are in document):

Total Run Averages  

Testclean   poolmultiplex
random  21159050.44 19448216.89 16806247
scan436106.89   442452.54   443262.56
sequential  19298239.78 17871047.67 14987028.44

Pool is 7.5% gain, multiplex is more like 20% for random reads

Only batches 2+3 (batch 1 was a little messed up and doesn't track with others) 

Testclean   poolmultiplex
random  20555308.67 1842501716987643.33
scan426849  427277.98   448031
sequential  18665323.67 16969885.83 15102404

Pool is 10% gain, multiplex is 17% or so for random reads

Per row for random read (batches 2+3 only):
clean: 3.42ms
pool: 3.07ms
multiplex: 2.83ms


> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> 
>
> Key: HDFS-918
> URL: https://issues.apache.org/jira/browse/HDFS-918
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Jay Booth
> Fix For: 0.22.0
>
> Attachments: hbase-hdfs-benchmarks.ods, hdfs-918-20100201.patch, 
> hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-918-20100228.patch, 
> hdfs-918-20100309.patch, hdfs-918-branch20-append.patch, 
> hdfs-918-branch20.2.patch, hdfs-918-TRUNK.patch, hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-19 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889910#action_12889910
 ] 

Scott Carey commented on HDFS-1094:
---

This needs to change "p" from a constant, to a function of the TTR window.  

"probablility of a single node failing" alone is meaningless, its concurrent 
failure that is the issue.  The odds of concurrent node failure is linearly 
proportional to TTR.  I think this model needs to assume one failure at odds = 
1.0, then use the odds of concurrent failure for the next 2 failures within the 
time window.  A 'constant' chance of failure begs the question, ".001 chance of 
failure per _what_?"  The first failure happens, that is assumed.  Then the 
next two happen given odds within a time window.

Assuming hadoop failure replication is optimized (which it isn't, the DN dishes 
out block replication requests too slow).
TTR is inversely proportional to the number of racks in a group for rack 
failure.
TTR is inversely proportional to the number or racks in a group for single node 
failure _IF_  the combined bandwidth of the machines in the group in a rack is 
at least 2x than the between-rack bandwidth, otherwise it is inversely 
proportional to the ratio of rack bandwidth to node group bandwidth.

The result is that only the "medium" sized groups above are viable, else it 
takes too long to get data replicated when a failure happens.   Also, the TTR 
affects the odds of data loss on larger replication counts disproporionatly.

> Intelligent block placement policy to decrease probability of block loss
> 
>
> Key: HDFS-1094
> URL: https://issues.apache.org/jira/browse/HDFS-1094
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: Rodrigo Schmidt
> Attachments: calculate_probs.py, failure_rate.py, prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and 
> the other two replicas are on any two random nodes on a random remote rack. 
> This means that if any three datanodes die together, then there is a 
> non-trivial probability of losing at least one block in the cluster. This 
> JIRA is to discuss if there is a better algorithm that can lower probability 
> of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1306) TestFileAppend4 fails

2010-07-19 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889915#action_12889915
 ] 

Suresh Srinivas commented on HDFS-1306:
---

Here are test results from Hudson to track the failures:
 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/435/testReport/


> TestFileAppend4 fails
> -
>
> Key: HDFS-1306
> URL: https://issues.apache.org/jira/browse/HDFS-1306
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>
> Following tests are failing on trunk:
> TestFileAppend4.testRecoverFinalizedBlock 
> TestFileAppend4.testCompleteOtherLeaseHoldersFile 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1298) Add support in HDFS to update statistics that tracks number of file system operations in FileSystem

2010-07-19 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889916#action_12889916
 ] 

Suresh Srinivas commented on HDFS-1298:
---

Nicholas, between the two hudson tests, only one has these tests failing. Also 
I could not duplicate this on my local machine.

> Add support in HDFS to update statistics that tracks number of file system 
> operations in FileSystem
> ---
>
> Key: HDFS-1298
> URL: https://issues.apache.org/jira/browse/HDFS-1298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: HDFS-1298.patch, HDFS-1298.y20.patch
>
>
> See HADOOP-6859 for the new statistics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-19 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889976#action_12889976
 ] 

Rodrigo Schmidt commented on HDFS-1094:
---

Hi Scott,

I totally understand your concerns. BTW, thanks for the thorough analysis you 
gave. That was impressive!

I would say that the main problem is the complexity of the calculations. It was 
already hard to calculate things with this simplified model. Adding more 
variables would be hard. Besides, I think we are more interested in permanent 
failures -- those that cannot be recovered --, since we are trying to reduce 
the odds of permanently losing data (in an unrecoverable way).

I'm not saying we should not use TTR. I'm just saying that we didn't aim for 
that in our evaluation.

I guess the best thing we wanted to take from these numbers was the comparison 
between the different algorithms, and whether it was worth changing the block 
placement policy or not. 

I think this is a great discussion JIRA and I'm really happy with all the great 
ideas people have been giving. Having said that, it's clear to me that there is 
no definite solution to the problem. Depending on how you approach it, 
different policies will be the optimal ones. For instance, take DEFAULT, RING, 
and DISJOINT. They are all great and valid approaches, each one with its ups 
and downs. I believe the main result of this JIRA is that people will start 
creating different probability models and algorithms depending on their use 
cases, and we will see several block placement policies coming out of it.


> Intelligent block placement policy to decrease probability of block loss
> 
>
> Key: HDFS-1094
> URL: https://issues.apache.org/jira/browse/HDFS-1094
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: Rodrigo Schmidt
> Attachments: calculate_probs.py, failure_rate.py, prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and 
> the other two replicas are on any two random nodes on a random remote rack. 
> This means that if any three datanodes die together, then there is a 
> non-trivial probability of losing at least one block in the cluster. This 
> JIRA is to discuss if there is a better algorithm that can lower probability 
> of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

2010-07-19 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1289#action_1289
 ] 

Scott Carey commented on HDFS-1094:
---

bq. I think we are more interested in permanent failures - those that cannot be 
recovered

That simplifies things.  We can ignore rack failure, which is predominantly an 
availability problem not a data loss problem.

Then, that makes the TTR issue primarily about how many nodes in a group per 
rack.  So the short answer for keeping TTR from growing too large is that the 
number of nodes in a rack.

R = replication count.
Lets say that p_0 is our baseline probability that R nodes will simultaneously 
fail.  (R=3, p_0 = 0.001 in the calculations above)

Now lets find p, the adjusted probability taking TTR into account.

N = machines per rack in a group.
RB = rack bandwidth.
NB = node bandwidth.

if (NB * N) / RB >= (R-1), then p = p_0.
else, p = p_0 * ((RB * (R-1)) / (NB * N)) ^ (R-1).

Assuming p_0 = 0.001 and R = 3 this is more clearly presented as:

IF (NB * N) >= (RB * 2)
  p = 0.001 
ELSE 
  p = 0.001 * ((2* RB)/(NB * N)) ^2

If Rack Bandwidth is 10x node bandwidth, then this is:
IF N >= 20, then p = 0.001
ELSE p = 0.001 * (20/N) ^ 2

In the 100 rack, 2000 node example you have the below as a subsection:
{noformat}
RING GROUPS (window = 10 racks, 5 machines) => 0.000352741
DISJOINT GROUPS (window = 10 racks, 5 machines) => 0.000177175

RING GROUPS (window = 10 racks, 10 machines) => 0.00151145
DISJOINT GROUPS (window = 10 racks, 10 machines) => 0.000768873

RING GROUPS (window = 10 racks, 20 machines) => 0.00550483
DISJOINT GROUPS (window = 10 racks, 20 machines) => 0.00307498
{noformat}

I'm not sure if RING GROUPS are better at TTR issues than DISJOINT GROUPS, I 
haven't thought through that one.  But assuming its the same then adjusting for 
TTR gives these adjusted data loss odds for the above:

{noformat}
(TTR is 4x longer, so probability of loss is 16x)
RING GROUPS (window = 10 racks, 5 machines) => 0.005643856
DISJOINT GROUPS (window = 10 racks, 5 machines) => 0.0028348

(TTR is 2x longer so probability of loss is 4x)
RING GROUPS (window = 10 racks, 10 machines) => 0.0060458
DISJOINT GROUPS (window = 10 racks, 10 machines) => 0.003075492

RING GROUPS (window = 10 racks, 20 machines) => 0.00550483
DISJOINT GROUPS (window = 10 racks, 20 machines) => 0.00307498
{noformat}

Interestingly, this almost exactly compensates for the shrinking of group size 
once  TTR is limited to the intra-rack bandwidth in the group.

Note that p_0 itself increases with the cluster size.  The more nodes, the 
higher the likelihood of co-ocurrance of node failure.

My conclusion is that this is useful especially for large clusters with  large 
numbers of nodes per rack, or larger ratios of intra-rack bandwidth to 
inter-rack bandwidth.  For clusters on the other end of those spectrums, its 
hard to beat the current placement algorithm as long as replication of missing 
blocks is done at maximum pace. 
Of course in the real world, the Namenode does not issue block replication 
requests at the maximum pace.  I did some tests a few days ago with three 
scenarios (decommission, node error, missing node at startup) and found that 
the bottleneck in replication is how fast the NN schedules block replication, 
not the network or the data nodes.  It schedules block replication in batches 
that are too small to saturate the network or disks, and  does not schedule 
batches aggressively enough. So one way to increase data reliability in the 
cluster is to work on that and therefore reduce TTR.


> Intelligent block placement policy to decrease probability of block loss
> 
>
> Key: HDFS-1094
> URL: https://issues.apache.org/jira/browse/HDFS-1094
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: Rodrigo Schmidt
> Attachments: calculate_probs.py, failure_rate.py, prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and 
> the other two replicas are on any two random nodes on a random remote rack. 
> This means that if any three datanodes die together, then there is a 
> non-trivial probability of losing at least one block in the cluster. This 
> JIRA is to discuss if there is a better algorithm that can lower probability 
> of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1085) hftp read failing silently

2010-07-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890005#action_12890005
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1085:
--

All the failed tests are not related to this.  The related JIRA are listed 
below.
- org.apache.hadoop.hdfs.TestFiHFlush.hFlushFi01_a; see HDFS-1206
- org.apache.hadoop.hdfs.TestFileAppend4; see HDFS-1306
- org.apache.hadoop.hdfs.security.token.block.TestBlockToken.testBlockTokenRpc; 
see HDFS-1284
- org.apache.hadoop.hdfs.server.common.TestJspHelper.testGetUgi; see HDFS-1285


> hftp read  failing silently
> ---
>
> Key: HDFS-1085
> URL: https://issues.apache.org/jira/browse/HDFS-1085
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Reporter: Koji Noguchi
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.22.0
>
> Attachments: h1085_20100713.patch, h1085_20100716b_y0.20.1xx.patch, 
> h1085_20100716b_y0.20.1xx_with_test.patch, h1085_20100716c.patch, 
> h1085_20100716c_y0.20.1xx.patch, h1085_20100716d.patch, 
> h1085_20100716d_y0.20.1xx.patch, h1085_20100716d_y0.20.1xx_test.patch
>
>
> When performing a massive distcp through hftp, we saw many tasks fail with 
> {quote}
> 2010-04-06 17:56:43,005 INFO org.apache.hadoop.tools.DistCp: FAIL 
> 2010/0/part-00032 : java.io.IOException: File size not matched: copied 
> 193855488 bytes (184.9m) to tmpfile 
> (=hdfs://omehost.com:8020/somepath/part-00032)
> but expected 1710327403 bytes (1.6g) from 
> hftp://someotherhost/somepath/part-00032
> at 
> org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:435)
> at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:543)
> at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:310)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.Child.main(Child.java:159)
> {quote}
> This means that read itself didn't fail but the resulted file was somehow 
> smaller.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1298) Add support in HDFS to update statistics that tracks number of file system operations in FileSystem

2010-07-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890008#action_12890008
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1298:
--

They are probably not related to your patch but I think you might have 
overlooked them.  Just want to make sure.

> Add support in HDFS to update statistics that tracks number of file system 
> operations in FileSystem
> ---
>
> Key: HDFS-1298
> URL: https://issues.apache.org/jira/browse/HDFS-1298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: HDFS-1298.patch, HDFS-1298.y20.patch
>
>
> See HADOOP-6859 for the new statistics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1307) Add start time, end time and total time taken for FSCK to FSCK report

2010-07-19 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1307:
--

Attachment: HDFS-1307.2.patch

Looks like the tests depend on the last line of the output printing DFS status. 
TestFsck failed because the patch prints Fsck end time and total time as the 
last line. New patch restores the last line as expected.

> Add start time, end time and total time taken for FSCK to FSCK report
> -
>
> Key: HDFS-1307
> URL: https://issues.apache.org/jira/browse/HDFS-1307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node, tools
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: HDFS-1307.1.patch, HDFS-1307.2.patch, HDFS-1307.patch, 
> HDFS-1307.y20.patch
>
>
> FSCK is a long running operation and makes namenode very busy when it runs. 
> Adding information such as start time, end time and time taken helps in 
> determining when the FSCK was run and the impact of that on Namenode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1286) Dry entropy pool on Hudson boxes causing test timeouts

2010-07-19 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890014#action_12890014
 ] 

Konstantin Shvachko commented on HDFS-1286:
---

Interesting catch. We should ask somebody to look at Hudson machines.
On the other hand we should also avoid using random data in tests. Tests should 
be reproducible by definition. Using random bytes does not achieve this goal. 
So replacing random with sequential sounds like the right direction to me.

> Dry entropy pool on Hudson boxes causing test timeouts
> --
>
> Key: HDFS-1286
> URL: https://issues.apache.org/jira/browse/HDFS-1286
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: TestFileAppend4.testCompleteOtherLeaseHoldersFile.log, 
> TestFileAppend4.testRecoverFinalizedBlock.log
>
>
> Some test runs seem to fail with "already locked" errors, though it passes 
> locally. For example:
> http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/423/testReport/
> http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/421/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1307) Add start time, end time and total time taken for FSCK to FSCK report

2010-07-19 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1307:
--

Attachment: HDFS-1307.3.patch

New patch with minor modification...

> Add start time, end time and total time taken for FSCK to FSCK report
> -
>
> Key: HDFS-1307
> URL: https://issues.apache.org/jira/browse/HDFS-1307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node, tools
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: HDFS-1307.1.patch, HDFS-1307.2.patch, HDFS-1307.3.patch, 
> HDFS-1307.patch, HDFS-1307.y20.patch
>
>
> FSCK is a long running operation and makes namenode very busy when it runs. 
> Adding information such as start time, end time and time taken helps in 
> determining when the FSCK was run and the impact of that on Namenode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1307) Add start time, end time and total time taken for FSCK to FSCK report

2010-07-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890022#action_12890022
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1307:
--

+1 the new patch looks good.

> Add start time, end time and total time taken for FSCK to FSCK report
> -
>
> Key: HDFS-1307
> URL: https://issues.apache.org/jira/browse/HDFS-1307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node, tools
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: HDFS-1307.1.patch, HDFS-1307.2.patch, HDFS-1307.3.patch, 
> HDFS-1307.patch, HDFS-1307.y20.patch
>
>
> FSCK is a long running operation and makes namenode very busy when it runs. 
> Adding information such as start time, end time and time taken helps in 
> determining when the FSCK was run and the impact of that on Namenode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1303) StreamFile.doGet(..) uses an additional RPC to get file length

2010-07-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-1303.
--

  Assignee: Tsz Wo (Nicholas), SZE
Resolution: Duplicate

> StreamFile.doGet(..) uses an additional RPC to get file length
> --
>
> Key: HDFS-1303
> URL: https://issues.apache.org/jira/browse/HDFS-1303
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> {code}
> //StreamFile.doGet(..)
> long fileLen = dfs.getFileInfo(filename).getLen();
> FSInputStream in = dfs.open(filename);
> {code}
> In the codes above, t is unnecessary to call getFileInfo(..), which is an 
> additional RPC to namenode.  The file length can be obtained from  the input 
> stream after open(..).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1085) hftp read failing silently

2010-07-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1085:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Thanks Suresh for the reviewing.

I have committed this.

> hftp read  failing silently
> ---
>
> Key: HDFS-1085
> URL: https://issues.apache.org/jira/browse/HDFS-1085
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Reporter: Koji Noguchi
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.22.0
>
> Attachments: h1085_20100713.patch, h1085_20100716b_y0.20.1xx.patch, 
> h1085_20100716b_y0.20.1xx_with_test.patch, h1085_20100716c.patch, 
> h1085_20100716c_y0.20.1xx.patch, h1085_20100716d.patch, 
> h1085_20100716d_y0.20.1xx.patch, h1085_20100716d_y0.20.1xx_test.patch
>
>
> When performing a massive distcp through hftp, we saw many tasks fail with 
> {quote}
> 2010-04-06 17:56:43,005 INFO org.apache.hadoop.tools.DistCp: FAIL 
> 2010/0/part-00032 : java.io.IOException: File size not matched: copied 
> 193855488 bytes (184.9m) to tmpfile 
> (=hdfs://omehost.com:8020/somepath/part-00032)
> but expected 1710327403 bytes (1.6g) from 
> hftp://someotherhost/somepath/part-00032
> at 
> org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:435)
> at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:543)
> at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:310)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.Child.main(Child.java:159)
> {quote}
> This means that read itself didn't fail but the resulted file was somehow 
> smaller.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1081) Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems

2010-07-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1081:
--

Attachment: results.xlsx

Raw data of results.

> Performance regression in DistributedFileSystem::getFileBlockLocations in 
> secure systems
> 
>
> Key: HDFS-1081
> URL: https://issues.apache.org/jira/browse/HDFS-1081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: bm1081.scala, HADOOP-1081-Y20-1.patch, 
> HADOOP-1081-Y20-2.patch, HDFS-1081-trunk.patch, results.xlsx
>
>
> We've seen a significant decrease in the performance of 
> DistributedFileSystem::getFileBlockLocations() with security turned on Y20. 
> This JIRA is for correcting and tracking it both on Y20 and trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1081) Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems

2010-07-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1081:
--

Attachment: HDFS-1081-trunk.patch

Correct patch. Previous was not sync'ed with trunk.

> Performance regression in DistributedFileSystem::getFileBlockLocations in 
> secure systems
> 
>
> Key: HDFS-1081
> URL: https://issues.apache.org/jira/browse/HDFS-1081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: bm1081.scala, HADOOP-1081-Y20-1.patch, 
> HADOOP-1081-Y20-2.patch, HDFS-1081-trunk.patch, HDFS-1081-trunk.patch, 
> results.xlsx
>
>
> We've seen a significant decrease in the performance of 
> DistributedFileSystem::getFileBlockLocations() with security turned on Y20. 
> This JIRA is for correcting and tracking it both on Y20 and trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1081) Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems

2010-07-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1081:
--

Attachment: HDFS-1081-trunk.patch

Patch for trunk. Basically same as the 20S patch but with modifications for new 
BlockManager.  We've been running the 20 patch in production and it's good.

This optimization was benchmarked on a 5 DN cluster using the (to-be-attached) 
script to measure performance time with and without patch on trunk.  
Results:
Round trip times for getBlockLocations call (in milliseconds) for files with 
specified number of blocks across 100 calls for each # of blocks.

*without patch*
|| ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 
blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 
blocks||1000 blocks||
|mean|2.20|2.07|2.06|2.05|2.05|2.01|2.01|2.01|2.03|2.07|42.23|4.05|7.02|16.08|30.47|50.57|
|median|2.00|2.00|2.00|2.00|2.00|2.00|2.00|2.00|2.00|2.00|42.00|4.00|7.00|16.00|28.00|49.50|
|std 
dev|1.54|0.38|0.28|0.36|0.26|0.10|0.10|0.10|0.17|0.70|1.24|0.33|0.32|1.04|27.00|5.04|

*With patch*
|| ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 
blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 
blocks||1000 blocks||
|mean|1.15|1.01|1.02|1.09|1.00|1.01|1.00|1.01|1.00|1.01|40.76|2.00|3.97|11.61|25.61|88.02|
|median|1.00|1.00|1.00|1.00|1.00|1.00|1.00|1.00|1.00|1.00|41.00|2.00|4.00|10.00|24.00|71.00|
|std 
dev|1.22|0.10|0.14|0.90|0.00|0.10|0.00|0.10|0.00|0.10|0.67|0.00|1.33|8.16|6.38|115.07|

*raw difference: how much less time it took with the patch (negative numbers 
are better)*
|| ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 
blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 
blocks||1000 blocks||
|mean|-1.05|-1.06|-1.04|-0.96|-1.05|-1.00|-1.01|-1.00|-1.03|-1.06|-1.47|-2.05|-3.05|-4.47|-4.86|37.45|
|median|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-2.00|-3.00|-6.00|-4.00|21.50|
|std 
dev|-0.33|-0.28|-0.14|0.54|-0.26|0.00|-0.10|0.00|-0.17|-0.60|-0.57|-0.33|1.01|7.12|-20.62|110.03|

*% difference: Amount of time the patched call took compared to the unpatched 
time*
|| ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 
blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 
blocks||1000 blocks||
|mean|0.52|0.49|0.50|0.53|0.49|0.50|0.50|0.50|0.49|0.49|0.97|0.49|0.57|0.72|0.84|1.74|
|median|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.98|0.50|0.57|0.63|0.86|1.43|
|std 
dev|0.79|0.26|0.51|2.51|0.00|1.00|0.00|1.00|0.00|0.14|0.54|0.00|4.19|7.83|0.24|22.84|

For files with 1-100 blocks we cut the time in half.  

At 20 blocks I see a big spike in the amount of time to do the processing, but 
this is in both the patched and unpatched versions.  I'm not sure what's 
causing this; it warrants looking into.  

This patch saves a lot of time on the NN CPU by only doing the big calculation 
once, but currently could do better at network usage.  This starts to show up 
with 250+ blocks, where we're sending a bigger and bigger amount of data and 
this overwhelms (eventually) the CPU savings.  250+ blocks for a single file in 
HDFS is exceedingly rare, and can also be improved, and I'll open another JIRA 
to optimize this.

I think the data support this particular optimization.  Patch is ready for 
review.

> Performance regression in DistributedFileSystem::getFileBlockLocations in 
> secure systems
> 
>
> Key: HDFS-1081
> URL: https://issues.apache.org/jira/browse/HDFS-1081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: HADOOP-1081-Y20-1.patch, HADOOP-1081-Y20-2.patch, 
> HDFS-1081-trunk.patch
>
>
> We've seen a significant decrease in the performance of 
> DistributedFileSystem::getFileBlockLocations() with security turned on Y20. 
> This JIRA is for correcting and tracking it both on Y20 and trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1081) Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems

2010-07-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1081:
--

Attachment: bm1081.scala

Script to benchmark this optimization.

> Performance regression in DistributedFileSystem::getFileBlockLocations in 
> secure systems
> 
>
> Key: HDFS-1081
> URL: https://issues.apache.org/jira/browse/HDFS-1081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: bm1081.scala, HADOOP-1081-Y20-1.patch, 
> HADOOP-1081-Y20-2.patch, HDFS-1081-trunk.patch, results.xlsx
>
>
> We've seen a significant decrease in the performance of 
> DistributedFileSystem::getFileBlockLocations() with security turned on Y20. 
> This JIRA is for correcting and tracking it both on Y20 and trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1081) Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems

2010-07-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1081:
--

   Status: Patch Available  (was: Open)
Affects Version/s: 0.22.0
Fix Version/s: 0.22.0

Submitting patch.

> Performance regression in DistributedFileSystem::getFileBlockLocations in 
> secure systems
> 
>
> Key: HDFS-1081
> URL: https://issues.apache.org/jira/browse/HDFS-1081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: bm1081.scala, HADOOP-1081-Y20-1.patch, 
> HADOOP-1081-Y20-2.patch, HDFS-1081-trunk.patch, HDFS-1081-trunk.patch, 
> results.xlsx
>
>
> We've seen a significant decrease in the performance of 
> DistributedFileSystem::getFileBlockLocations() with security turned on Y20. 
> This JIRA is for correcting and tracking it both on Y20 and trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1307) Add start time, end time and total time taken for FSCK to FSCK report

2010-07-19 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1307:
--

Status: Patch Available  (was: Open)

> Add start time, end time and total time taken for FSCK to FSCK report
> -
>
> Key: HDFS-1307
> URL: https://issues.apache.org/jira/browse/HDFS-1307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node, tools
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: HDFS-1307.1.patch, HDFS-1307.2.patch, HDFS-1307.3.patch, 
> HDFS-1307.patch, HDFS-1307.y20.patch
>
>
> FSCK is a long running operation and makes namenode very busy when it runs. 
> Adding information such as start time, end time and time taken helps in 
> determining when the FSCK was run and the impact of that on Namenode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1307) Add start time, end time and total time taken for FSCK to FSCK report

2010-07-19 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1307:
--

Status: Open  (was: Patch Available)

> Add start time, end time and total time taken for FSCK to FSCK report
> -
>
> Key: HDFS-1307
> URL: https://issues.apache.org/jira/browse/HDFS-1307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node, tools
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: HDFS-1307.1.patch, HDFS-1307.2.patch, HDFS-1307.3.patch, 
> HDFS-1307.patch, HDFS-1307.y20.patch
>
>
> FSCK is a long running operation and makes namenode very busy when it runs. 
> Adding information such as start time, end time and time taken helps in 
> determining when the FSCK was run and the impact of that on Namenode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1308) job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem (part of MR-1718)

2010-07-19 Thread Boris Shkolnik (JIRA)
 job conf key for the services name of DelegationToken for HFTP url is 
constructed incorrectly in HFTPFileSystem (part of MR-1718)
--

 Key: HDFS-1308
 URL: https://issues.apache.org/jira/browse/HDFS-1308
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik


change HFTP init code that checks for existing delegation tokens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1308) job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem (part of MR-1718)

2010-07-19 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1308:
-

Attachment: HDFS-1308.patch

>  job conf key for the services name of DelegationToken for HFTP url is 
> constructed incorrectly in HFTPFileSystem (part of MR-1718)
> --
>
> Key: HDFS-1308
> URL: https://issues.apache.org/jira/browse/HDFS-1308
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: HDFS-1308.patch
>
>
> change HFTP init code that checks for existing delegation tokens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1308) job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem (part of MR-1718)

2010-07-19 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1308:
-

Status: Patch Available  (was: Open)

>  job conf key for the services name of DelegationToken for HFTP url is 
> constructed incorrectly in HFTPFileSystem (part of MR-1718)
> --
>
> Key: HDFS-1308
> URL: https://issues.apache.org/jira/browse/HDFS-1308
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: HDFS-1308.patch
>
>
> change HFTP init code that checks for existing delegation tokens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1081) Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems

2010-07-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890084#action_12890084
 ] 

Owen O'Malley commented on HDFS-1081:
-

+1

> Performance regression in DistributedFileSystem::getFileBlockLocations in 
> secure systems
> 
>
> Key: HDFS-1081
> URL: https://issues.apache.org/jira/browse/HDFS-1081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: bm1081.scala, HADOOP-1081-Y20-1.patch, 
> HADOOP-1081-Y20-2.patch, HDFS-1081-trunk.patch, HDFS-1081-trunk.patch, 
> results.xlsx
>
>
> We've seen a significant decrease in the performance of 
> DistributedFileSystem::getFileBlockLocations() with security turned on Y20. 
> This JIRA is for correcting and tracking it both on Y20 and trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1201) Support for using different Kerberos keys for Namenode and datanode.

2010-07-19 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das resolved HDFS-1201.
---

 Assignee: Kan Zhang  (was: Jitendra Nath Pandey)
Fix Version/s: 0.22.0
   Resolution: Fixed

I just committed this. Thanks, Kan & Jitendra!

>  Support for using different Kerberos keys for Namenode and datanode.
> -
>
> Key: HDFS-1201
> URL: https://issues.apache.org/jira/browse/HDFS-1201
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jitendra Nath Pandey
>Assignee: Kan Zhang
> Fix For: 0.22.0
>
> Attachments: h6632-06.patch
>
>
> This jira covers the hdfs changes to support different Kerberos keys for 
> Namenode and datanode. This corresponds to changes in HADOOP-6632

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1081) Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems

2010-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890122#action_12890122
 ] 

Hadoop QA commented on HDFS-1081:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449889/HDFS-1081-trunk.patch
  against trunk revision 965621.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/console

This message is automatically generated.

> Performance regression in DistributedFileSystem::getFileBlockLocations in 
> secure systems
> 
>
> Key: HDFS-1081
> URL: https://issues.apache.org/jira/browse/HDFS-1081
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: bm1081.scala, HADOOP-1081-Y20-1.patch, 
> HADOOP-1081-Y20-2.patch, HDFS-1081-trunk.patch, HDFS-1081-trunk.patch, 
> results.xlsx
>
>
> We've seen a significant decrease in the performance of 
> DistributedFileSystem::getFileBlockLocations() with security turned on Y20. 
> This JIRA is for correcting and tracking it both on Y20 and trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1307) Add start time, end time and total time taken for FSCK to FSCK report

2010-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890124#action_12890124
 ] 

Hadoop QA commented on HDFS-1307:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449874/HDFS-1307.3.patch
  against trunk revision 965621.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/218/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/218/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/218/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/218/console

This message is automatically generated.

> Add start time, end time and total time taken for FSCK to FSCK report
> -
>
> Key: HDFS-1307
> URL: https://issues.apache.org/jira/browse/HDFS-1307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node, tools
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: HDFS-1307.1.patch, HDFS-1307.2.patch, HDFS-1307.3.patch, 
> HDFS-1307.patch, HDFS-1307.y20.patch
>
>
> FSCK is a long running operation and makes namenode very busy when it runs. 
> Adding information such as start time, end time and time taken helps in 
> determining when the FSCK was run and the impact of that on Namenode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1053) A client side mount table to give per-application/per-job file system view

2010-07-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-1053:
---

Attachment: (was: ViewFs (Hadoop-common 0.22.0-SNAPSHOT API).pdf)

> A client side mount table to give per-application/per-job file system view
> --
>
> Key: HDFS-1053
> URL: https://issues.apache.org/jira/browse/HDFS-1053
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.22.0
>Reporter: Sanjay Radia
>Assignee: Sanjay Radia
> Fix For: 0.22.0
>
> Attachments: viewfs1.patch
>
>
> This jira proposes a client side mount table to allow application-centric (or 
> job-centric) filesystem views. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1053) A client side mount table to give per-application/per-job file system view

2010-07-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-1053:
---

Attachment: ViewFsJavaDoc.pdf

> A client side mount table to give per-application/per-job file system view
> --
>
> Key: HDFS-1053
> URL: https://issues.apache.org/jira/browse/HDFS-1053
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.22.0
>Reporter: Sanjay Radia
>Assignee: Sanjay Radia
> Fix For: 0.22.0
>
> Attachments: viewfs1.patch, ViewFsJavaDoc.pdf
>
>
> This jira proposes a client side mount table to allow application-centric (or 
> job-centric) filesystem views. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1053) A client side mount table to give per-application/per-job file system view

2010-07-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-1053:
---

Attachment: (was: ViewFsJavaDoc.pdf)

> A client side mount table to give per-application/per-job file system view
> --
>
> Key: HDFS-1053
> URL: https://issues.apache.org/jira/browse/HDFS-1053
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.22.0
>Reporter: Sanjay Radia
>Assignee: Sanjay Radia
> Fix For: 0.22.0
>
> Attachments: ViewFs (Hadoop-common 0.22.0-SNAPSHOT API).pdf, 
> viewfs1.patch
>
>
> This jira proposes a client side mount table to allow application-centric (or 
> job-centric) filesystem views. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1053) A client side mount table to give per-application/per-job file system view

2010-07-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-1053:
---

Attachment: ViewFs (Hadoop-common 0.22.0-SNAPSHOT API).pdf

> A client side mount table to give per-application/per-job file system view
> --
>
> Key: HDFS-1053
> URL: https://issues.apache.org/jira/browse/HDFS-1053
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.22.0
>Reporter: Sanjay Radia
>Assignee: Sanjay Radia
> Fix For: 0.22.0
>
> Attachments: ViewFs (Hadoop-common 0.22.0-SNAPSHOT API).pdf, 
> viewfs1.patch
>
>
> This jira proposes a client side mount table to allow application-centric (or 
> job-centric) filesystem views. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1085) hftp read failing silently

2010-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890130#action_12890130
 ] 

Hudson commented on HDFS-1085:
--

Integrated in Hadoop-Hdfs-trunk-Commit #346 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/346/])


> hftp read  failing silently
> ---
>
> Key: HDFS-1085
> URL: https://issues.apache.org/jira/browse/HDFS-1085
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Reporter: Koji Noguchi
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.22.0
>
> Attachments: h1085_20100713.patch, h1085_20100716b_y0.20.1xx.patch, 
> h1085_20100716b_y0.20.1xx_with_test.patch, h1085_20100716c.patch, 
> h1085_20100716c_y0.20.1xx.patch, h1085_20100716d.patch, 
> h1085_20100716d_y0.20.1xx.patch, h1085_20100716d_y0.20.1xx_test.patch
>
>
> When performing a massive distcp through hftp, we saw many tasks fail with 
> {quote}
> 2010-04-06 17:56:43,005 INFO org.apache.hadoop.tools.DistCp: FAIL 
> 2010/0/part-00032 : java.io.IOException: File size not matched: copied 
> 193855488 bytes (184.9m) to tmpfile 
> (=hdfs://omehost.com:8020/somepath/part-00032)
> but expected 1710327403 bytes (1.6g) from 
> hftp://someotherhost/somepath/part-00032
> at 
> org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:435)
> at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:543)
> at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:310)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.Child.main(Child.java:159)
> {quote}
> This means that read itself didn't fail but the resulted file was somehow 
> smaller.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1201) Support for using different Kerberos keys for Namenode and datanode.

2010-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890131#action_12890131
 ] 

Hudson commented on HDFS-1201:
--

Integrated in Hadoop-Hdfs-trunk-Commit #346 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/346/])
HDFS-1201. The HDFS component for HADOOP-6632. Contributed by Kan Zhang & 
Jitendra Pandey.


>  Support for using different Kerberos keys for Namenode and datanode.
> -
>
> Key: HDFS-1201
> URL: https://issues.apache.org/jira/browse/HDFS-1201
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jitendra Nath Pandey
>Assignee: Kan Zhang
> Fix For: 0.22.0
>
> Attachments: h6632-06.patch
>
>
> This jira covers the hdfs changes to support different Kerberos keys for 
> Namenode and datanode. This corresponds to changes in HADOOP-6632

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1308) job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem (part of MR-1718)

2010-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890179#action_12890179
 ] 

Hadoop QA commented on HDFS-1308:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449898/HDFS-1308.patch
  against trunk revision 965697.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/440/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/440/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/440/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/440/console

This message is automatically generated.

>  job conf key for the services name of DelegationToken for HFTP url is 
> constructed incorrectly in HFTPFileSystem (part of MR-1718)
> --
>
> Key: HDFS-1308
> URL: https://issues.apache.org/jira/browse/HDFS-1308
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: HDFS-1308.patch
>
>
> change HFTP init code that checks for existing delegation tokens

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.