[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062253#comment-13062253
 ] 

Hudson commented on HDFS-941:
-

Integrated in Hadoop-Hdfs-22-branch #70 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-22-branch/70/])


 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-07-01 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058240#comment-13058240
 ] 

Arun C Murthy commented on HDFS-941:


TestDFSIO depends on MR scheduling 

I've run into issues with this too - in the past changes to the 
JT/JIP/scheduler would cause DFS I/O performance to 'regress'!

We need to re-work TestDFSIO. One way would be to do 'scheduling' in the 
input-formatt of the test similar to what we did with TeraSort. Even better, 
stop using a MR job.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-20 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052160#comment-13052160
 ] 

Konstantin Shvachko commented on HDFS-941:
--

I ran some test myself over the weekend. The results are good. I am getting 
throughput around 75-78 MB/sec on reads with small ( 2) std.deviation in both 
cases.
So I am +1 now on this patch.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-20 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052166#comment-13052166
 ] 

Konstantin Shvachko commented on HDFS-941:
--

Answers to some issues raised here:

Stack RM says whats in a release and no one else.

We can still talk about technical merits of the implementation, don't we?

Todd nrFiles = nrNodes means full locality, right?

No. In DFSIO there is no locality, since files that DFSIO reads/writes are not 
the input of the MR job. Their names are. The reason here is to make sure the 
job completes in one wave of mappers, and to minimize contention on the drives 
between tasks.

I was trying to avoid making this issue yet another discussion about DFSIO, 
because
the objective here is to verify that the patch does not introduce regression in 
performance for sequential ios. If the benchmark I proposed doesn't work for 
you guys, you can propose a different one.

Dhruba, Todd, Nicholas TestDFSIO exhibits very high variance, and its results 
are dependent on mapreduce's scheduling.

DFSIO does not depend on the MR scheduling. It depends on the OS memory cache. 
Cluster nodes these days run with 16, 32 GB RAM. So a 10GB file almost entirely 
can be cached by OS. When you repeatedly run DFSIO then you are not measuring 
cold IO, but RAM access and communication. And high variation is explained by 
the fact that some data is cached and some is not.
For example DFSIO -write is usually very stable with std.dev  1. This is 
because it deals with cold writes.
For DFSIO -read you need to choose file size larger than your RAM. With 
sequential reads OS cache works as LRU, so if your file is larger than RAM, the 
OS cache will forget blocks from the head of the file, when you get to 
reading the tail. And when you start reading the file again cache will release 
oldest pages, which correspond to the higher offset in the file. So it is going 
to be cold read.
I had to go to 100GB files, which brought std.dev to  2, and variation in 
throughput was around 3%.
Alternatively you can clean Linux cache on all DataNodes.
 
Nicholas it is hard to explain what do the Throughput and Average IO rate 
really mean.

[This 
post|http://old.nabble.com/Re%3A-TestDFSIO-delivers-bad-values-of-%22throughput%22-and-%22average-IO-rate%22-p21322404.html]
 has the definitions.

Nicholas, I agree with you the results you are posting don't make sense. 
The point is though not to screw the benchmark, but to find the conditions when 
it reliably measures what you need.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050944#comment-13050944
 ] 

Konstantin Shvachko commented on HDFS-941:
--

150 MB/sec throughput can be if your data.dir is on a filer, which is your home 
directory or /tmp. This also explains ridiculous standard deviation, because it 
competed with with Nicholas running ant test in his home dir, which is on the 
same filer. Set data.dir to crawlspace3, you will start getting reasonable 
numbers.
What is the cluster size?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-17 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051007#comment-13051007
 ] 

Kihwal Lee commented on HDFS-941:
-

Filer was not used. Cluster has 5 DNs with a separate NN. 

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051048#comment-13051048
 ] 

Hudson commented on HDFS-941:
-

Integrated in Hadoop-Hdfs-trunk #699 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/699/])


 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-17 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051216#comment-13051216
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-941:
-

Hi Konstantin, don't you agree that the result number of Throughput and 
Average IO rate do not make much sense?  At least the definitions of these 
two numbers are not clear.  Recall that we got 1 GB/sec in the past.

{noformat}
10/08/07 00:19:55 INFO fs.TestDFSIO: - TestDFSIO - : read
10/08/07 00:19:55 INFO fs.TestDFSIO:Date  time: Sat Aug 07 
00:19:55 UTC 2010
10/08/07 00:19:55 INFO fs.TestDFSIO:Number of files: 2
10/08/07 00:19:55 INFO fs.TestDFSIO: Total MBytes processed: 2048
10/08/07 00:19:55 INFO fs.TestDFSIO:  Throughput mb/sec: 1096.3597430406853
10/08/07 00:19:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 1143.6881103515625
10/08/07 00:19:55 INFO fs.TestDFSIO:  IO rate std deviation: 232.655606509863
10/08/07 00:19:55 INFO fs.TestDFSIO: Test exec time sec: 28.354
{noformat}

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-16 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050609#comment-13050609
 ] 

Konstantin Shvachko commented on HDFS-941:
--

 I think it is reasonable to run tests against the latest patch and make sure 
 there is no regression in performance.

This is exactly what I asked. That is to run DFSIO on 5-node cluster with and 
without the *new* patch.
Here is the command I propose to run for 5 nodes (should have nrFile = 
nrNodes).
{code}
TestDFSIO -read -fileSize 10GB -nrFiles 5
{code}
You can run -write first to generate data.
I think this will be representative enough.

 it failed to reject the null hypothesis

Great analysis Todd, I am truly impressed. Does everything run on one node? Is 
there any inter-DN communication then? Also with 128 MB file everything is in 
RAM, not sure what it measures.

 Uncommitting now may do more harm than good.

If my concerns can be addressed without uncommitting then I can hold on to 
that. Please confirm somebody is doing it.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050621#comment-13050621
 ] 

Todd Lipcon commented on HDFS-941:
--

Konstantin:
Yes, everything runs on one node. It tests the localhost path, which is 
identical to what would be tested with your proposed benchmark (nrFiles = 
nrNodes means full locality, right?).

bq. Also with 128 MB file everything is in RAM, not sure what it measures

It measures the overhead of DFS rather than the cost of IO. Having it *not* be 
in RAM makes for a worse test since differences in CPU overhead are lost in the 
noise of the slow disks.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-16 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050622#comment-13050622
 ] 

Kihwal Lee commented on HDFS-941:
-

Also with 128 MB file everything is in RAM, not sure what it measures.

If cold reads are performed, the disk i/o will be the bottleneck and that can 
bury whatever overhead the patch might have introduced in the connection 
handling under noise. Since the patch didn't change the rest of the serving 
code, the ideal way of measuring its overhead will be having the dn to do 
something like null ops. It is effectively putting a magnifying glass on the 
area where the change has been made.  In a normal setup, the next best thing is 
probably what Todd did.

In any case, I will run DFSIO as you suggested.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050623#comment-13050623
 ] 

Todd Lipcon commented on HDFS-941:
--

Another thing to note is that TestDFSIO itself is a pretty flawed test. It 
exhibits very high variance, and its results are very much dependent on 
mapreduce's scheduling. For example, dropping the MR heartbeat interval from 3 
seconds to 0.3 seconds improved DFS IO performance by nearly 2x in some tests 
I ran a few months ago.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-16 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050726#comment-13050726
 ] 

dhruba borthakur commented on HDFS-941:
---

My experience with TestDFSIO has been that the variance of its results are 
higher (especially due to map-reduce software scheduling), and could never 
capture (at least, for me) small differences in performance of DFS.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-16 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050798#comment-13050798
 ] 

Kihwal Lee commented on HDFS-941:
-

Following is 4 consecutive samples taken out in the middle of a larger set. 
This is the test Konstantin suggested.  The std dev of I/O rate seems too high 
and so does the variation in run times.  This is probably not the best way to 
measure small performance differences, as others have pointed out.

{noformat} 

- TestDFSIO - : read
   Date  time: Thu Jun 16 20:29:41 UTC 2011
   Number of files: 5
Total MBytes processed: 51200.0
 Throughput mb/sec: 100.75824515346937
Average IO rate mb/sec: 136.13864135742188
 IO rate std deviation: 92.17360497645333
Test exec time sec: 179.953

- TestDFSIO - : read
   Date  time: Thu Jun 16 20:31:23 UTC 2011
   Number of files: 5
Total MBytes processed: 51200.0
 Throughput mb/sec: 150.92337396277026
Average IO rate mb/sec: 197.9733428955078
 IO rate std deviation: 106.59864139156599
Test exec time sec: 99.805

- TestDFSIO - : read
   Date  time: Thu Jun 16 20:33:20 UTC 2011
   Number of files: 5
Total MBytes processed: 51200.0
 Throughput mb/sec: 115.66831207852795
Average IO rate mb/sec: 145.11795043945312
 IO rate std deviation: 90.42587602009961
Test exec time sec: 115.77

- TestDFSIO - : read
   Date  time: Thu Jun 16 20:36:31 UTC 2011
   Number of files: 5
Total MBytes processed: 51200.0
 Throughput mb/sec: 91.04763462868748
Average IO rate mb/sec: 127.12406921386719
 IO rate std deviation: 97.86844611649816
Test exec time sec: 189.954
{noformat} 

I ran shorter (64KB) reads so that variances are smaller and the proportion of 
overhead is larger. For larger reads, the overhead will become less noticeable.

{noformat}
=== BEFORE ===

- TestDFSIO - : read
   Date  time: Thu Jun 16 23:00:03 UTC 2011
   Number of files: 5
Total MBytes processed: 4.7683716
 Throughput mb/sec: 24.328426438934947
Average IO rate mb/sec: 24.558759689331055
 IO rate std deviation: 2.474296728169802
Test exec time sec: 8.444

- TestDFSIO - : read
   Date  time: Thu Jun 16 23:00:13 UTC 2011
   Number of files: 5
Total MBytes processed: 4.7683716
 Throughput mb/sec: 23.374370500153187
Average IO rate mb/sec: 23.41034698486328
 IO rate std deviation: 0.9176091691810716
Test exec time sec: 8.41

- TestDFSIO - : read
   Date  time: Thu Jun 16 23:00:23 UTC 2011
   Number of files: 5
Total MBytes processed: 4.7683716
 Throughput mb/sec: 24.83526865641276
Average IO rate mb/sec: 24.873613357543945
 IO rate std deviation: 0.9842580011607321
Test exec time sec: 8.424

- TestDFSIO - : read
   Date  time: Thu Jun 16 23:00:33 UTC 2011
   Number of files: 5
Total MBytes processed: 4.7683716
 Throughput mb/sec: 24.57923495892397
Average IO rate mb/sec: 24.62860679626465
 IO rate std deviation: 1.1144092332035256
Test exec time sec: 8.41


=== AFTER ===

- TestDFSIO - : read
   Date  time: Thu Jun 16 23:07:34 UTC 2011
   Number of files: 5
Total MBytes processed: 4.7683716
 Throughput mb/sec: 23.961666241363066
Average IO rate mb/sec: 23.970088958740234
 IO rate std deviation: 0.4478642432612885
Test exec time sec: 8.378

- TestDFSIO - : read
   Date  time: Thu Jun 16 23:07:44 UTC 2011
   Number of files: 5
Total MBytes processed: 4.7683716
 Throughput mb/sec: 24.57923495892397
Average IO rate mb/sec: 24.58832550048828
 IO rate std deviation: 0.4712211529700926
Test exec time sec: 8.394

- TestDFSIO - : read
   Date  time: Thu Jun 16 23:07:53 UTC 2011
   Number of files: 5
Total MBytes processed: 4.7683716
 Throughput mb/sec: 22.92486337515024
Average IO rate mb/sec: 22.95939064025879
 IO rate std deviation: 0.8841870285378609
Test exec time sec: 8.388

- TestDFSIO - : read
   Date  time: Thu Jun 16 23:08:03 UTC 2011
   Number of files: 5
Total MBytes processed: 4.7683716
 Throughput mb/sec: 24.204931888483504
Average IO rate mb/sec: 24.234447479248047
 IO rate std deviation: 0.8576845331358649
Test exec time sec: 8.382

{noformat}

I didn't try to do any statistical analysis on it.  If somebody wishes to, I 
can provide a larger set of data.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 

[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-16 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050845#comment-13050845
 ] 

Konstantin Shvachko commented on HDFS-941:
--

Kihwal, thanks for doing this.
For the first set of results is it with or without the patch? Should there be 
BEFORE and AFTER sections, as in the second set?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049863#comment-13049863
 ] 

stack commented on HDFS-941:


I reran tests, same three failed.  I backed out my patch and the same three 
failed.  So, this patch does not seem to be responsible for these test failures 
on my machine.

I'm +1 on commit.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050181#comment-13050181
 ] 

Konstantin Shvachko commented on HDFS-941:
--

-1 on committing this without the proof of no-degradation to sequential ios.
Should have done it before, but thought my message was clear.
Let me know if you want me to uncommit before benchmarks are provided.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050197#comment-13050197
 ] 

stack commented on HDFS-941:


@Konstantin

Convention is that RM says whats in a release and no one else.  See his +1 
above.

bq. ...proof of no-degradation to sequential ios.

What would this test look like?  Perf tests done above showed only minor 
differences (...well within the standard deviation. as per Todd).

And if this test can only be committed pending perf evaluation, why single this 
patch out and not require it of all commits to hdfs?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050219#comment-13050219
 ] 

Kihwal Lee commented on HDFS-941:
-

Perhaps it's confusing because this Jira is seen as Random Vs. Sequential read. 
But in fact this jira is really about improving short reads and the solution is 
to reduce the overhead of connection making, which is present in both short and 
long reads. It is by no means favoring random or short reads. In fact, if the 
client does typical sequential reads multiple times from the same dn, this 
patch will help them too. The gain will be bigger if the files are smaller. 
Sure, there is one time overhead of cache lookup (size: 16), this can be 
ignored when the read size is sufficiently big. This cache management overhead 
should show up, in theory, for very small cold(connecton-wise) accesses. So far 
I have only seen gains. But there might be some special chronic cases that this 
patch actually make read slower. But again I don't belive they are typical use 
cases. Having said that, I think it is reasonable to run tests against the 
latest patch and make sure there is no regression in performance. Uncommitting 
now may do more harm than good. Let's see the numbers first and decide what to 
do. 

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050231#comment-13050231
 ] 

Todd Lipcon commented on HDFS-941:
--

Ran the following benchmark to compare 0.22 before vs after the application of 
HDFS-941:
- inserted a 128M file into HDFS
- read it 50 times using hadoop fs -cat /file  /dev/null and the unix time 
utility
- recompiled with the patch reverted, restarted NN/DN
- ran same test
- recompiled with the patch included, restarted NN/DN
- ran same test
- recompiled with patch reverted
- ran same test

This resulted in 100 samples for each setup, 50 from each run. The following is 
the output of a t-test for the important variables:


 t.test(d.22$wall, d.22.with.941$wall)

Welch Two Sample t-test

data:  d.22$wall and d.22.with.941$wall 
t = -0.4932, df = 174.594, p-value = 0.6225
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.011002972  0.006602972 
sample estimates:
mean of x mean of y 
   1.19371.1959 

 t.test(d.22$user, d.22.with.941$user)

Welch Two Sample t-test

data:  d.22$user and d.22.with.941$user 
t = -1.5212, df = 197.463, p-value = 0.1298
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.032378364  0.004178364 
sample estimates:
mean of x mean of y 
   1.33351.3476 

that is to say, it failed to reject the null hypothesis... in less stat-heavy 
terms, there's no statistical evidence that this patch makes the test any 
slower.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Fix For: 0.22.0

 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049278#comment-13049278
 ] 

Todd Lipcon commented on HDFS-941:
--

Looks good to me. How'd the test run go?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049456#comment-13049456
 ] 

stack commented on HDFS-941:


Bit odd.  These failed when I ran all tests:

{code}

[junit] Running org.apache.hadoop.hdfs.TestFileAppend4
[junit] Tests run: 2, Failures: 0, Errors: 2, Time elapsed: 60.251 sec

[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 104.115 sec
[junit] Test org.apache.hadoop.hdfs.TestLargeBlock FAILED


[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 60.022 sec
[junit] Test org.apache.hadoop.hdfs.D-1D-2TestWriteConfigurationToDFS 
FAILED

{code}

I reran all and only TestLargeBlock fails when I run tests singularly.  If I 
back out the patch, TestLargeBlock fails against clean 0.22 checkout.

Commit I'd say?


 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049457#comment-13049457
 ] 

stack commented on HDFS-941:


Or, hang on...(240 minutes) and let me rerun these tests and see if 
TestFileAppend4 and/or TestWriteConfigurationToDFS fail again.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-14 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049500#comment-13049500
 ] 

Konstantin Shvachko commented on HDFS-941:
--

Could anybody please run DFSIO to make sure there is no degradation in 
sequential ios.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049515#comment-13049515
 ] 

Todd Lipcon commented on HDFS-941:
--

Cos: do you have any reason to believe there would be? I believe in 
benchmarking, but unless there's some reasoning behind the idea, it can take a 
lot of time that's better spent on other places (eg optimizing sequential IO :) 
)

If I recall correctly, early versions of this patch were indeed benchmarked for 
sequential IO, where we saw no difference.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049526#comment-13049526
 ] 

Todd Lipcon commented on HDFS-941:
--

oops, sorry Konstantin - didn't mean to call you Cos. But my comment stands :)

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-14 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049537#comment-13049537
 ] 

Konstantin Shvachko commented on HDFS-941:
--

Yes in the previous 
[comment|https://issues.apache.org/jira/browse/HDFS-941?focusedCommentId=12862854page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12862854]
 there has been some degradation in throughput for sequential io. I just want 
to make sure there is no degradation for the primary use case with this patch.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048579#comment-13048579
 ] 

Kihwal Lee commented on HDFS-941:
-

HDFS-2071 was filed. 

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048688#comment-13048688
 ] 

Todd Lipcon commented on HDFS-941:
--

Hey Stack. I just looked over your patch for 0.22. The only thing I noticed is 
that it no longer calls verifiedByClient() -- this is a change that happened 
in trunk with HDFS-1655. Are we OK with removing this from 0.22?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048721#comment-13048721
 ] 

stack commented on HDFS-941:


I should put it back.  Give me a sec...



 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048837#comment-13048837
 ] 

Todd Lipcon commented on HDFS-941:
--

Hey Stack. I still don't think this is quite right -- it will now call 
verifiedByClient() if the client read the entire byterange, even if the 
byterange didn't cover the whole block. I think we need {{if 
(datanode.blockScanner != null  blockSender.isBlockReadFully())}}. Also, can 
you add back TestDataXceiver? I think that test case would catch this bug.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048875#comment-13048875
 ] 

Todd Lipcon commented on HDFS-941:
--

Yea, I think we should add back the blockReadFully variable (in addition to 
keeping the new sentEntireByteRange variable and its getter).

Looks like there's a new getFileBlocks() method which can be used after 
writeFile() to get the block location, and then keep that test around?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048979#comment-13048979
 ] 

stack commented on HDFS-941:


I put back TestDataXceiver.  It does this:

{code}
-ListLocatedBlock blkList = util.writeFile(TEST_FILE, FILE_SIZE_K);
+// Create file.
+util.writeFile(TEST_FILE, FILE_SIZE_K);
+// Now get its blocks.
+ListLocatedBlock blkList = util.getFileBlocks(TEST_FILE, FILE_SIZE_K);
{code}


rather than change the writeFile signature (writeFile is used in a few other 
places so the change would ripple).


I also added back BlockSender.isBlockReadFully so the tests before we call 
verifiedByClient are as they were before this patch application:

{code}
-if (DataTransferProtocol.Status.read(in) == CHECKSUM_OK) {
-  if (blockSender.isBlockReadFully()  datanode.blockScanner != null) 
{
-datanode.blockScanner.verifiedByClient(block);
+  if (blockSender.didSendEntireByteRange()) {
+// If we sent the entire range, then we should expect the client
+// to respond with a Status enum.
+try {
+  DataTransferProtocol.Status stat = 
DataTransferProtocol.Status.read(in);
+  if (stat == null) {
+LOG.warn(Client  + s.getInetAddress() + did not send a valid 
status  +
+ code after reading. Will close connection.);
+IOUtils.closeStream(out);
+  } else if (stat == CHECKSUM_OK) {
+if (blockSender.isBlockReadFully()  datanode.blockScanner != 
null) {
+  datanode.blockScanner.verifiedByClient(block);
+}
   }
{code}

I ran the bundled tests and they pass.  Am currently running all.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
 HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
 HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-10 Thread Nigel Daley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047039#comment-13047039
 ] 

Nigel Daley commented on HDFS-941:
--

+1 for 0.22.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047051#comment-13047051
 ] 

Todd Lipcon commented on HDFS-941:
--

Cool, I will review and check in Stack's backport tomorrow.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-10 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047763#comment-13047763
 ] 

Kihwal Lee commented on HDFS-941:
-

One thing I noticed is, Socket.isConnected() cannot be used for checking the 
connection status in this case. It returns false until the connection is made 
and then stays true after that. It will never return false after the initial 
connection is successfully made. Socket.isClosed() or SocketChannel.isOpen() 
should be used instead, assuming someone is handling SocketException and does 
Socket.close() or SocketChannel.close().  It seems the op handlers in 
DataXceiver are diligently using IOUtils.closeStream(), which will invoke 
SocketChannel.close().


{code}
- } while (s.isConnected()  socketKeepaliveTimeout  0);
+ } while (s.isConnected()  !s.isClosed()  socketKeepaliveTimeout  0);
{code}


Sorry for spotting this late. I just realized it while looking at HDFS-2054.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047784#comment-13047784
 ] 

Todd Lipcon commented on HDFS-941:
--

Hey Kihwal. Nice find. Mind filing a new JIRA for this? I think it should be a 
minor thing, since the next time around the loop, it will just the IOE trying 
to read the next operation anyway, right?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046375#comment-13046375
 ] 

Todd Lipcon commented on HDFS-941:
--

Stack seems to have turned up some kind of out of sync issue between client 
and server, where the client tries to do another request when the server is 
still expecting a status message. So, no commit tomorrow :(

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046374#comment-13046374
 ] 

stack commented on HDFS-941:


Dang.  Did more testing (w/ Todd's help).  I backported his patch to 0.22 so 
could run my loadings.  I see this every so often in dn logs 'Got error for 
OP_READ_BLOCK' (perhaps once every ten minutes per server).  The other side of 
the connection will print 'Client /10.4.9.34did not send a valid status code 
after reading. Will close connection' (I'll see this latter message much more 
frequently than the former but it seems fine -- we are just closing the 
connection and moving on w/ no repercussions client-side).

Here is more context.

In the datanode log (Look for 'Client /10.4.9.34did not...'): 

{code}
2011-06-08 23:39:45,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Received block blk_-1043418802690508828_7206 of size 16207176 from 
/10.4.9.34:57333
2011-06-08 23:39:45,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 2 for block blk_-1043418802690508828_7206 terminating   
   2011-06-08 23:39:45,960 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
blk_5716868613634466961_7207 src: /10.4.14.34:39560 dest: /10.4.9.34:10010
2011-06-08 23:39:46,301 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Received block blk_5716868613634466961_7207 of size 29893370 from 
/10.4.14.34:395602011-06-08 23:39:46,301 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block 
blk_5716868613634466961_7207 terminating
2011-06-08 23:39:46,326 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_-7242346463849737969_7208 src: /10.4.14.34:39564 dest: 
/10.4.9.34:100102011-06-08 23:39:46,434 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: Client /10.4.9.34did not send 
a valid status code after reading. Will close connection.
2011-06-08 23:39:46,435 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Client /10.4.9.34did not send a valid status code after reading. Will close 
connection.2011-06-08 23:39:46,435 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: Client /10.4.9.34did not send 
a valid status code after reading. Will close connection.
2011-06-08 23:39:46,435 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Client /10.4.9.34did not send a valid status code after reading. Will close 
connection.2011-06-08 23:39:47,837 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_-7242346463849737969_7208 of size 67108864 from /10.4.14.34:39564   
2011-06-08 23:39:47,837 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block 
blk_-7242346463849737969_7208 terminating
2011-06-08 23:39:47,855 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_7820819556875770048_7208 src: /10.4.14.34:39596 dest: 
/10.4.9.34:10010 2011-06-08 23:39:49,212 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_7820819556875770048_7208 of size 67108864 from /10.4.14.34:39596
2011-06-08 23:39:49,212 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 for block blk_7820819556875770048_7208 terminating
{code}

In the regionserver log (the client):


{code}
2011-06-08 23:39:45,777 INFO org.apache.hadoop.hbase.regionserver.Store: 
Completed compaction of 4 file(s) in values of 
usertable,user617882364,1307559813504.   
e4a9ed69f909762ddba8027cb6438575.; new storefile 
name=hdfs://sv4borg227:1/hbase/usertable/e4a9ed69f909762ddba8027cb6438575/values/6552772398789018757,
 size=143.5m; total size   for store is 488.4m
2011-06-08 23:39:45,777 INFO 
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: completed 
compaction: regionName=usertable,user617882364,1307559813504. 
e4a9ed69f909762ddba8027cb6438575., storeName=values, fileCount=4, 
fileSize=175.5m, priority=2, date=Wed Jun 08 23:39:41 PDT 2011; duration=3sec
2011-06-08 23:39:45,777 DEBUG 
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: 
CompactSplitThread Status: compaction_queue=(0:0), split_queue=0
2011-06-08 23:39:46,436 WARN org.apache.hadoop.hdfs.DFSClient: Failed to 
connect to /10.4.9.34:10010 for file 
/hbase/usertable/e4a9ed69f909762ddba8027cb6438575/values/  
5422279471660943029 for block blk_1325488162553537841_6905:java.io.IOException: 
Got error for OP_READ_BLOCK, self=/10.4.9.34:57345, remote=/10.4.9.34:10010, 
for file /hbase/
usertable/e4a9ed69f909762ddba8027cb6438575/values/5422279471660943029, for 
block 1325488162553537841_6905
at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:437)
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:727)
at 

[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046376#comment-13046376
 ] 

Kihwal Lee commented on HDFS-941:
-

My test is still running on trunk, but so far I only see did not send a valid 
status code after reading. Will close connection in special occasions. In my 
case it's during task init (random readers are map tasks in my test), number of 
messages exactly matching number of tasks on running on the DN. Afterwards I 
don't see them.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046384#comment-13046384
 ] 

stack commented on HDFS-941:


@Kihwal You are on TRUNK and not 0.22? (I wonder if my backport messed up 
something -- Todd doesn't thing so but...)

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046383#comment-13046383
 ] 

stack commented on HDFS-941:


@Kihwal Are you doing any writing at the same time? (I was).

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046387#comment-13046387
 ] 

Kihwal Lee commented on HDFS-941:
-

It's read-only and yes it's against TRUNK. I put 200 X 170MB files across 8 
DNs, dfs.replication=1. There are 200 random readers who are randomly reading 
from all 200 files. The locality was intentionally reduced to test the socket 
caching.  I will try a R/W test once this one is done.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046422#comment-13046422
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481896/hdfs-941.txt
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/748//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/748//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/748//console

This message is automatically generated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046526#comment-13046526
 ] 

Kihwal Lee commented on HDFS-941:
-

Good catch and fix!  I took a close look the open connections each reader has 
and sometimes saw more than one connections to a same DN. I will see if that is 
fixed with the Todd's fix. Otherwise I will look further to determine if it is 
an issue.

The test I did was primarily for exercising the socket cache itself.  To make 
it more interesting, the socket cache size was lowered to 3 and dfs.replication 
to 1.  I used the random read test (work in progress) in HDFS-236 on a cluster 
with 8 data nodes.  200 X 170MB files were created.  200 readers (25 on each 
DN) read 200 files randomly 64K at a time, jumping among files, for about 6 
hours last night. Each reader caches DFSInputStream to all 200 files during its 
lifetime. Checked the client/server logs afterward.

** I saw 25 of the did not send a valid status code after reading. Will close 
connection warning at around the task initialization (readers are map tasks) 
on each data node. They all look local, so they are likely accessing the job 
conf/jar files that are replicated and available on all eight data nodes, 
unlike regular data files. Or accessing local DN for some other reasons during 
this time period. Need to check whether this needs to be fixed. 
 
** While running, there were 3 ESTABLISHED connections per process and some 
number of sockets in TIME_WAIT all the time. It means socket cache is not 
leaking anything, clients are not denied of new connections and eviction is 
working.

** The only thing I think a bit odd is the symptom I mentioned above: Duplicate 
connections in the socket cache. I will try to reproduce with Todd's latest fix.


 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046583#comment-13046583
 ] 

Kihwal Lee commented on HDFS-941:
-

Regarding duplicate connections, it makes sense because the inputstream cache 
is per file and it is quite possible that the clients read blocks belonging to 
two files that are on the same DN within the window of 3 reads.  

I will look at the one happening during task initialization. May be they just 
stop reading in the middle of stream by design.  Since one message will show up 
for every new map task, how about changing the message to DEBUG after we are 
done with testing?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046643#comment-13046643
 ] 

Kihwal Lee commented on HDFS-941:
-

I am retesting with Todd's patch and I don't see the messages anymore. Instead, 
I see more of BlockSender.sendChunks() exception: java.io.IOException: Broken 
pipe from DNs. 

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046650#comment-13046650
 ] 

stack commented on HDFS-941:


@Kihwal I see lots of those sendChunks exceptions but don't think related.  
Testing latest addition to patch...

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046661#comment-13046661
 ] 

Kihwal Lee commented on HDFS-941:
-

OK, I see it's from BlockSender.java:407. It really shouldn't say ERROR since 
clients can close connections any time, but I agree that this needs to be 
addressed in a separate work.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046700#comment-13046700
 ] 

stack commented on HDFS-941:


+1 on commit for latest version of patch.

I've been running over the last few hours.  I no longer see Client 
/10.4.9.34did not send a valid status code after reading (fix the space on 
commit) nor do I see the Got error for OP_READ_BLOCK exceptions.   I have 
the BlockSender.sendChunks exceptions but they are something else (that we need 
to fix).

Nice test you have over there Kihwal!

My test was a 5 node cluster running hbase on a 451 patched 0.22.  The loading 
was random reads running in MR and then another random-read test being done via 
a bunch of clients.  Cache was disabled so went to FS for all data.  I also had 
random writing going on concurrently.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046706#comment-13046706
 ] 

Todd Lipcon commented on HDFS-941:
--

Regarding duplicate connections: also keep in mind that the caching only 
applies at the read side. So, assuming there's some output as well, there will 
be a socket for each of those streams.

I agree we should fix the sendChunks error messages separately. I think JD 
might have filed a JIRA about this a few weeks ago. I'll see if I can dig it up.

Kihwal: are you +1 on commit now as well?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046716#comment-13046716
 ] 

Kihwal Lee commented on HDFS-941:
-

They were pure readers and didn't write/report anything until the end.  I just 
filed HDFS-2054 for the error message. If you find the other JIRA that was 
already filed, please dupe one to the other.

+1 for commit.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046722#comment-13046722
 ] 

Todd Lipcon commented on HDFS-941:
--

Committed to trunk.

I'm 50/50 on whether this should go into the 0.22 branch as well. Like Stack 
said, it's a nice carrot to help convince HBase users to try out 0.22. But, 
it's purely an optimization and on the riskier side as far as these things go. 
I guess I'll ping Nigel?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046724#comment-13046724
 ] 

Todd Lipcon commented on HDFS-941:
--

Also, big thanks to: bc for authoring the majority of the patch and test cases, 
Sam Rash for reviews, and Stack and Kihwal for both code review and cluster 
testing. Great team effort spanning 4 companies!

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046738#comment-13046738
 ] 

stack commented on HDFS-941:


Todd, I'll buy you a beer to go 51/49 in favor of 0.22 commit.  If Nigel wants 
me to a make a case, I could do it here or in another issue?


 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, 
 HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, 
 HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046739#comment-13046739
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481962/941.22.txt
  against trunk revision 1134031.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/754//console

This message is automatically generated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, 
 HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, 
 HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046744#comment-13046744
 ] 

Eli Collins commented on HDFS-941:
--

Make that two beers (52/48?). I reviewed an earlier version of this patch but 
if Nigel is game I think it's suitable for 22 as well.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046756#comment-13046756
 ] 

stack commented on HDFS-941:


Yeah, my 0.22 version fails against trunk (trunk already has guava, etc.)

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046755#comment-13046755
 ] 

stack commented on HDFS-941:


So, that would leave 48 beers that I need to buy (And Nigel probably wants two) 
-- I can get a keg?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
 HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
 HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046148#comment-13046148
 ] 

stack commented on HDFS-941:


+1 on commit.  Have run this patch first in a light random read loading over 
night and then over this morning with a 'heavy' random read + write loading on 
5 node cluster.  Discernible perf improvement (caching involved so hard to say 
for sure but I see 20% improvement if just random reads).

@Kihwal Fair enough.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046151#comment-13046151
 ] 

stack commented on HDFS-941:


Oh, just to say that I don't seem hdfs-level complaints in server or client 
side and that I tested on patched 0.22 hadoop.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046155#comment-13046155
 ] 

stack commented on HDFS-941:


This patch should be applied to hadoop 0.22.  It'd be an incentive for hbase 
users to upgrade to hadoop 0.22.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046159#comment-13046159
 ] 

Todd Lipcon commented on HDFS-941:
--

Stack, thanks a million for the cluster testing and review!! I will get to your 
review feedback later this afternoon and post a final patch.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046162#comment-13046162
 ] 

stack commented on HDFS-941:


On occasion I see these new additions to the datanode log:

{code}
2011-06-08 12:37:20,478 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
Client did not send a valid status code after reading. Will close connection.
2011-06-08 12:37:20,480 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
Client did not send a valid status code after reading. Will close connection.
2011-06-08 12:37:20,482 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
Client did not send a valid status code after reading. Will close connection.
2011-06-08 12:37:20,483 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
Client did not send a valid status code after reading. Will close connection.
{code}

Should these be logged as DEBUG and not ERROR?

I see this too, don't think it related:

{code}
2011-06-08 12:40:09,642 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_-2049668997072761677_6556 src: /10.4.9.34:36343 dest: 
/10.4.9.34:10010
2011-06-08 12:40:09,661 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
BlockSender.sendChunks() exception: java.io.IOException: Connection reset by 
peer
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:204)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:392)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:481)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opReadBlock(DataXceiver.java:237)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opReadBlock(DataTransferProtocol.java:356)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:328)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:169)
at java.lang.Thread.run(Thread.java:662)
{code}

Odd is that this is machine talking to itself.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046175#comment-13046175
 ] 

Todd Lipcon commented on HDFS-941:
--

Hey Stack, are you sure you got the latest patch applied? the did not send a 
valid status code bit was changed to a WARN in the latest patch, and I also 
addressed a bug that would cause it to happen more often than it used to.

I agree that the warning in sendChunks is unrelated - I've seen that in trunk 
for a while before this patch.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046182#comment-13046182
 ] 

stack commented on HDFS-941:


OK. Looks like I was running the just-previous.  Let me redo loadings.  On the 
IOE sendChunks, this is in 0.22.  I should make an issue for it?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046183#comment-13046183
 ] 

stack commented on HDFS-941:


Or, hang on, let me indeed verify 0.22 has this minus the 941 patch.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046212#comment-13046212
 ] 

Kihwal Lee commented on HDFS-941:
-

I will try putting some load in a cluster with this patch + trunk.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046234#comment-13046234
 ] 

stack commented on HDFS-941:


New patch looks good (nice comment on why NODELAY).  Let me test it.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046311#comment-13046311
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481868/hdfs-941.txt
  against trunk revision 1133476.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/745//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/745//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/745//console

This message is automatically generated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-08 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046330#comment-13046330
 ] 

Todd Lipcon commented on HDFS-941:
--

I'd like to commit this tomorrow so long as Stack and Kihwal's testing works 
out. Woo! :)

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, 
 hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045432#comment-13045432
 ] 

Kihwal Lee commented on HDFS-941:
-

 You think 16 a good number for the socket cache (doesn't seem easily 
 chanageable)?
If the client's working set size of data nodes in past several seconds is 
bigger, it means lower locality. If a lot of clients are doing it, each data 
node is likely to see less data locality, making page cache less effective. 
This can make more reads cold and the gain from caching connections will start 
to diminish. Is 16 a good number? IMO, it may actually be too big for typical 
use cases, but is small enough to not cause trouble.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045033#comment-13045033
 ] 

Hadoop QA commented on HDFS-941:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481594/hdfs-941.txt
  against trunk revision 1132698.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/720//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/720//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/720//console

This message is automatically generated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045278#comment-13045278
 ] 

stack commented on HDFS-941:


I took a look at patch.  It looks good to me.  Minor comments below.  Meantime 
I've patched it into an hadoop 0.22 and am running a loading on it overnight to 
see if can find probs.

What is this about?

+dependency org=com.google.collections name=google-collections 
rev=${google-collections.version} conf=common-default/

When I go to the google-collections home page it says:

{code}
This library was renamed to Guava!
What you see here is ancient and unmaintained. Do not use it.
{code}

Nice doc. changes in BlockReader.

If you make another version of this patch, change the mentions of getEOS in 
comments to be 'eos' to match the change of variable name.

When you create a socket inside in getBlockReader, you've added this:

{code}
 469 +sock.setTcpNoDelay(true);
{code}

to the socket config before connect.  That is intentional?  (This is new with 
this patch. Also, old code used set timer after making connection -- which 
seems off... in your patch you set timeout then connect).

You think 16 a good number for the socket cache (doesn't seem easily 
chanageable)?

Nice cleanup of description in DataNode.java

One note is that this patch looks 'safe'; we default to closing the connection 
if anything untoward which should be just the behavior DN had before this patch.

TestParallelRead is sweet.






 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044698#comment-13044698
 ] 

Todd Lipcon commented on HDFS-941:
--

oops, the last line of my benchmark results got truncated. It should read:

*without patch*:
11/06/05 20:32:54 INFO hdfs.TestParallelRead: === Report: 4 threads read 
2619994 KB (across 1 file(s)) in 25.762s; average 101699.94565639313 KB/s
11/06/05 20:33:34 INFO hdfs.TestParallelRead: === Report: 16 threads read 
10470506 KB (across 1 file(s)) in 40.583s; average 258002.26695907154 KB/s
11/06/05 20:34:00 INFO hdfs.TestParallelRead: === Report: 8 threads read 
5232371 KB (across 2 file(s)) in 25.484s; average 205319.8477476063 KB/s


*with patch*:
11/06/05 20:35:45 INFO hdfs.TestParallelRead: === Report: 4 threads read 
2626843 KB (across 1 file(s)) in 10.208s; average 257331.7985893417 KB/s
11/06/05 20:36:13 INFO hdfs.TestParallelRead: === Report: 16 threads read 
10492178 KB (across 1 file(s)) in 27.046s; average 387938.25334615103 KB/s
11/06/05 20:36:25 INFO hdfs.TestParallelRead: === Report: 8 threads read 
5236253 KB (across 2 file(s)) in 12.447s; average 420683.93990519806 KB/s


 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044715#comment-13044715
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481534/hdfs-941.txt
  against trunk revision 1131331.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
  org.apache.hadoop.hdfs.TestDFSClientRetries

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/711//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/711//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/711//console

This message is automatically generated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-05-11 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032194#comment-13032194
 ] 

Jason Rutherglen commented on HDFS-941:
---

I'm seeing many errors trying to apply 
http://issues.apache.org/jira/secure/attachment/12476027/HDFS-941-6.patch to 
https://svn.apache.org/repos/asf/hadoop/hdfs/trunk

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-04-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023101#comment-13023101
 ] 

Todd Lipcon commented on HDFS-941:
--

Looks pretty good, and I looped TestFileConcurrentReader for half an hour or so 
with no failures.

A few small comments:
- google-collections is deprecated in favor of the new name guava - we 
should depend on the newest
- in TestParallelRead, you have a few cases of assert() where you should 
probably be using assertEquals() in case unit tests run without -ea. 
assertEquals() will also give a nicer error message
- in SocketCache.evict(), you are calling {{multimap.remove}} while iterating 
over the same map's entries. This seems likely to throw 
ConcurrentModificationException. Better to use {{multimap.iterator()}} and call 
{{it.remove()}}. This makes me notice that you only call {{evict()}} ever with 
an argument of 1, so maybe you should just rename to {{evictOne()}}
- If you have multiple DFSClient in a JVM with different socketTimeout 
settings, I think this will currently end up leaking timeouts between them. 
Perhaps after successfully getting a socket from socketCache, you need to call 
{{sock.setSoTimeout}} based on the current instance of {{dfsClient}}?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-04-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021837#comment-13021837
 ] 

Todd Lipcon commented on HDFS-941:
--

TestFileConcurrentReader has been failing intermittently a lot for a while - 
it's likely this isn't related to the patch. But worth a quick look at least to 
see if this patch changes the intermittent failure to a reproducible one.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-04-19 Thread sam rash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021862#comment-13021862
 ] 

sam rash commented on HDFS-941:
---

The last failure I saw with this test was basically unrelated to the test 
itself--it was a socket leak in the datanode, i think with RPCs. 

I glanced at the first test failure output and found a similar error:


2011-04-11 21:29:36,962 INFO  datanode.DataNode 
(DataXceiver.java:opWriteBlock(458)) - writeBlock blk_-6878114854540472276_1001 
received exception java.io.FileNotFoundException: 
/grid/0/hudson/hudson-slave/workspace/PreCommit-HDFS-Build/trunk/build/test/data/dfs/data/data1/current/rbw/blk_-6878114854540472276_1001.meta
 (Too many open files)


Note that this test implicitly finds any socket/fd leaks because it 
opens/closes files repeatedly.

If you can check into this, that'd be great.  I'll have some more time later 
this week to help more.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-04-11 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018635#comment-13018635
 ] 

bc Wong commented on HDFS-941:
--

I'll take a look at the TestFileConcurrentReader failure.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018471#comment-13018471
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12476021/HDFS-941-6.patch
  against trunk revision 1091131.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/340//console

This message is automatically generated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-04-11 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018472#comment-13018472
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-941:
-

Hi bc, seems that Jenkins (previously Hudson) sometimes does not pick up 
patches.  I just have [submitted this 
manually|https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/340/].

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-04-11 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018485#comment-13018485
 ] 

bc Wong commented on HDFS-941:
--

Thanks Nicholas! I generated the wrong patch format, unfortunately. Could you 
help me submit it to Jenkins again?

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-04-11 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018570#comment-13018570
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-941:
-

You are welcome.  Just have [started 
it|https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/343/].

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018589#comment-13018589
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12476027/HDFS-941-6.patch
  against trunk revision 1091131.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
  org.apache.hadoop.hdfs.TestFileConcurrentReader

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/343//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/343//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/343//console

This message is automatically generated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, 
 HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-03-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010814#comment-13010814
 ] 

Kihwal Lee commented on HDFS-941:
-

+1 The patch looks good. I was unsure about the new dependency on Guava, but 
apparently people have already agreed on adding it to hadoop-common, so I guess 
it's not an issue.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-03-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010879#comment-13010879
 ] 

stack commented on HDFS-941:


+1 on commit. Patch looks great though a bit hard to read because its mostly 
white-space changes.  I like the tests.  Im good w/ adding guava.

If a v6, here a few minor comment:

Javadoc on BlockReader is not properly formatted (will show as mess after 
html'ing) -- same for class comment on DN.

gotEOS is odd name for a boolean, would think eos better?

Hard-codings like this, +final int MAX_RETRIES = 3;, should be instead 
gotten from config. even if not declared in hdfs-default.xml?  Same for 
DN_KEEPALIVE_TIMEOUT.

Why would we retry a socket that is throwing an IOE?  Why not close and move on 
with new socket?

Is SocketCache missing a copyright notice?

Is this the right thing to do?

{code}
+SocketAddress remoteAddr = sock.getRemoteSocketAddress();
+if (remoteAddr == null) {
+  return;
+}
{code}

The socket is not cached because it does not have a remote address.  Why does 
it not have a remote address.  Is there something wrong w/ the socket?  Should 
we throw and exception or close and throw away the socket?

There is a tab at #1242 in patch:

{code}+ // restore normal timeout{code}






 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-03-21 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009167#comment-13009167
 ] 

Kihwal Lee commented on HDFS-941:
-

Nice work! I performed a basic test and got results comparable to the one from 
your previous patch. I will review the patch in depth soon. 

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12996836#comment-12996836
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443322/HDFS-941-4.patch
  against trunk revision 1072023.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/199//console

This message is automatically generated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-05-12 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866496#action_12866496
 ] 

Todd Lipcon commented on HDFS-941:
--

I ran some benchmarks again tonight using YCSB.

I loaded 1M rows into an HBase table (untimed) on my test cluster. The cluster 
is running a 5-node HDFS, but I only ran one HBase region server, so that I 
could reliably have the same region deployment between test runs. The data fits 
entirely within the buffer cache, so we're just benchmarking DFS overhead and 
not actual seek time.

I ran benchmarks with:
{code}
java -cp build/ycsb.jar:src/com/yahoo/ycsb/db/hbaselib/*:$HBASE_CONF_DIR 
com.yahoo.ycsb.Client  -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=$[1000*1000] -p 
operationcount=$[1000*1000]
{code}
from one of the nodes in the cluster (not the same one as ran the region server)

I ran the benchmark twice without the patch and twice with, alternating builds 
and restarting DFS and HBase each time, to make sure I wasn't getting any 
variability due to caching, etc.

Results follow:

== 941-bench-1.txt ==
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=100 -p 
operationcount=100
[OVERALL],RunTime(ms), 118197
[OVERALL],Throughput(ops/sec), 8460.451618907417
[READ], Operations, 100
[READ], AverageLatency(ms), 4.701651
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 1352
[READ], 95thPercentileLatency(ms), 11
[READ], 99thPercentileLatency(ms), 15

== 941-bench-2.txt ==
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=100 -p 
operationcount=100
[OVERALL],RunTime(ms), 124005
[OVERALL],Throughput(ops/sec), 8064.190960041934
[READ], Operations, 100
[READ], AverageLatency(ms), 4.940652
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 1337
[READ], 95thPercentileLatency(ms), 12
[READ], 99thPercentileLatency(ms), 16

== normal-bench-1.txt ==
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=100 -p 
operationcount=100
[OVERALL],RunTime(ms), 182316
[OVERALL],Throughput(ops/sec), 5484.982118958293
[READ], Operations, 100
[READ], AverageLatency(ms), 7.267306
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 1327
[READ], 95thPercentileLatency(ms), 17
[READ], 99thPercentileLatency(ms), 26

== normal-bench-2.txt ==
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=100 -p 
operationcount=100
[OVERALL],RunTime(ms), 190053
[OVERALL],Throughput(ops/sec), 5261.690160113231
[READ], Operations, 100
[READ], AverageLatency(ms), 7.577673
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 1525
[READ], 95thPercentileLatency(ms), 15
[READ], 99thPercentileLatency(ms), 21

In other words, this patch speeds up average latency by nearly 40%, with 
similar gains on the high percentile latencies. The reads/sec number improved 
by about 35%.

This is without any tuning of the keepalive or the socket cache size - I 
imagine even more improvement could be made with a bit more tuning, etc.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-05-12 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866519#action_12866519
 ] 

Todd Lipcon commented on HDFS-941:
--

I'd like to hold off on this just a bit longer yet - I'm seeing this 
sporadically in my testing:

Caused by: java.lang.IndexOutOfBoundsException
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:151)
at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1155)
at 
org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1441)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:1913)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2035)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46)

But the above benchmarks do show that the idea has a lot of promise! (and the 
above trace may in fact be an HBase bug)

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-05-12 Thread sam rash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1284#action_1284
 ] 

sam rash commented on HDFS-941:
---

todd: wow, those benchmarks do look impressive!  do we have any idea if 
standard sequential access gets any benefit?


bc: my point about the cache is you don't have to hard-code it as a static 
member of ReaderSocketCache.  I don't think it needs to be more generic--it can 
be a socket cache. I do think it can be decoupled from BlockReader by getting 
rid
of having owner.

why does a 'cache' create sockets?  you can avoid the whole owner problem if 
you simply let the client ask for a socket, and if there is none, create its 
own, use it, and put it in the cache when its done with it (ie, it's usable).  
This
should greatly reduce complexity (no need for free + used separately, owner, 
etc).  It seems like this is mixing up responsibilities of being a socket 
factory and a socket cache (possibly why it seems complex to me)

code
boolean reusable() {
  return ((owner == null || owner.hasConsumedAll()) 
  sock.isConnected() 
  !sock.isInputShutdown() 
  !sock.isOutputShutdown());
}
/code

will only check socket if you can make this change

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-05-12 Thread sam rash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866680#action_12866680
 ] 

sam rash commented on HDFS-941:
---

two other comments:

1. the number of sockets per address is limited, but not the number of 
addresses.  This may in practice not be a problem, but the cache can in theory 
grow very large
2. the usedmap seems like a good place for a memory/object leak:  I can take a 
socket and never return it (why again, I vote for getting rid of this data 
structure period--as far as a cache is concerned, an entry that someone else 
owns shouldn't even be there). otherwise, you've got to periodically clean this 
map up as well.  Seems like it's only used for stats which I think you can do 
w/o actually keeping a hash of used sockets.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-05-11 Thread sam rash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866456#action_12866456
 ] 

sam rash commented on HDFS-941:
---

+1 for the idea of caching sockets, but I have some questions/concerns about 
the implementation.
some comments:

1. avoid making the cache implementation tied to the class ReaderSocketCache. 
Don't make the cache a static member of the same class. Let the cache be an 
instantiable
object. Let DFSClient store the cache either as an instance or static var 
(don't force everything to use the same cache instance--better for testing and 
stubbing out as well)
2. a lot of the logic around re-using is complicated--I think this could be 
simplified
a. not clear why sockets are always in the cache even if not usable: i 
would think adding only when usable and removing when used would be cleaner?
b. if we can keep the cache clean, no need for lazy removal of unusable 
sockets
3. shouldn't there be a cap on the # of sockets there can be in the cache?
-again, should only be usable ones, but a max # put into the cache 
makes sense.  If we have a flurry of reads using tons of sockets to several 
DNs, no need to keep 100s or more sockets in a cache
4. general concern about potential socket leaks;  
5. seems like this needs more thought into the effects of synchronization:  the 
freemap has to be traversed every time to get a socket in a sync block.  see 
above if we can 
avoid lazy removal by not putting unusable sockets in the cache (unsuable 
either since they are in use or not usable at all)
6. do we have real performance benchmarks from actual clusters that show a 
significant benefit?  as noted above, the change is fairly complex (caching is 
in fact hard :)
and if we don't see a substantial performance improvement, the risk of bugs may 
outweigh the benefit

that's my 2c anyway

-sr

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-04-30 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862854#action_12862854
 ] 

bc Wong commented on HDFS-941:
--

The variance is large on the tests. But they show that the patch isn't slower 
than trunk. Tests executed on a 5 node cluster:

* TestDFSIO -read -fileSize 512 -bufferSize 4096 -nrFiles 10

||-||trunk||patched||
|Num trials|6|5|
|Throughput (MB/s)|92|93|
|Avg IO (MB/s)|150|134|
|Std dev|122|77|

* TestDFSIO -read -fileSize 512 -bufferSize 4096 -nrFiles 20

||-||trunk||patched||
|Num trials|5|5|
|Throughput (MB/s)|78|83|
|Avg IO (MB/s)|114|121|
|Std dev|75|76|

* Distributed {{bin/hadoop fs -cat /benchmarks/TestDFSIO/io_data/test_io_$i  
/dev/null}}, for i in [0,9]

||-||trunk||patched||
|Num trials|5|5|
|Avg time (sec)|47.8|48.0|
|Std dev|4.2|3.6|

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-04-08 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855205#action_12855205
 ] 

bc Wong commented on HDFS-941:
--

I replaced the size-of-one cache with a more generic cache, which is also a 
global shared cache. There is a new TestParallelRead, which test the concurrent 
use of a DFSInputStream with concurrent readers. There's a clear speed 
difference with vs without the patch. Each thread does 1024 # of reads.

Trunk:
{noformat}
Report: 4 threads read 236953 KB (across 1 file(s)) in 5.879s; average 
40304.98384078925 KB/s
Report: 4 threads read 238873 KB (across 1 file(s)) in 5.063s; average 
47180.13035749556 KB/s
Report: 4 threads read 236068 KB (across 1 file(s)) in 5.93s; average 
39809.10623946037 KB/s
Report: 16 threads read 942666 KB (across 1 file(s)) in 13.524s; average 
69703.19432120674 KB/s
Report: 16 threads read 947015 KB (across 1 file(s)) in 13.401s; average 
70667.48750093277 KB/s
Report: 16 threads read 948768 KB (across 1 file(s)) in 12.932s; average 
73365.91401175379 KB/s
Report: 8 threads read 469529 KB (across 2 file(s)) in 5.436s; average 
86373.98822663723 KB/s
Report: 8 threads read 455428 KB (across 2 file(s)) in 5.363s; average 
84920.38038411336 KB/s
Report: 8 threads read 469005 KB (across 2 file(s)) in 5.713s; average 
82094.34622790127 KB/s
{noformat}

Patched:
{noformat}
Report: 4 threads read 236845 KB (across 1 file(s)) in 3.612s; average 
65571.70542635658 KB/s
Report: 4 threads read 238803 KB (across 1 file(s)) in 4.371s; average 
54633.49347975291 KB/s
Report: 4 threads read 240241 KB (across 1 file(s)) in 4.395s; average 
54662.34357224119 KB/s
Report: 16 threads read 938652 KB (across 1 file(s)) in 9.044s; average 
103787.26227333037 KB/s
Report: 16 threads read 943999 KB (across 1 file(s)) in 8.59s; average 
109895.11059371362 KB/s
Report: 16 threads read 938546 KB (across 1 file(s)) in 9.081s; average 
103352.71445876005 KB/s
Report: 8 threads read 478534 KB (across 2 file(s)) in 3.376s; average 
141745.85308056872 KB/s
Report: 8 threads read 467412 KB (across 2 file(s)) in 3.623s; average 
129012.42064587357 KB/s
Report: 8 threads read 475349 KB (across 2 file(s)) in 3.49s; average 
136203.15186246418 KB/s
{noformat}

bq. The edits to the docs in DataNode.java are good - if possible they should 
probably move into HDFS-1001 though, no?
The addition to the docs doesn't apply to HDFS-1001, in which the DataXceiver 
still actively closes all sockets after each use.

Todd, the new patch addresses the rest of your comments.


 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-03-19 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847244#action_12847244
 ] 

bc Wong commented on HDFS-941:
--

Thanks for the review, Todd. I'll add more tests, and look into making a cache 
of size  1.

bq. I think there is a concurrency issue here. Namely, the positional read API 
calls through into fetchBlockByteRange, which will use the existing cached 
socket, regardless of other concurrent operations. So we may end up with 
multiple block readers on the same socket and everything will fall apart.

That should be fine. Each {{SocketCacheEntry}} has a unique {{Socket}}, owned 
by its {{BlockReader}}. One of the reuse condition is that the {{BlockReader}} 
has finished reading on that {{Socket}} ({{hasConsumedAll()}}). Note that we do 
not reuse {{BlockReader}}. So at this point, it should be safe to take the 
{{Socket}} away from its previous owner and give it to a new {{BlockReader}}.

I'll add tests for this though.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-03-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847117#action_12847117
 ] 

Todd Lipcon commented on HDFS-941:
--

Style notes:

- in BlockReader:
{code}
+  LOG.warn(Could not write to datanode  + sock.getInetAddress() +
+   :  + e.getMessage());
{code}
should be more specific - like Could not write read result status code and 
also indicate in the warning somehow that this is not a critical problem. 
Perhaps info level is better? (in my experience if people see WARN they think 
something is seriously wrong)

- please move the inner SocketCacheEntry class down lower in DFSInputStream
- in SocketCacheEntry.setOwner, can you use IOUtils.closeStream to close 
reader? Similarly in SocketCacheEntry.close
- We expect the following may happen reasonably often, right?
{code}
+// Our socket is no good.
+DFSClient.LOG.warn(Error making BlockReader. Closing stale  + 
entry.sock.toString());
{code}
I think this should probably be debug level.

- The edits to the docs in DataNode.java are good - if possible they should 
probably move into HDFS-1001 though, no?

- the do { ... } while () loop is a bit hard to follow in DataXceiver. Would it 
be possible to rearrange the code a bit to be more linear? (eg setting 
DN_KEEPALIVE_TIMEOUT right before the read at the beginning of the loop if 
workDone  0 would be easier to follow in my opinion)

- In DataXceiver:
{code}
+  } catch (IOException ioe) {
+LOG.error(Error reading client status response. Will close 
connection. Err:  + ioe);
{code}
Doesn't this yield error messages on every incomplete client read? Since the 
response is optional, this seems more like a DEBUG.

Bigger stuff:

- I think there is a concurrency issue here. Namely, the positional read API 
calls through into fetchBlockByteRange, which will use the existing cached 
socket, regardless of other concurrent operations. So we may end up with 
multiple block readers on the same socket and everything will fall apart.

Can you add a test case which tests concurrent use of a DFSInputStream? Maybe a 
few threads doing random positional reads while another thread does seeks and 
sequential reads?

- Regarding the cache size of one - I don't think this is quite true. For a use 
case like HBase, the region server is continually slamming the local datanode 
with random read requests from several client threads. Is the idea that such an 
application should be using multiple DFSInputStreams to read the same file and 
handle the multithreading itself?

- In DataXceiver, SocketException is caught and ignored while sending a block. 
(// Its ok for remote side to close the connection anytime. I think there are 
other SocketException types (eg timeout) that could throw here aside from a 
connection close, so in that case we need to IOUtils.closeStream(out) I 
believe. A test case for this could be to open a BlockReader, read some bytes, 
then stop reading so that the other side's BlockSender generates a timeout.


- Not sure about this removal in the finally clause of opWriteBlock:
{code}
-  IOUtils.closeStream(replyOut);
{code}
(a) We still need to close in the case of an downstream-generated exception. 
Otherwise we'll read the next data bytes from the writer as an operation and 
have undefined results.
(b) To keep this patch less dangerous, maybe we should not add the reuse 
feature for operations other than read? Read's the only operation where we 
expect a lot of very short requests coming in - not much benefit for writes, 
etc, plus they're more complicated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-03-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846499#action_12846499
 ] 

Hadoop QA commented on HDFS-941:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438934/HDFS-941-1.patch
  against trunk revision 923467.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/130/console

This message is automatically generated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >