[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062253#comment-13062253 ] Hudson commented on HDFS-941: - Integrated in Hadoop-Hdfs-22-branch #70 (See [https://builds.apache.org/job/Hadoop-Hdfs-22-branch/70/]) Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058240#comment-13058240 ] Arun C Murthy commented on HDFS-941: TestDFSIO depends on MR scheduling I've run into issues with this too - in the past changes to the JT/JIP/scheduler would cause DFS I/O performance to 'regress'! We need to re-work TestDFSIO. One way would be to do 'scheduling' in the input-formatt of the test similar to what we did with TeraSort. Even better, stop using a MR job. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052160#comment-13052160 ] Konstantin Shvachko commented on HDFS-941: -- I ran some test myself over the weekend. The results are good. I am getting throughput around 75-78 MB/sec on reads with small ( 2) std.deviation in both cases. So I am +1 now on this patch. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052166#comment-13052166 ] Konstantin Shvachko commented on HDFS-941: -- Answers to some issues raised here: Stack RM says whats in a release and no one else. We can still talk about technical merits of the implementation, don't we? Todd nrFiles = nrNodes means full locality, right? No. In DFSIO there is no locality, since files that DFSIO reads/writes are not the input of the MR job. Their names are. The reason here is to make sure the job completes in one wave of mappers, and to minimize contention on the drives between tasks. I was trying to avoid making this issue yet another discussion about DFSIO, because the objective here is to verify that the patch does not introduce regression in performance for sequential ios. If the benchmark I proposed doesn't work for you guys, you can propose a different one. Dhruba, Todd, Nicholas TestDFSIO exhibits very high variance, and its results are dependent on mapreduce's scheduling. DFSIO does not depend on the MR scheduling. It depends on the OS memory cache. Cluster nodes these days run with 16, 32 GB RAM. So a 10GB file almost entirely can be cached by OS. When you repeatedly run DFSIO then you are not measuring cold IO, but RAM access and communication. And high variation is explained by the fact that some data is cached and some is not. For example DFSIO -write is usually very stable with std.dev 1. This is because it deals with cold writes. For DFSIO -read you need to choose file size larger than your RAM. With sequential reads OS cache works as LRU, so if your file is larger than RAM, the OS cache will forget blocks from the head of the file, when you get to reading the tail. And when you start reading the file again cache will release oldest pages, which correspond to the higher offset in the file. So it is going to be cold read. I had to go to 100GB files, which brought std.dev to 2, and variation in throughput was around 3%. Alternatively you can clean Linux cache on all DataNodes. Nicholas it is hard to explain what do the Throughput and Average IO rate really mean. [This post|http://old.nabble.com/Re%3A-TestDFSIO-delivers-bad-values-of-%22throughput%22-and-%22average-IO-rate%22-p21322404.html] has the definitions. Nicholas, I agree with you the results you are posting don't make sense. The point is though not to screw the benchmark, but to find the conditions when it reliably measures what you need. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050944#comment-13050944 ] Konstantin Shvachko commented on HDFS-941: -- 150 MB/sec throughput can be if your data.dir is on a filer, which is your home directory or /tmp. This also explains ridiculous standard deviation, because it competed with with Nicholas running ant test in his home dir, which is on the same filer. Set data.dir to crawlspace3, you will start getting reasonable numbers. What is the cluster size? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051007#comment-13051007 ] Kihwal Lee commented on HDFS-941: - Filer was not used. Cluster has 5 DNs with a separate NN. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051048#comment-13051048 ] Hudson commented on HDFS-941: - Integrated in Hadoop-Hdfs-trunk #699 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/699/]) Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051216#comment-13051216 ] Tsz Wo (Nicholas), SZE commented on HDFS-941: - Hi Konstantin, don't you agree that the result number of Throughput and Average IO rate do not make much sense? At least the definitions of these two numbers are not clear. Recall that we got 1 GB/sec in the past. {noformat} 10/08/07 00:19:55 INFO fs.TestDFSIO: - TestDFSIO - : read 10/08/07 00:19:55 INFO fs.TestDFSIO:Date time: Sat Aug 07 00:19:55 UTC 2010 10/08/07 00:19:55 INFO fs.TestDFSIO:Number of files: 2 10/08/07 00:19:55 INFO fs.TestDFSIO: Total MBytes processed: 2048 10/08/07 00:19:55 INFO fs.TestDFSIO: Throughput mb/sec: 1096.3597430406853 10/08/07 00:19:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 1143.6881103515625 10/08/07 00:19:55 INFO fs.TestDFSIO: IO rate std deviation: 232.655606509863 10/08/07 00:19:55 INFO fs.TestDFSIO: Test exec time sec: 28.354 {noformat} Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050609#comment-13050609 ] Konstantin Shvachko commented on HDFS-941: -- I think it is reasonable to run tests against the latest patch and make sure there is no regression in performance. This is exactly what I asked. That is to run DFSIO on 5-node cluster with and without the *new* patch. Here is the command I propose to run for 5 nodes (should have nrFile = nrNodes). {code} TestDFSIO -read -fileSize 10GB -nrFiles 5 {code} You can run -write first to generate data. I think this will be representative enough. it failed to reject the null hypothesis Great analysis Todd, I am truly impressed. Does everything run on one node? Is there any inter-DN communication then? Also with 128 MB file everything is in RAM, not sure what it measures. Uncommitting now may do more harm than good. If my concerns can be addressed without uncommitting then I can hold on to that. Please confirm somebody is doing it. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050621#comment-13050621 ] Todd Lipcon commented on HDFS-941: -- Konstantin: Yes, everything runs on one node. It tests the localhost path, which is identical to what would be tested with your proposed benchmark (nrFiles = nrNodes means full locality, right?). bq. Also with 128 MB file everything is in RAM, not sure what it measures It measures the overhead of DFS rather than the cost of IO. Having it *not* be in RAM makes for a worse test since differences in CPU overhead are lost in the noise of the slow disks. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050622#comment-13050622 ] Kihwal Lee commented on HDFS-941: - Also with 128 MB file everything is in RAM, not sure what it measures. If cold reads are performed, the disk i/o will be the bottleneck and that can bury whatever overhead the patch might have introduced in the connection handling under noise. Since the patch didn't change the rest of the serving code, the ideal way of measuring its overhead will be having the dn to do something like null ops. It is effectively putting a magnifying glass on the area where the change has been made. In a normal setup, the next best thing is probably what Todd did. In any case, I will run DFSIO as you suggested. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050623#comment-13050623 ] Todd Lipcon commented on HDFS-941: -- Another thing to note is that TestDFSIO itself is a pretty flawed test. It exhibits very high variance, and its results are very much dependent on mapreduce's scheduling. For example, dropping the MR heartbeat interval from 3 seconds to 0.3 seconds improved DFS IO performance by nearly 2x in some tests I ran a few months ago. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050726#comment-13050726 ] dhruba borthakur commented on HDFS-941: --- My experience with TestDFSIO has been that the variance of its results are higher (especially due to map-reduce software scheduling), and could never capture (at least, for me) small differences in performance of DFS. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050798#comment-13050798 ] Kihwal Lee commented on HDFS-941: - Following is 4 consecutive samples taken out in the middle of a larger set. This is the test Konstantin suggested. The std dev of I/O rate seems too high and so does the variation in run times. This is probably not the best way to measure small performance differences, as others have pointed out. {noformat} - TestDFSIO - : read Date time: Thu Jun 16 20:29:41 UTC 2011 Number of files: 5 Total MBytes processed: 51200.0 Throughput mb/sec: 100.75824515346937 Average IO rate mb/sec: 136.13864135742188 IO rate std deviation: 92.17360497645333 Test exec time sec: 179.953 - TestDFSIO - : read Date time: Thu Jun 16 20:31:23 UTC 2011 Number of files: 5 Total MBytes processed: 51200.0 Throughput mb/sec: 150.92337396277026 Average IO rate mb/sec: 197.9733428955078 IO rate std deviation: 106.59864139156599 Test exec time sec: 99.805 - TestDFSIO - : read Date time: Thu Jun 16 20:33:20 UTC 2011 Number of files: 5 Total MBytes processed: 51200.0 Throughput mb/sec: 115.66831207852795 Average IO rate mb/sec: 145.11795043945312 IO rate std deviation: 90.42587602009961 Test exec time sec: 115.77 - TestDFSIO - : read Date time: Thu Jun 16 20:36:31 UTC 2011 Number of files: 5 Total MBytes processed: 51200.0 Throughput mb/sec: 91.04763462868748 Average IO rate mb/sec: 127.12406921386719 IO rate std deviation: 97.86844611649816 Test exec time sec: 189.954 {noformat} I ran shorter (64KB) reads so that variances are smaller and the proportion of overhead is larger. For larger reads, the overhead will become less noticeable. {noformat} === BEFORE === - TestDFSIO - : read Date time: Thu Jun 16 23:00:03 UTC 2011 Number of files: 5 Total MBytes processed: 4.7683716 Throughput mb/sec: 24.328426438934947 Average IO rate mb/sec: 24.558759689331055 IO rate std deviation: 2.474296728169802 Test exec time sec: 8.444 - TestDFSIO - : read Date time: Thu Jun 16 23:00:13 UTC 2011 Number of files: 5 Total MBytes processed: 4.7683716 Throughput mb/sec: 23.374370500153187 Average IO rate mb/sec: 23.41034698486328 IO rate std deviation: 0.9176091691810716 Test exec time sec: 8.41 - TestDFSIO - : read Date time: Thu Jun 16 23:00:23 UTC 2011 Number of files: 5 Total MBytes processed: 4.7683716 Throughput mb/sec: 24.83526865641276 Average IO rate mb/sec: 24.873613357543945 IO rate std deviation: 0.9842580011607321 Test exec time sec: 8.424 - TestDFSIO - : read Date time: Thu Jun 16 23:00:33 UTC 2011 Number of files: 5 Total MBytes processed: 4.7683716 Throughput mb/sec: 24.57923495892397 Average IO rate mb/sec: 24.62860679626465 IO rate std deviation: 1.1144092332035256 Test exec time sec: 8.41 === AFTER === - TestDFSIO - : read Date time: Thu Jun 16 23:07:34 UTC 2011 Number of files: 5 Total MBytes processed: 4.7683716 Throughput mb/sec: 23.961666241363066 Average IO rate mb/sec: 23.970088958740234 IO rate std deviation: 0.4478642432612885 Test exec time sec: 8.378 - TestDFSIO - : read Date time: Thu Jun 16 23:07:44 UTC 2011 Number of files: 5 Total MBytes processed: 4.7683716 Throughput mb/sec: 24.57923495892397 Average IO rate mb/sec: 24.58832550048828 IO rate std deviation: 0.4712211529700926 Test exec time sec: 8.394 - TestDFSIO - : read Date time: Thu Jun 16 23:07:53 UTC 2011 Number of files: 5 Total MBytes processed: 4.7683716 Throughput mb/sec: 22.92486337515024 Average IO rate mb/sec: 22.95939064025879 IO rate std deviation: 0.8841870285378609 Test exec time sec: 8.388 - TestDFSIO - : read Date time: Thu Jun 16 23:08:03 UTC 2011 Number of files: 5 Total MBytes processed: 4.7683716 Throughput mb/sec: 24.204931888483504 Average IO rate mb/sec: 24.234447479248047 IO rate std deviation: 0.8576845331358649 Test exec time sec: 8.382 {noformat} I didn't try to do any statistical analysis on it. If somebody wishes to, I can provide a larger set of data. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt,
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050845#comment-13050845 ] Konstantin Shvachko commented on HDFS-941: -- Kihwal, thanks for doing this. For the first set of results is it with or without the patch? Should there be BEFORE and AFTER sections, as in the second set? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049863#comment-13049863 ] stack commented on HDFS-941: I reran tests, same three failed. I backed out my patch and the same three failed. So, this patch does not seem to be responsible for these test failures on my machine. I'm +1 on commit. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050181#comment-13050181 ] Konstantin Shvachko commented on HDFS-941: -- -1 on committing this without the proof of no-degradation to sequential ios. Should have done it before, but thought my message was clear. Let me know if you want me to uncommit before benchmarks are provided. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050197#comment-13050197 ] stack commented on HDFS-941: @Konstantin Convention is that RM says whats in a release and no one else. See his +1 above. bq. ...proof of no-degradation to sequential ios. What would this test look like? Perf tests done above showed only minor differences (...well within the standard deviation. as per Todd). And if this test can only be committed pending perf evaluation, why single this patch out and not require it of all commits to hdfs? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050219#comment-13050219 ] Kihwal Lee commented on HDFS-941: - Perhaps it's confusing because this Jira is seen as Random Vs. Sequential read. But in fact this jira is really about improving short reads and the solution is to reduce the overhead of connection making, which is present in both short and long reads. It is by no means favoring random or short reads. In fact, if the client does typical sequential reads multiple times from the same dn, this patch will help them too. The gain will be bigger if the files are smaller. Sure, there is one time overhead of cache lookup (size: 16), this can be ignored when the read size is sufficiently big. This cache management overhead should show up, in theory, for very small cold(connecton-wise) accesses. So far I have only seen gains. But there might be some special chronic cases that this patch actually make read slower. But again I don't belive they are typical use cases. Having said that, I think it is reasonable to run tests against the latest patch and make sure there is no regression in performance. Uncommitting now may do more harm than good. Let's see the numbers first and decide what to do. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050231#comment-13050231 ] Todd Lipcon commented on HDFS-941: -- Ran the following benchmark to compare 0.22 before vs after the application of HDFS-941: - inserted a 128M file into HDFS - read it 50 times using hadoop fs -cat /file /dev/null and the unix time utility - recompiled with the patch reverted, restarted NN/DN - ran same test - recompiled with the patch included, restarted NN/DN - ran same test - recompiled with patch reverted - ran same test This resulted in 100 samples for each setup, 50 from each run. The following is the output of a t-test for the important variables: t.test(d.22$wall, d.22.with.941$wall) Welch Two Sample t-test data: d.22$wall and d.22.with.941$wall t = -0.4932, df = 174.594, p-value = 0.6225 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.011002972 0.006602972 sample estimates: mean of x mean of y 1.19371.1959 t.test(d.22$user, d.22.with.941$user) Welch Two Sample t-test data: d.22$user and d.22.with.941$user t = -1.5212, df = 197.463, p-value = 0.1298 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.032378364 0.004178364 sample estimates: mean of x mean of y 1.33351.3476 that is to say, it failed to reject the null hypothesis... in less stat-heavy terms, there's no statistical evidence that this patch makes the test any slower. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Fix For: 0.22.0 Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049278#comment-13049278 ] Todd Lipcon commented on HDFS-941: -- Looks good to me. How'd the test run go? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049456#comment-13049456 ] stack commented on HDFS-941: Bit odd. These failed when I ran all tests: {code} [junit] Running org.apache.hadoop.hdfs.TestFileAppend4 [junit] Tests run: 2, Failures: 0, Errors: 2, Time elapsed: 60.251 sec [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 104.115 sec [junit] Test org.apache.hadoop.hdfs.TestLargeBlock FAILED [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 60.022 sec [junit] Test org.apache.hadoop.hdfs.D-1D-2TestWriteConfigurationToDFS FAILED {code} I reran all and only TestLargeBlock fails when I run tests singularly. If I back out the patch, TestLargeBlock fails against clean 0.22 checkout. Commit I'd say? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049457#comment-13049457 ] stack commented on HDFS-941: Or, hang on...(240 minutes) and let me rerun these tests and see if TestFileAppend4 and/or TestWriteConfigurationToDFS fail again. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049500#comment-13049500 ] Konstantin Shvachko commented on HDFS-941: -- Could anybody please run DFSIO to make sure there is no degradation in sequential ios. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049515#comment-13049515 ] Todd Lipcon commented on HDFS-941: -- Cos: do you have any reason to believe there would be? I believe in benchmarking, but unless there's some reasoning behind the idea, it can take a lot of time that's better spent on other places (eg optimizing sequential IO :) ) If I recall correctly, early versions of this patch were indeed benchmarked for sequential IO, where we saw no difference. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049526#comment-13049526 ] Todd Lipcon commented on HDFS-941: -- oops, sorry Konstantin - didn't mean to call you Cos. But my comment stands :) Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049537#comment-13049537 ] Konstantin Shvachko commented on HDFS-941: -- Yes in the previous [comment|https://issues.apache.org/jira/browse/HDFS-941?focusedCommentId=12862854page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12862854] there has been some degradation in throughput for sequential io. I just want to make sure there is no degradation for the primary use case with this patch. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048579#comment-13048579 ] Kihwal Lee commented on HDFS-941: - HDFS-2071 was filed. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048688#comment-13048688 ] Todd Lipcon commented on HDFS-941: -- Hey Stack. I just looked over your patch for 0.22. The only thing I noticed is that it no longer calls verifiedByClient() -- this is a change that happened in trunk with HDFS-1655. Are we OK with removing this from 0.22? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048721#comment-13048721 ] stack commented on HDFS-941: I should put it back. Give me a sec... Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048837#comment-13048837 ] Todd Lipcon commented on HDFS-941: -- Hey Stack. I still don't think this is quite right -- it will now call verifiedByClient() if the client read the entire byterange, even if the byterange didn't cover the whole block. I think we need {{if (datanode.blockScanner != null blockSender.isBlockReadFully())}}. Also, can you add back TestDataXceiver? I think that test case would catch this bug. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048875#comment-13048875 ] Todd Lipcon commented on HDFS-941: -- Yea, I think we should add back the blockReadFully variable (in addition to keeping the new sentEntireByteRange variable and its getter). Looks like there's a new getFileBlocks() method which can be used after writeFile() to get the block location, and then keep that test around? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048979#comment-13048979 ] stack commented on HDFS-941: I put back TestDataXceiver. It does this: {code} -ListLocatedBlock blkList = util.writeFile(TEST_FILE, FILE_SIZE_K); +// Create file. +util.writeFile(TEST_FILE, FILE_SIZE_K); +// Now get its blocks. +ListLocatedBlock blkList = util.getFileBlocks(TEST_FILE, FILE_SIZE_K); {code} rather than change the writeFile signature (writeFile is used in a few other places so the change would ripple). I also added back BlockSender.isBlockReadFully so the tests before we call verifiedByClient are as they were before this patch application: {code} -if (DataTransferProtocol.Status.read(in) == CHECKSUM_OK) { - if (blockSender.isBlockReadFully() datanode.blockScanner != null) { -datanode.blockScanner.verifiedByClient(block); + if (blockSender.didSendEntireByteRange()) { +// If we sent the entire range, then we should expect the client +// to respond with a Status enum. +try { + DataTransferProtocol.Status stat = DataTransferProtocol.Status.read(in); + if (stat == null) { +LOG.warn(Client + s.getInetAddress() + did not send a valid status + + code after reading. Will close connection.); +IOUtils.closeStream(out); + } else if (stat == CHECKSUM_OK) { +if (blockSender.isBlockReadFully() datanode.blockScanner != null) { + datanode.blockScanner.verifiedByClient(block); +} } {code} I ran the bundled tests and they pass. Am currently running all. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047039#comment-13047039 ] Nigel Daley commented on HDFS-941: -- +1 for 0.22. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047051#comment-13047051 ] Todd Lipcon commented on HDFS-941: -- Cool, I will review and check in Stack's backport tomorrow. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047763#comment-13047763 ] Kihwal Lee commented on HDFS-941: - One thing I noticed is, Socket.isConnected() cannot be used for checking the connection status in this case. It returns false until the connection is made and then stays true after that. It will never return false after the initial connection is successfully made. Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming someone is handling SocketException and does Socket.close() or SocketChannel.close(). It seems the op handlers in DataXceiver are diligently using IOUtils.closeStream(), which will invoke SocketChannel.close(). {code} - } while (s.isConnected() socketKeepaliveTimeout 0); + } while (s.isConnected() !s.isClosed() socketKeepaliveTimeout 0); {code} Sorry for spotting this late. I just realized it while looking at HDFS-2054. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047784#comment-13047784 ] Todd Lipcon commented on HDFS-941: -- Hey Kihwal. Nice find. Mind filing a new JIRA for this? I think it should be a minor thing, since the next time around the loop, it will just the IOE trying to read the next operation anyway, right? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046375#comment-13046375 ] Todd Lipcon commented on HDFS-941: -- Stack seems to have turned up some kind of out of sync issue between client and server, where the client tries to do another request when the server is still expecting a status message. So, no commit tomorrow :( Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046374#comment-13046374 ] stack commented on HDFS-941: Dang. Did more testing (w/ Todd's help). I backported his patch to 0.22 so could run my loadings. I see this every so often in dn logs 'Got error for OP_READ_BLOCK' (perhaps once every ten minutes per server). The other side of the connection will print 'Client /10.4.9.34did not send a valid status code after reading. Will close connection' (I'll see this latter message much more frequently than the former but it seems fine -- we are just closing the connection and moving on w/ no repercussions client-side). Here is more context. In the datanode log (Look for 'Client /10.4.9.34did not...'): {code} 2011-06-08 23:39:45,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-1043418802690508828_7206 of size 16207176 from /10.4.9.34:57333 2011-06-08 23:39:45,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_-1043418802690508828_7206 terminating 2011-06-08 23:39:45,960 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_5716868613634466961_7207 src: /10.4.14.34:39560 dest: /10.4.9.34:10010 2011-06-08 23:39:46,301 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_5716868613634466961_7207 of size 29893370 from /10.4.14.34:395602011-06-08 23:39:46,301 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_5716868613634466961_7207 terminating 2011-06-08 23:39:46,326 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-7242346463849737969_7208 src: /10.4.14.34:39564 dest: /10.4.9.34:100102011-06-08 23:39:46,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Client /10.4.9.34did not send a valid status code after reading. Will close connection. 2011-06-08 23:39:46,435 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Client /10.4.9.34did not send a valid status code after reading. Will close connection.2011-06-08 23:39:46,435 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Client /10.4.9.34did not send a valid status code after reading. Will close connection. 2011-06-08 23:39:46,435 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Client /10.4.9.34did not send a valid status code after reading. Will close connection.2011-06-08 23:39:47,837 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-7242346463849737969_7208 of size 67108864 from /10.4.14.34:39564 2011-06-08 23:39:47,837 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_-7242346463849737969_7208 terminating 2011-06-08 23:39:47,855 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_7820819556875770048_7208 src: /10.4.14.34:39596 dest: /10.4.9.34:10010 2011-06-08 23:39:49,212 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_7820819556875770048_7208 of size 67108864 from /10.4.14.34:39596 2011-06-08 23:39:49,212 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_7820819556875770048_7208 terminating {code} In the regionserver log (the client): {code} 2011-06-08 23:39:45,777 INFO org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 4 file(s) in values of usertable,user617882364,1307559813504. e4a9ed69f909762ddba8027cb6438575.; new storefile name=hdfs://sv4borg227:1/hbase/usertable/e4a9ed69f909762ddba8027cb6438575/values/6552772398789018757, size=143.5m; total size for store is 488.4m 2011-06-08 23:39:45,777 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: completed compaction: regionName=usertable,user617882364,1307559813504. e4a9ed69f909762ddba8027cb6438575., storeName=values, fileCount=4, fileSize=175.5m, priority=2, date=Wed Jun 08 23:39:41 PDT 2011; duration=3sec 2011-06-08 23:39:45,777 DEBUG org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: CompactSplitThread Status: compaction_queue=(0:0), split_queue=0 2011-06-08 23:39:46,436 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /10.4.9.34:10010 for file /hbase/usertable/e4a9ed69f909762ddba8027cb6438575/values/ 5422279471660943029 for block blk_1325488162553537841_6905:java.io.IOException: Got error for OP_READ_BLOCK, self=/10.4.9.34:57345, remote=/10.4.9.34:10010, for file /hbase/ usertable/e4a9ed69f909762ddba8027cb6438575/values/5422279471660943029, for block 1325488162553537841_6905 at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:437) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:727) at
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046376#comment-13046376 ] Kihwal Lee commented on HDFS-941: - My test is still running on trunk, but so far I only see did not send a valid status code after reading. Will close connection in special occasions. In my case it's during task init (random readers are map tasks in my test), number of messages exactly matching number of tasks on running on the DN. Afterwards I don't see them. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046384#comment-13046384 ] stack commented on HDFS-941: @Kihwal You are on TRUNK and not 0.22? (I wonder if my backport messed up something -- Todd doesn't thing so but...) Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046383#comment-13046383 ] stack commented on HDFS-941: @Kihwal Are you doing any writing at the same time? (I was). Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046387#comment-13046387 ] Kihwal Lee commented on HDFS-941: - It's read-only and yes it's against TRUNK. I put 200 X 170MB files across 8 DNs, dfs.replication=1. There are 200 random readers who are randomly reading from all 200 files. The locality was intentionally reduced to test the socket caching. I will try a R/W test once this one is done. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046422#comment-13046422 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481896/hdfs-941.txt against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/748//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/748//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/748//console This message is automatically generated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046526#comment-13046526 ] Kihwal Lee commented on HDFS-941: - Good catch and fix! I took a close look the open connections each reader has and sometimes saw more than one connections to a same DN. I will see if that is fixed with the Todd's fix. Otherwise I will look further to determine if it is an issue. The test I did was primarily for exercising the socket cache itself. To make it more interesting, the socket cache size was lowered to 3 and dfs.replication to 1. I used the random read test (work in progress) in HDFS-236 on a cluster with 8 data nodes. 200 X 170MB files were created. 200 readers (25 on each DN) read 200 files randomly 64K at a time, jumping among files, for about 6 hours last night. Each reader caches DFSInputStream to all 200 files during its lifetime. Checked the client/server logs afterward. ** I saw 25 of the did not send a valid status code after reading. Will close connection warning at around the task initialization (readers are map tasks) on each data node. They all look local, so they are likely accessing the job conf/jar files that are replicated and available on all eight data nodes, unlike regular data files. Or accessing local DN for some other reasons during this time period. Need to check whether this needs to be fixed. ** While running, there were 3 ESTABLISHED connections per process and some number of sockets in TIME_WAIT all the time. It means socket cache is not leaking anything, clients are not denied of new connections and eviction is working. ** The only thing I think a bit odd is the symptom I mentioned above: Duplicate connections in the socket cache. I will try to reproduce with Todd's latest fix. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046583#comment-13046583 ] Kihwal Lee commented on HDFS-941: - Regarding duplicate connections, it makes sense because the inputstream cache is per file and it is quite possible that the clients read blocks belonging to two files that are on the same DN within the window of 3 reads. I will look at the one happening during task initialization. May be they just stop reading in the middle of stream by design. Since one message will show up for every new map task, how about changing the message to DEBUG after we are done with testing? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046643#comment-13046643 ] Kihwal Lee commented on HDFS-941: - I am retesting with Todd's patch and I don't see the messages anymore. Instead, I see more of BlockSender.sendChunks() exception: java.io.IOException: Broken pipe from DNs. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046650#comment-13046650 ] stack commented on HDFS-941: @Kihwal I see lots of those sendChunks exceptions but don't think related. Testing latest addition to patch... Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046661#comment-13046661 ] Kihwal Lee commented on HDFS-941: - OK, I see it's from BlockSender.java:407. It really shouldn't say ERROR since clients can close connections any time, but I agree that this needs to be addressed in a separate work. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046700#comment-13046700 ] stack commented on HDFS-941: +1 on commit for latest version of patch. I've been running over the last few hours. I no longer see Client /10.4.9.34did not send a valid status code after reading (fix the space on commit) nor do I see the Got error for OP_READ_BLOCK exceptions. I have the BlockSender.sendChunks exceptions but they are something else (that we need to fix). Nice test you have over there Kihwal! My test was a 5 node cluster running hbase on a 451 patched 0.22. The loading was random reads running in MR and then another random-read test being done via a bunch of clients. Cache was disabled so went to FS for all data. I also had random writing going on concurrently. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046706#comment-13046706 ] Todd Lipcon commented on HDFS-941: -- Regarding duplicate connections: also keep in mind that the caching only applies at the read side. So, assuming there's some output as well, there will be a socket for each of those streams. I agree we should fix the sendChunks error messages separately. I think JD might have filed a JIRA about this a few weeks ago. I'll see if I can dig it up. Kihwal: are you +1 on commit now as well? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046716#comment-13046716 ] Kihwal Lee commented on HDFS-941: - They were pure readers and didn't write/report anything until the end. I just filed HDFS-2054 for the error message. If you find the other JIRA that was already filed, please dupe one to the other. +1 for commit. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046722#comment-13046722 ] Todd Lipcon commented on HDFS-941: -- Committed to trunk. I'm 50/50 on whether this should go into the 0.22 branch as well. Like Stack said, it's a nice carrot to help convince HBase users to try out 0.22. But, it's purely an optimization and on the riskier side as far as these things go. I guess I'll ping Nigel? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046724#comment-13046724 ] Todd Lipcon commented on HDFS-941: -- Also, big thanks to: bc for authoring the majority of the patch and test cases, Sam Rash for reviews, and Stack and Kihwal for both code review and cluster testing. Great team effort spanning 4 companies! Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046738#comment-13046738 ] stack commented on HDFS-941: Todd, I'll buy you a beer to go 51/49 in favor of 0.22 commit. If Nigel wants me to a make a case, I could do it here or in another issue? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046739#comment-13046739 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481962/941.22.txt against trunk revision 1134031. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/754//console This message is automatically generated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046744#comment-13046744 ] Eli Collins commented on HDFS-941: -- Make that two beers (52/48?). I reviewed an earlier version of this patch but if Nigel is game I think it's suitable for 22 as well. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046756#comment-13046756 ] stack commented on HDFS-941: Yeah, my 0.22 version fails against trunk (trunk already has guava, etc.) Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046755#comment-13046755 ] stack commented on HDFS-941: So, that would leave 48 beers that I need to buy (And Nigel probably wants two) -- I can get a keg? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046148#comment-13046148 ] stack commented on HDFS-941: +1 on commit. Have run this patch first in a light random read loading over night and then over this morning with a 'heavy' random read + write loading on 5 node cluster. Discernible perf improvement (caching involved so hard to say for sure but I see 20% improvement if just random reads). @Kihwal Fair enough. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046151#comment-13046151 ] stack commented on HDFS-941: Oh, just to say that I don't seem hdfs-level complaints in server or client side and that I tested on patched 0.22 hadoop. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046155#comment-13046155 ] stack commented on HDFS-941: This patch should be applied to hadoop 0.22. It'd be an incentive for hbase users to upgrade to hadoop 0.22. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046159#comment-13046159 ] Todd Lipcon commented on HDFS-941: -- Stack, thanks a million for the cluster testing and review!! I will get to your review feedback later this afternoon and post a final patch. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046162#comment-13046162 ] stack commented on HDFS-941: On occasion I see these new additions to the datanode log: {code} 2011-06-08 12:37:20,478 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Client did not send a valid status code after reading. Will close connection. 2011-06-08 12:37:20,480 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Client did not send a valid status code after reading. Will close connection. 2011-06-08 12:37:20,482 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Client did not send a valid status code after reading. Will close connection. 2011-06-08 12:37:20,483 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Client did not send a valid status code after reading. Will close connection. {code} Should these be logged as DEBUG and not ERROR? I see this too, don't think it related: {code} 2011-06-08 12:40:09,642 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-2049668997072761677_6556 src: /10.4.9.34:36343 dest: /10.4.9.34:10010 2011-06-08 12:40:09,661 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: BlockSender.sendChunks() exception: java.io.IOException: Connection reset by peer at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:204) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:392) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:481) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opReadBlock(DataXceiver.java:237) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opReadBlock(DataTransferProtocol.java:356) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:328) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:169) at java.lang.Thread.run(Thread.java:662) {code} Odd is that this is machine talking to itself. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046175#comment-13046175 ] Todd Lipcon commented on HDFS-941: -- Hey Stack, are you sure you got the latest patch applied? the did not send a valid status code bit was changed to a WARN in the latest patch, and I also addressed a bug that would cause it to happen more often than it used to. I agree that the warning in sendChunks is unrelated - I've seen that in trunk for a while before this patch. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046182#comment-13046182 ] stack commented on HDFS-941: OK. Looks like I was running the just-previous. Let me redo loadings. On the IOE sendChunks, this is in 0.22. I should make an issue for it? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046183#comment-13046183 ] stack commented on HDFS-941: Or, hang on, let me indeed verify 0.22 has this minus the 941 patch. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046212#comment-13046212 ] Kihwal Lee commented on HDFS-941: - I will try putting some load in a cluster with this patch + trunk. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046234#comment-13046234 ] stack commented on HDFS-941: New patch looks good (nice comment on why NODELAY). Let me test it. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046311#comment-13046311 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481868/hdfs-941.txt against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/745//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/745//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/745//console This message is automatically generated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046330#comment-13046330 ] Todd Lipcon commented on HDFS-941: -- I'd like to commit this tomorrow so long as Stack and Kihwal's testing works out. Woo! :) Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045432#comment-13045432 ] Kihwal Lee commented on HDFS-941: - You think 16 a good number for the socket cache (doesn't seem easily chanageable)? If the client's working set size of data nodes in past several seconds is bigger, it means lower locality. If a lot of clients are doing it, each data node is likely to see less data locality, making page cache less effective. This can make more reads cold and the gain from caching connections will start to diminish. Is 16 a good number? IMO, it may actually be too big for typical use cases, but is small enough to not cause trouble. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045033#comment-13045033 ] Hadoop QA commented on HDFS-941: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481594/hdfs-941.txt against trunk revision 1132698. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/720//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/720//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/720//console This message is automatically generated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045278#comment-13045278 ] stack commented on HDFS-941: I took a look at patch. It looks good to me. Minor comments below. Meantime I've patched it into an hadoop 0.22 and am running a loading on it overnight to see if can find probs. What is this about? +dependency org=com.google.collections name=google-collections rev=${google-collections.version} conf=common-default/ When I go to the google-collections home page it says: {code} This library was renamed to Guava! What you see here is ancient and unmaintained. Do not use it. {code} Nice doc. changes in BlockReader. If you make another version of this patch, change the mentions of getEOS in comments to be 'eos' to match the change of variable name. When you create a socket inside in getBlockReader, you've added this: {code} 469 +sock.setTcpNoDelay(true); {code} to the socket config before connect. That is intentional? (This is new with this patch. Also, old code used set timer after making connection -- which seems off... in your patch you set timeout then connect). You think 16 a good number for the socket cache (doesn't seem easily chanageable)? Nice cleanup of description in DataNode.java One note is that this patch looks 'safe'; we default to closing the connection if anything untoward which should be just the behavior DN had before this patch. TestParallelRead is sweet. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044698#comment-13044698 ] Todd Lipcon commented on HDFS-941: -- oops, the last line of my benchmark results got truncated. It should read: *without patch*: 11/06/05 20:32:54 INFO hdfs.TestParallelRead: === Report: 4 threads read 2619994 KB (across 1 file(s)) in 25.762s; average 101699.94565639313 KB/s 11/06/05 20:33:34 INFO hdfs.TestParallelRead: === Report: 16 threads read 10470506 KB (across 1 file(s)) in 40.583s; average 258002.26695907154 KB/s 11/06/05 20:34:00 INFO hdfs.TestParallelRead: === Report: 8 threads read 5232371 KB (across 2 file(s)) in 25.484s; average 205319.8477476063 KB/s *with patch*: 11/06/05 20:35:45 INFO hdfs.TestParallelRead: === Report: 4 threads read 2626843 KB (across 1 file(s)) in 10.208s; average 257331.7985893417 KB/s 11/06/05 20:36:13 INFO hdfs.TestParallelRead: === Report: 16 threads read 10492178 KB (across 1 file(s)) in 27.046s; average 387938.25334615103 KB/s 11/06/05 20:36:25 INFO hdfs.TestParallelRead: === Report: 8 threads read 5236253 KB (across 2 file(s)) in 12.447s; average 420683.93990519806 KB/s Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044715#comment-13044715 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481534/hdfs-941.txt against trunk revision 1131331. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.TestDFSClientRetries +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/711//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/711//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/711//console This message is automatically generated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs-941.txt, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032194#comment-13032194 ] Jason Rutherglen commented on HDFS-941: --- I'm seeing many errors trying to apply http://issues.apache.org/jira/secure/attachment/12476027/HDFS-941-6.patch to https://svn.apache.org/repos/asf/hadoop/hdfs/trunk Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023101#comment-13023101 ] Todd Lipcon commented on HDFS-941: -- Looks pretty good, and I looped TestFileConcurrentReader for half an hour or so with no failures. A few small comments: - google-collections is deprecated in favor of the new name guava - we should depend on the newest - in TestParallelRead, you have a few cases of assert() where you should probably be using assertEquals() in case unit tests run without -ea. assertEquals() will also give a nicer error message - in SocketCache.evict(), you are calling {{multimap.remove}} while iterating over the same map's entries. This seems likely to throw ConcurrentModificationException. Better to use {{multimap.iterator()}} and call {{it.remove()}}. This makes me notice that you only call {{evict()}} ever with an argument of 1, so maybe you should just rename to {{evictOne()}} - If you have multiple DFSClient in a JVM with different socketTimeout settings, I think this will currently end up leaking timeouts between them. Perhaps after successfully getting a socket from socketCache, you need to call {{sock.setSoTimeout}} based on the current instance of {{dfsClient}}? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021837#comment-13021837 ] Todd Lipcon commented on HDFS-941: -- TestFileConcurrentReader has been failing intermittently a lot for a while - it's likely this isn't related to the patch. But worth a quick look at least to see if this patch changes the intermittent failure to a reproducible one. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021862#comment-13021862 ] sam rash commented on HDFS-941: --- The last failure I saw with this test was basically unrelated to the test itself--it was a socket leak in the datanode, i think with RPCs. I glanced at the first test failure output and found a similar error: 2011-04-11 21:29:36,962 INFO datanode.DataNode (DataXceiver.java:opWriteBlock(458)) - writeBlock blk_-6878114854540472276_1001 received exception java.io.FileNotFoundException: /grid/0/hudson/hudson-slave/workspace/PreCommit-HDFS-Build/trunk/build/test/data/dfs/data/data1/current/rbw/blk_-6878114854540472276_1001.meta (Too many open files) Note that this test implicitly finds any socket/fd leaks because it opens/closes files repeatedly. If you can check into this, that'd be great. I'll have some more time later this week to help more. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018635#comment-13018635 ] bc Wong commented on HDFS-941: -- I'll take a look at the TestFileConcurrentReader failure. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018471#comment-13018471 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12476021/HDFS-941-6.patch against trunk revision 1091131. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/340//console This message is automatically generated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018472#comment-13018472 ] Tsz Wo (Nicholas), SZE commented on HDFS-941: - Hi bc, seems that Jenkins (previously Hudson) sometimes does not pick up patches. I just have [submitted this manually|https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/340/]. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018485#comment-13018485 ] bc Wong commented on HDFS-941: -- Thanks Nicholas! I generated the wrong patch format, unfortunately. Could you help me submit it to Jenkins again? Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018570#comment-13018570 ] Tsz Wo (Nicholas), SZE commented on HDFS-941: - You are welcome. Just have [started it|https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/343/]. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018589#comment-13018589 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12476027/HDFS-941-6.patch against trunk revision 1091131. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/343//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/343//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/343//console This message is automatically generated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010814#comment-13010814 ] Kihwal Lee commented on HDFS-941: - +1 The patch looks good. I was unsure about the new dependency on Guava, but apparently people have already agreed on adding it to hadoop-common, so I guess it's not an issue. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010879#comment-13010879 ] stack commented on HDFS-941: +1 on commit. Patch looks great though a bit hard to read because its mostly white-space changes. I like the tests. Im good w/ adding guava. If a v6, here a few minor comment: Javadoc on BlockReader is not properly formatted (will show as mess after html'ing) -- same for class comment on DN. gotEOS is odd name for a boolean, would think eos better? Hard-codings like this, +final int MAX_RETRIES = 3;, should be instead gotten from config. even if not declared in hdfs-default.xml? Same for DN_KEEPALIVE_TIMEOUT. Why would we retry a socket that is throwing an IOE? Why not close and move on with new socket? Is SocketCache missing a copyright notice? Is this the right thing to do? {code} +SocketAddress remoteAddr = sock.getRemoteSocketAddress(); +if (remoteAddr == null) { + return; +} {code} The socket is not cached because it does not have a remote address. Why does it not have a remote address. Is there something wrong w/ the socket? Should we throw and exception or close and throw away the socket? There is a tab at #1242 in patch: {code}+ // restore normal timeout{code} Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, hdfs941-1.png Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009167#comment-13009167 ] Kihwal Lee commented on HDFS-941: - Nice work! I performed a basic test and got results comparable to the one from your previous patch. I will review the patch in depth soon. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12996836#comment-12996836 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443322/HDFS-941-4.patch against trunk revision 1072023. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/199//console This message is automatically generated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866496#action_12866496 ] Todd Lipcon commented on HDFS-941: -- I ran some benchmarks again tonight using YCSB. I loaded 1M rows into an HBase table (untimed) on my test cluster. The cluster is running a 5-node HDFS, but I only ran one HBase region server, so that I could reliably have the same region deployment between test runs. The data fits entirely within the buffer cache, so we're just benchmarking DFS overhead and not actual seek time. I ran benchmarks with: {code} java -cp build/ycsb.jar:src/com/yahoo/ycsb/db/hbaselib/*:$HBASE_CONF_DIR com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=$[1000*1000] -p operationcount=$[1000*1000] {code} from one of the nodes in the cluster (not the same one as ran the region server) I ran the benchmark twice without the patch and twice with, alternating builds and restarting DFS and HBase each time, to make sure I wasn't getting any variability due to caching, etc. Results follow: == 941-bench-1.txt == YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=100 -p operationcount=100 [OVERALL],RunTime(ms), 118197 [OVERALL],Throughput(ops/sec), 8460.451618907417 [READ], Operations, 100 [READ], AverageLatency(ms), 4.701651 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 1352 [READ], 95thPercentileLatency(ms), 11 [READ], 99thPercentileLatency(ms), 15 == 941-bench-2.txt == YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=100 -p operationcount=100 [OVERALL],RunTime(ms), 124005 [OVERALL],Throughput(ops/sec), 8064.190960041934 [READ], Operations, 100 [READ], AverageLatency(ms), 4.940652 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 1337 [READ], 95thPercentileLatency(ms), 12 [READ], 99thPercentileLatency(ms), 16 == normal-bench-1.txt == YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=100 -p operationcount=100 [OVERALL],RunTime(ms), 182316 [OVERALL],Throughput(ops/sec), 5484.982118958293 [READ], Operations, 100 [READ], AverageLatency(ms), 7.267306 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 1327 [READ], 95thPercentileLatency(ms), 17 [READ], 99thPercentileLatency(ms), 26 == normal-bench-2.txt == YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=100 -p operationcount=100 [OVERALL],RunTime(ms), 190053 [OVERALL],Throughput(ops/sec), 5261.690160113231 [READ], Operations, 100 [READ], AverageLatency(ms), 7.577673 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 1525 [READ], 95thPercentileLatency(ms), 15 [READ], 99thPercentileLatency(ms), 21 In other words, this patch speeds up average latency by nearly 40%, with similar gains on the high percentile latencies. The reads/sec number improved by about 35%. This is without any tuning of the keepalive or the socket cache size - I imagine even more improvement could be made with a bit more tuning, etc. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866519#action_12866519 ] Todd Lipcon commented on HDFS-941: -- I'd like to hold off on this just a bit longer yet - I'm seeing this sporadically in my testing: Caused by: java.lang.IndexOutOfBoundsException at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:151) at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1155) at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384) at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1441) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:1913) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2035) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46) But the above benchmarks do show that the idea has a lot of promise! (and the above trace may in fact be an HBase bug) Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1284#action_1284 ] sam rash commented on HDFS-941: --- todd: wow, those benchmarks do look impressive! do we have any idea if standard sequential access gets any benefit? bc: my point about the cache is you don't have to hard-code it as a static member of ReaderSocketCache. I don't think it needs to be more generic--it can be a socket cache. I do think it can be decoupled from BlockReader by getting rid of having owner. why does a 'cache' create sockets? you can avoid the whole owner problem if you simply let the client ask for a socket, and if there is none, create its own, use it, and put it in the cache when its done with it (ie, it's usable). This should greatly reduce complexity (no need for free + used separately, owner, etc). It seems like this is mixing up responsibilities of being a socket factory and a socket cache (possibly why it seems complex to me) code boolean reusable() { return ((owner == null || owner.hasConsumedAll()) sock.isConnected() !sock.isInputShutdown() !sock.isOutputShutdown()); } /code will only check socket if you can make this change Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866680#action_12866680 ] sam rash commented on HDFS-941: --- two other comments: 1. the number of sockets per address is limited, but not the number of addresses. This may in practice not be a problem, but the cache can in theory grow very large 2. the usedmap seems like a good place for a memory/object leak: I can take a socket and never return it (why again, I vote for getting rid of this data structure period--as far as a cache is concerned, an entry that someone else owns shouldn't even be there). otherwise, you've got to periodically clean this map up as well. Seems like it's only used for stats which I think you can do w/o actually keeping a hash of used sockets. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866456#action_12866456 ] sam rash commented on HDFS-941: --- +1 for the idea of caching sockets, but I have some questions/concerns about the implementation. some comments: 1. avoid making the cache implementation tied to the class ReaderSocketCache. Don't make the cache a static member of the same class. Let the cache be an instantiable object. Let DFSClient store the cache either as an instance or static var (don't force everything to use the same cache instance--better for testing and stubbing out as well) 2. a lot of the logic around re-using is complicated--I think this could be simplified a. not clear why sockets are always in the cache even if not usable: i would think adding only when usable and removing when used would be cleaner? b. if we can keep the cache clean, no need for lazy removal of unusable sockets 3. shouldn't there be a cap on the # of sockets there can be in the cache? -again, should only be usable ones, but a max # put into the cache makes sense. If we have a flurry of reads using tons of sockets to several DNs, no need to keep 100s or more sockets in a cache 4. general concern about potential socket leaks; 5. seems like this needs more thought into the effects of synchronization: the freemap has to be traversed every time to get a socket in a sync block. see above if we can avoid lazy removal by not putting unusable sockets in the cache (unsuable either since they are in use or not usable at all) 6. do we have real performance benchmarks from actual clusters that show a significant benefit? as noted above, the change is fairly complex (caching is in fact hard :) and if we don't see a substantial performance improvement, the risk of bugs may outweigh the benefit that's my 2c anyway -sr Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862854#action_12862854 ] bc Wong commented on HDFS-941: -- The variance is large on the tests. But they show that the patch isn't slower than trunk. Tests executed on a 5 node cluster: * TestDFSIO -read -fileSize 512 -bufferSize 4096 -nrFiles 10 ||-||trunk||patched|| |Num trials|6|5| |Throughput (MB/s)|92|93| |Avg IO (MB/s)|150|134| |Std dev|122|77| * TestDFSIO -read -fileSize 512 -bufferSize 4096 -nrFiles 20 ||-||trunk||patched|| |Num trials|5|5| |Throughput (MB/s)|78|83| |Avg IO (MB/s)|114|121| |Std dev|75|76| * Distributed {{bin/hadoop fs -cat /benchmarks/TestDFSIO/io_data/test_io_$i /dev/null}}, for i in [0,9] ||-||trunk||patched|| |Num trials|5|5| |Avg time (sec)|47.8|48.0| |Std dev|4.2|3.6| Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855205#action_12855205 ] bc Wong commented on HDFS-941: -- I replaced the size-of-one cache with a more generic cache, which is also a global shared cache. There is a new TestParallelRead, which test the concurrent use of a DFSInputStream with concurrent readers. There's a clear speed difference with vs without the patch. Each thread does 1024 # of reads. Trunk: {noformat} Report: 4 threads read 236953 KB (across 1 file(s)) in 5.879s; average 40304.98384078925 KB/s Report: 4 threads read 238873 KB (across 1 file(s)) in 5.063s; average 47180.13035749556 KB/s Report: 4 threads read 236068 KB (across 1 file(s)) in 5.93s; average 39809.10623946037 KB/s Report: 16 threads read 942666 KB (across 1 file(s)) in 13.524s; average 69703.19432120674 KB/s Report: 16 threads read 947015 KB (across 1 file(s)) in 13.401s; average 70667.48750093277 KB/s Report: 16 threads read 948768 KB (across 1 file(s)) in 12.932s; average 73365.91401175379 KB/s Report: 8 threads read 469529 KB (across 2 file(s)) in 5.436s; average 86373.98822663723 KB/s Report: 8 threads read 455428 KB (across 2 file(s)) in 5.363s; average 84920.38038411336 KB/s Report: 8 threads read 469005 KB (across 2 file(s)) in 5.713s; average 82094.34622790127 KB/s {noformat} Patched: {noformat} Report: 4 threads read 236845 KB (across 1 file(s)) in 3.612s; average 65571.70542635658 KB/s Report: 4 threads read 238803 KB (across 1 file(s)) in 4.371s; average 54633.49347975291 KB/s Report: 4 threads read 240241 KB (across 1 file(s)) in 4.395s; average 54662.34357224119 KB/s Report: 16 threads read 938652 KB (across 1 file(s)) in 9.044s; average 103787.26227333037 KB/s Report: 16 threads read 943999 KB (across 1 file(s)) in 8.59s; average 109895.11059371362 KB/s Report: 16 threads read 938546 KB (across 1 file(s)) in 9.081s; average 103352.71445876005 KB/s Report: 8 threads read 478534 KB (across 2 file(s)) in 3.376s; average 141745.85308056872 KB/s Report: 8 threads read 467412 KB (across 2 file(s)) in 3.623s; average 129012.42064587357 KB/s Report: 8 threads read 475349 KB (across 2 file(s)) in 3.49s; average 136203.15186246418 KB/s {noformat} bq. The edits to the docs in DataNode.java are good - if possible they should probably move into HDFS-1001 though, no? The addition to the docs doesn't apply to HDFS-1001, in which the DataXceiver still actively closes all sockets after each use. Todd, the new patch addresses the rest of your comments. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch, HDFS-941-2.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847244#action_12847244 ] bc Wong commented on HDFS-941: -- Thanks for the review, Todd. I'll add more tests, and look into making a cache of size 1. bq. I think there is a concurrency issue here. Namely, the positional read API calls through into fetchBlockByteRange, which will use the existing cached socket, regardless of other concurrent operations. So we may end up with multiple block readers on the same socket and everything will fall apart. That should be fine. Each {{SocketCacheEntry}} has a unique {{Socket}}, owned by its {{BlockReader}}. One of the reuse condition is that the {{BlockReader}} has finished reading on that {{Socket}} ({{hasConsumedAll()}}). Note that we do not reuse {{BlockReader}}. So at this point, it should be safe to take the {{Socket}} away from its previous owner and give it to a new {{BlockReader}}. I'll add tests for this though. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847117#action_12847117 ] Todd Lipcon commented on HDFS-941: -- Style notes: - in BlockReader: {code} + LOG.warn(Could not write to datanode + sock.getInetAddress() + + : + e.getMessage()); {code} should be more specific - like Could not write read result status code and also indicate in the warning somehow that this is not a critical problem. Perhaps info level is better? (in my experience if people see WARN they think something is seriously wrong) - please move the inner SocketCacheEntry class down lower in DFSInputStream - in SocketCacheEntry.setOwner, can you use IOUtils.closeStream to close reader? Similarly in SocketCacheEntry.close - We expect the following may happen reasonably often, right? {code} +// Our socket is no good. +DFSClient.LOG.warn(Error making BlockReader. Closing stale + entry.sock.toString()); {code} I think this should probably be debug level. - The edits to the docs in DataNode.java are good - if possible they should probably move into HDFS-1001 though, no? - the do { ... } while () loop is a bit hard to follow in DataXceiver. Would it be possible to rearrange the code a bit to be more linear? (eg setting DN_KEEPALIVE_TIMEOUT right before the read at the beginning of the loop if workDone 0 would be easier to follow in my opinion) - In DataXceiver: {code} + } catch (IOException ioe) { +LOG.error(Error reading client status response. Will close connection. Err: + ioe); {code} Doesn't this yield error messages on every incomplete client read? Since the response is optional, this seems more like a DEBUG. Bigger stuff: - I think there is a concurrency issue here. Namely, the positional read API calls through into fetchBlockByteRange, which will use the existing cached socket, regardless of other concurrent operations. So we may end up with multiple block readers on the same socket and everything will fall apart. Can you add a test case which tests concurrent use of a DFSInputStream? Maybe a few threads doing random positional reads while another thread does seeks and sequential reads? - Regarding the cache size of one - I don't think this is quite true. For a use case like HBase, the region server is continually slamming the local datanode with random read requests from several client threads. Is the idea that such an application should be using multiple DFSInputStreams to read the same file and handle the multithreading itself? - In DataXceiver, SocketException is caught and ignored while sending a block. (// Its ok for remote side to close the connection anytime. I think there are other SocketException types (eg timeout) that could throw here aside from a connection close, so in that case we need to IOUtils.closeStream(out) I believe. A test case for this could be to open a BlockReader, read some bytes, then stop reading so that the other side's BlockSender generates a timeout. - Not sure about this removal in the finally clause of opWriteBlock: {code} - IOUtils.closeStream(replyOut); {code} (a) We still need to close in the case of an downstream-generated exception. Otherwise we'll read the next data bytes from the writer as an operation and have undefined results. (b) To keep this patch less dangerous, maybe we should not add the reuse feature for operations other than read? Read's the only operation where we expect a lot of very short requests coming in - not much benefit for writes, etc, plus they're more complicated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846499#action_12846499 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438934/HDFS-941-1.patch against trunk revision 923467. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/130/console This message is automatically generated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.