I completely agree with Ryan. Most of the measurements in HDFS-347 are point comparisions.... data rate over socket, single-threaded sequential read from datanode, single-threaded random read form datanode, etc. These measurements are good, but when you run the entire Hbase system at load, you definitely see a 3X performance improvement when reading data locally (instead of going through the datanode).
-dhruba On Fri, Jun 3, 2011 at 11:08 AM, Ryan Rawson <ryano...@gmail.com> wrote: > Could you explain your HDFS-347 comment more? I dont think people > suggested that the socket itself was the primary issue, but dealing > with the datanode and the socket and everything was really slow. It's > hard to separate concerns and test only 1 thing at a time - for > example you said 'local socket comm isnt the problem', but there is no > way to build a test that uses a local socket but not the datanode. > > The basic fact is that datanode adds a lot of overhead, and under high > concurrency that overhead grows. > > > > On Fri, Jun 3, 2011 at 7:07 AM, Kihwal Lee <kih...@yahoo-inc.com> wrote: > > HDFS-941 > > The trunk has moved on so the patch won't apply. There has been > significant changes in HDFS lately, so it will require more than simple > rebase/merge. If the original assignee is busy, I am willing to help. > > > > HDFS-347 > > The analysis is pointing out that local socket communication is actually > not the problem. The initial assumption of local socket being slow should be > ignored and the design should be revisited. > > > > I agree that improving local pread performance is critical. Based on my > experiments, HDFS-941 helps a lot and the communication channel became no > longer the bottleneck. > > > > Kihwal > > > > > > On 6/2/11 4:00 PM, "Doug Meil" <doug.m...@explorysmedical.com> wrote: > > > > Hi folks, I was wondering if there was any movement on any of these HDFS > tickets for HBase. The umbrella ticket is HDFS-1599, but the last comment > from stack back in Feb highlighted interest in several tickets: > > > > > > 1) HDFS-918 (use single selector) > > > > a. Last comment Jan 2011 > > > > > > > > 2) HDFS-941 (reuse of connection) > > > > a. Patch available as of April 2011 > > > > b. But ticket still unresolved. > > > > > > > > 3) HDFS-347 (local reads) > > > > a. Discussion seemed to end in March 2011 with a huge comment > saying that there was no performance benefit. > > > > b. I'm working my way through this comment/report, but intuitively > it seems like it would be a good idea since as the other comments in the > ticket stated the RS reads locally just about every time. > > > > > > Doug Meil > > Chief Software Architect, Explorys > > doug.m...@explorys.com > > > > > > > -- Connect to me at http://www.facebook.com/dhruba