Agreed, I think we'll get more bang for our buck by finishing up (reviving) patches like HDFS-941 or HDFS-347. Unfortunately performance doesn't seem to be the highest priority among our customers so it's tough to find much time to work on these things until we really get stability up to par.
-Todd On Sat, Aug 28, 2010 at 3:36 PM, Jay Booth <[email protected]> wrote: > I don't think async is a magic bullet for it's own sake, we've all > seen those papers that show good performance from blocking > implementations. Particularly, I don't think async is worth a whole > lot on the client side of service, which HBase is to HDFS. > > What about an HDFS call for localize(Path) which attempts to replicate > the blocks for a file to the local datanode (if any) in a background > thread? If RegionServers called that function for their files every > so often, you'd eliminate a lot of bandwidth constraints, although the > latency of establishing a local socket for every read is still there. > > On Sat, Aug 28, 2010 at 4:42 PM, Todd Lipcon <[email protected]> wrote: > > On Sat, Aug 28, 2010 at 1:38 PM, Ryan Rawson <[email protected]> wrote: > > > >> One thought I had was if we have the writable code, surely just > >> putting a different transport around it wouldn't be THAT bad right :-) > >> > >> Of course writables are really tied to that DataInputStream or > >> whatever, so we'd have to work on that. Benoit said something about > >> writables needing to do blocking reads and that causing issues, but > >> there was a netty3 thing specifically designed to handle that by > >> throwing and retrying the op later when there was more data. > >> > >> > > The data transfer protocol actually doesn't do anything with Writables - > > it's all hand coded bytes going over the transport. > > > > I have some code floating around somewhere for translating between > blocking > > IO and Netty - not sure where, though :) > > > > -Todd > > > > > >> On Sat, Aug 28, 2010 at 1:32 PM, Todd Lipcon <[email protected]> > wrote: > >> > On Sat, Aug 28, 2010 at 1:29 PM, Ryan Rawson <[email protected]> > wrote: > >> > > >> >> a production server should be CPU bound, with memory caching etc. > Our > >> >> prod systems do see a reasonable load, and jstack always shows some > >> >> kind of wait generally... > >> >> > >> >> but we need more IO pushdown into HDFS. For example if we are > loading > >> >> regions, why not do N at the same time? That figure N is probably > >> >> more dependent on how many disks/node you have than anything else > >> >> really. > >> >> > >> >> For simple reads (eg: hfile) would it really be that hard to retrofit > >> >> some kind of async netty based API on top of the existing DFSClient > >> >> logic? > >> >> > >> > > >> > Would probably be a duplication rather than a retrofit, but it's > probably > >> > doable -- the protocol is pretty simple for reads, and failure/retry > is > >> much > >> > less complicated compared to writes (though still pretty complicated) > >> > > >> > > >> >> > >> >> -ryan > >> >> > >> >> On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon <[email protected]> > wrote: > >> >> > Depending on the workload, parallelism doesn't seem to matter much. > On > >> my > >> >> > 8-core Nehalem test cluster with 12 disks each, I'm always network > >> bound > >> >> far > >> >> > before I'm CPU bound for most benchmarks. ie jstacks show threads > >> mostly > >> >> > waiting for IO to happen, not blocked on locks. > >> >> > > >> >> > Is that not the case for your production boxes? > >> >> > > >> >> > On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <[email protected]> > >> wrote: > >> >> > > >> >> >> bigtable was written for 1 core machines, with ~ 100 regions per > box. > >> >> >> Thanks to CMS we generally can't run on < 4 cores, and at this > point > >> >> >> 16 core machines (with HTT) is becoming pretty standard. > >> >> >> > >> >> >> The question is, how do we leverage the ever-increasing sizes of > >> >> >> machines and differentiate ourselves from bigtable? What did > google > >> >> >> do (if anything) to adopt to the 16 core machines? We should be > able > >> >> >> to do quite a bit on a 20 or 40 node cluster. > >> >> >> > >> >> >> more thread parallelism? > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Todd Lipcon > >> >> > Software Engineer, Cloudera > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > Todd Lipcon > >> > Software Engineer, Cloudera > >> > > >> > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > -- Todd Lipcon Software Engineer, Cloudera
