One thought I had was if we have the writable code, surely just putting a different transport around it wouldn't be THAT bad right :-)
Of course writables are really tied to that DataInputStream or whatever, so we'd have to work on that. Benoit said something about writables needing to do blocking reads and that causing issues, but there was a netty3 thing specifically designed to handle that by throwing and retrying the op later when there was more data. On Sat, Aug 28, 2010 at 1:32 PM, Todd Lipcon <[email protected]> wrote: > On Sat, Aug 28, 2010 at 1:29 PM, Ryan Rawson <[email protected]> wrote: > >> a production server should be CPU bound, with memory caching etc. Our >> prod systems do see a reasonable load, and jstack always shows some >> kind of wait generally... >> >> but we need more IO pushdown into HDFS. For example if we are loading >> regions, why not do N at the same time? That figure N is probably >> more dependent on how many disks/node you have than anything else >> really. >> >> For simple reads (eg: hfile) would it really be that hard to retrofit >> some kind of async netty based API on top of the existing DFSClient >> logic? >> > > Would probably be a duplication rather than a retrofit, but it's probably > doable -- the protocol is pretty simple for reads, and failure/retry is much > less complicated compared to writes (though still pretty complicated) > > >> >> -ryan >> >> On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon <[email protected]> wrote: >> > Depending on the workload, parallelism doesn't seem to matter much. On my >> > 8-core Nehalem test cluster with 12 disks each, I'm always network bound >> far >> > before I'm CPU bound for most benchmarks. ie jstacks show threads mostly >> > waiting for IO to happen, not blocked on locks. >> > >> > Is that not the case for your production boxes? >> > >> > On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <[email protected]> wrote: >> > >> >> bigtable was written for 1 core machines, with ~ 100 regions per box. >> >> Thanks to CMS we generally can't run on < 4 cores, and at this point >> >> 16 core machines (with HTT) is becoming pretty standard. >> >> >> >> The question is, how do we leverage the ever-increasing sizes of >> >> machines and differentiate ourselves from bigtable? What did google >> >> do (if anything) to adopt to the 16 core machines? We should be able >> >> to do quite a bit on a 20 or 40 node cluster. >> >> >> >> more thread parallelism? >> >> >> > >> > >> > >> > -- >> > Todd Lipcon >> > Software Engineer, Cloudera >> > >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
