> I agree with what Ryan is saying here, and I'd like to second (third? > fourth?) keep pushing for HDFS improvements. Anything else is coding > around the bigger I/O issue.
The Facebook code drop, not the 0.20-append branch with its clean history but rather the hairball without (shame), has a HDFS patched with the same approach as Ryan's HDFS-347 but in addition it also checksums the blocks and caches NameNode metadata. I might swap out Ryan's HDFS-347 patch locally with an extraction of these changes. I've also been considering back porting the (stale) HADOOP-4801/HADOOP-6311 approach. Jason, it looks like you've recently updated those issues? Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: Doug Meil <doug.m...@explorysmedical.com> > To: "dev@hbase.apache.org" <dev@hbase.apache.org> > Cc: > Sent: Saturday, July 9, 2011 6:04 PM > Subject: Re: Converting byte[] to ByteBuffer > > > re: "If a variant of hdfs-347 was committed," > > I agree with what Ryan is saying here, and I'd like to second (third? > fourth?) keep pushing for HDFS improvements. Anything else is coding > around the bigger I/O issue. > > > > On 7/9/11 6:13 PM, "Ryan Rawson" <ryano...@gmail.com> wrote: > >> I think my general point is we could hack up the hbase source, add >> refcounting, circumvent the gc, etc or we could demand more from the dfs. >> >> If a variant of hdfs-347 was committed, reads could come from the Linux >> buffer cache and life would be good. >> >> The choice isn't fast hbase vs slow hbase, there are elements of bugs >> there >> as well. >> On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mcsri...@gmail.com> > wrote: >>> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen < >> jason.rutherg...@gmail.com >>>> wrote: >>> >>>> There are couple of things here, one is direct byte buffers to put > the >>>> blocks outside of heap, the other is MMap'ing the blocks > directly from >>>> the underlying HDFS file. >>> >>> >>>> I think they both make sense. And I'm not sure MapR's > solution will >>>> be that much better if the latter is implemented in HBase. >>>> >>> >>> There're some major issues with mmap'ing the local hdfs file > (the >>> "block") >>> directly: >>> (a) no checksums to detect data corruption from bad disks >>> (b) when a disk does fail, the dfs could start reading from an > alternate >>> replica ... but that option is lost when mmap'ing and the RS will > crash >>> immediately >>> (c) security is completely lost, but that is minor given hbase's > current >>> status >>> >>> For those hbase deployments that don't care about the absence of > the (a) >> and >>> (b), especially (b), its definitely a viable option that gives good >>> perf. >>> >>> At MapR, we did consider similar direct-access capability and rejected >>> it >>> due to the above concerns. >>> >>> >>> >>>> >>>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson > <ryano...@gmail.com> wrote: >>>> > The overhead in a byte buffer is the extra integers to keep > track of >> the >>>> > mark, position, limit. >>>> > >>>> > I am not sure that putting the block cache in to heap is the > way to >>>> go. >>>> > Getting faster local dfs reads is important, and if you run > hbase on >> top >>>> of >>>> > Mapr, these things are taken care of for you. >>>> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" >>>> <jason.rutherg...@gmail.com> >>>> > wrote: >>>> >> Also, it's for a good cause, moving the blocks out of > main heap >>>> using >>>> >> direct byte buffers or some other more native-like > facility (if >>>> DBB's >>>> >> don't work). >>>> >> >>>> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson > <ryano...@gmail.com> >> wrote: >>>> >>> Where? Everywhere? An array is 24 bytes, bb is 56 > bytes. Also the >>>> API >>>> >>> is...annoying. >>>> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" > < >> jason.rutherg...@gmail.com >>>> > >>>> >>> wrote: >>>> >>>> Is there an open issue for this? How hard will > this be? :) >>>> >>> >>>> > >>>> >