Andrew, I fully agree. I opened HDFS-2004 to this end however it was (oddly) shot down. I think HBase usage of HDFS is divergent from the traditional MapReduce usage. MapR addresses these issues, as do some of the Facebook related work.
I think HBase should work at a lower level than the traditional HDFS APIs, thus the only patches required for HDFS are ones that make it more malleable for the requirements of HBase. > Ryan's HDFS-347 but in addition it also checksums the blocks and caches > NameNode metadata Sounds good, I'm interested in checking that out. On Sun, Jul 10, 2011 at 9:25 AM, Andrew Purtell <apurt...@apache.org> wrote: >> I agree with what Ryan is saying here, and I'd like to second (third? >> fourth?) keep pushing for HDFS improvements. Anything else is coding >> around the bigger I/O issue. > > > The Facebook code drop, not the 0.20-append branch with its clean history but > rather the hairball without (shame), has a HDFS patched with the same > approach as Ryan's HDFS-347 but in addition it also checksums the blocks and > caches NameNode metadata. I might swap out Ryan's HDFS-347 patch locally with > an extraction of these changes. > > I've also been considering back porting the (stale) HADOOP-4801/HADOOP-6311 > approach. Jason, it looks like you've recently updated those issues? > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via > Tom White) > > > ----- Original Message ----- >> From: Doug Meil <doug.m...@explorysmedical.com> >> To: "dev@hbase.apache.org" <dev@hbase.apache.org> >> Cc: >> Sent: Saturday, July 9, 2011 6:04 PM >> Subject: Re: Converting byte[] to ByteBuffer >> >> >> re: "If a variant of hdfs-347 was committed," >> >> I agree with what Ryan is saying here, and I'd like to second (third? >> fourth?) keep pushing for HDFS improvements. Anything else is coding >> around the bigger I/O issue. >> >> >> >> On 7/9/11 6:13 PM, "Ryan Rawson" <ryano...@gmail.com> wrote: >> >>> I think my general point is we could hack up the hbase source, add >>> refcounting, circumvent the gc, etc or we could demand more from the dfs. >>> >>> If a variant of hdfs-347 was committed, reads could come from the Linux >>> buffer cache and life would be good. >>> >>> The choice isn't fast hbase vs slow hbase, there are elements of bugs >>> there >>> as well. >>> On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mcsri...@gmail.com> >> wrote: >>>> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen < >>> jason.rutherg...@gmail.com >>>>> wrote: >>>> >>>>> There are couple of things here, one is direct byte buffers to put >> the >>>>> blocks outside of heap, the other is MMap'ing the blocks >> directly from >>>>> the underlying HDFS file. >>>> >>>> >>>>> I think they both make sense. And I'm not sure MapR's >> solution will >>>>> be that much better if the latter is implemented in HBase. >>>>> >>>> >>>> There're some major issues with mmap'ing the local hdfs file >> (the >>>> "block") >>>> directly: >>>> (a) no checksums to detect data corruption from bad disks >>>> (b) when a disk does fail, the dfs could start reading from an >> alternate >>>> replica ... but that option is lost when mmap'ing and the RS will >> crash >>>> immediately >>>> (c) security is completely lost, but that is minor given hbase's >> current >>>> status >>>> >>>> For those hbase deployments that don't care about the absence of >> the (a) >>> and >>>> (b), especially (b), its definitely a viable option that gives good >>>> perf. >>>> >>>> At MapR, we did consider similar direct-access capability and rejected >>>> it >>>> due to the above concerns. >>>> >>>> >>>> >>>>> >>>>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson >> <ryano...@gmail.com> wrote: >>>>> > The overhead in a byte buffer is the extra integers to keep >> track of >>> the >>>>> > mark, position, limit. >>>>> > >>>>> > I am not sure that putting the block cache in to heap is the >> way to >>>>> go. >>>>> > Getting faster local dfs reads is important, and if you run >> hbase on >>> top >>>>> of >>>>> > Mapr, these things are taken care of for you. >>>>> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" >>>>> <jason.rutherg...@gmail.com> >>>>> > wrote: >>>>> >> Also, it's for a good cause, moving the blocks out of >> main heap >>>>> using >>>>> >> direct byte buffers or some other more native-like >> facility (if >>>>> DBB's >>>>> >> don't work). >>>>> >> >>>>> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson >> <ryano...@gmail.com> >>> wrote: >>>>> >>> Where? Everywhere? An array is 24 bytes, bb is 56 >> bytes. Also the >>>>> API >>>>> >>> is...annoying. >>>>> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" >> < >>> jason.rutherg...@gmail.com >>>>> > >>>>> >>> wrote: >>>>> >>>> Is there an open issue for this? How hard will >> this be? :) >>>>> >>> >>>>> > >>>>> >> >