Andrew,

I fully agree.  I opened HDFS-2004 to this end however it was (oddly)
shot down.  I think HBase usage of HDFS is divergent from the
traditional MapReduce usage.  MapR addresses these issues, as do some
of the Facebook related work.

I think HBase should work at a lower level than the traditional HDFS
APIs, thus the only patches required for HDFS are ones that make it
more malleable for the requirements of HBase.

> Ryan's HDFS-347 but in addition it also checksums the blocks and caches 
> NameNode metadata

Sounds good, I'm interested in checking that out.

On Sun, Jul 10, 2011 at 9:25 AM, Andrew Purtell <apurt...@apache.org> wrote:
>> I agree with what Ryan is saying here, and I'd like to second (third?
>> fourth?) keep pushing for HDFS improvements.  Anything else is coding
>> around the bigger I/O issue.
>
>
> The Facebook code drop, not the 0.20-append branch with its clean history but 
> rather the hairball without (shame), has a HDFS patched with the same 
> approach as Ryan's HDFS-347 but in addition it also checksums the blocks and 
> caches NameNode metadata. I might swap out Ryan's HDFS-347 patch locally with 
> an extraction of these changes.
>
> I've also been considering back porting the (stale) HADOOP-4801/HADOOP-6311 
> approach. Jason, it looks like you've recently updated those issues?
>
> Best regards,
>
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
> Tom White)
>
>
> ----- Original Message -----
>> From: Doug Meil <doug.m...@explorysmedical.com>
>> To: "dev@hbase.apache.org" <dev@hbase.apache.org>
>> Cc:
>> Sent: Saturday, July 9, 2011 6:04 PM
>> Subject: Re: Converting byte[] to ByteBuffer
>>
>>
>> re:  "If a variant of hdfs-347 was committed,"
>>
>> I agree with what Ryan is saying here, and I'd like to second (third?
>> fourth?) keep pushing for HDFS improvements.  Anything else is coding
>> around the bigger I/O issue.
>>
>>
>>
>> On 7/9/11 6:13 PM, "Ryan Rawson" <ryano...@gmail.com> wrote:
>>
>>> I think my general point is we could hack up the hbase source, add
>>> refcounting, circumvent the gc, etc or we could demand more from the dfs.
>>>
>>> If a variant of hdfs-347 was committed, reads could come from the Linux
>>> buffer cache and life would be good.
>>>
>>> The choice isn't fast hbase vs slow hbase, there are elements of bugs
>>> there
>>> as well.
>>> On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mcsri...@gmail.com>
>> wrote:
>>>>  On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <
>>> jason.rutherg...@gmail.com
>>>>>  wrote:
>>>>
>>>>>  There are couple of things here, one is direct byte buffers to put
>> the
>>>>>  blocks outside of heap, the other is MMap'ing the blocks
>> directly from
>>>>>  the underlying HDFS file.
>>>>
>>>>
>>>>>  I think they both make sense. And I'm not sure MapR's
>> solution will
>>>>>  be that much better if the latter is implemented in HBase.
>>>>>
>>>>
>>>>  There're some major issues with mmap'ing the local hdfs file
>> (the
>>>> "block")
>>>>  directly:
>>>>  (a) no checksums to detect data corruption from bad disks
>>>>  (b) when a disk does fail, the dfs could start reading from an
>> alternate
>>>>  replica ... but that option is lost when mmap'ing and the RS will
>> crash
>>>>  immediately
>>>>  (c) security is completely lost, but that is minor given hbase's
>> current
>>>>  status
>>>>
>>>>  For those hbase deployments that don't care about the absence of
>> the (a)
>>> and
>>>>  (b), especially (b), its definitely a viable option that gives good
>>>> perf.
>>>>
>>>>  At MapR, we did consider similar direct-access capability and rejected
>>>> it
>>>>  due to the above concerns.
>>>>
>>>>
>>>>
>>>>>
>>>>>  On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson
>> <ryano...@gmail.com> wrote:
>>>>>  > The overhead in a byte buffer is the extra integers to keep
>> track of
>>> the
>>>>>  > mark, position, limit.
>>>>>  >
>>>>>  > I am not sure that putting the block cache in to heap is the
>> way to
>>>>> go.
>>>>>  > Getting faster local dfs reads is important, and if you run
>> hbase on
>>> top
>>>>>  of
>>>>>  > Mapr, these things are taken care of for you.
>>>>>  > On Jul 8, 2011 6:20 PM, "Jason Rutherglen"
>>>>> <jason.rutherg...@gmail.com>
>>>>>  > wrote:
>>>>>  >> Also, it's for a good cause, moving the blocks out of
>> main heap
>>>>> using
>>>>>  >> direct byte buffers or some other more native-like
>> facility (if
>>>>> DBB's
>>>>>  >> don't work).
>>>>>  >>
>>>>>  >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson
>> <ryano...@gmail.com>
>>> wrote:
>>>>>  >>> Where? Everywhere? An array is 24 bytes, bb is 56
>> bytes. Also the
>>>>> API
>>>>>  >>> is...annoying.
>>>>>  >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen"
>> <
>>> jason.rutherg...@gmail.com
>>>>>  >
>>>>>  >>> wrote:
>>>>>  >>>> Is there an open issue for this? How hard will
>> this be? :)
>>>>>  >>>
>>>>>  >
>>>>>
>>
>

Reply via email to