> I agree with what Ryan is saying here, and I'd like to second (third?
> fourth?) keep pushing for HDFS improvements.  Anything else is coding
> around the bigger I/O issue.


The Facebook code drop, not the 0.20-append branch with its clean history but 
rather the hairball without (shame), has a HDFS patched with the same approach 
as Ryan's HDFS-347 but in addition it also checksums the blocks and caches 
NameNode metadata. I might swap out Ryan's HDFS-347 patch locally with an 
extraction of these changes.

I've also been considering back porting the (stale) HADOOP-4801/HADOOP-6311 
approach. Jason, it looks like you've recently updated those issues?
 
Best regards,


   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


----- Original Message -----
> From: Doug Meil <doug.m...@explorysmedical.com>
> To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> Cc: 
> Sent: Saturday, July 9, 2011 6:04 PM
> Subject: Re: Converting byte[] to ByteBuffer
> 
> 
> re:  "If a variant of hdfs-347 was committed,"
> 
> I agree with what Ryan is saying here, and I'd like to second (third?
> fourth?) keep pushing for HDFS improvements.  Anything else is coding
> around the bigger I/O issue.
> 
> 
> 
> On 7/9/11 6:13 PM, "Ryan Rawson" <ryano...@gmail.com> wrote:
> 
>> I think my general point is we could hack up the hbase source, add
>> refcounting, circumvent the gc, etc or we could demand more from the dfs.
>> 
>> If a variant of hdfs-347 was committed, reads could come from the Linux
>> buffer cache and life would be good.
>> 
>> The choice isn't fast hbase vs slow hbase, there are elements of bugs
>> there
>> as well.
>> On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mcsri...@gmail.com> 
> wrote:
>>>  On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <
>> jason.rutherg...@gmail.com
>>>>  wrote:
>>> 
>>>>  There are couple of things here, one is direct byte buffers to put 
> the
>>>>  blocks outside of heap, the other is MMap'ing the blocks 
> directly from
>>>>  the underlying HDFS file.
>>> 
>>> 
>>>>  I think they both make sense. And I'm not sure MapR's 
> solution will
>>>>  be that much better if the latter is implemented in HBase.
>>>> 
>>> 
>>>  There're some major issues with mmap'ing the local hdfs file 
> (the
>>> "block")
>>>  directly:
>>>  (a) no checksums to detect data corruption from bad disks
>>>  (b) when a disk does fail, the dfs could start reading from an 
> alternate
>>>  replica ... but that option is lost when mmap'ing and the RS will 
> crash
>>>  immediately
>>>  (c) security is completely lost, but that is minor given hbase's 
> current
>>>  status
>>> 
>>>  For those hbase deployments that don't care about the absence of 
> the (a)
>> and
>>>  (b), especially (b), its definitely a viable option that gives good
>>> perf.
>>> 
>>>  At MapR, we did consider similar direct-access capability and rejected
>>> it
>>>  due to the above concerns.
>>> 
>>> 
>>> 
>>>> 
>>>>  On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson 
> <ryano...@gmail.com> wrote:
>>>>  > The overhead in a byte buffer is the extra integers to keep 
> track of
>> the
>>>>  > mark, position, limit.
>>>>  >
>>>>  > I am not sure that putting the block cache in to heap is the 
> way to
>>>> go.
>>>>  > Getting faster local dfs reads is important, and if you run 
> hbase on
>> top
>>>>  of
>>>>  > Mapr, these things are taken care of for you.
>>>>  > On Jul 8, 2011 6:20 PM, "Jason Rutherglen"
>>>> <jason.rutherg...@gmail.com>
>>>>  > wrote:
>>>>  >> Also, it's for a good cause, moving the blocks out of 
> main heap
>>>> using
>>>>  >> direct byte buffers or some other more native-like 
> facility (if
>>>> DBB's
>>>>  >> don't work).
>>>>  >>
>>>>  >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson 
> <ryano...@gmail.com>
>> wrote:
>>>>  >>> Where? Everywhere? An array is 24 bytes, bb is 56 
> bytes. Also the
>>>> API
>>>>  >>> is...annoying.
>>>>  >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" 
> <
>> jason.rutherg...@gmail.com
>>>>  >
>>>>  >>> wrote:
>>>>  >>>> Is there an open issue for this? How hard will 
> this be? :)
>>>>  >>>
>>>>  >
>>>> 
>

Reply via email to