Re: Hypertable claiming upto >900% random-read throughput vs HBase

Todd Lipcon Wed, 15 Dec 2010 12:31:35 -0800

On Wed, Dec 15, 2010 at 12:27 PM, Vladimir Rodionov
<vrodio...@carrieriq.com> wrote:
> Why do not you use off heap memory for this purpose? If its block cache (all 
> blocks are of equal sizes)
> alloc/free algorithm is pretty much simple - you do not have to re-implement 
> malloc in Java.


The block cache unfortunately isn't all equal size - if you have a
single cell larger than the hfile block size, the block expands to fit
it.

That said we could use a fairly simple slab allocator.

The bigger difficulty is in reference counting/tracking - the hfile
blocks are zero-copied out all the way to the RPC implementation so
tracking references is not straightforward.

-Todd

>
> I think something like open source version of Terracotta BigMemory is a good 
> candidate for
> Apache project. I see at least  several large Hadoops : HBase, HDFS 
> DataNodes, TaskTrackers and NameNode who suffer a lot from GC timeouts.
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodio...@carrieriq.com
>
> ________________________________________
> From: Ryan Rawson [ryano...@gmail.com]
> Sent: Wednesday, December 15, 2010 11:52 AM
> To: dev@hbase.apache.org
> Subject: Re: Hypertable claiming upto >900% random-read throughput vs HBase
>
> The malloc thing was pointing out that we have to contend with Xmx and
> GC.  So it makes it harder for us to maximally use all the available
> ram for block cache in the regionserver.  Which you may or may not
> want to do for alternative reasons.  At least with Xmx you can plan
> and control your deployments, and you wont suffer from heap growth due
> to heap fragmentation.
>
> -ryan
>
> On Wed, Dec 15, 2010 at 11:49 AM, Todd Lipcon <t...@cloudera.com> wrote:
>> On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma
>> <gaurav.gs.sha...@gmail.com> wrote:
>>> Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have
>>> given them a further advantage but as you said, not much is known about the
>>> test source code.
>>
>> I think Hypertable does use tcmalloc or jemalloc (forget which)
>>
>> You may be interested in this thread from back in August:
>> http://search-hadoop.com/m/pG6SM1xSP7r/hypertable&subj=Re+Finding+on+HBase+Hypertable+comparison
>>
>> -Todd
>>
>>>
>>> On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>>>
>>>> So if that is the case, I'm not sure how that is a fair test.  One
>>>> system reads from RAM, the other from disk.  The results as expected.
>>>>
>>>> Why not test one system with SSDs and the other without?
>>>>
>>>> It's really hard to get apples/oranges comparison. Even if you are
>>>> doing the same workloads on 2 diverse systems, you are not testing the
>>>> code quality, you are testing overall systems and other issues.
>>>>
>>>> As G1 GC improves, I expect our ability to use larger and larger heaps
>>>> would blunt the advantage of a C++ program using malloc.
>>>>
>>>> -ryan
>>>>
>>>> On Wed, Dec 15, 2010 at 11:15 AM, Ted Dunning <tdunn...@maprtech.com>
>>>> wrote:
>>>> > From the small comments I have heard, the RAM versus disk difference is
>>>> > mostly what I have heard they were testing.
>>>> >
>>>> > On Wed, Dec 15, 2010 at 11:11 AM, Ryan Rawson <ryano...@gmail.com>
>>>> wrote:
>>>> >
>>>> >> We dont have the test source code, so it isnt very objective.  However
>>>> >> I believe there are 2 things which help them:
>>>> >> - They are able to harness larger amounts of RAM, so they are really
>>>> >> just testing that vs HBase
>>>> >>
>>>> >
>>>>
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hypertable claiming upto >900% random-read throughput vs HBase

Reply via email to