Re: hbase vs bigtable

Ryan Rawson Sat, 28 Aug 2010 13:29:59 -0700

a production server should be CPU bound, with memory caching etc.  Our
prod systems do see a reasonable load, and jstack always shows some
kind of wait generally...


but we need more IO pushdown into HDFS.  For example if we are loading
regions, why not do N at the same time?  That figure N is probably
more dependent on how many disks/node you have than anything else
really.

For simple reads (eg: hfile) would it really be that hard to retrofit
some kind of async netty based API on top of the existing DFSClient
logic?

-ryan

On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon <[email protected]> wrote:
> Depending on the workload, parallelism doesn't seem to matter much. On my
> 8-core Nehalem test cluster with 12 disks each, I'm always network bound far
> before I'm CPU bound for most benchmarks. ie jstacks show threads mostly
> waiting for IO to happen, not blocked on locks.
>
> Is that not the case for your production boxes?
>
> On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <[email protected]> wrote:
>
>> bigtable was written for 1 core machines, with ~ 100 regions per box.
>> Thanks to CMS we generally can't run on < 4 cores, and at this point
>> 16 core machines (with HTT) is becoming pretty standard.
>>
>> The question is, how do we leverage the ever-increasing sizes of
>> machines and differentiate ourselves from bigtable?  What did google
>> do (if anything) to adopt to the 16 core machines?  We should be able
>> to do quite a bit on a 20 or 40 node cluster.
>>
>> more thread parallelism?
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: hbase vs bigtable

Reply via email to