Re: Modeling column families

alex kamil Sat, 24 Apr 2010 15:33:49 -0700

Ryan,

it makes sense, so probably the order randomization or in other words load
balancing has to be handled on a block/region level rather then on a single
row level to reduce the number of RPC calls.

so then the question is: does Hbase load balancing work as described in the
original BigTable paper "Each tablet is assigned to one tablet server at a
time" (i guess in some kind of round robin partitioning)
or it preserves the data locality by storing the multiple  regions data (a
collection of HFiles if I understand correctly) as a contiguous sequence of
data blocks on hdfs datanodes.
i'm looking at the documentation but i don't see it specifically addressed
there

Thanks
Alex

On Sat, Apr 24, 2010 at 5:37 PM, Ryan Rawson <ryano...@gmail.com> wrote:

> While that sounds right, the issue is the overhead of multiple rpc calls.
> If your data was spread out so would be your rpc pattern.
>
> The advantage of hbase is you can have multiple concurrent scans on
> different servers and they wont share resources. Thus you can scale.
>
> Underlying hbase is hdfs which allows us to use more disk spindles on both
> local and remote machines. This also allows a single machine to scale well
> especially when you use 4 or more disks.
>
> On Apr 24, 2010 2:28 PM, "alex kamil" <alex.ka...@gmail.com> wrote:
>
> Ryan,
>
> wouldn't be storing time series data in chronological order sub-optimal for
> sequential scans and range queries
> lets say there is a large chunk of data (e.g 10M rows) representing 1hr of
> recordings stored in multiple regions on a single node/regionserver
> then if we run a range query for that time period we will not utilize the
> entire cluster and will be largely IO bound and limited by a single node
> read throughput.
> i'm thinking of randomizing the input sequence order during insertion to
> improve access time
>
> thanks
> Alex
>
>
>
> On Sat, Apr 24, 2010 at 4:45 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> >
> > Hey,
> >
> > So in my cas...
>
>

Re: Modeling column families

Reply via email to