Re: Modeling column families

Ryan Rawson Sat, 24 Apr 2010 14:38:00 -0700

While that sounds right, the issue is the overhead of multiple rpc calls. If
your data was spread out so would be your rpc pattern.

The advantage of hbase is you can have multiple concurrent scans on
different servers and they wont share resources. Thus you can scale.

Underlying hbase is hdfs which allows us to use more disk spindles on both
local and remote machines. This also allows a single machine to scale well
especially when you use 4 or more disks.

On Apr 24, 2010 2:28 PM, "alex kamil" <alex.ka...@gmail.com> wrote:

Ryan,

wouldn't be storing time series data in chronological order sub-optimal for
sequential scans and range queries
lets say there is a large chunk of data (e.g 10M rows) representing 1hr of
recordings stored in multiple regions on a single node/regionserver
then if we run a range query for that time period we will not utilize the
entire cluster and will be largely IO bound and limited by a single node
read throughput.
i'm thinking of randomizing the input sequence order during insertion to
improve access time

thanks
Alex

On Sat, Apr 24, 2010 at 4:45 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>
> Hey,
>
> So in my cas...

Re: Modeling column families

Reply via email to