While that sounds right, the issue is the overhead of multiple rpc calls. If your data was spread out so would be your rpc pattern.
The advantage of hbase is you can have multiple concurrent scans on different servers and they wont share resources. Thus you can scale. Underlying hbase is hdfs which allows us to use more disk spindles on both local and remote machines. This also allows a single machine to scale well especially when you use 4 or more disks. On Apr 24, 2010 2:28 PM, "alex kamil" <alex.ka...@gmail.com> wrote: Ryan, wouldn't be storing time series data in chronological order sub-optimal for sequential scans and range queries lets say there is a large chunk of data (e.g 10M rows) representing 1hr of recordings stored in multiple regions on a single node/regionserver then if we run a range query for that time period we will not utilize the entire cluster and will be largely IO bound and limited by a single node read throughput. i'm thinking of randomizing the input sequence order during insertion to improve access time thanks Alex On Sat, Apr 24, 2010 at 4:45 PM, Ryan Rawson <ryano...@gmail.com> wrote: > > Hey, > > So in my cas...