Re: Improving HBase scanner

Ryan Rawson Wed, 05 May 2010 01:30:24 -0700

Because of the internals it isn't that easy.  The additional versions
of a specific row/column are stored next to each other, and everything
is overall organized by row id.  Thus to get every cell that matches a
specific timestamp we'd have to inspect every row.

Ultimately there is no free lunch in any database system - if you want
fast access, data must be co-located (this is true in RDBMS as well),
either via the primary key or via an index.  Indexes let you create an
different co-location of data (or at least of data location pointers),
and in the end that might be what you want.

The only downside is by default, HBase doesn't do index creation or
maintenance for you. Some people are enjoying using the
Indexed/Transactional contrib, and others like the Indexed HBase
contrib.  The key difference is the former uses a transactional
extension to keep secondary index tables up to date, and the latter
uses in-memory indexes that are built at run time to speed up access.

On Wed, May 5, 2010 at 1:24 AM, TuX RaceR <tuxrace...@gmail.com> wrote:
> Seraph Imalia wrote:
>>
>> Hi Ryan,
>>
>> Thanks for your response - I am also working on this project.
>>
>> I was hoping that hBase perhaps treated the time range differently which
>> would prevent a full table scan.  I suppose our only next option is to
>> implement indexing?
>
> Yes I would say so except if a time-based key can naturally identify a
> record, or if you will always retrieve your records using time queries.
> In that case you could create a key which is a concat of a timestamp and
> your old SQL uid,
>
> cheers
> TuX
>
>
>
>

Re: Improving HBase scanner

Reply via email to