I'm not 100% certain yet, but from what I am told, an extra table does get created to store the index. Will know more after writing the code.

On 05 May 2010, at 5:11 PM, Kevin Apte wrote:

If you add secondary indexing- where does the index get stored? Are there separate set of files for every index? For example, if I index on Fields A,
B, C and D will there be a separate set of files for the 4 indices?

Kevin



On Wed, May 5, 2010 at 8:31 PM, Seraph Imalia <ser...@eisp.co.za> wrote:

Yeah, that is exactly why we are using GUID for the row key :)

Michelan is busy writing code to add secondary indexing - the table is about 200 Gigs big so it's gonna take a while to run, but it looks like the
only option we have.


On 05 May 2010, at 10:29 AM, TuX RaceR wrote:

Also be aware that using a time based key, you will probably create 'hot
spots', i.e. the nodes will get all the load one after the other at writing
time, and possibly at read time too, if you query only recent data.
But I do not see any way to avoid that, as you do need a scanner,
cheers
TuX


TuX RaceR wrote:

Seraph Imalia wrote:

Hi Ryan,

Thanks for your response - I am also working on this project.

I was hoping that hBase perhaps treated the time range differently which would prevent a full table scan. I suppose our only next option is to
implement indexing?


Yes I would say so except if a time-based key can naturally identify a record, or if you will always retrieve your records using time queries. In that case you could create a key which is a concat of a timestamp and
your old SQL uid,

cheers
TuX









Reply via email to