Re: Improving HBase scanner

Seraph Imalia Wed, 05 May 2010 09:24:21 -0700

I'm not 100% certain yet, but from what I am told, an extra table doesget created to store the index. Will know more after writing the code.


On 05 May 2010, at 5:11 PM, Kevin Apte wrote:

If you add secondary indexing- where does the index get stored? Arethereseparate set of files for every index? For example, if I index onFields A,
B, C and D will there be a separate set of files for the 4 indices?

Kevin
On Wed, May 5, 2010 at 8:31 PM, Seraph Imalia <ser...@eisp.co.za>wrote:
Yeah, that is exactly why we are using GUID for the row key :)
Michelan is busy writing code to add secondary indexing - the tableisabout 200 Gigs big so it's gonna take a while to run, but it lookslike the
only option we have.


On 05 May 2010, at 10:29 AM, TuX RaceR wrote:
Also be aware that using a time based key, you will probably create'hot
spots', i.e. the nodes will get all the load one after the otherat writing
time, and possibly at read time too, if you query only recent data.
But I do not see any way to avoid that, as you do need a scanner,
cheers
TuX


TuX RaceR wrote:
Seraph Imalia wrote:
Hi Ryan,

Thanks for your response - I am also working on this project.
I was hoping that hBase perhaps treated the time rangedifferently whichwould prevent a full table scan. I suppose our only next optionis to
implement indexing?
Yes I would say so except if a time-based key can naturallyidentify arecord, or if you will always retrieve your records using timequeries.In that case you could create a key which is a concat of atimestamp and
your old SQL uid,

cheers
TuX

Re: Improving HBase scanner

Reply via email to