Re: Improving HBase scanner

Kevin Apte Wed, 05 May 2010 08:12:28 -0700

If you add secondary indexing- where does the index get stored?  Are there
separate set of files for every index? For example, if I index on Fields A,
B, C and D will there be a separate set of files for the 4 indices?


Kevin



On Wed, May 5, 2010 at 8:31 PM, Seraph Imalia <ser...@eisp.co.za> wrote:

> Yeah, that is exactly why we are using GUID for the row key :)
>
> Michelan is busy writing code to add secondary indexing - the table is
> about 200 Gigs big so it's gonna take a while to run, but it looks like the
> only option we have.
>
>
> On 05 May 2010, at 10:29 AM, TuX RaceR wrote:
>
>  Also be aware that using a time based key, you will probably create 'hot
>> spots', i.e. the nodes will get all the load one after the other at writing
>> time, and possibly at read time too, if you query only recent data.
>> But I do not see any way to avoid that, as you do need a scanner,
>> cheers
>> TuX
>>
>>
>> TuX RaceR wrote:
>>
>>> Seraph Imalia wrote:
>>>
>>>> Hi Ryan,
>>>>
>>>> Thanks for your response - I am also working on this project.
>>>>
>>>> I was hoping that hBase perhaps treated the time range differently which
>>>> would prevent a full table scan.  I suppose our only next option is to
>>>> implement indexing?
>>>>
>>>
>>> Yes I would say so except if a time-based key can naturally identify a
>>> record, or if you will always retrieve your records using time queries.
>>> In that case you could create a key which is a concat of a timestamp and
>>> your old SQL uid,
>>>
>>> cheers
>>> TuX
>>>
>>>
>>>
>>>
>>
>
>

Re: Improving HBase scanner

Reply via email to