Re: Indexed Table in Hbase

bharath vissapragada Mon, 17 Aug 2009 10:15:25 -0700

Generally one may expect that apart frm the rowkey other columns can have
repeated attributes and similar is the case with my application ..
In the API . there seems to be no such function doing that job


If any others know more abt it or faced the same situation kindly reply.

Thanks .


On Mon, Aug 17, 2009 at 10:30 PM, Jonathan Gray <jl...@streamy.com> wrote:

> I'm actually unsure about that.  Look at the code or experiment.
>
> Seems to me that there would be a uniqueness requirement, otherwise what do
> you expect the behavior to be?  A get can only return a single row, so
> multiple index hits doesn't really make sense.
>
> Clint?  You out there? :)
>
> JG
>
>
> bharath vissapragada wrote:
>
>> I got it ... I think this is definitely useful in my app because iam
>> performing a full table scan everytime for selecting the rowkeys based on
>> some column values .
>>
>> BUT ..
>>
>>  we can have more than one rowkey for the same column value .Can you
>> please
>> tell me how they are stored .
>>
>> Thanks in advance
>>
>> On Mon, Aug 17, 2009 at 9:27 PM, Jonathan Gray <jl...@streamy.com> wrote:
>>
>>  It's not an actual hash or btree index, but rather secondary indexes in
>>> HBase are implemented by creating an additional HBase table.
>>>
>>> If I have a table "users" (row key is userid) with family "data" and
>>> column
>>> "email", and I want to index the value in that column...
>>>
>>> I can create a table "users_email" where the row key is the email address
>>> (value from the column in "users" table) and a single column that
>>> contains
>>> the userid.
>>>
>>> Doing an "index lookup" would mean doing a get on "users_email" and then
>>> using that userid to do a lookup on the "users" table.
>>>
>>> IndexedTable does this transparently, but still does require two queries.
>>>  So it's slower than a single query, but certainly faster than a full
>>> table
>>> scan.
>>>
>>> If you need hash-level performance on the index lookup, there are lots of
>>> solutions outside of HBase that would work... In-memory Java HashMap,
>>> Tokyo
>>> Cabinet on-disk HashMaps, BerkeleyDB, etc... If you need full-text
>>> indexing,
>>> you can use Lucene or the like.
>>>
>>> Make sense?
>>>
>>> JG
>>>
>>>
>>> bharath vissapragada wrote:
>>>
>>>  But i have read somewhere that Secondary indexes are somewhat slow
>>>> compared
>>>> to normal Hbase tables ..Does that effect the performance ?
>>>>
>>>> Also do you know the type of index created on the column(i mean Hash
>>>> type
>>>> or
>>>> Btree etc)
>>>>
>>>> On Mon, Aug 17, 2009 at 8:30 PM, Kirill Shabunov <e2...@yahoo.com>
>>>> wrote:
>>>>
>>>>  Hi!
>>>>
>>>>> As far as I understand you are talking about the secondary indexes.
>>>>> Yes,
>>>>> they can be used to quickly get the rowkey by a value in the indexed
>>>>> column.
>>>>>
>>>>> --Kirill
>>>>>
>>>>>
>>>>> bharath vissapragada wrote:
>>>>>
>>>>>  Hi all ,
>>>>>
>>>>>> I have gone through the IndexedTableAdmin classes in Hbase 0.19.3 API
>>>>>> ..
>>>>>>  I
>>>>>> have seen some methods used to create an Indexed Table (on some
>>>>>> column)..
>>>>>> I
>>>>>> have some doubts regarding the same ...
>>>>>>
>>>>>> 1) Are these somewhat similar to Hash indexes(in RDBMS) where i can
>>>>>> easily
>>>>>> lookup a column value and find it's corresponding rowkey(s)
>>>>>> 2) Can i find any performance gain when i use IndexedTable to search
>>>>>> for
>>>>>> a
>>>>>> paritcular column value .. instead of scanning an entire normal HTable
>>>>>> ..
>>>>>>
>>>>>> Kindly clarify my doubts
>>>>>>
>>>>>> Thanks in advance
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>

Re: Indexed Table in Hbase

Reply via email to