RE: Support for MultiGet / SQL In clause -- error in patch HBASE-1845

Sriram Muthuswamy Chittathoor Sun, 24 Jan 2010 00:05:17 -0800

However, I'd only recommend using secondary index as a last resort.
First I'd try doing everything I can to work with the index I get for
free. The row key.  It sounds like you have done this already...
--


The only reason why this is important to me is because of the following 

1.  I am storing at a minimal 1 yrs worth of data (small rows --  10
billion)

2.  Row key is   user + date   (columns  --   gameid ,  opponent etc)

3.  Queries may be something like give me details for a particular
"gameid"

4.  To do step 3  I am assuming I need something like a secondary index
or else given my row key  how else can I do it 



-----Original Message-----
From: Daniel Washusen [mailto:[email protected]] 
Sent: Sunday, January 24, 2010 3:16 AM
To: [email protected]
Subject: Re: Support for MultiGet / SQL In clause -- error in patch
HBASE-1845

Well, it CAN be a RAM hog ;-). It depends what you're indexing. Each
unique value in the indexed column resides in memory. If you index a
column that contains 1 million random 1KB values then the index will
require at least 1GB of memory. Also it *can* slow down writes,
especially when bulk loading sequential keys.

On the up side, it can make scans dramatically faster.

However, I'd only recommend using secondary index as a last resort.
First I'd try doing everything I can to work with the index I get for
free. The row key.  It sounds like you have done this already...

Cheers,
Dan

On 24/01/2010, at 7:02 AM, Stack <[email protected]> wrote:

> On Sat, Jan 23, 2010 at 2:52 AM, Sriram Muthuswamy Chittathoor
> <[email protected]> wrote:
>> Thanks all.  I messed it up when I was trying to upgrade to
>> 0.20.3.  I deleted the data directory and formatted it thinking it
>> will reset the whole cluster.
>>
>> I started fresh by deleting the data directory on all the nodes and
>> then everything worked.  I was also able to create the indexed
>> table using the 0.20.3 patch.  Let me run some tests on a few
>> million rows and see how it holds up.
>>
>> BTW --  what would be the right way when I moved versions.  Do I
>> run migrate scripts to migrate the data to newer versions ?
>>
> Just install the new binaries every and restart or perform a rolling
> restart -- see http://wiki.apache.org/hadoop/Hbase/RollingRestart --
> if you would avoid taking down your cluster during the upgrade.
>
> You'll be flagged on start if you need to run a migration but general
> rule is that there (should) never be need of a migration between patch
> releases: e.g. between 0.20.2 to 0.20.3.  There may be need of
> migrations moving between minor numbers; e.g. from 0.19 to 0.20.
>
> Let us know how IHBase works out for you (indexed hbase).  Its a RAM
> hog but the speed improvement finding matching cells can be startling.
>
> St.Ack
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of
>> Stack
>> Sent: Saturday, January 23, 2010 5:00 AM
>> To: [email protected]
>> Subject: Re: Support for MultiGet / SQL In clause -- error in patch
>> HBASE-1845
>>
>> Check your master log.  Something is seriously off if you do not have
>> a reachable .META. table.
>> St.Ack
>>
>> On Fri, Jan 22, 2010 at 1:09 PM, Sriram Muthuswamy Chittathoor
>> <[email protected]> wrote:
>>> I applied the hbase-0.20.3 version / hadoop 0.20.1.  But after
>>> starting
>>> hbase I keep getting the error below when I go to the hbase shell
>>>
>>> [ppo...@karisimbivir1 hbase-0.20.3]$ ./bin/hbase shell
>>> HBase Shell; enter 'help<RETURN>' for list of supported commands.
>>> Version: 0.20.3, r900041, Sat Jan 16 17:20:21 PST 2010
>>> hbase(main):001:0> list
>>> NativeException:
>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>> contact region server null for region , row '', but failed after 7
>>> attempts.
>>> Exceptions:
>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
>>>
>>>
>>>
>>> Also when I try to create a table programatically I get this --
>>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Attempting connection
>>> to
>>> server localhost/127.0.0.1:2181
>>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Priming connection to
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:43775
>>> remote=localhost/127.0.0.1:2181]
>>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Server connection
>>> successful
>>> Exception in thread "main"
>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager
>>> $TableServers.locateReg
>>> ionInMeta(HConnectionManager.java:684)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager
>>> $TableServers.locateReg
>>> ion(HConnectionManager.java:634)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager
>>> $TableServers.locateReg
>>> ion(HConnectionManager.java:601)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager
>>> $TableServers.locateReg
>>> ionInMeta(HConnectionManager.java:675)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager
>>> $TableServers.locateReg
>>> ion(HConnectionManager.java:638)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager
>>> $TableServers.locateReg
>>> ion(HConnectionManager.java:601)
>>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:
>>> 128)
>>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:
>>> 106)
>>>        at test.CreateTable.main(CreateTable.java:36)
>>>
>>>
>>>
>>> Any clues ?
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Dan Washusen [mailto:[email protected]]
>>> Sent: Friday, January 22, 2010 4:53 AM
>>> To: [email protected]
>>> Subject: Re: Support for MultiGet / SQL In clause -- error in patch
>>> HBASE-1845
>>>
>>> If you want to give the "indexed" contrib package a try you'll
>>> need to
>>> do
>>> the following:
>>>
>>>   1. Include the contrib jars (export HBASE_CLASSPATH=(`find
>>>   /path/to/hbase/hbase-0.20.3/contrib/indexed -name '*jar' | tr -s
>>> "\n"
>>> ":"`)
>>>   2. Set the 'hbase.hregion.impl' property to
>>>   'org.apache.hadoop.hbase.regionserver.IdxRegion' in your
>>> hbase-site.xml
>>>
>>> Once you've done that you can create a table with an index using:
>>>
>>>>     // define which qualifiers need an index (choosing the correct
>>> type)
>>>>     IdxColumnDescriptor columnDescriptor = new
>>>> IdxColumnDescriptor("columnFamily");
>>>>     columnDescriptor.addIndexDescriptor(
>>>>       new IdxIndexDescriptor("qualifier",
>>>> IdxQualifierType.BYTE_ARRAY)
>>>>     );
>>>>
>>>>     HTableDescriptor tableDescriptor = new HTableDescriptor
>>>> ("table");
>>>>     tableDescriptor.addFamily(columnDescriptor);
>>>>
>>>
>>> Then when you want to perform a scan with an index hint:
>>>
>>>>     Scan scan = new IdxScan(
>>>>           new Comparison("columnFamily", "qualifier",
>>>>               Comparison.Operator.EQ, Bytes.toBytes("foo"))
>>>>       );
>>>>
>>>
>>> You have to keep in mind that the index hint is only a hint.  It
>>> guarantees
>>> that your scan will get all rows that match the hint but you'll more
>>> than
>>> likely receive rows that don't.  For this reason I'd suggest that
>>> you
>>> also
>>> include a filter along with the scan:
>>>
>>>>       Scan scan = new IdxScan(
>>>>           new Comparison("columnFamily", "qualifier",
>>>>               Comparison.Operator.EQ, Bytes.toBytes("foo"))
>>>>       );
>>>>       scan.setFilter(
>>>>           new SingleColumnValueFilter(
>>>>               "columnFamily", "qualifer",
>>> CompareFilter.CompareOp.EQUAL,
>>>>               new BinaryComparator("foo")
>>>>           )
>>>>       );
>>>>
>>>
>>> Cheers,
>>> Dan
>>>
>>>
>>> 2010/1/22 stack <[email protected]>
>>>
>>>>
>>>
http://people.apache.org/~jdcryans/hbase-0.20.3-candidate-2/<http://peop
>>> le.apache.org/%7Ejdcryans/hbase-0.20.3-candidate-2/>
>>>>
>>>> There is a bit of documentation if you look at javadoc for the
>>>> 'indexed' contrib (This is what hbase-2073 is called on commit).
>>>>
>>>> St.Ack
>>>>
>>>> P.S. We had a thread going named "HBase bulk load".  You got all
>>>> the
>>>> answers you need on that one?
>>>>
>>>> On Thu, Jan 21, 2010 at 11:19 AM, Sriram Muthuswamy Chittathoor
>>>> <[email protected]> wrote:
>>>>>
>>>>> Great.  Can I migrate to 0.20.3RC2 easily.  I am on 0.20.2. Can u
>>> pass
>>>>> me the link
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected] [mailto:[email protected]] On Behalf
>>>>> Of
>>>>> stack
>>>>> Sent: Friday, January 22, 2010 12:42 AM
>>>>> To: [email protected]
>>>>> Subject: Re: Support for MultiGet / SQL In clause -- error in
>>>>> patch
>>>>> HBASE-1845
>>>>>
>>>>> IIRC, hbase-1845 was a sketch only and not yet complete.  Its
>>> probably
>>>>> rotted since any ways.
>>>>>
>>>>> Have you looked at hbase-2037 since committed and available in
>>>>> 0.20.3RC2.
>>>>>  Would this help you with your original problem?
>>>>>
>>>>> St.Ack
>>>>>
>>>>> On Thu, Jan 21, 2010 at 9:10 AM, Sriram Muthuswamy Chittathoor <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I tried applying the patch to the hbase source code  hbase 0.20.2
>>> and
>>>>> I
>>>>>> get the errors below.  Do you know if this needs to be applied to
>>> a
>>>>>> specific hbase version. Is there a version which works with
>>>>>> 0.20.2
>>> or
>>>>>> later ??
>>>>>> Basically HRegionServer  and HTable patching fails.
>>>>>>
>>>>>>
>>>>>> Thanks for the help
>>>>>>
>>>>>> patch -p0 -i batch.patch
>>>>>>
>>>>>> patching file src/java/org/apache/hadoop/hbase/client/Get.java
>>>>>> Hunk #1 succeeded at 61 (offset 2 lines).
>>>>>> Hunk #2 succeeded at 347 (offset 31 lines).
>>>>>> patching file
>>> src/java/org/apache/hadoop/hbase/client/HConnection.java
>>>>>> patching file
>>>>>> src/java/org/apache/hadoop/hbase/client/HConnectionManager.java
>>>>>> Hunk #3 succeeded at 1244 (offset 6 lines).
>>>>>> patching file src/java/org/apache/hadoop/hbase/client/HTable.java
>>>>>> Hunk #2 succeeded at 73 (offset 8 lines).
>>>>>> Hunk #4 FAILED at 405.
>>>>>> Hunk #5 succeeded at 671 with fuzz 2 (offset 26 lines).
>>>>>> 1 out of 5 hunks FAILED -- saving rejects to file
>>>>>> src/java/org/apache/hadoop/hbase/client/HTable.java.rej
>>>>>> patching file src/java/org/apache/hadoop/hbase/client/Multi.java
>>>>>> patching file
>>>>> src/java/org/apache/hadoop/hbase/client/MultiCallable.java
>>>>>> patching file
>>> src/java/org/apache/hadoop/hbase/client/MultiResult.java
>>>>>> patching file src/java/org/apache/hadoop/hbase/client/Row.java
>>>>>> patching file
>>>>>> src/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
>>>>>> Hunk #2 succeeded at 156 with fuzz 1 (offset 3 lines).
>>>>>> patching file
>>>>> src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
>>>>>> Hunk #2 succeeded at 247 (offset 2 lines).
>>>>>> patching file
>>>>>> src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
>>>>>> Hunk #1 succeeded at 78 (offset -1 lines).
>>>>>> Hunk #2 FAILED at 2515.
>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file
>>>>>>
>>> src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java.rej
>>>>>> patching file
>>> src/test/org/apache/hadoop/hbase/client/TestHTable.java
>>>>>> Hunk #2 FAILED at 333.
>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file
>>>>>> src/test/org/apache/hadoop/hbase/client/TestHTable.java.rej
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Marc Limotte [mailto:[email protected]]
>>>>>> Sent: Tuesday, January 19, 2010 10:26 PM
>>>>>> To: [email protected]
>>>>>> Subject: Re: Support for MultiGet / SQL In clause
>>>>>>
>>>>>> Sriram,
>>>>>>
>>>>>> Would a secondary index help you:
>>>>>>
>>>>>
>>>
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/
>>>>>> client/tableindexed/package-summary.html#package_description
>>>>>> .
>>>>>>
>>>>>> The index is stored in a separate table, but the index is managed
>>> for
>>>>>> you.
>>>>>>
>>>>>> I don't think you can do an arbitrary "in" query, though.  If the
>>> keys
>>>>>> that
>>>>>> you want to include in the "in" are reasonably close neighbors,
>>> you
>>>>>> could do
>>>>>> a scan and skip ones that are uninteresting.  You could also
>>>>>> try a
>>>>> batch
>>>>>> Get
>>>>>> by applying a separate patch, see
>>>>>> http://issues.apache.org/jira/browse/HBASE-1845.
>>>>>>
>>>>>> Marc Limotte
>>>>>>
>>>>>> On Tue, Jan 19, 2010 at 8:45 AM, Sriram Muthuswamy Chittathoor <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Is there any support for this.  I want to do this
>>>>>>>
>>>>>>> 1.  Create a second table to maintain mapping between secondary
>>>>> column
>>>>>>> and the rowid's of the primary table
>>>>>>>
>>>>>>> 2.  Use this second table to get the rowid's to lookup from the
>>>>>> primary
>>>>>>> table using a SQL In like clause ---
>>>>>>>
>>>>>>> Basically I am doing this to speed up querying by  Non-row key
>>>>>> columns.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Sriram C
>>>>>>>
>>>>>>>
>>>>>>> This email is sent for and on behalf of Ivy Comptech Private
>>>>> Limited.
>>>>>> Ivy
>>>>>>> Comptech Private Limited is a limited liability company.
>>>>>>>
>>>>>>> This email and any attachments are confidential, and may be
>>> legally
>>>>>>> privileged and protected by copyright. If you are not the
>>> intended
>>>>>> recipient
>>>>>>> dissemination or copying of this email is prohibited. If you
>>> have
>>>>>> received
>>>>>>> this in error, please notify the sender by replying by email and
>>>>> then
>>>>>> delete
>>>>>>> the email completely from your system.
>>>>>>> Any views or opinions are solely those of the sender.  This
>>>>>> communication
>>>>>>> is not intended to form a binding contract on behalf of Ivy
>>> Comptech
>>>>>> Private
>>>>>>> Limited unless expressly indicated to the contrary and properly
>>>>>> authorised.
>>>>>>> Any actions taken on the basis of this email are at the
>>> recipient's
>>>>>> own
>>>>>>> risk.
>>>>>>>
>>>>>>> Registered office:
>>>>>>> Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara
>>>>> Hills,
>>>>>>> Hyderabad 500 033, Andhra Pradesh, India. Registered number:
>>> 37994.
>>>>>>> Registered in India. A list of members' names is available for
>>>>>> inspection at
>>>>>>> the registered office.
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>> This email is sent for and on behalf of Ivy Comptech Private
>> Limited. Ivy Comptech Private Limited is a limited liability company.
>>
>> This email and any attachments are confidential, and may be legally
>> privileged and protected by copyright. If you are not the intended
>> recipient dissemination or copying of this email is prohibited. If
>> you have received this in error, please notify the sender by
>> replying by email and then delete the email completely from your
>> system.
>> Any views or opinions are solely those of the sender.  This
>> communication is not intended to form a binding contract on behalf
>> of Ivy Comptech Private Limited unless expressly indicated to the
>> contrary and properly authorised. Any actions taken on the basis of
>> this email are at the recipient's own risk.
>>
>> Registered office:
>> Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara
>> Hills, Hyderabad 500 033, Andhra Pradesh, India. Registered number:
>> 37994. Registered in India. A list of members' names is available
>> for inspection at the registered office.
>>
>>

RE: Support for MultiGet / SQL In clause -- error in patch HBASE-1845

Reply via email to