However, I'd only recommend using secondary index as a last resort. First I'd try doing everything I can to work with the index I get for free. The row key. It sounds like you have done this already... --
The only reason why this is important to me is because of the following 1. I am storing at a minimal 1 yrs worth of data (small rows -- 10 billion) 2. Row key is user + date (columns -- gameid , opponent etc) 3. Queries may be something like give me details for a particular "gameid" 4. To do step 3 I am assuming I need something like a secondary index or else given my row key how else can I do it -----Original Message----- From: Daniel Washusen [mailto:[email protected]] Sent: Sunday, January 24, 2010 3:16 AM To: [email protected] Subject: Re: Support for MultiGet / SQL In clause -- error in patch HBASE-1845 Well, it CAN be a RAM hog ;-). It depends what you're indexing. Each unique value in the indexed column resides in memory. If you index a column that contains 1 million random 1KB values then the index will require at least 1GB of memory. Also it *can* slow down writes, especially when bulk loading sequential keys. On the up side, it can make scans dramatically faster. However, I'd only recommend using secondary index as a last resort. First I'd try doing everything I can to work with the index I get for free. The row key. It sounds like you have done this already... Cheers, Dan On 24/01/2010, at 7:02 AM, Stack <[email protected]> wrote: > On Sat, Jan 23, 2010 at 2:52 AM, Sriram Muthuswamy Chittathoor > <[email protected]> wrote: >> Thanks all. I messed it up when I was trying to upgrade to >> 0.20.3. I deleted the data directory and formatted it thinking it >> will reset the whole cluster. >> >> I started fresh by deleting the data directory on all the nodes and >> then everything worked. I was also able to create the indexed >> table using the 0.20.3 patch. Let me run some tests on a few >> million rows and see how it holds up. >> >> BTW -- what would be the right way when I moved versions. Do I >> run migrate scripts to migrate the data to newer versions ? >> > Just install the new binaries every and restart or perform a rolling > restart -- see http://wiki.apache.org/hadoop/Hbase/RollingRestart -- > if you would avoid taking down your cluster during the upgrade. > > You'll be flagged on start if you need to run a migration but general > rule is that there (should) never be need of a migration between patch > releases: e.g. between 0.20.2 to 0.20.3. There may be need of > migrations moving between minor numbers; e.g. from 0.19 to 0.20. > > Let us know how IHBase works out for you (indexed hbase). Its a RAM > hog but the speed improvement finding matching cells can be startling. > > St.Ack > >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] On Behalf Of >> Stack >> Sent: Saturday, January 23, 2010 5:00 AM >> To: [email protected] >> Subject: Re: Support for MultiGet / SQL In clause -- error in patch >> HBASE-1845 >> >> Check your master log. Something is seriously off if you do not have >> a reachable .META. table. >> St.Ack >> >> On Fri, Jan 22, 2010 at 1:09 PM, Sriram Muthuswamy Chittathoor >> <[email protected]> wrote: >>> I applied the hbase-0.20.3 version / hadoop 0.20.1. But after >>> starting >>> hbase I keep getting the error below when I go to the hbase shell >>> >>> [ppo...@karisimbivir1 hbase-0.20.3]$ ./bin/hbase shell >>> HBase Shell; enter 'help<RETURN>' for list of supported commands. >>> Version: 0.20.3, r900041, Sat Jan 16 17:20:21 PST 2010 >>> hbase(main):001:0> list >>> NativeException: >>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to >>> contact region server null for region , row '', but failed after 7 >>> attempts. >>> Exceptions: >>> org.apache.hadoop.hbase.TableNotFoundException: .META. >>> org.apache.hadoop.hbase.TableNotFoundException: .META. >>> org.apache.hadoop.hbase.TableNotFoundException: .META. >>> org.apache.hadoop.hbase.TableNotFoundException: .META. >>> org.apache.hadoop.hbase.TableNotFoundException: .META. >>> org.apache.hadoop.hbase.TableNotFoundException: .META. >>> org.apache.hadoop.hbase.TableNotFoundException: .META. >>> >>> >>> >>> Also when I try to create a table programatically I get this -- >>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Attempting connection >>> to >>> server localhost/127.0.0.1:2181 >>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Priming connection to >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:43775 >>> remote=localhost/127.0.0.1:2181] >>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Server connection >>> successful >>> Exception in thread "main" >>> org.apache.hadoop.hbase.TableNotFoundException: .META. >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager >>> $TableServers.locateReg >>> ionInMeta(HConnectionManager.java:684) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager >>> $TableServers.locateReg >>> ion(HConnectionManager.java:634) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager >>> $TableServers.locateReg >>> ion(HConnectionManager.java:601) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager >>> $TableServers.locateReg >>> ionInMeta(HConnectionManager.java:675) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager >>> $TableServers.locateReg >>> ion(HConnectionManager.java:638) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager >>> $TableServers.locateReg >>> ion(HConnectionManager.java:601) >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java: >>> 128) >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java: >>> 106) >>> at test.CreateTable.main(CreateTable.java:36) >>> >>> >>> >>> Any clues ? >>> >>> >>> >>> -----Original Message----- >>> From: Dan Washusen [mailto:[email protected]] >>> Sent: Friday, January 22, 2010 4:53 AM >>> To: [email protected] >>> Subject: Re: Support for MultiGet / SQL In clause -- error in patch >>> HBASE-1845 >>> >>> If you want to give the "indexed" contrib package a try you'll >>> need to >>> do >>> the following: >>> >>> 1. Include the contrib jars (export HBASE_CLASSPATH=(`find >>> /path/to/hbase/hbase-0.20.3/contrib/indexed -name '*jar' | tr -s >>> "\n" >>> ":"`) >>> 2. Set the 'hbase.hregion.impl' property to >>> 'org.apache.hadoop.hbase.regionserver.IdxRegion' in your >>> hbase-site.xml >>> >>> Once you've done that you can create a table with an index using: >>> >>>> // define which qualifiers need an index (choosing the correct >>> type) >>>> IdxColumnDescriptor columnDescriptor = new >>>> IdxColumnDescriptor("columnFamily"); >>>> columnDescriptor.addIndexDescriptor( >>>> new IdxIndexDescriptor("qualifier", >>>> IdxQualifierType.BYTE_ARRAY) >>>> ); >>>> >>>> HTableDescriptor tableDescriptor = new HTableDescriptor >>>> ("table"); >>>> tableDescriptor.addFamily(columnDescriptor); >>>> >>> >>> Then when you want to perform a scan with an index hint: >>> >>>> Scan scan = new IdxScan( >>>> new Comparison("columnFamily", "qualifier", >>>> Comparison.Operator.EQ, Bytes.toBytes("foo")) >>>> ); >>>> >>> >>> You have to keep in mind that the index hint is only a hint. It >>> guarantees >>> that your scan will get all rows that match the hint but you'll more >>> than >>> likely receive rows that don't. For this reason I'd suggest that >>> you >>> also >>> include a filter along with the scan: >>> >>>> Scan scan = new IdxScan( >>>> new Comparison("columnFamily", "qualifier", >>>> Comparison.Operator.EQ, Bytes.toBytes("foo")) >>>> ); >>>> scan.setFilter( >>>> new SingleColumnValueFilter( >>>> "columnFamily", "qualifer", >>> CompareFilter.CompareOp.EQUAL, >>>> new BinaryComparator("foo") >>>> ) >>>> ); >>>> >>> >>> Cheers, >>> Dan >>> >>> >>> 2010/1/22 stack <[email protected]> >>> >>>> >>> http://people.apache.org/~jdcryans/hbase-0.20.3-candidate-2/<http://peop >>> le.apache.org/%7Ejdcryans/hbase-0.20.3-candidate-2/> >>>> >>>> There is a bit of documentation if you look at javadoc for the >>>> 'indexed' contrib (This is what hbase-2073 is called on commit). >>>> >>>> St.Ack >>>> >>>> P.S. We had a thread going named "HBase bulk load". You got all >>>> the >>>> answers you need on that one? >>>> >>>> On Thu, Jan 21, 2010 at 11:19 AM, Sriram Muthuswamy Chittathoor >>>> <[email protected]> wrote: >>>>> >>>>> Great. Can I migrate to 0.20.3RC2 easily. I am on 0.20.2. Can u >>> pass >>>>> me the link >>>>> >>>>> -----Original Message----- >>>>> From: [email protected] [mailto:[email protected]] On Behalf >>>>> Of >>>>> stack >>>>> Sent: Friday, January 22, 2010 12:42 AM >>>>> To: [email protected] >>>>> Subject: Re: Support for MultiGet / SQL In clause -- error in >>>>> patch >>>>> HBASE-1845 >>>>> >>>>> IIRC, hbase-1845 was a sketch only and not yet complete. Its >>> probably >>>>> rotted since any ways. >>>>> >>>>> Have you looked at hbase-2037 since committed and available in >>>>> 0.20.3RC2. >>>>> Would this help you with your original problem? >>>>> >>>>> St.Ack >>>>> >>>>> On Thu, Jan 21, 2010 at 9:10 AM, Sriram Muthuswamy Chittathoor < >>>>> [email protected]> wrote: >>>>> >>>>>> I tried applying the patch to the hbase source code hbase 0.20.2 >>> and >>>>> I >>>>>> get the errors below. Do you know if this needs to be applied to >>> a >>>>>> specific hbase version. Is there a version which works with >>>>>> 0.20.2 >>> or >>>>>> later ?? >>>>>> Basically HRegionServer and HTable patching fails. >>>>>> >>>>>> >>>>>> Thanks for the help >>>>>> >>>>>> patch -p0 -i batch.patch >>>>>> >>>>>> patching file src/java/org/apache/hadoop/hbase/client/Get.java >>>>>> Hunk #1 succeeded at 61 (offset 2 lines). >>>>>> Hunk #2 succeeded at 347 (offset 31 lines). >>>>>> patching file >>> src/java/org/apache/hadoop/hbase/client/HConnection.java >>>>>> patching file >>>>>> src/java/org/apache/hadoop/hbase/client/HConnectionManager.java >>>>>> Hunk #3 succeeded at 1244 (offset 6 lines). >>>>>> patching file src/java/org/apache/hadoop/hbase/client/HTable.java >>>>>> Hunk #2 succeeded at 73 (offset 8 lines). >>>>>> Hunk #4 FAILED at 405. >>>>>> Hunk #5 succeeded at 671 with fuzz 2 (offset 26 lines). >>>>>> 1 out of 5 hunks FAILED -- saving rejects to file >>>>>> src/java/org/apache/hadoop/hbase/client/HTable.java.rej >>>>>> patching file src/java/org/apache/hadoop/hbase/client/Multi.java >>>>>> patching file >>>>> src/java/org/apache/hadoop/hbase/client/MultiCallable.java >>>>>> patching file >>> src/java/org/apache/hadoop/hbase/client/MultiResult.java >>>>>> patching file src/java/org/apache/hadoop/hbase/client/Row.java >>>>>> patching file >>>>>> src/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java >>>>>> Hunk #2 succeeded at 156 with fuzz 1 (offset 3 lines). >>>>>> patching file >>>>> src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java >>>>>> Hunk #2 succeeded at 247 (offset 2 lines). >>>>>> patching file >>>>>> src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java >>>>>> Hunk #1 succeeded at 78 (offset -1 lines). >>>>>> Hunk #2 FAILED at 2515. >>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>> >>> src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java.rej >>>>>> patching file >>> src/test/org/apache/hadoop/hbase/client/TestHTable.java >>>>>> Hunk #2 FAILED at 333. >>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>> src/test/org/apache/hadoop/hbase/client/TestHTable.java.rej >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Marc Limotte [mailto:[email protected]] >>>>>> Sent: Tuesday, January 19, 2010 10:26 PM >>>>>> To: [email protected] >>>>>> Subject: Re: Support for MultiGet / SQL In clause >>>>>> >>>>>> Sriram, >>>>>> >>>>>> Would a secondary index help you: >>>>>> >>>>> >>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/ >>>>>> client/tableindexed/package-summary.html#package_description >>>>>> . >>>>>> >>>>>> The index is stored in a separate table, but the index is managed >>> for >>>>>> you. >>>>>> >>>>>> I don't think you can do an arbitrary "in" query, though. If the >>> keys >>>>>> that >>>>>> you want to include in the "in" are reasonably close neighbors, >>> you >>>>>> could do >>>>>> a scan and skip ones that are uninteresting. You could also >>>>>> try a >>>>> batch >>>>>> Get >>>>>> by applying a separate patch, see >>>>>> http://issues.apache.org/jira/browse/HBASE-1845. >>>>>> >>>>>> Marc Limotte >>>>>> >>>>>> On Tue, Jan 19, 2010 at 8:45 AM, Sriram Muthuswamy Chittathoor < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Is there any support for this. I want to do this >>>>>>> >>>>>>> 1. Create a second table to maintain mapping between secondary >>>>> column >>>>>>> and the rowid's of the primary table >>>>>>> >>>>>>> 2. Use this second table to get the rowid's to lookup from the >>>>>> primary >>>>>>> table using a SQL In like clause --- >>>>>>> >>>>>>> Basically I am doing this to speed up querying by Non-row key >>>>>> columns. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Sriram C >>>>>>> >>>>>>> >>>>>>> This email is sent for and on behalf of Ivy Comptech Private >>>>> Limited. >>>>>> Ivy >>>>>>> Comptech Private Limited is a limited liability company. >>>>>>> >>>>>>> This email and any attachments are confidential, and may be >>> legally >>>>>>> privileged and protected by copyright. If you are not the >>> intended >>>>>> recipient >>>>>>> dissemination or copying of this email is prohibited. If you >>> have >>>>>> received >>>>>>> this in error, please notify the sender by replying by email and >>>>> then >>>>>> delete >>>>>>> the email completely from your system. >>>>>>> Any views or opinions are solely those of the sender. This >>>>>> communication >>>>>>> is not intended to form a binding contract on behalf of Ivy >>> Comptech >>>>>> Private >>>>>>> Limited unless expressly indicated to the contrary and properly >>>>>> authorised. >>>>>>> Any actions taken on the basis of this email are at the >>> recipient's >>>>>> own >>>>>>> risk. >>>>>>> >>>>>>> Registered office: >>>>>>> Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara >>>>> Hills, >>>>>>> Hyderabad 500 033, Andhra Pradesh, India. Registered number: >>> 37994. >>>>>>> Registered in India. A list of members' names is available for >>>>>> inspection at >>>>>>> the registered office. >>>>>>> >>>>>>> >>>>>> >>>> >>> >> >> This email is sent for and on behalf of Ivy Comptech Private >> Limited. Ivy Comptech Private Limited is a limited liability company. >> >> This email and any attachments are confidential, and may be legally >> privileged and protected by copyright. If you are not the intended >> recipient dissemination or copying of this email is prohibited. If >> you have received this in error, please notify the sender by >> replying by email and then delete the email completely from your >> system. >> Any views or opinions are solely those of the sender. This >> communication is not intended to form a binding contract on behalf >> of Ivy Comptech Private Limited unless expressly indicated to the >> contrary and properly authorised. Any actions taken on the basis of >> this email are at the recipient's own risk. >> >> Registered office: >> Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara >> Hills, Hyderabad 500 033, Andhra Pradesh, India. Registered number: >> 37994. Registered in India. A list of members' names is available >> for inspection at the registered office. >> >>
