Yes, that's roughly what I was thinking... 2010/1/24 Sriram Muthuswamy Chittathoor <[email protected]>
> So on the reporting tables I will have to store by the keys I want to > lookup by for example > > 1. One reporting table by gameid > > 2. Another one by same some other column like tournamentid > > So basically create a reporting table based on how I want to query and > this reporting table will be queried by it rowKey (which is native) and > the column values will be what I want > > Etc. Is that right ? > > > > -----Original Message----- > From: Daniel Washusen [mailto:[email protected]] > Sent: Sunday, January 24, 2010 2:00 PM > To: [email protected] > Subject: Re: Support for MultiGet / SQL In clause -- error in patch > HBASE-1845 > > Sounds like it's some sort of reporting system. Have you considered > duplicating data into reporting tables? > > Write all the game details into the main table then map reduce into > your reporting tables... > > On 24/01/2010, at 7:07 PM, "Sriram Muthuswamy Chittathoor" > <[email protected] > > wrote: > > > However, I'd only recommend using secondary index as a last resort. > > First I'd try doing everything I can to work with the index I get for > > free. The row key. It sounds like you have done this already... > > -- > > > > The only reason why this is important to me is because of the > > following > > > > 1. I am storing at a minimal 1 yrs worth of data (small rows -- 10 > > billion) > > > > 2. Row key is user + date (columns -- gameid , opponent etc) > > > > 3. Queries may be something like give me details for a particular > > "gameid" > > > > 4. To do step 3 I am assuming I need something like a secondary > > index > > or else given my row key how else can I do it > > > > > > > > -----Original Message----- > > From: Daniel Washusen [mailto:[email protected]] > > Sent: Sunday, January 24, 2010 3:16 AM > > To: [email protected] > > Subject: Re: Support for MultiGet / SQL In clause -- error in patch > > HBASE-1845 > > > > Well, it CAN be a RAM hog ;-). It depends what you're indexing. Each > > unique value in the indexed column resides in memory. If you index a > > column that contains 1 million random 1KB values then the index will > > require at least 1GB of memory. Also it *can* slow down writes, > > especially when bulk loading sequential keys. > > > > On the up side, it can make scans dramatically faster. > > > > However, I'd only recommend using secondary index as a last resort. > > First I'd try doing everything I can to work with the index I get for > > free. The row key. It sounds like you have done this already... > > > > Cheers, > > Dan > > > > On 24/01/2010, at 7:02 AM, Stack <[email protected]> wrote: > > > >> On Sat, Jan 23, 2010 at 2:52 AM, Sriram Muthuswamy Chittathoor > >> <[email protected]> wrote: > >>> Thanks all. I messed it up when I was trying to upgrade to > >>> 0.20.3. I deleted the data directory and formatted it thinking it > >>> will reset the whole cluster. > >>> > >>> I started fresh by deleting the data directory on all the nodes and > >>> then everything worked. I was also able to create the indexed > >>> table using the 0.20.3 patch. Let me run some tests on a few > >>> million rows and see how it holds up. > >>> > >>> BTW -- what would be the right way when I moved versions. Do I > >>> run migrate scripts to migrate the data to newer versions ? > >>> > >> Just install the new binaries every and restart or perform a rolling > >> restart -- see http://wiki.apache.org/hadoop/Hbase/RollingRestart -- > >> if you would avoid taking down your cluster during the upgrade. > >> > >> You'll be flagged on start if you need to run a migration but general > >> rule is that there (should) never be need of a migration between > >> patch > >> releases: e.g. between 0.20.2 to 0.20.3. There may be need of > >> migrations moving between minor numbers; e.g. from 0.19 to 0.20. > >> > >> Let us know how IHBase works out for you (indexed hbase). Its a RAM > >> hog but the speed improvement finding matching cells can be > >> startling. > >> > >> St.Ack > >> > >>> -----Original Message----- > >>> From: [email protected] [mailto:[email protected]] On Behalf Of > >>> Stack > >>> Sent: Saturday, January 23, 2010 5:00 AM > >>> To: [email protected] > >>> Subject: Re: Support for MultiGet / SQL In clause -- error in patch > >>> HBASE-1845 > >>> > >>> Check your master log. Something is seriously off if you do not > >>> have > >>> a reachable .META. table. > >>> St.Ack > >>> > >>> On Fri, Jan 22, 2010 at 1:09 PM, Sriram Muthuswamy Chittathoor > >>> <[email protected]> wrote: > >>>> I applied the hbase-0.20.3 version / hadoop 0.20.1. But after > >>>> starting > >>>> hbase I keep getting the error below when I go to the hbase shell > >>>> > >>>> [ppo...@karisimbivir1 hbase-0.20.3]$ ./bin/hbase shell > >>>> HBase Shell; enter 'help<RETURN>' for list of supported commands. > >>>> Version: 0.20.3, r900041, Sat Jan 16 17:20:21 PST 2010 > >>>> hbase(main):001:0> list > >>>> NativeException: > >>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to > >>>> contact region server null for region , row '', but failed after 7 > >>>> attempts. > >>>> Exceptions: > >>>> org.apache.hadoop.hbase.TableNotFoundException: .META. > >>>> org.apache.hadoop.hbase.TableNotFoundException: .META. > >>>> org.apache.hadoop.hbase.TableNotFoundException: .META. > >>>> org.apache.hadoop.hbase.TableNotFoundException: .META. > >>>> org.apache.hadoop.hbase.TableNotFoundException: .META. > >>>> org.apache.hadoop.hbase.TableNotFoundException: .META. > >>>> org.apache.hadoop.hbase.TableNotFoundException: .META. > >>>> > >>>> > >>>> > >>>> Also when I try to create a table programatically I get this -- > >>>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Attempting connection > >>>> to > >>>> server localhost/127.0.0.1:2181 > >>>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Priming connection to > >>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:43775 > >>>> remote=localhost/127.0.0.1:2181] > >>>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Server connection > >>>> successful > >>>> Exception in thread "main" > >>>> org.apache.hadoop.hbase.TableNotFoundException: .META. > >>>> at > >>>> org.apache.hadoop.hbase.client.HConnectionManager > >>>> $TableServers.locateReg > >>>> ionInMeta(HConnectionManager.java:684) > >>>> at > >>>> org.apache.hadoop.hbase.client.HConnectionManager > >>>> $TableServers.locateReg > >>>> ion(HConnectionManager.java:634) > >>>> at > >>>> org.apache.hadoop.hbase.client.HConnectionManager > >>>> $TableServers.locateReg > >>>> ion(HConnectionManager.java:601) > >>>> at > >>>> org.apache.hadoop.hbase.client.HConnectionManager > >>>> $TableServers.locateReg > >>>> ionInMeta(HConnectionManager.java:675) > >>>> at > >>>> org.apache.hadoop.hbase.client.HConnectionManager > >>>> $TableServers.locateReg > >>>> ion(HConnectionManager.java:638) > >>>> at > >>>> org.apache.hadoop.hbase.client.HConnectionManager > >>>> $TableServers.locateReg > >>>> ion(HConnectionManager.java:601) > >>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java: > >>>> 128) > >>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java: > >>>> 106) > >>>> at test.CreateTable.main(CreateTable.java:36) > >>>> > >>>> > >>>> > >>>> Any clues ? > >>>> > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: Dan Washusen [mailto:[email protected]] > >>>> Sent: Friday, January 22, 2010 4:53 AM > >>>> To: [email protected] > >>>> Subject: Re: Support for MultiGet / SQL In clause -- error in patch > >>>> HBASE-1845 > >>>> > >>>> If you want to give the "indexed" contrib package a try you'll > >>>> need to > >>>> do > >>>> the following: > >>>> > >>>> 1. Include the contrib jars (export HBASE_CLASSPATH=(`find > >>>> /path/to/hbase/hbase-0.20.3/contrib/indexed -name '*jar' | tr -s > >>>> "\n" > >>>> ":"`) > >>>> 2. Set the 'hbase.hregion.impl' property to > >>>> 'org.apache.hadoop.hbase.regionserver.IdxRegion' in your > >>>> hbase-site.xml > >>>> > >>>> Once you've done that you can create a table with an index using: > >>>> > >>>>> // define which qualifiers need an index (choosing the correct > >>>> type) > >>>>> IdxColumnDescriptor columnDescriptor = new > >>>>> IdxColumnDescriptor("columnFamily"); > >>>>> columnDescriptor.addIndexDescriptor( > >>>>> new IdxIndexDescriptor("qualifier", > >>>>> IdxQualifierType.BYTE_ARRAY) > >>>>> ); > >>>>> > >>>>> HTableDescriptor tableDescriptor = new HTableDescriptor > >>>>> ("table"); > >>>>> tableDescriptor.addFamily(columnDescriptor); > >>>>> > >>>> > >>>> Then when you want to perform a scan with an index hint: > >>>> > >>>>> Scan scan = new IdxScan( > >>>>> new Comparison("columnFamily", "qualifier", > >>>>> Comparison.Operator.EQ, Bytes.toBytes("foo")) > >>>>> ); > >>>>> > >>>> > >>>> You have to keep in mind that the index hint is only a hint. It > >>>> guarantees > >>>> that your scan will get all rows that match the hint but you'll > >>>> more > >>>> than > >>>> likely receive rows that don't. For this reason I'd suggest that > >>>> you > >>>> also > >>>> include a filter along with the scan: > >>>> > >>>>> Scan scan = new IdxScan( > >>>>> new Comparison("columnFamily", "qualifier", > >>>>> Comparison.Operator.EQ, Bytes.toBytes("foo")) > >>>>> ); > >>>>> scan.setFilter( > >>>>> new SingleColumnValueFilter( > >>>>> "columnFamily", "qualifer", > >>>> CompareFilter.CompareOp.EQUAL, > >>>>> new BinaryComparator("foo") > >>>>> ) > >>>>> ); > >>>>> > >>>> > >>>> Cheers, > >>>> Dan > >>>> > >>>> > >>>> 2010/1/22 stack <[email protected]> > >>>> > >>>>> > >>>> > > > http://people.apache.org/~jdcryans/hbase-0.20.3-candidate-2/<http://people.apache.org/%7Ejdcryans/hbase-0.20.3-candidate-2/> > <http://peop > >>>> le.apache.org/%7Ejdcryans/hbase-0.20.3-candidate-2/> > >>>>> > >>>>> There is a bit of documentation if you look at javadoc for the > >>>>> 'indexed' contrib (This is what hbase-2073 is called on commit). > >>>>> > >>>>> St.Ack > >>>>> > >>>>> P.S. We had a thread going named "HBase bulk load". You got all > >>>>> the > >>>>> answers you need on that one? > >>>>> > >>>>> On Thu, Jan 21, 2010 at 11:19 AM, Sriram Muthuswamy Chittathoor > >>>>> <[email protected]> wrote: > >>>>>> > >>>>>> Great. Can I migrate to 0.20.3RC2 easily. I am on 0.20.2. Can u > >>>> pass > >>>>>> me the link > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: [email protected] [mailto:[email protected]] On Behalf > >>>>>> Of > >>>>>> stack > >>>>>> Sent: Friday, January 22, 2010 12:42 AM > >>>>>> To: [email protected] > >>>>>> Subject: Re: Support for MultiGet / SQL In clause -- error in > >>>>>> patch > >>>>>> HBASE-1845 > >>>>>> > >>>>>> IIRC, hbase-1845 was a sketch only and not yet complete. Its > >>>> probably > >>>>>> rotted since any ways. > >>>>>> > >>>>>> Have you looked at hbase-2037 since committed and available in > >>>>>> 0.20.3RC2. > >>>>>> Would this help you with your original problem? > >>>>>> > >>>>>> St.Ack > >>>>>> > >>>>>> On Thu, Jan 21, 2010 at 9:10 AM, Sriram Muthuswamy Chittathoor < > >>>>>> [email protected]> wrote: > >>>>>> > >>>>>>> I tried applying the patch to the hbase source code hbase > >>>>>>> 0.20.2 > >>>> and > >>>>>> I > >>>>>>> get the errors below. Do you know if this needs to be applied > >>>>>>> to > >>>> a > >>>>>>> specific hbase version. Is there a version which works with > >>>>>>> 0.20.2 > >>>> or > >>>>>>> later ?? > >>>>>>> Basically HRegionServer and HTable patching fails. > >>>>>>> > >>>>>>> > >>>>>>> Thanks for the help > >>>>>>> > >>>>>>> patch -p0 -i batch.patch > >>>>>>> > >>>>>>> patching file src/java/org/apache/hadoop/hbase/client/Get.java > >>>>>>> Hunk #1 succeeded at 61 (offset 2 lines). > >>>>>>> Hunk #2 succeeded at 347 (offset 31 lines). > >>>>>>> patching file > >>>> src/java/org/apache/hadoop/hbase/client/HConnection.java > >>>>>>> patching file > >>>>>>> src/java/org/apache/hadoop/hbase/client/HConnectionManager.java > >>>>>>> Hunk #3 succeeded at 1244 (offset 6 lines). > >>>>>>> patching file src/java/org/apache/hadoop/hbase/client/ > >>>>>>> HTable.java > >>>>>>> Hunk #2 succeeded at 73 (offset 8 lines). > >>>>>>> Hunk #4 FAILED at 405. > >>>>>>> Hunk #5 succeeded at 671 with fuzz 2 (offset 26 lines). > >>>>>>> 1 out of 5 hunks FAILED -- saving rejects to file > >>>>>>> src/java/org/apache/hadoop/hbase/client/HTable.java.rej > >>>>>>> patching file src/java/org/apache/hadoop/hbase/client/Multi.java > >>>>>>> patching file > >>>>>> src/java/org/apache/hadoop/hbase/client/MultiCallable.java > >>>>>>> patching file > >>>> src/java/org/apache/hadoop/hbase/client/MultiResult.java > >>>>>>> patching file src/java/org/apache/hadoop/hbase/client/Row.java > >>>>>>> patching file > >>>>>>> src/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java > >>>>>>> Hunk #2 succeeded at 156 with fuzz 1 (offset 3 lines). > >>>>>>> patching file > >>>>>> src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java > >>>>>>> Hunk #2 succeeded at 247 (offset 2 lines). > >>>>>>> patching file > >>>>>>> src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > >>>>>>> Hunk #1 succeeded at 78 (offset -1 lines). > >>>>>>> Hunk #2 FAILED at 2515. > >>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file > >>>>>>> > >>>> src/java/org/apache/hadoop/hbase/regionserver/ > >>>> HRegionServer.java.rej > >>>>>>> patching file > >>>> src/test/org/apache/hadoop/hbase/client/TestHTable.java > >>>>>>> Hunk #2 FAILED at 333. > >>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file > >>>>>>> src/test/org/apache/hadoop/hbase/client/TestHTable.java.rej > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Marc Limotte [mailto:[email protected]] > >>>>>>> Sent: Tuesday, January 19, 2010 10:26 PM > >>>>>>> To: [email protected] > >>>>>>> Subject: Re: Support for MultiGet / SQL In clause > >>>>>>> > >>>>>>> Sriram, > >>>>>>> > >>>>>>> Would a secondary index help you: > >>>>>>> > >>>>>> > >>>> > > > http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/ > >>>>>>> client/tableindexed/package-summary.html#package_description > >>>>>>> . > >>>>>>> > >>>>>>> The index is stored in a separate table, but the index is > >>>>>>> managed > >>>> for > >>>>>>> you. > >>>>>>> > >>>>>>> I don't think you can do an arbitrary "in" query, though. If > >>>>>>> the > >>>> keys > >>>>>>> that > >>>>>>> you want to include in the "in" are reasonably close neighbors, > >>>> you > >>>>>>> could do > >>>>>>> a scan and skip ones that are uninteresting. You could also > >>>>>>> try a > >>>>>> batch > >>>>>>> Get > >>>>>>> by applying a separate patch, see > >>>>>>> http://issues.apache.org/jira/browse/HBASE-1845. > >>>>>>> > >>>>>>> Marc Limotte > >>>>>>> > >>>>>>> On Tue, Jan 19, 2010 at 8:45 AM, Sriram Muthuswamy Chittathoor < > >>>>>>> [email protected]> wrote: > >>>>>>> > >>>>>>>> Is there any support for this. I want to do this > >>>>>>>> > >>>>>>>> 1. Create a second table to maintain mapping between secondary > >>>>>> column > >>>>>>>> and the rowid's of the primary table > >>>>>>>> > >>>>>>>> 2. Use this second table to get the rowid's to lookup from the > >>>>>>> primary > >>>>>>>> table using a SQL In like clause --- > >>>>>>>> > >>>>>>>> Basically I am doing this to speed up querying by Non-row key > >>>>>>> columns. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> Sriram C > >>>>>>>> > >>>>>>>> > >>>>>>>> This email is sent for and on behalf of Ivy Comptech Private > >>>>>> Limited. > >>>>>>> Ivy > >>>>>>>> Comptech Private Limited is a limited liability company. > >>>>>>>> > >>>>>>>> This email and any attachments are confidential, and may be > >>>> legally > >>>>>>>> privileged and protected by copyright. If you are not the > >>>> intended > >>>>>>> recipient > >>>>>>>> dissemination or copying of this email is prohibited. If you > >>>> have > >>>>>>> received > >>>>>>>> this in error, please notify the sender by replying by email > >>>>>>>> and > >>>>>> then > >>>>>>> delete > >>>>>>>> the email completely from your system. > >>>>>>>> Any views or opinions are solely those of the sender. This > >>>>>>> communication > >>>>>>>> is not intended to form a binding contract on behalf of Ivy > >>>> Comptech > >>>>>>> Private > >>>>>>>> Limited unless expressly indicated to the contrary and properly > >>>>>>> authorised. > >>>>>>>> Any actions taken on the basis of this email are at the > >>>> recipient's > >>>>>>> own > >>>>>>>> risk. > >>>>>>>> > >>>>>>>> Registered office: > >>>>>>>> Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara > >>>>>> Hills, > >>>>>>>> Hyderabad 500 033, Andhra Pradesh, India. Registered number: > >>>> 37994. > >>>>>>>> Registered in India. A list of members' names is available for > >>>>>>> inspection at > >>>>>>>> the registered office. > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>> > >>>> > >>> > >>> This email is sent for and on behalf of Ivy Comptech Private > >>> Limited. Ivy Comptech Private Limited is a limited liability > >>> company. > >>> > >>> This email and any attachments are confidential, and may be legally > >>> privileged and protected by copyright. If you are not the intended > >>> recipient dissemination or copying of this email is prohibited. If > >>> you have received this in error, please notify the sender by > >>> replying by email and then delete the email completely from your > >>> system. > >>> Any views or opinions are solely those of the sender. This > >>> communication is not intended to form a binding contract on behalf > >>> of Ivy Comptech Private Limited unless expressly indicated to the > >>> contrary and properly authorised. Any actions taken on the basis of > >>> this email are at the recipient's own risk. > >>> > >>> Registered office: > >>> Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara > >>> Hills, Hyderabad 500 033, Andhra Pradesh, India. Registered number: > >>> 37994. Registered in India. A list of members' names is available > >>> for inspection at the registered office. > >>> > >>> >
