Hi - You could start by looking at the org.apache.hadoop.hbase.client.tableindexed.IndexedTableAdmin , IndexSpecification and org.apache.hadoop.hbase.HTableDescriptor.addIndex(IndexSpecification index)
I don't think you can use the shell currently to create index specifications. Grab the source and have a look at TestIndexedTable for an example of how to create the index. Cheers, John On 13/05/09 6:52 AM, "Jason Buberel" <[email protected]> wrote: Yesterday I completed a basic investigative setup that involved installing/deploying: Hadoop v0.19.1 HBase v0.19.1 Both were deployed in a pseudo-cluster configuration (a cluster with one node). Using the HBase shell, I created a simple table to hold real esate data: address, city, state, zip, beds, baths, sqft, etc: create 'listing_entry', \ > {NAME => 'mls_d'}, \ > {NAME => 'address_1'}, \ > {NAME => 'address_2'}, \ > {NAME => 'city'}, \ > {NAME => 'state'}, \ > {NAME => 'zip'}, \ > {NAME => 'date'}, \ > {NAME => 'beds'}, \ > {NAME => 'baths'},\ > {NAME => 'sqft'}, \ > {NAME => 'lot'}, \ > {NAME => 'year_built'} > I then wrote a short program to generated ~ 10M sample rows, with a randomly chosen zip value between 10000 and 99999 plus a city name randomly selected from a list of 10 values ('SUNNYVALE', 'CUPERTINO', 'MOUNTAIN VIEW', 'PALO ALTO', etc.). Next, I put together a simple query application that would search the 10M rows, looking for entries that matched by city, zip or both. The code for the zip code search was simple: HBaseConfiguration config = new HBaseConfiguration(); > HTable table = new HTable(config, "listing_entry"); > > RowFilterInterface filter = new ColumnValueFilter(Bytes.toBytes("zip:"), > ColumnValueFilter.CompareOp.EQUAL, > Bytes.toBytes("94086")); > Scanner search = table.getScanner(new String[]{"address_1:","city:", > "zip:"}, "",Long.MAX_VALUE, filter); > for (RowResult result : search) { > System.out.println(" " + result.get(Bytes.toBytes("address_1:")) + "/" > + > result.get(Bytes.toBytes("city:")) + "/" + > result.get(Bytes.toBytes("zip:"))); > } > search.close(); > When this was executed against a sample database of 10K rows, the query took about 15 seconds. So far, so good. But when that was expanded to the full sample data set of 10M rows, the request timed out. >From there, I went searching for information on how to create and then make use of indixes, which led me to the JavaDoc for IndexedTable. After reading through the JavaDocs on that class, it looked as though it is intended to be used to read data from indexed tables. Poking around the HBase shell help info, I didn't see any information specific to creating indices or indexed tables. I also looked through the examples code, but didn't find any information on indexing. Is there some other bit of example code or documentation I can read through that would help me figure out how to make my table with 10M rows queryable with reasonable response times? Or am I going about this all wrong, trying to wedge my structured-query-like brain into an orthogonal solution space? Thanks for any pointers... jason -- Jason L. Buberel [email protected]
