Hey guys, I hate to ruin it for you, but Google search does not use bigtable at the query time. If you would like an example of a good robust search and indexing system, you could have a look at lucene library, the solr system build on lucene, and katta which is another system building on lucene.
-ryan On Sat, Mar 20, 2010 at 3:13 PM, TuX RaceR <tuxrace...@gmail.com> wrote: > Hello Hbase user List! > > The feature provided by IHbase is very appealing. It seems to correspond to > a use case very common in applications (at least in mine ;) ) > > Dan Washusen wrote: >> >> Not at the moment. It currently keeps a copy of each unique indexed >> value and each row key in memory... >> > > Is there a more robust indexing on the roadmap? > HBase if I understand well proposes an opensource version of Google > Bigtable. > To me the most striking difference between Hbase and Bigtable is for > narrowing searches; the example below shows what I mean by narrowing: > > If in Google you search for the word > > hbase: > > (i.e using: > http://www.google.com/search?q=hbase > ) > you get a fast answer > (typically: Results *1* - *10* of about *249,000* for *hbase*. (*0.17* > seconds)) > > Now if you search all pages coming for the hadoop.apache.org host name (or > base URL), that is with the query: > > hbase +site:hadoop.apache.org > > (i.e using the URL: > http://www.google.com/search?q=hbase+%2Bsite%3Ahadoop.apache.org > ) > you get a pretty fast answer to: > (typically: Results *1* - *10* of about *2,510* from *hadoop.apache.org* for > *hbase*. (*0.12* seconds) ) > > It seems to me that the second search uses a secondary index on a column > named 'site' to scan the 'hbase' based keys. Obviously Google found a good > way to implement this (good= fast and scalable) > Is this Google second indexing documented somewhere? Is that implemented > using something like IHbase or more something like THbase, or something > else? > Also, why IHbase stays in the 'contrib' tree? Is that because the code is > not at the same level as the main hbase code (not as tested, not as robust, > etc...)? > > Thanks > TuX > >