Hey,

You have to be clear about what hbase does and does not do.  HBase is just
not a rational database - it's "weakness" is it's strength.

In general, you can only access rows in key order.  Keys are stored
lexicographically sorted however.  There aren't declarative secondary
indexes (minus the lucene thing, but that isn't an index).  You have to put
all these pieces together to build a system.

But, you get scalability, and reasonable performance, and in 0.20 you will
get really good performance (fast enough to serve websites hopefully).

In general you need to make sure your row-key sorts data in the order you
want to query by.  You can do something like this:

<user> <Long.MAX_VALUE - System.currentTimeMillis()> <event id>

to store events in reverse chronological order by users.

If you want another access method, you need to use a map-reduce and build a
secondary index.

I dont know if this exactly answers your question, but hopefully should give
you more of an idea of what hbase does and does not do.

-ryan





On Wed, Feb 25, 2009 at 9:02 PM, Bradford Stephens <
[email protected]> wrote:

> Greetings,
>
> I'm in charge of the data analysis and collection platform at my company,
> and we're basing a large part of our core analysis platform on Hadoop,
> Nutch, and Lucene -- it's a delight to use. However, we're going to be
> wanting some on-demand "web-scale" business intelligence, and I'm wondering
> if HBase is the right solution -- my research hasn't given me any
> conclusions.
>
> Our data set is pretty simple -- a bunch of XML documents which have been
> parsed from HTML pages, and some associated data (Author Name, Post Date,
> Influence, etc). What we would like to be able to do is have our end users
> do real-time (< 10 seconds) OLAP-type analysis on this, and have it
> presented on a webpage. For example, queries like ("All authors for the
> past
> two weeks who have used these keywords in the post bodies and what their
> influence score is"). I imagine we'll have several terabytes of data to go
> through, and we won't be able to do much pre-population of results.
>
> Is HBase low-latency enough that we can scale-out to solve these sorts of
> problems?
>
> Cheers,
> Bradford
>

Reply via email to