The same fella did keynote at apachecon eu on a similar topic. Then
he talked mostly of Sherpa/pnuts yahoo tech. In that presentation we
got no mention. There the comparison strangely was to couchdb and
perhaps Cassandra (iirc).
So, mention is an improvement (do you think the kick up the behind I
rendered him after his amsterdam talk could have had anything to do
with it?).
Can we write him to figure more on how evaluation was done?
Should we try and get into vldb next year?
Good stuff Andy. Any thing else interesting at the conference?
Stack
On Aug 25, 2009, at 6:17 AM, Andrew Purtell <apurt...@apache.org> wrote:
In this keynote address here at VLDB 2009 (http://vldb2009.org/?q=node/22
) Raghu Ramakrishnan, Yahoo! Research's Chief Scientist, made
prominent mention of HBase, much to my surprise (and later chagrin).
This happened near the end of the talk when a number of the new
elastic/scalable/"nosql" storage systems were discussed to make
concrete some of the architectural and data model points made
earlier. The alternatives considered were Yahoo's PNUTS, sharded
MySQL, HBase, and Cassandra. I don't know what version of HBase was
used exactly but unfortunately the message was "not ready yet".
Perhaps it was a configuration or provisioning issue but HBase did
not really survive the evaluation, leading to short hyperbolic
performance curves terminating on the far left of the various
graphs. This was quite disappointing to see as the other
alternatives were apparently successfully tested on what can be
presumed to be the same resources. It stands to reason there is
opportunity for HBase to improve here if only we know what that is.
It was also a little disappointing that it appears through a mailing
list search that these issues were not brought to either hbase-dev@
or hbase-users@, only a minor question relating to the REST
interface. Perhaps the community could have identified a specific
configuration problem, recommended a correction for a deployment/
provisioning error, or resolved a bug. To future evaluators of
HBase, on behalf of the community I humbly request that you share
you results, good or bad, so we can take the feedback, or the bug
reports and their artifacts (logs, etc.) and improve our software.
At least, the story has already changed from what was presented
today -- for example, the multimaster architecture of 0.20 was not
presented, rather the older one (circa 0.19); and JG's/Ryan's
performance test results for 0.20 stand as a contradiction. We
should look into opportunities to produce a peer reviewed positive
contribution. I think we have opportunities to take some novel
approaches in the system itself and/or produce a novel vertical
contribution and 0.20 is a good substrate for that.
Though this was unfortunately a missed opportunity for a good
showing for HBase in particular, the keynote in general was a well
formulated introduction of the emerging area of "cloud scale"
storage / "nosql" systems to the largest elite gathering of database
and data processing researchers in the world. The presentation was
importantly also a call for participation in the future development
and directions of the new and growing "nosql" constellation. Such
participation, whether it is specific involvement with the HBase
project or not, would be and is most welcome as the problems of
serving data at very large scale under "cloud" constraints is an
area of both significant challenge and significant promise. HBase
like other projects in this area are in an early stage of
development. They cover the use cases of their creators but, as
answers to the larger set of problems, they are not -- that space is
untapped and only waiting for creativity and effort. I
think I can speak for HBase in particular, we welcome this and would
be pleased to assist at every opportunity.
- Andy