In this keynote address here at VLDB 2009 (http://vldb2009.org/?q=node/22) 
Raghu Ramakrishnan, Yahoo! Research's Chief Scientist, made prominent mention 
of HBase, much to my surprise (and later chagrin). This happened near the end 
of the talk when a number of the new elastic/scalable/"nosql" storage systems 
were discussed to make concrete some of the architectural and data model points 
made earlier. The alternatives considered were Yahoo's PNUTS, sharded MySQL, 
HBase, and Cassandra. I don't know what version of HBase was used exactly but 
unfortunately the message was "not ready yet". Perhaps it was a configuration 
or provisioning issue but HBase did not really survive the evaluation, leading 
to short hyperbolic performance curves terminating on the far left of the 
various graphs. This was quite disappointing to see as the other alternatives 
were apparently successfully tested on what can be presumed to be the same 
resources. It stands to reason there is
 opportunity for HBase to improve here if only we know what that is. It was 
also a little disappointing that it appears through a mailing list search that 
these issues were not brought to either hbase-dev@ or hbase-users@, only a 
minor question relating to the REST interface. Perhaps the community could have 
identified a specific configuration problem, recommended a correction for a 
deployment/provisioning error, or resolved a bug. To future evaluators of 
HBase, on behalf of the community I humbly request that you share you results, 
good or bad, so we can take the feedback, or the bug reports and their 
artifacts (logs, etc.) and improve our software. 

At least, the story has already changed from what was presented today -- for 
example, the multimaster architecture of 0.20 was not presented, rather the 
older one (circa 0.19); and JG's/Ryan's performance test results for 0.20 stand 
as a contradiction. We should look into opportunities to produce a peer 
reviewed positive contribution. I think we have opportunities to take some 
novel approaches in the system itself and/or produce a novel vertical 
contribution and 0.20 is a good substrate for that.

Though this was unfortunately a missed opportunity for a good showing for HBase 
in particular, the keynote in general was a well formulated introduction of the 
emerging area of "cloud scale" storage / "nosql" systems to the largest elite 
gathering of database and data processing researchers in the world. The 
presentation was importantly also a call for participation in the future 
development and directions of the new and growing "nosql" constellation. Such 
participation, whether it is specific involvement with the HBase project or 
not, would be and is most welcome as the problems of serving data at very large 
scale under "cloud" constraints is an area of both significant challenge and 
significant promise. HBase like other projects in this area are in an early 
stage of development. They cover the use cases of their creators but, as 
answers to the larger set of problems, they are not -- that space is untapped 
and only waiting for creativity and effort. I
 think I can speak for HBase in particular, we welcome this and would be 
pleased to assist at every opportunity. 

    - Andy


      

Reply via email to