Re: HBase mention in VLDB keynote

Schubert Zhang Tue, 25 Aug 2009 11:27:23 -0700

hi andy,

Even though current HBase is not yet ready for production, but we know it is
really testable and evaluation-able for its data model and architecture.


Regards "...and JG's/Ryan's performance test results for 0.20 stand as a
contradiction." Can you provide more references? such as a url/link of these
contradiction?

Regards Hive, it's really a good design, especially about its abatraction of
MapReduce workflow matched to SQL. Hive made a good success inside
Facebook, the report says 29% of Facebook employees use Hive, and 51% of
those users are from outside engineering. It should be caused by the easy
leaned SQL than other languages such as Pig Latin, etc. In fact, Pig is now
adding features of metadata and sql, which are provided in Hive. But Hive is
still not very flexible to use alternate data store than HDFS files. We are
doing a interesting thing to make Hive can use HBase as it's data store. Now
we can use Hive's SQL to query/mapreduce data stored in HBase, and also we
can directly query/scan data from HBase.

I believe HBase can be a data store to work as a storage adapter layer above
HDFS. It is not a database, it is just a data storage adapter system above
HDFS, with a distributed b-tree clustered index. BigTable is designed to
provide more easy-used ways to store small data objects and provide
random-access, since GFS is designed for
sequential-access/batch-processing/large-data storage and GFS is not
appropriate to store small data objects and random-access.

I also believe HBase can be a data store to let MapReduce over HBase
possiable. If we review the Bigtable paper's, especially secetor 8, we can
find it is widely used for to do mapreduce analysis/summary in many google
applications.


In the recent ACM Queue interview to Sean Quinlan, Google GFS leader, we can
find google's new GFS integrated some data models of Bigtable.
http://queue.acm.org/detail.cfm?id=1594206


Schubert

On Wed, Aug 26, 2009 at 12:36 AM, Bradford Stephens <
bradfordsteph...@gmail.com> wrote:

> Interesting. I need to see what sort of eval was going on for that
> presentation...
>
> He probably forgot to tweak GC :)
>
> On Tue, Aug 25, 2009 at 9:32 AM, Andrew Purtell <apurt...@apache.org>
> wrote:
>
> > > Can we write him to figure more on how evaluation was done?
> >
> >
> > This was one interaction with that group, maybe the only other one aside
> > from a question about sizing memstore:
> > http://osdir.com/ml/hbase-user-hadoop-apache/2009-07/msg00552.html
> > Now I wonder if the eval was done via the REST gateway... A followup
> might
> > be useful. If I run into someone from Yahoo Research here I'll ask.
> > Otherwise we should try mailing them, yes.
> >
> > > Should we try and get into VLDB next year?
> >
> > We can certainly submit a candidate paper given a novel contribution of
> > some kind which moves the state of the art forward. There are other
> venues
> > besides VLDB also we can consider. Regardless, I think one of us should
> > attend VLDB every year.
> >
> > > Any thing else interesting at the conference?
> >
> > Yes.
> >
> > ETH Zurich presented a system which tailors consistency to the needs of
> > various data items -- "consistency rationing in the cloud: pay only when
> it
> > matters" -- choosing eventual (session) consistency or pessimistic 2PC on
> > demand according to a cost model, with good results. Made me think of
> > possibilities with THBase. Also, I watched a demo of HIVE, something I
> > hadn't see to date. Their query planner and mapreduce scheduler is
> > interesting in concept and in detail. We're looking at Cascading for
> batch
> > analytics on top of HBase instead, but knowing more about alternatives is
> > always good.
> >
> > The Hadoop-y track is really tomorrow.
> >
> > Outside of direct relevance to things HBase I attended talks on aspects
> of
> > data fusion, ETL, and complex event processing / stream processing,
> wearing
> > my TM hat. Lots of good stuff here.
> >
> >   - Andy
> >
> >
> >
> > ________________________________
> > From: Stack <saint....@gmail.com>
> > To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
> > Sent: Tuesday, August 25, 2009 4:47:57 PM
> > Subject: Re: HBase mention in VLDB keynote
> >
> > The same fella did keynote at apachecon eu on a similar topic.  Then he
> > talked mostly of Sherpa/pnuts yahoo tech.   In that presentation we got
> no
> > mention.  There the comparison strangely was to couchdb and perhaps
> > Cassandra (iirc).
> >
> > So, mention is an improvement (do you think the kick up the behind I
> > rendered him after his amsterdam talk could have had anything to do with
> > it?).
> >
> > Can we write him to figure more on how evaluation was done?
> >
> > Should we try and get into vldb next year?
> >
> > Good stuff Andy.  Any thing else interesting at the conference?
> >
> > Stack
> >
> >
> >
> > On Aug 25, 2009, at 6:17 AM, Andrew Purtell <apurt...@apache.org> wrote:
> >
> > > In this keynote address here at VLDB 2009 (
> > http://vldb2009.org/?q=node/22) Raghu Ramakrishnan, Yahoo! Research's
> > Chief Scientist, made prominent mention of HBase, much to my surprise
> (and
> > later chagrin). This happened near the end of the talk when a number of
> the
> > new elastic/scalable/"nosql" storage systems were discussed to make
> concrete
> > some of the architectural and data model points made earlier. The
> > alternatives considered were Yahoo's PNUTS, sharded MySQL, HBase, and
> > Cassandra. I don't know what version of HBase was used exactly but
> > unfortunately the message was "not ready yet". Perhaps it was a
> > configuration or provisioning issue but HBase did not really survive the
> > evaluation, leading to short hyperbolic performance curves terminating on
> > the far left of the various graphs. This was quite disappointing to see
> as
> > the other alternatives were apparently successfully tested on what can be
> > presumed to be the same resources. It stands to reason there
> >  is opportunity for HBase to improve here if only we know what that is.
> It
> > was also a little disappointing that it appears through a mailing list
> > search that these issues were not brought to either hbase-dev@ or
> > hbase-users@, only a minor question relating to the REST interface.
> > Perhaps the community could have identified a specific configuration
> > problem, recommended a correction for a deployment/provisioning error, or
> > resolved a bug. To future evaluators of HBase, on behalf of the community
> I
> > humbly request that you share you results, good or bad, so we can take
> the
> > feedback, or the bug reports and their artifacts (logs, etc.) and improve
> > our software.
> > >
> > > At least, the story has already changed from what was presented today
> --
> > for example, the multimaster architecture of 0.20 was not presented,
> rather
> > the older one (circa 0.19); and JG's/Ryan's performance test results for
> > 0.20 stand as a contradiction. We should look into opportunities to
> produce
> > a peer reviewed positive contribution. I think we have opportunities to
> take
> > some novel approaches in the system itself and/or produce a novel
> vertical
> > contribution and 0.20 is a good substrate for that.
> > >
> > > Though this was unfortunately a missed opportunity for a good showing
> for
> > HBase in particular, the keynote in general was a well formulated
> > introduction of the emerging area of "cloud scale" storage / "nosql"
> systems
> > to the largest elite gathering of database and data processing
> researchers
> > in the world. The presentation was importantly also a call for
> participation
> > in the future development and directions of the new and growing "nosql"
> > constellation. Such participation, whether it is specific involvement
> with
> > the HBase project or not, would be and is most welcome as the problems of
> > serving data at very large scale under "cloud" constraints is an area of
> > both significant challenge and significant promise. HBase like other
> > projects in this area are in an early stage of development. They cover
> the
> > use cases of their creators but, as answers to the larger set of
> problems,
> > they are not -- that space is untapped and only waiting for creativity
> and
> > effort. I
> >  think I can speak for HBase in particular, we welcome this and would be
> > pleased to assist at every opportunity.
> > >
> > >    - Andy
> > >
> > >
> >
> >
> >
> >
> >
>
>
>
> --
> http://www.roadtofailure.com -- The Fringes of Scalability, Social Media,
> and Computer Science
>

Re: HBase mention in VLDB keynote

Reply via email to