Re: HBase mention in VLDB keynote

Schubert Zhang Tue, 25 Aug 2009 14:02:28 -0700

@stack
We know HIVE-705, and already have good communication with the contributor,
since we are all chinese. :-)
In fact some code of the patch are used and tested in our project. But we
need more flexible data store schema to resolve engineering problems,
especially performance and practicability.


@andy
Does ryan's result different from JG's?
On Wed, Aug 26, 2009 at 2:50 AM, Andrew Purtell <apurt...@apache.org> wrote:

> Hi Schubert,
>
>
> > Regards "...and JG's/Ryan's performance test results for 0.20 stand as a
> contradiction." Can you provide more references? such as a url/link of these
> contradiction?
>
> For JG: http://www.docstoc.com/docs/7493304/HBase-Goes-Realtime
>
> I'm sure you have seen this already.
>
> Ryan has posted some information on the list now and again.
>
> Also I think your work with performance evaluation is very important
> feedback and data points. Thanks for that.
>
> > We are doing a interesting thing to make Hive can use HBase as it's data
> store. Now we can use Hive's SQL to query/mapreduce data stored in HBase,
> and also we can directly query/scan data from HBase.
>
> That sounds REALLY interesting!
>
>   - Andy
>
>
>
>
> ________________________________
> From: Schubert Zhang <zson...@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Tuesday, August 25, 2009 8:26:50 PM
>  Subject: Re: HBase mention in VLDB keynote
>
> hi andy,
>
> Even though current HBase is not yet ready for production, but we know it
> is
> really testable and evaluation-able for its data model and architecture.
>
> Regards "...and JG's/Ryan's performance test results for 0.20 stand as a
> contradiction." Can you provide more references? such as a url/link of
> these
> contradiction?
>
> Regards Hive, it's really a good design, especially about its abatraction
> of
> MapReduce workflow matched to SQL. Hive made a good success inside
> Facebook, the report says 29% of Facebook employees use Hive, and 51% of
> those users are from outside engineering. It should be caused by the easy
> leaned SQL than other languages such as Pig Latin, etc. In fact, Pig is now
> adding features of metadata and sql, which are provided in Hive. But Hive
> is
> still not very flexible to use alternate data store than HDFS files. We are
> doing a interesting thing to make Hive can use HBase as it's data store.
> Now
> we can use Hive's SQL to query/mapreduce data stored in HBase, and also we
> can directly query/scan data from HBase.
>
> I believe HBase can be a data store to work as a storage adapter layer
> above
> HDFS. It is not a database, it is just a data storage adapter system above
> HDFS, with a distributed b-tree clustered index. BigTable is designed to
> provide more easy-used ways to store small data objects and provide
> random-access, since GFS is designed for
> sequential-access/batch-processing/large-data storage and GFS is not
> appropriate to store small data objects and random-access.
>
> I also believe HBase can be a data store to let MapReduce over HBase
> possiable. If we review the Bigtable paper's, especially secetor 8, we can
> find it is widely used for to do mapreduce analysis/summary in many google
> applications.
>
>
> In the recent ACM Queue interview to Sean Quinlan, Google GFS leader, we
> can
> find google's new GFS integrated some data models of Bigtable.
> http://queue.acm.org/detail.cfm?id=1594206
>
>
> Schubert
>
> On Wed, Aug 26, 2009 at 12:36 AM, Bradford Stephens <
> bradfordsteph...@gmail.com> wrote:
>
> > Interesting. I need to see what sort of eval was going on for that
> > presentation...
> >
> > He probably forgot to tweak GC :)
> >
> > On Tue, Aug 25, 2009 at 9:32 AM, Andrew Purtell <apurt...@apache.org>
> > wrote:
> >
> > > > Can we write him to figure more on how evaluation was done?
> > >
> > >
> > > This was one interaction with that group, maybe the only other one
> aside
> > > from a question about sizing memstore:
> > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-07/msg00552.html
> > > Now I wonder if the eval was done via the REST gateway... A followup
> > might
> > > be useful. If I run into someone from Yahoo Research here I'll ask.
> > > Otherwise we should try mailing them, yes.
> > >
> > > > Should we try and get into VLDB next year?
> > >
> > > We can certainly submit a candidate paper given a novel contribution of
> > > some kind which moves the state of the art forward. There are other
> > venues
> > > besides VLDB also we can consider. Regardless, I think one of us should
> > > attend VLDB every year.
> > >
> > > > Any thing else interesting at the conference?
> > >
> > > Yes.
> > >
> > > ETH Zurich presented a system which tailors consistency to the needs of
> > > various data items -- "consistency rationing in the cloud: pay only
> when
> > it
> > > matters" -- choosing eventual (session) consistency or pessimistic 2PC
> on
> > > demand according to a cost model, with good results. Made me think of
> > > possibilities with THBase. Also, I watched a demo of HIVE, something I
> > > hadn't see to date. Their query planner and mapreduce scheduler is
> > > interesting in concept and in detail. We're looking at Cascading for
> > batch
> > > analytics on top of HBase instead, but knowing more about alternatives
> is
> > > always good.
> > >
> > > The Hadoop-y track is really tomorrow.
> > >
> > > Outside of direct relevance to things HBase I attended talks on aspects
> > of
> > > data fusion, ETL, and complex event processing / stream processing,
> > wearing
> > > my TM hat. Lots of good stuff here.
> > >
> > >   - Andy
> > >
> > >
> > >
> > > ________________________________
> > > From: Stack <saint....@gmail.com>
> > > To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
> > > Sent: Tuesday, August 25, 2009 4:47:57 PM
> > > Subject: Re: HBase mention in VLDB keynote
> > >
> > > The same fella did keynote at apachecon eu on a similar topic.  Then he
> > > talked mostly of Sherpa/pnuts yahoo tech.   In that presentation we got
> > no
> > > mention.  There the comparison strangely was to couchdb and perhaps
> > > Cassandra (iirc).
> > >
> > > So, mention is an improvement (do you think the kick up the behind I
> > > rendered him after his amsterdam talk could have had anything to do
> with
> > > it?).
> > >
> > > Can we write him to figure more on how evaluation was done?
> > >
> > > Should we try and get into vldb next year?
> > >
> > > Good stuff Andy.  Any thing else interesting at the conference?
> > >
> > > Stack
> > >
> > >
> > >
> > > On Aug 25, 2009, at 6:17 AM, Andrew Purtell <apurt...@apache.org>
> wrote:
> > >
> > > > In this keynote address here at VLDB 2009 (
> > > http://vldb2009.org/?q=node/22) Raghu Ramakrishnan, Yahoo! Research's
> > > Chief Scientist, made prominent mention of HBase, much to my surprise
> > (and
> > > later chagrin). This happened near the end of the talk when a number of
> > the
> > > new elastic/scalable/"nosql" storage systems were discussed to make
> > concrete
> > > some of the architectural and data model points made earlier. The
> > > alternatives considered were Yahoo's PNUTS, sharded MySQL, HBase, and
> > > Cassandra. I don't know what version of HBase was used exactly but
> > > unfortunately the message was "not ready yet". Perhaps it was a
> > > configuration or provisioning issue but HBase did not really survive
> the
> > > evaluation, leading to short hyperbolic performance curves terminating
> on
> > > the far left of the various graphs. This was quite disappointing to see
> > as
> > > the other alternatives were apparently successfully tested on what can
> be
> > > presumed to be the same resources. It stands to reason there
> > >  is opportunity for HBase to improve here if only we know what that is.
> > It
> > > was also a little disappointing that it appears through a mailing list
> > > search that these issues were not brought to either hbase-dev@ or
> > > hbase-users@, only a minor question relating to the REST interface.
> > > Perhaps the community could have identified a specific configuration
> > > problem, recommended a correction for a deployment/provisioning error,
> or
> > > resolved a bug. To future evaluators of HBase, on behalf of the
> community
> > I
> > > humbly request that you share you results, good or bad, so we can take
> > the
> > > feedback, or the bug reports and their artifacts (logs, etc.) and
> improve
> > > our software.
> > > >
> > > > At least, the story has already changed from what was presented today
> > --
> > > for example, the multimaster architecture of 0.20 was not presented,
> > rather
> > > the older one (circa 0.19); and JG's/Ryan's performance test results
> for
> > > 0.20 stand as a contradiction. We should look into opportunities to
> > produce
> > > a peer reviewed positive contribution. I think we have opportunities to
> > take
> > > some novel approaches in the system itself and/or produce a novel
> > vertical
> > > contribution and 0.20 is a good substrate for that.
> > > >
> > > > Though this was unfortunately a missed opportunity for a good showing
> > for
> > > HBase in particular, the keynote in general was a well formulated
> > > introduction of the emerging area of "cloud scale" storage / "nosql"
> > systems
> > > to the largest elite gathering of database and data processing
> > researchers
> > > in the world. The presentation was importantly also a call for
> > participation
> > > in the future development and directions of the new and growing "nosql"
> > > constellation. Such participation, whether it is specific involvement
> > with
> > > the HBase project or not, would be and is most welcome as the problems
> of
> > > serving data at very large scale under "cloud" constraints is an area
> of
> > > both significant challenge and significant promise. HBase like other
> > > projects in this area are in an early stage of development. They cover
> > the
> > > use cases of their creators but, as answers to the larger set of
> > problems,
> > > they are not -- that space is untapped and only waiting for creativity
> > and
> > > effort. I
> > >  think I can speak for HBase in particular, we welcome this and would
> be
> > > pleased to assist at every opportunity.
> > > >
> > > >    - Andy
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media,
> > and Computer Science
> >
>
>
>
>
>

Re: HBase mention in VLDB keynote

Reply via email to