@stack We know HIVE-705, and already have good communication with the contributor, since we are all chinese. :-) In fact some code of the patch are used and tested in our project. But we need more flexible data store schema to resolve engineering problems, especially performance and practicability.
@andy Does ryan's result different from JG's? On Wed, Aug 26, 2009 at 2:50 AM, Andrew Purtell <apurt...@apache.org> wrote: > Hi Schubert, > > > > Regards "...and JG's/Ryan's performance test results for 0.20 stand as a > contradiction." Can you provide more references? such as a url/link of these > contradiction? > > For JG: http://www.docstoc.com/docs/7493304/HBase-Goes-Realtime > > I'm sure you have seen this already. > > Ryan has posted some information on the list now and again. > > Also I think your work with performance evaluation is very important > feedback and data points. Thanks for that. > > > We are doing a interesting thing to make Hive can use HBase as it's data > store. Now we can use Hive's SQL to query/mapreduce data stored in HBase, > and also we can directly query/scan data from HBase. > > That sounds REALLY interesting! > > - Andy > > > > > ________________________________ > From: Schubert Zhang <zson...@gmail.com> > To: hbase-user@hadoop.apache.org > Sent: Tuesday, August 25, 2009 8:26:50 PM > Subject: Re: HBase mention in VLDB keynote > > hi andy, > > Even though current HBase is not yet ready for production, but we know it > is > really testable and evaluation-able for its data model and architecture. > > Regards "...and JG's/Ryan's performance test results for 0.20 stand as a > contradiction." Can you provide more references? such as a url/link of > these > contradiction? > > Regards Hive, it's really a good design, especially about its abatraction > of > MapReduce workflow matched to SQL. Hive made a good success inside > Facebook, the report says 29% of Facebook employees use Hive, and 51% of > those users are from outside engineering. It should be caused by the easy > leaned SQL than other languages such as Pig Latin, etc. In fact, Pig is now > adding features of metadata and sql, which are provided in Hive. But Hive > is > still not very flexible to use alternate data store than HDFS files. We are > doing a interesting thing to make Hive can use HBase as it's data store. > Now > we can use Hive's SQL to query/mapreduce data stored in HBase, and also we > can directly query/scan data from HBase. > > I believe HBase can be a data store to work as a storage adapter layer > above > HDFS. It is not a database, it is just a data storage adapter system above > HDFS, with a distributed b-tree clustered index. BigTable is designed to > provide more easy-used ways to store small data objects and provide > random-access, since GFS is designed for > sequential-access/batch-processing/large-data storage and GFS is not > appropriate to store small data objects and random-access. > > I also believe HBase can be a data store to let MapReduce over HBase > possiable. If we review the Bigtable paper's, especially secetor 8, we can > find it is widely used for to do mapreduce analysis/summary in many google > applications. > > > In the recent ACM Queue interview to Sean Quinlan, Google GFS leader, we > can > find google's new GFS integrated some data models of Bigtable. > http://queue.acm.org/detail.cfm?id=1594206 > > > Schubert > > On Wed, Aug 26, 2009 at 12:36 AM, Bradford Stephens < > bradfordsteph...@gmail.com> wrote: > > > Interesting. I need to see what sort of eval was going on for that > > presentation... > > > > He probably forgot to tweak GC :) > > > > On Tue, Aug 25, 2009 at 9:32 AM, Andrew Purtell <apurt...@apache.org> > > wrote: > > > > > > Can we write him to figure more on how evaluation was done? > > > > > > > > > This was one interaction with that group, maybe the only other one > aside > > > from a question about sizing memstore: > > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-07/msg00552.html > > > Now I wonder if the eval was done via the REST gateway... A followup > > might > > > be useful. If I run into someone from Yahoo Research here I'll ask. > > > Otherwise we should try mailing them, yes. > > > > > > > Should we try and get into VLDB next year? > > > > > > We can certainly submit a candidate paper given a novel contribution of > > > some kind which moves the state of the art forward. There are other > > venues > > > besides VLDB also we can consider. Regardless, I think one of us should > > > attend VLDB every year. > > > > > > > Any thing else interesting at the conference? > > > > > > Yes. > > > > > > ETH Zurich presented a system which tailors consistency to the needs of > > > various data items -- "consistency rationing in the cloud: pay only > when > > it > > > matters" -- choosing eventual (session) consistency or pessimistic 2PC > on > > > demand according to a cost model, with good results. Made me think of > > > possibilities with THBase. Also, I watched a demo of HIVE, something I > > > hadn't see to date. Their query planner and mapreduce scheduler is > > > interesting in concept and in detail. We're looking at Cascading for > > batch > > > analytics on top of HBase instead, but knowing more about alternatives > is > > > always good. > > > > > > The Hadoop-y track is really tomorrow. > > > > > > Outside of direct relevance to things HBase I attended talks on aspects > > of > > > data fusion, ETL, and complex event processing / stream processing, > > wearing > > > my TM hat. Lots of good stuff here. > > > > > > - Andy > > > > > > > > > > > > ________________________________ > > > From: Stack <saint....@gmail.com> > > > To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org> > > > Sent: Tuesday, August 25, 2009 4:47:57 PM > > > Subject: Re: HBase mention in VLDB keynote > > > > > > The same fella did keynote at apachecon eu on a similar topic. Then he > > > talked mostly of Sherpa/pnuts yahoo tech. In that presentation we got > > no > > > mention. There the comparison strangely was to couchdb and perhaps > > > Cassandra (iirc). > > > > > > So, mention is an improvement (do you think the kick up the behind I > > > rendered him after his amsterdam talk could have had anything to do > with > > > it?). > > > > > > Can we write him to figure more on how evaluation was done? > > > > > > Should we try and get into vldb next year? > > > > > > Good stuff Andy. Any thing else interesting at the conference? > > > > > > Stack > > > > > > > > > > > > On Aug 25, 2009, at 6:17 AM, Andrew Purtell <apurt...@apache.org> > wrote: > > > > > > > In this keynote address here at VLDB 2009 ( > > > http://vldb2009.org/?q=node/22) Raghu Ramakrishnan, Yahoo! Research's > > > Chief Scientist, made prominent mention of HBase, much to my surprise > > (and > > > later chagrin). This happened near the end of the talk when a number of > > the > > > new elastic/scalable/"nosql" storage systems were discussed to make > > concrete > > > some of the architectural and data model points made earlier. The > > > alternatives considered were Yahoo's PNUTS, sharded MySQL, HBase, and > > > Cassandra. I don't know what version of HBase was used exactly but > > > unfortunately the message was "not ready yet". Perhaps it was a > > > configuration or provisioning issue but HBase did not really survive > the > > > evaluation, leading to short hyperbolic performance curves terminating > on > > > the far left of the various graphs. This was quite disappointing to see > > as > > > the other alternatives were apparently successfully tested on what can > be > > > presumed to be the same resources. It stands to reason there > > > is opportunity for HBase to improve here if only we know what that is. > > It > > > was also a little disappointing that it appears through a mailing list > > > search that these issues were not brought to either hbase-dev@ or > > > hbase-users@, only a minor question relating to the REST interface. > > > Perhaps the community could have identified a specific configuration > > > problem, recommended a correction for a deployment/provisioning error, > or > > > resolved a bug. To future evaluators of HBase, on behalf of the > community > > I > > > humbly request that you share you results, good or bad, so we can take > > the > > > feedback, or the bug reports and their artifacts (logs, etc.) and > improve > > > our software. > > > > > > > > At least, the story has already changed from what was presented today > > -- > > > for example, the multimaster architecture of 0.20 was not presented, > > rather > > > the older one (circa 0.19); and JG's/Ryan's performance test results > for > > > 0.20 stand as a contradiction. We should look into opportunities to > > produce > > > a peer reviewed positive contribution. I think we have opportunities to > > take > > > some novel approaches in the system itself and/or produce a novel > > vertical > > > contribution and 0.20 is a good substrate for that. > > > > > > > > Though this was unfortunately a missed opportunity for a good showing > > for > > > HBase in particular, the keynote in general was a well formulated > > > introduction of the emerging area of "cloud scale" storage / "nosql" > > systems > > > to the largest elite gathering of database and data processing > > researchers > > > in the world. The presentation was importantly also a call for > > participation > > > in the future development and directions of the new and growing "nosql" > > > constellation. Such participation, whether it is specific involvement > > with > > > the HBase project or not, would be and is most welcome as the problems > of > > > serving data at very large scale under "cloud" constraints is an area > of > > > both significant challenge and significant promise. HBase like other > > > projects in this area are in an early stage of development. They cover > > the > > > use cases of their creators but, as answers to the larger set of > > problems, > > > they are not -- that space is untapped and only waiting for creativity > > and > > > effort. I > > > think I can speak for HBase in particular, we welcome this and would > be > > > pleased to assist at every opportunity. > > > > > > > > - Andy > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, > > and Computer Science > > > > > > >