Hi On Thu, Sep 16, 2010 at 4:43 PM, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote:
> Hi Enis, > > Thanks. Let’s leave the VOTE open until Friday evening Pacific time to give > folks time to look at the proposal. Technically we didn’t do a VOTE thread > on this yet though and I saw that some folks just asked Kitty to explicitly > post a [VOTE] thread, so we may want to do that here. > Sorry, I wasn't implying that the vote is over, just a friendly welcome to new comers. I agree that a [VOTE] thread might be necessary if that is the concensus. > > What do folks think? If so, then I’ll call a VOTE thread around Friday > evening, and leave it open for 72 hours. > > Cheers, > Chris > > > On 9/16/10 5:57 AM, "Enis Soztutar" <enis.soz.nu...@gmail.com> wrote: > > Thanks to everyone who have shown interest in the project. > > > I think Chris added all, who has indicated their support as committers. On > behalf of the initial comitters, welcome on board : ) > > Just FYI in case you want to start contributing right away, we use github > as > the collaboration tool until we can establish the > infrastructure at Apache. > > Cheers, > Enis > > On Tue, Sep 14, 2010 at 11:51 PM, Tom White <tomwh...@apache.org> wrote: > > > +1 Sounds very interesting. I'd be happy to help out as a mentor. > > > > Cheers, > > Tom > > > > On Mon, Sep 13, 2010 at 6:10 AM, Enis Soztutar <enis.soz.nu...@gmail.com > > > > wrote: > > > Hi all, > > > > > > We would like to announce the Proposal for Gora, an ORM for Colum > Stores, > > > for the Apache Incubation. We believe that Gora can find a nice home at > > > Apache. > > > > > > Wiki of the proposal can be found at > > > http://wiki.apache.org/incubator/GoraProposal > > > > > > The proposal is as below. > > > > > > > > > = Gora Proposal for Apache Incubation = > > > > > > == Abstract == > > > Gora is an ORM framework for column stores such as Apache HBase and > > Apache > > > Cassandra with a specific focus on Hadoop. > > > > > > == Proposal == > > > Although there are various excellent ORM frameworks for relational > > > databases, data modeling in NoSQL data stores differ profoundly from > > their > > > relational cousins. Moreover, data-model agnostic frameworks such as > JDO > > are > > > not sufficient for use cases, where one needs to use the full power of > > the > > > data models in column stores. Gora fills this gap by giving the user an > > > easy-to-use ORM framework with data store specific mappings and built > in > > > Apache Hadoop support. > > > > > > The overall goal for Gora is to become the standard data representation > > and > > > persistence framework for big data. The roadmap of Gora can be grouped > as > > > follows. > > > > > > * Data Persistence : Persisting objects to Column stores such as > HBase, > > > Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; > > SQL > > > databases, such as MySQL, HSQLDB, flat files in local file system of > > Hadoop > > > HDFS. > > > * Data Access : An easy to use Java-friendly common API for accessing > > the > > > data regardless of its location. > > > * Indexing : Persisting objects to Lucene and Solr indexes, > > > accessing/querying the data with Gora API. > > > * Analysis : Accesing the data and making analysis through adapters > for > > > Apache Pig, Apache Hive and Cascading > > > * MapReduce support : Out-of-the-box and extensive MapReduce (Apache > > > Hadoop) support for data in the data store. > > > > > > == Background == > > > ORM stands for Object Relation Mapping. It is a technology which > abstacts > > > the persistency layer > > > (mostly Relational Databases) so that plain domain level objects can be > > > used, without the cumbersome effort to save/load the data to and from > the > > > database. Gora differs from current solutions in that: > > > * Gora is specially focussed at NoSQL data stores, but also has > limited > > > support for SQL databases > > > * The main use case for Gora is to access/analyze big data using > Hadoop. > > > * Gora uses Avro for bean definition, not byte code enhancement or > > > annotations > > > * Object-to-data store mappings are backend specific, so that full > data > > > model can be utilized. > > > * Gora is simple since it ignores complex SQL mappings > > > * Gora will support persistence, indexing and anaysis of data, using > > Pig, > > > Lucene, Hive, etc > > > > > > == Rationale == > > > ORM frameworks are nothing new. But with the explosion of data > generated > > in > > > Terabytes and even Petabytes, NoSQL data stores are gaining > > ever-increasing > > > popularity. Coupled with limited support to already-proven Apache > Hadoop > > > support in current ORM frameworks, there was a need for a new project. > > > > > > Gora is currently hosted at Github. However, Gora has ties to ASF in > many > > > ways. As detailed in the proposal section, Gora will be a high level > > client > > > for many Apache projects and subprojects including Hadoop(common, hdfs, > > and > > > mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora > > > already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gora started > > its > > > life inside Apache Nutch project, and now Nutch trunk uses Gora as a > > > library. Even more, the initial set of committers are all ASF members. > > > Therefore, we think that Apache will be an excellent home for Gora. > > > > > > == Initial Goals == > > > Initial goals for Gora can be summarized as: > > > * Iron out the remaining issues with HBase, Cassandra and SQL support. > > > * Make the first release before the end of the year. > > > * Improve documentation > > > * Support for Cascading > > > > > > == Current Status == > > > === Meritocracy === > > > Current commit rights belong to the initial list of committers four of > > who > > > are also ASF members. All the developers have extensive experience with > > > Apache projects. We honor the meritocracy policy of ASF foundation. > > > > > > === Community === > > > Gora’s community mostly overlap with that of Nutch, Hadoop, HBase, Avro > > and > > > Cassandra. We > > > have a small community for now (5 initial committers, 18 people > tracking > > the > > > project at Github), but have been piggybacking the Nutch community for > a > > > while. If Gora is accepted to Apache Incubator, we expect more > traction. > > > Moreover, with the increasing popularity of NoSQL databases, we expect > > more > > > users. > > > > > > === Core Developers === > > > Gora was started by the initial code base inside Apache Nutch by > Doğacan > > > Güney. Then Enis Söztutar has refactored and re-architected the project > > out > > > of Nutch. Later Julien Nioche, Andrzej Bialecki and Doğacan has ported > > Nutch > > > to use the newly formed project. Later, Sertan Alkan has joined. > Doğacan > > and > > > Julien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enis is > an > > > Apache Hadoop PMC member. > > > > > > === Alignment === > > > As discusssed in the second paragraph of Rationale Section, all of the > > > current developers are Apache people, and four of them are PMC members, > > > which shows that we have some experience with the Apache way. Moreover, > > Gora > > > is tightly related with lots of Apache projects, Nutch, Hadoop, HBase, > > > Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has started its > > life > > > inside Nutch, and now nutch trunk uses Gora to persist web crawl data > to > > > HBase, Cassandra and MySQL, which means that Gora is a very critical > > > component in Nutch. > > > > > > == Known Risks == > > > === Orphaned Products === > > > Most of the development depends on Enis and Doğacan for now. Both of > them > > > intent to continue Gora development. However, we also acknowledge that > > more > > > core developers are needed for the project to be truly successful. The > > > general strategy to acquire more developers will be to acquire more > > users, > > > and encourage users to be active in the community and develop patches. > > > Moreover, the next release of Nutch planned before the end of 2010 has > > > extensive Gora support. We expect more interest from Nutch community, > and > > we > > > will continue to announce Gora notifications at Hadoop,HBase and > > Cassandra > > > mailing lists. > > > > > > === Inexperience with Open Source === > > > We believe that all of the developers have extensive open source > > experience. > > > Four of the initial committers are apache members. The codebase is also > > open > > > source since April 2010. We also have some documentation, wiki pages, > > issue > > > tracker and dev mailing list. > > > > > > === Homogeneous Developers === > > > We have a semi-distributed development environment where Doğacan, Enis > > and > > > Sertan share the same office, but Andrzej and Julien are independent. > > With > > > the aim of acquiring more developers, we expect more heterogeneous > > > development. > > > > > > === Reliance on Salaried Developers === > > > Gora development have been supported by [[ant.com]] search engine as > > > contract work. It is expected that this contract will continue in the > > > future. However, even without sponsors, we are commited to continue on > > Gora > > > development, since we believe in the technology it brings and it’s > vital > > > role in Nutch, and our other closed sourced projects. > > > > > > === Relationships with Other Apache Products === > > > Gora will be tightly related to lots of Apache projects: > > > > > > * Nutch : Apache nutch was to home to Gora’s initial code base. Now, > > Nutch > > > trunk uses Gora as a library. The next relase of Nutch, planned before > > the > > > end of 2010 will be using Gora’s first release. > > > * Hadoop : Gora has extensive support for Hadoop MapReduce Gora > defines > > all > > > the necessary data structures for working with Hadoop .Data stored in > > column > > > oriented data stores can be analyzed with Gora using Hadoop. > > > * Avro : Gora uses and extends Avro. Data beans in Gora are defined > > using > > > Avro schemas ,and compiled into Java code with the extended version of > > the > > > Avro compiler. Avro is also used in data serialization. > > > * HBase : Gora supports HBase as a persistency backend. > > > * Cassandra : Gora support Cassandra as a persistency backend. > > > * Lucene/Solr : Gora intends to support Lucene/Solr as a persistency > and > > > indexing backend. > > > * Pig : Gora intends to support Pig for data anaysis > > > * Hive : Gora intends to support Hive for data analysis > > > > > > === An Excessive Fascination with the Apache Brand === > > > Gora is a natural fit for Apache due to it's current commiters and > > depending > > > projects. > > > > > > == Documentation == > > > * The project is currently hosted at http://github.com/enis/gora/. > > > * Wiki pages can be found at http://wiki.github.com/enis/gora/. > > > * List of issues can be found at http://github.com/enis/gora/issues/ > . > > > * Current web address: http://groups.google.com/group/gora-dev. > > > * Current email address: gora-...@googlegroups.com. > > > > > > == Initial Source == > > > The initial source was developed as a patch to the Apache Nutch > project. > > But > > > the storage abstraction layer was orthogonal to the web crawler, and we > > > decided to extract it to a separate project with much wider goals. Thus > > > Gora, as a project, was born. The initial code is developed by Enis and > > > Dogacan with ant.com’s sponsorship. > > > > > > The code can be found at http://github.com/enis/gora/. > > > > > > == External Dependencies == > > > External dependencies excluding Apache projects are as follows > > > * JDOM - http://jdom.org/ - Apache-style license > > > * SQL Builder - http://openhms.sourceforge.net/sqlbuilder/ - Artistic > > > License, LGPL. SQL Builder is intended to be removed from the source > due > > to > > > technical reasons anyway. > > > * HSQLDB - http://hsqldb.org/ - BSD-style license > > > * JUnit - http://junit.org - Common Public License 1.0 > > > * SLF4J - http://www.slf4j.org/ - MIT License > > > * Google Guava Libraries - http://code.google.com/p/guava-libraries/- > > > Apache License 2.0 > > > > > > > > > == Required Resources == > > > > > > === Mailing Lists === > > > > > > * gora-private (with moderated subscriptions) > > > * gora-dev > > > * gora-commits > > > > > > === Subversion Directory === > > > > > > * [[http://svn.apache.org/repos/asf/incubator/gora]] > > > > > > === Issue Tracking === > > > * JIRA (GORA) > > > > > > === Other Resources === > > > We need a wiki at http://wiki.apache.org. Currently, we have a wiki at > > > Github, Since there is not a lot of pages there, we can manually move > the > > > pages to the wiki at wiki.apache.org. > > > > > > == Initial Committers == > > > > > > Name email > > > Affiliation Timezone > > > Enis Söztutar enis [at] apache.org Konneka +3 > > > Doğacan Güney dogacan [at] apache.org Konneka +3 > > > Sertan Alkan sertanalkan [at] gmail.com Konneka +3 > > > Julien Nioche jnioche [at] apache.org DigitalPebble +1 > > > Andrzej Bialecki ab [at] apache.org Sigram > > > > > > > > > === Affiliations === > > > All of the parties are affiliated with open source consulting shops. > Most > > of > > > the development was sponsored by ant.com, however we expect that the > > amount > > > of volunteer work will increase, and more developers will come on > board. > > > > > > == Sponsors == > > > > > > === Champion === > > > * Chris Mattmann (mattmann AT apache DOT org) > > > > > > === Nominated Mentors === > > > * Chris Mattmann (mattmann AT apache DOT org) > > > * Andrzej Bialecki (ab AT apache DOT org ) > > > > > > === Sponsoring Entity === > > > Apache Incubator. Successful graduation can result in either being a > TLP, > > or > > > a subproject of > > > Hadoop, since most of the community is projected to overlap. > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: chris.mattm...@jpl.nasa.gov > WWW: http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >