Re: [PROPOSAL] Gora to enter Incubator

Enis Soztutar Thu, 16 Sep 2010 06:56:24 -0700

Hi

On Thu, Sep 16, 2010 at 4:43 PM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:


> Hi Enis,
>
> Thanks. Let’s leave the VOTE open until Friday evening Pacific time to give
> folks time to look at the proposal. Technically we didn’t do a VOTE thread
> on this yet though and I saw that some folks just asked Kitty to explicitly
> post a [VOTE] thread, so we may want to do that here.
>

Sorry, I wasn't implying that the vote is over, just a friendly welcome to
new comers. I agree that a [VOTE] thread might be necessary if that is the
concensus.


>
> What do folks think? If so, then I’ll call a VOTE thread around Friday
> evening, and leave it open for 72 hours.
>
> Cheers,
> Chris
>
>
> On 9/16/10 5:57 AM, "Enis Soztutar" <enis.soz.nu...@gmail.com> wrote:
>
> Thanks to everyone who have shown interest in the project.
>
>
> I think Chris added all, who has indicated their support as committers. On
> behalf of the initial comitters, welcome on board : )
>
> Just FYI in case you want to start contributing right away, we use github
> as
> the collaboration tool until we can establish the
> infrastructure at Apache.
>
> Cheers,
> Enis
>
> On Tue, Sep 14, 2010 at 11:51 PM, Tom White <tomwh...@apache.org> wrote:
>
> > +1 Sounds very interesting. I'd be happy to help out as a mentor.
> >
> > Cheers,
> > Tom
> >
> > On Mon, Sep 13, 2010 at 6:10 AM, Enis Soztutar <enis.soz.nu...@gmail.com
> >
> > wrote:
> > > Hi all,
> > >
> > > We would like to announce the Proposal for Gora, an ORM for Colum
> Stores,
> > > for the Apache Incubation. We believe that Gora can find a nice home at
> > > Apache.
> > >
> > > Wiki of the proposal can be found at
> > > http://wiki.apache.org/incubator/GoraProposal
> > >
> > > The proposal is as below.
> > >
> > >
> > > = Gora Proposal for Apache Incubation =
> > >
> > > == Abstract ==
> > > Gora is an ORM framework for column stores such as Apache HBase and
> > Apache
> > > Cassandra with a specific focus on Hadoop.
> > >
> > > == Proposal ==
> > > Although there are various excellent ORM frameworks for relational
> > > databases, data modeling in NoSQL data stores differ profoundly from
> > their
> > > relational cousins. Moreover, data-model agnostic frameworks such as
> JDO
> > are
> > > not sufficient for use cases, where one needs to use the full power of
> > the
> > > data models in column stores. Gora fills this gap by giving the user an
> > > easy-to-use ORM framework with data store specific mappings and built
> in
> > > Apache Hadoop support.
> > >
> > > The overall goal for Gora is to become the standard data representation
> > and
> > > persistence framework for big data. The roadmap of Gora can be grouped
> as
> > > follows.
> > >
> > >  * Data Persistence : Persisting objects to Column stores such as
> HBase,
> > > Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc;
> > SQL
> > > databases, such as MySQL, HSQLDB, flat files in local file system of
> > Hadoop
> > > HDFS.
> > >  * Data Access : An easy to use Java-friendly common API for accessing
> > the
> > > data regardless of its location.
> > >  * Indexing : Persisting objects to Lucene and Solr indexes,
> > > accessing/querying the data with Gora API.
> > >  * Analysis : Accesing the data and making analysis through adapters
> for
> > > Apache Pig, Apache Hive and Cascading
> > >  * MapReduce support : Out-of-the-box and extensive MapReduce (Apache
> > > Hadoop) support for data in the data store.
> > >
> > > == Background ==
> > > ORM stands for Object Relation Mapping. It is a technology which
> abstacts
> > > the persistency layer
> > > (mostly Relational Databases) so that plain domain level objects can be
> > > used, without the cumbersome effort to save/load the data to and from
> the
> > > database. Gora differs from current solutions in that:
> > >  * Gora is specially focussed at NoSQL data stores, but also has
> limited
> > > support for SQL databases
> > >  * The main use case for Gora is to access/analyze big data using
> Hadoop.
> > >  * Gora uses Avro for bean definition, not byte code enhancement or
> > > annotations
> > >  * Object-to-data store mappings are backend specific, so that full
> data
> > > model can be utilized.
> > >  * Gora is simple since it ignores complex SQL mappings
> > >  * Gora will support persistence, indexing and anaysis of data, using
> > Pig,
> > > Lucene, Hive, etc
> > >
> > > == Rationale ==
> > > ORM frameworks are nothing new. But with the explosion of data
> generated
> > in
> > > Terabytes and even Petabytes, NoSQL data stores are gaining
> > ever-increasing
> > > popularity. Coupled with limited support to already-proven Apache
> Hadoop
> > > support in current ORM frameworks, there was a need for a new project.
> > >
> > > Gora is currently hosted at Github. However, Gora has ties to ASF in
> many
> > > ways. As detailed in the proposal section, Gora will be a high level
> > client
> > > for many Apache projects and subprojects including Hadoop(common, hdfs,
> > and
> > > mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora
> > > already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gora started
> > its
> > > life inside Apache Nutch project, and now Nutch trunk uses Gora as a
> > > library. Even more, the initial set of committers are all ASF members.
> > > Therefore, we think that Apache will be an excellent home for Gora.
> > >
> > > == Initial Goals ==
> > > Initial goals for Gora can be summarized as:
> > >  * Iron out the remaining issues with HBase, Cassandra and SQL support.
> > >  * Make the first release before the end of the year.
> > >  * Improve documentation
> > >  * Support for Cascading
> > >
> > > == Current Status ==
> > > === Meritocracy ===
> > > Current commit rights belong to the initial list of committers four of
> > who
> > > are also ASF members. All the developers have extensive experience with
> > > Apache projects. We honor the meritocracy policy of ASF foundation.
> > >
> > > === Community ===
> > > Gora’s community mostly overlap with that of Nutch, Hadoop, HBase, Avro
> > and
> > > Cassandra. We
> > > have a small community for now (5 initial committers, 18 people
> tracking
> > the
> > > project at Github), but have been piggybacking the Nutch community for
> a
> > > while. If Gora is accepted to Apache Incubator, we expect more
> traction.
> > > Moreover, with the increasing popularity of NoSQL databases, we expect
> > more
> > > users.
> > >
> > > === Core Developers ===
> > > Gora was started by the initial code base inside Apache Nutch by
> Doğacan
> > > Güney. Then Enis Söztutar has refactored and re-architected the project
> > out
> > > of Nutch. Later Julien Nioche, Andrzej Bialecki and Doğacan has ported
> > Nutch
> > > to use the newly formed project. Later, Sertan Alkan has joined.
> Doğacan
> > and
> > > Julien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enis is
> an
> > > Apache Hadoop PMC member.
> > >
> > > === Alignment ===
> > > As discusssed in the second paragraph of Rationale Section, all of the
> > > current developers are Apache people, and four of them are PMC members,
> > > which shows that we have some experience with the Apache way. Moreover,
> > Gora
> > > is tightly related with lots of Apache projects, Nutch, Hadoop, HBase,
> > > Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has started its
> > life
> > > inside Nutch, and now nutch trunk uses Gora to persist web crawl data
> to
> > > HBase, Cassandra and MySQL, which means that Gora is a very critical
> > > component in Nutch.
> > >
> > > == Known Risks ==
> > > === Orphaned Products ===
> > > Most of the development depends on Enis and Doğacan for now. Both of
> them
> > > intent to continue Gora development. However, we also acknowledge that
> > more
> > > core developers are needed for the project to be truly successful. The
> > > general strategy to acquire more developers will be to acquire more
> > users,
> > > and encourage users to be active in the community and develop patches.
> > > Moreover, the next release of Nutch planned before the end of 2010 has
> > > extensive Gora support. We expect more interest from Nutch community,
> and
> > we
> > > will continue to announce Gora notifications at Hadoop,HBase and
> > Cassandra
> > > mailing lists.
> > >
> > > === Inexperience with Open Source ===
> > > We believe that all of the developers have extensive open source
> > experience.
> > > Four of the initial committers are apache members. The codebase is also
> > open
> > > source since April 2010. We also have some documentation, wiki pages,
> > issue
> > > tracker and dev mailing list.
> > >
> > > === Homogeneous Developers ===
> > > We have a semi-distributed development environment where Doğacan, Enis
> > and
> > > Sertan share the same office, but Andrzej and Julien are independent.
> > With
> > > the aim of acquiring more developers, we expect more heterogeneous
> > > development.
> > >
> > > === Reliance on Salaried Developers ===
> > > Gora development have been supported by [[ant.com]]  search engine as
> > > contract work. It is expected that this contract will continue in the
> > > future. However, even without sponsors, we are commited to continue on
> > Gora
> > > development, since we believe in the technology it brings and it’s
> vital
> > > role in Nutch, and our other closed sourced projects.
> > >
> > > === Relationships with Other Apache Products ===
> > > Gora will be tightly related to lots of Apache projects:
> > >
> > >  * Nutch : Apache nutch was to home to Gora’s initial code base. Now,
> > Nutch
> > > trunk uses Gora as a library. The next relase of Nutch, planned before
> > the
> > > end of 2010 will be using Gora’s first release.
> > >  * Hadoop : Gora has extensive support for Hadoop MapReduce Gora
> defines
> > all
> > > the necessary data structures for working with Hadoop .Data stored in
> > column
> > > oriented data stores can be analyzed  with Gora using Hadoop.
> > >  * Avro : Gora uses and extends Avro. Data beans in Gora are defined
> > using
> > > Avro schemas ,and compiled into Java code with the extended version of
> > the
> > > Avro compiler. Avro is also used in data serialization.
> > >  * HBase : Gora supports HBase as a persistency backend.
> > >  * Cassandra : Gora support Cassandra as a persistency backend.
> > >  * Lucene/Solr : Gora intends to support Lucene/Solr as a persistency
> and
> > > indexing backend.
> > >  * Pig : Gora intends to support Pig for data anaysis
> > >  * Hive :  Gora intends to support Hive for data analysis
> > >
> > > === An Excessive Fascination with the Apache Brand ===
> > > Gora is a natural fit for Apache due to it's current commiters and
> > depending
> > > projects.
> > >
> > > == Documentation ==
> > >  * The project is currently hosted at http://github.com/enis/gora/.
> > >  * Wiki pages can be found at http://wiki.github.com/enis/gora/.
> > >  * List of issues can be found at  http://github.com/enis/gora/issues/
> .
> > >  * Current web address: http://groups.google.com/group/gora-dev.
> > >  * Current email address: gora-...@googlegroups.com.
> > >
> > > == Initial Source ==
> > > The initial source was developed as a patch to the Apache Nutch
> project.
> > But
> > > the storage abstraction layer was orthogonal to the web crawler, and we
> > > decided to extract it to a separate project with much wider goals. Thus
> > > Gora, as a project, was born. The initial code is developed by Enis and
> > > Dogacan with ant.com’s sponsorship.
> > >
> > > The code can be found at http://github.com/enis/gora/.
> > >
> > > == External Dependencies ==
> > > External dependencies excluding Apache projects are as follows
> > >  * JDOM - http://jdom.org/ -  Apache-style license
> > >  * SQL Builder - http://openhms.sourceforge.net/sqlbuilder/ - Artistic
> > > License, LGPL. SQL Builder is intended to be removed from the source
> due
> > to
> > > technical reasons anyway.
> > >  * HSQLDB - http://hsqldb.org/ - BSD-style license
> > >  * JUnit - http://junit.org - Common Public License 1.0
> > >  * SLF4J - http://www.slf4j.org/ - MIT License
> > >  * Google Guava Libraries - http://code.google.com/p/guava-libraries/-
> > > Apache License 2.0
> > >
> > >
> > > == Required Resources ==
> > >
> > > === Mailing Lists ===
> > >
> > >  * gora-private (with moderated subscriptions)
> > >  * gora-dev
> > >  * gora-commits
> > >
> > > === Subversion Directory ===
> > >
> > >  * [[http://svn.apache.org/repos/asf/incubator/gora]]
> > >
> > > === Issue Tracking ===
> > >  * JIRA (GORA)
> > >
> > > === Other Resources ===
> > > We need a wiki at http://wiki.apache.org. Currently, we have a wiki at
> > > Github, Since there is not a lot of pages there, we can manually move
> the
> > > pages to the wiki at wiki.apache.org.
> > >
> > > == Initial Committers ==
> > >
> > > Name                   email
> > > Affiliation        Timezone
> > > Enis Söztutar       enis [at] apache.org           Konneka         +3
> > > Doğacan Güney  dogacan [at] apache.org    Konneka         +3
> > > Sertan Alkan       sertanalkan [at] gmail.com Konneka         +3
> > > Julien Nioche       jnioche [at] apache.org      DigitalPebble  +1
> > > Andrzej Bialecki   ab [at] apache.org             Sigram
> > >
> > >
> > > === Affiliations ===
> > > All of the parties are affiliated with open source consulting shops.
> Most
> > of
> > > the development was sponsored by ant.com, however we expect that the
> > amount
> > > of volunteer work will increase, and more developers will come on
> board.
> > >
> > > == Sponsors ==
> > >
> > > === Champion ===
> > >  * Chris Mattmann (mattmann AT apache DOT org)
> > >
> > > === Nominated Mentors ===
> > >  * Chris Mattmann (mattmann AT apache DOT org)
> > >  * Andrzej Bialecki (ab AT apache DOT org )
> > >
> > > === Sponsoring Entity ===
> > > Apache Incubator. Successful graduation can result in either being a
> TLP,
> > or
> > > a subproject of
> > > Hadoop, since most of the community is projected to overlap.
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.mattm...@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: [PROPOSAL] Gora to enter Incubator

Reply via email to