I have worked with Sebastian Schelter closely for some time and have had a
number of contacts with the rest of the TU team and to a lesser degree with
the UCI and UCSD academic collaborators and think that Stratosphere would
make an excellent project.

Moreover, I think that the academic tradition that Stratosphere springs
from is a short walk from the Apache way so incubation should be relatively
painless.



On Sun, Mar 30, 2014 at 12:14 AM, Alan Gates <ga...@hortonworks.com> wrote:

> I would like to propose Stratosphere as an Apache Incubator project.  I
> have posted the proposal to
> https://wiki.apache.org/incubator/StratosphereProposal and posted the
> text of the proposal below.
>
> Alan.
>
> = Stratosphere =
>
> == Abstract ==
> Stratosphere is an open source system for parallel data analysis.
> Stratosphere deeply integrates MapReduce and database technologies to
> provide expressive and optimizable programming interfaces and at the same
> time efficient and scalable execution.
>
> == Proposal ==
> Stratosphere is an open source system for expressive, declarative, fast,
> and efficient data analysis. Stratosphere combines the scalability and
> programming flexibility of distributed MapReduce-like platforms with the
> efficiency, out-of-core execution, and query optimization capabilities
> found in parallel databases.
>
> == Background ==
> There is currently a need for general-purpose cluster computing platforms
> that are compatible with the Hadoop ecosystem, are more efficient, easier
> to use, and can support more applications than Hadoop MapReduce, but are
> not restricted to a specific data model and language (such as the
> relational model and a variant of SQL). Stratosphere fulfils these needs.
>
> Stratosphere exposes expressive APIs in Java and Scala (conceptually
> similar to Spark, Cascading, Scalding) that allow arbitrary user-defined
> functions in the same language and data model that the program is written
> in. Stratosphere programs pass through a cost-based optimizer that finds
> the best execution path for these programs depending on the data and
> cluster characteristics. The design and implementation of Stratosphere is
> based on research that generalizes query optimizers in relational
> databases. Stratosphere has a distributed runtime that is architected upon
> the principles of parallel databases, providing true pipelining (a basis
> for stream processing) and efficient out-of-core algorithms for grouping,
> sorting, joining, and aggregating data. Stratosphere provides first-class
> support for iterative algorithms via a built-in iterate operator, covering
> Machine Learning and graph analysis use cases. It achieves performance
> similar to Apache Giraph without being a specialized graph processing
> system.
>
> Stratosphere has undergone three major releases (v0.1, v0.2, v0.4) and
> some minor ones.
>
> == Rationale ==
> Stratosphere started out in 2008 as a research project by the Technical
> University of Berlin, the Humboldt University of Berlin, and the Hasso
> Plattner Institute, and has received subsequent funding from the German
> Research Council, the European Institute of Innovation and Technology, the
> European Commision, and industry.
>
> The traction of Stratosphere has by far exceeded our initial expectations,
> and we are therefore seeking an organizational long-term home for
> Stratosphere beyond the University walls that will house and further
> encourage contributors from companies and other organizations that are
> interested in Stratosphere. We believe that the Apache Software Foundation
> is the ideal home for Stratosphere. Stratosphere integrates with several
> existing Apache projects, such as HDFS, YARN, HBase, and Avro. The team is
> familiar with the Apache processes and fully subscribes to the Apache
> mission. One of the proposing members is a long-time Apache contributor and
> PMC member.
>
> == Initial Goals ==
>  * Move the existing codebase to Apache
>  * Integrate with the Apache development process
>  * Ensure all dependencies are compliant with Apache License version 2.0
>  * Incremental development and releases per Apache guidelines
>
>
> == Current Status ==
> === Meritocracy ===
> Stratosphere operated on meritocratic principles from the get go. The
> initial project proposal submitted to the German Research Council
> in 2008 stated that all code developed in the project will be released as
> open source under the Apache 2 license. Currently, all the
> discussions pertaining to Stratosphere development are public on [[
> https://github.com/stratosphere/stratosphere|GitHub]]  and our [[
> https://groups.google.com/forum/#!forum/stratosphere-dev|mailing list]].
> The current incubation proposal includes the major code contributors to
> Stratosphere. Several additional people have worked on the Stratosphere
> codebase for research prototypes and industry use cases and would be
> interested in becoming committers. We are starting with a small committer
> group and we plan to add additional committers following an open
> merit-based decision process during the incubation phase.
>
> === Community ===
> Currently, the core of Stratosphere is developed at TU Berlin, mainly by
> the committers listed in this proposal. Additional people from several
> Universities and companies in Europe are working with Stratosphere and are
> interested in becoming committers to the project.
>
> During the years, Stratosphere has been adopted as a platform for research
> and teaching in several Universities (TU Berlin, HU Berlin, HPI, RWTH,
> Inria, KTH, U. Trento, UCSD, and others), and it is currently witnessing
> its first industrial installations. We are seeing a rapidly growing
> interest in Stratosphere by both startups and large companies, as well as a
> growing community (our first [[
> http://stratosphere.eu/events/2013/summit.html|Stratosphere Summit]] in
> November 2013 attracted over 80 participants). Stratosphere was recently
> accepted as a mentoring organization in Google Summer of Code 2014.
>
> We believe that acceptance in the Apache Software Foundation will
> consolidate the current community under one organizational umbrella, and
> most importantly accelerate the growth of the community.
>
> === Core developers ===
> The core developers of the system are Stephan Ewen, Fabian Hueske, Daniel
> Warneke, Robert Metzger, Ufuk Celebi, and Aljoscha Krettek, who are all
> committers in the current proposal.
>
> === Alignment ===
> Stratosphere is compatible with, and related to several Apache projects.
> Stratosphere re-uses parts of Apache Hadoop, in particular HDFS and YARN,
> as well as Apache HBase and Apache Avro. Stratosphere is a very good
> compilation target for query languages such as Apache Hive and Apache Pig.
>
> == Known Risks ==
> === Orphaned Products ===
> There is strong interest in Stratosphere by several companies and
> organizations, and there is currently a long-term commitment to fund
> salaried developers for Stratosphere by public and private organizations in
> Europe.
>
> === Inexperience with Open Source ===
> Sebastian Schelter is a committer and PMC member of Apache Mahout and
> Apache Giraph, member of the Apache Software Foundation, member of the
> Incubator PMC and project mentor for Apache Drill. Sebastian, along with
> our mentors, will guide the rest of the committers that have experience
> with releasing software as open source but little experience in
> participating in an open source project besides Stratosphere itself.
>
> In mid-2013 Stratosphere transitioned from an “open source project with
> publicly accessible source code” to an open source project that puts the
> community first. We moved from a University-hosted git repository to
> GitHub, where we discuss all issues publicly. This also includes release
> planning (via GitHub’s milestone feature) and code reviews. We also moved
> our build system to the publicly available Travis-CI. The mailing lists are
> hosted with Google Groups, we use the public Maven repository
> infrastructure of Sonatype. The source code of the www.stratosphere.euwebsite 
> is publicly available and is meant to be changed by external
> contributors (for example for documentation purposes).
>
> === Homogeneous Developers ===
> Most committers in this proposal belong to the same institution (TU
> Berlin). The engagement of these committers goes well beyond the necessary
> development to support research, and all committers work on Stratosphere in
> their free time. Several people from other institutions are working on and
> are familiar with the Stratosphere codebase. We will work to attract them
> as future committers during the incubation phase, following a merit-based
> approach.
>
> === Reliance on Salaried Developers ===
> Currently, Stratosphere receives support from salaried developers, in
> particular from graduate students at TU Berlin that are funded by the
> German Research Council, the European Institute of Technology, and the
> European Commission. These students work in their free time on Stratosphere
> in addition to their employment.
>
> We expect that Stratosphere development will occur on both salaried and
> volunteer time. We will recruit additional committers, including
> non-salaried developers, and we will work to ensure that the project will
> move forward independently of salaried developers.
>
> === Relationship with Other Apache Products ===
> Stratosphere interfaces with several existing Apache projects: Apache
> HBase for storage, Apache Hadoop (HDFS for storage, YARN for resource
> management, and Stratosphere contains a generic wrapper for Hadoop
> MapReduce input formats), and Apache Avro (for serialization). Stratosphere
> uses Apache Maven and Apache Commons libraries internally. Stratosphere can
> be a great compilation target for Apache Pig and Apache Hive, although such
> functionality is not yet implemented.
>
> Stratosphere is also related with several projects undergoing incubation
> in the Apache Incubation project, such as Tez, Drill, and Spark
> (graduated). While all these projects target sufficiently different spaces
> and have different architectures, it would be interesting to explore code
> reuse possibilities. For example, we are currently basing our design for
> compiling SQL to Stratosphere on the Optiq library, also used by Apache
> Drill.
>
> === An Excessive Fascination with the Apache Brand ===
> We believe that the Apache brand will help us attract contributors to
> Stratosphere, by giving us a well-defined, transparent development process
> under a known brand. At the same time, Stratosphere already has a healthy
> community and current funding guarantees the further codebase development
> and growth of the project for the next 3-5 years. The reason for this
> proposal is not to gain publicity, but to further strengthen the longevity
> of the project as explained in the Rationale section.
>
> == Documentation ==
>  * [[https://stratosphere.eu|Project website]]
>  * [[http://stratosphere.eu/docs/0.4/|Documentation]]
>  * [[https://github.com/stratosphere/stratosphere|Codebase]]
>  * [[https://groups.google.com/forum/#!forum/stratosphere-dev|Mailinglist]]
>
> == Initial Source ==
> Stratosphere is hosted on [[
> https://github.com/stratosphere/stratosphere|GitHub]] . This is the
> codebase that we will migrate to the Apache Foundation. The code was
> previously hosted on a TU Berlin’s own git infrastructure. It has always
> been Apache 2.0 licensed.
>
> === Source and Intellectual Property Submission Plan ===
> All initial and past committers will sign a CLA with the ASF while the
> incubator proposal for Stratosphere is being discussed. All organizations
> that have employed Stratosphere contributors in the past will sign a SGA.
> Current contributors will sign a CCLA. All major contributors are still
> active in the project.
>
> === External Dependencies ===
> All critical dependencies are, to the extend of our knowledge, from other
> Apache projects. These include Apache Hadoop (for YARN and HDFS) and some
> libraries (log4j, commons codec, junit and more). Our web frontend uses
> some MIT-licensed JavaScript libraries.
>
> == Required Resources ==
>
> === Mailing list ===
> We will migrate our mailing lists to the following:
>  * us...@stratosphere.incubator.apache.org
>  * d...@stratosphere.incubator.apache.org
>  * priv...@stratosphere.incubator.apache.org
>  * comm...@stratosphere.incubator.apache.org
>
> === Source control ===
> We would like to use Git for source control and enable GitHib mirroring
> functionality, where code reviews on GitHub are automatically
> forwarded to the developer mailing list. (See also: [[
> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and]
> ])
>
>
> === Issue tracking ===
> We are currently using GitHub for issue tracking. We request an
> Apache-hosted JIRA, and we will import existing issues there.
>
>
> == Initial committers ==
>  * Stephan Ewen - stephan.e...@tu-berlin.de
>  * Fabian Hueske - fabian.hue...@tu-berlin.de
>  * Daniel Warneke - warn...@posteo.de
>  * Robert Metzger - metrob...@gmail.com
>  * Ufuk Celebi - u.cel...@fu-berlin.de
>  * Aljoscha Krettek - aljoscha.kret...@gmail.com
>  * Kostas Tzoumas - kostas.tzou...@tu-berlin.de
>  * Sebastian Schelter  - s...@apache.org
>
> === Affiliations ===
>  * Stephan Ewen (TU Berlin)
>  * Fabian Hueske (TU Berlin)
>  * Daniel Warneke (Amadeus IT Group)
>  * Robert Metzger (TU Berlin)
>  * Ufuk Celebi (FU Berlin)
>  * Aljoscha Krettek (TU Berlin)
>  * Kostas Tzoumas (TU Berlin)
>  * Sebastian Schelter (TU Berlin)
>
> == Sponsors ==
> === Champion ===
> Alan Gates (ga...@apache.org)
>
> === Nominated Mentors ===
>  * Sean Owen (sro...@apache.org) (Note: Sean is an Apache member but not
> currently on the IPC, he will need to request IPMC membership)
>  * Ted Dunning (tdunn...@apache.org)
>  * Owen O'Malley (omal...@apache.org)
>
> === Sponsoring Entity ===
> The Apache Incubator
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Reply via email to