Thanks Alan, I believe Sebastian had added me to the list. - Henry
On Mon, Apr 7, 2014 at 7:59 AM, Alan Gates <ga...@hortonworks.com> wrote: > Henry, definitely glad to have you on board. Ashutosh Chauhan (new to the > IPMC) has also expressed his interest to me offline in being a mentor. I’ll > add both of you the proposal. > > Alan. > > On Apr 6, 2014, at 10:34 AM, Henry Saputra <henry.sapu...@gmail.com> wrote: > >> Hi Guys, >> >> The proposal looks great and I would love to help to sign up as a >> Mentor if you guys still have space for one. >> >> >> - Henry >> >> >> On Sun, Mar 30, 2014 at 12:14 AM, Alan Gates <ga...@hortonworks.com> wrote: >>> I would like to propose Stratosphere as an Apache Incubator project. I >>> have posted the proposal to >>> https://wiki.apache.org/incubator/StratosphereProposal and posted the text >>> of the proposal below. >>> >>> Alan. >>> >>> = Stratosphere = >>> >>> == Abstract == >>> Stratosphere is an open source system for parallel data analysis. >>> Stratosphere deeply integrates MapReduce and database technologies to >>> provide expressive and optimizable programming interfaces and at the same >>> time efficient and scalable execution. >>> >>> == Proposal == >>> Stratosphere is an open source system for expressive, declarative, fast, >>> and efficient data analysis. Stratosphere combines the scalability and >>> programming flexibility of distributed MapReduce-like platforms with the >>> efficiency, out-of-core execution, and query optimization capabilities >>> found in parallel databases. >>> >>> == Background == >>> There is currently a need for general-purpose cluster computing platforms >>> that are compatible with the Hadoop ecosystem, are more efficient, easier >>> to use, and can support more applications than Hadoop MapReduce, but are >>> not restricted to a specific data model and language (such as the >>> relational model and a variant of SQL). Stratosphere fulfils these needs. >>> >>> Stratosphere exposes expressive APIs in Java and Scala (conceptually >>> similar to Spark, Cascading, Scalding) that allow arbitrary user-defined >>> functions in the same language and data model that the program is written >>> in. Stratosphere programs pass through a cost-based optimizer that finds >>> the best execution path for these programs depending on the data and >>> cluster characteristics. The design and implementation of Stratosphere is >>> based on research that generalizes query optimizers in relational >>> databases. Stratosphere has a distributed runtime that is architected upon >>> the principles of parallel databases, providing true pipelining (a basis >>> for stream processing) and efficient out-of-core algorithms for grouping, >>> sorting, joining, and aggregating data. Stratosphere provides first-class >>> support for iterative algorithms via a built-in iterate operator, covering >>> Machine Learning and graph analysis use cases. It achieves performance >>> similar to Apache Giraph without being a specialized graph processing >>> system. >>> >>> Stratosphere has undergone three major releases (v0.1, v0.2, v0.4) and some >>> minor ones. >>> >>> == Rationale == >>> Stratosphere started out in 2008 as a research project by the Technical >>> University of Berlin, the Humboldt University of Berlin, and the Hasso >>> Plattner Institute, and has received subsequent funding from the German >>> Research Council, the European Institute of Innovation and Technology, the >>> European Commision, and industry. >>> >>> The traction of Stratosphere has by far exceeded our initial expectations, >>> and we are therefore seeking an organizational long-term home for >>> Stratosphere beyond the University walls that will house and further >>> encourage contributors from companies and other organizations that are >>> interested in Stratosphere. We believe that the Apache Software Foundation >>> is the ideal home for Stratosphere. Stratosphere integrates with several >>> existing Apache projects, such as HDFS, YARN, HBase, and Avro. The team is >>> familiar with the Apache processes and fully subscribes to the Apache >>> mission. One of the proposing members is a long-time Apache contributor and >>> PMC member. >>> >>> == Initial Goals == >>> * Move the existing codebase to Apache >>> * Integrate with the Apache development process >>> * Ensure all dependencies are compliant with Apache License version 2.0 >>> * Incremental development and releases per Apache guidelines >>> >>> >>> == Current Status == >>> === Meritocracy === >>> Stratosphere operated on meritocratic principles from the get go. The >>> initial project proposal submitted to the German Research Council >>> in 2008 stated that all code developed in the project will be released as >>> open source under the Apache 2 license. Currently, all the >>> discussions pertaining to Stratosphere development are public on >>> [[https://github.com/stratosphere/stratosphere|GitHub]] and our >>> [[https://groups.google.com/forum/#!forum/stratosphere-dev|mailing list]]. >>> The current incubation proposal includes the major code contributors to >>> Stratosphere. Several additional people have worked on the Stratosphere >>> codebase for research prototypes and industry use cases and would be >>> interested in becoming committers. We are starting with a small committer >>> group and we plan to add additional committers following an open >>> merit-based decision process during the incubation phase. >>> >>> === Community === >>> Currently, the core of Stratosphere is developed at TU Berlin, mainly by >>> the committers listed in this proposal. Additional people from several >>> Universities and companies in Europe are working with Stratosphere and are >>> interested in becoming committers to the project. >>> >>> During the years, Stratosphere has been adopted as a platform for research >>> and teaching in several Universities (TU Berlin, HU Berlin, HPI, RWTH, >>> Inria, KTH, U. Trento, UCSD, and others), and it is currently witnessing >>> its first industrial installations. We are seeing a rapidly growing >>> interest in Stratosphere by both startups and large companies, as well as a >>> growing community (our first >>> [[http://stratosphere.eu/events/2013/summit.html|Stratosphere Summit]] in >>> November 2013 attracted over 80 participants). Stratosphere was recently >>> accepted as a mentoring organization in Google Summer of Code 2014. >>> >>> We believe that acceptance in the Apache Software Foundation will >>> consolidate the current community under one organizational umbrella, and >>> most importantly accelerate the growth of the community. >>> >>> === Core developers === >>> The core developers of the system are Stephan Ewen, Fabian Hueske, Daniel >>> Warneke, Robert Metzger, Ufuk Celebi, and Aljoscha Krettek, who are all >>> committers in the current proposal. >>> >>> === Alignment === >>> Stratosphere is compatible with, and related to several Apache projects. >>> Stratosphere re-uses parts of Apache Hadoop, in particular HDFS and YARN, >>> as well as Apache HBase and Apache Avro. Stratosphere is a very good >>> compilation target for query languages such as Apache Hive and Apache Pig. >>> >>> == Known Risks == >>> === Orphaned Products === >>> There is strong interest in Stratosphere by several companies and >>> organizations, and there is currently a long-term commitment to fund >>> salaried developers for Stratosphere by public and private organizations in >>> Europe. >>> >>> === Inexperience with Open Source === >>> Sebastian Schelter is a committer and PMC member of Apache Mahout and >>> Apache Giraph, member of the Apache Software Foundation, member of the >>> Incubator PMC and project mentor for Apache Drill. Sebastian, along with >>> our mentors, will guide the rest of the committers that have experience >>> with releasing software as open source but little experience in >>> participating in an open source project besides Stratosphere itself. >>> >>> In mid-2013 Stratosphere transitioned from an “open source project with >>> publicly accessible source code” to an open source project that puts the >>> community first. We moved from a University-hosted git repository to >>> GitHub, where we discuss all issues publicly. This also includes release >>> planning (via GitHub’s milestone feature) and code reviews. We also moved >>> our build system to the publicly available Travis-CI. The mailing lists are >>> hosted with Google Groups, we use the public Maven repository >>> infrastructure of Sonatype. The source code of the www.stratosphere.eu >>> website is publicly available and is meant to be changed by external >>> contributors (for example for documentation purposes). >>> >>> === Homogeneous Developers === >>> Most committers in this proposal belong to the same institution (TU >>> Berlin). The engagement of these committers goes well beyond the necessary >>> development to support research, and all committers work on Stratosphere in >>> their free time. Several people from other institutions are working on and >>> are familiar with the Stratosphere codebase. We will work to attract them >>> as future committers during the incubation phase, following a merit-based >>> approach. >>> >>> === Reliance on Salaried Developers === >>> Currently, Stratosphere receives support from salaried developers, in >>> particular from graduate students at TU Berlin that are funded by the >>> German Research Council, the European Institute of Technology, and the >>> European Commission. These students work in their free time on Stratosphere >>> in addition to their employment. >>> >>> We expect that Stratosphere development will occur on both salaried and >>> volunteer time. We will recruit additional committers, including >>> non-salaried developers, and we will work to ensure that the project will >>> move forward independently of salaried developers. >>> >>> === Relationship with Other Apache Products === >>> Stratosphere interfaces with several existing Apache projects: Apache HBase >>> for storage, Apache Hadoop (HDFS for storage, YARN for resource management, >>> and Stratosphere contains a generic wrapper for Hadoop MapReduce input >>> formats), and Apache Avro (for serialization). Stratosphere uses Apache >>> Maven and Apache Commons libraries internally. Stratosphere can be a great >>> compilation target for Apache Pig and Apache Hive, although such >>> functionality is not yet implemented. >>> >>> Stratosphere is also related with several projects undergoing incubation in >>> the Apache Incubation project, such as Tez, Drill, and Spark (graduated). >>> While all these projects target sufficiently different spaces and have >>> different architectures, it would be interesting to explore code reuse >>> possibilities. For example, we are currently basing our design for >>> compiling SQL to Stratosphere on the Optiq library, also used by Apache >>> Drill. >>> >>> === An Excessive Fascination with the Apache Brand === >>> We believe that the Apache brand will help us attract contributors to >>> Stratosphere, by giving us a well-defined, transparent development process >>> under a known brand. At the same time, Stratosphere already has a healthy >>> community and current funding guarantees the further codebase development >>> and growth of the project for the next 3-5 years. The reason for this >>> proposal is not to gain publicity, but to further strengthen the longevity >>> of the project as explained in the Rationale section. >>> >>> == Documentation == >>> * [[https://stratosphere.eu|Project website]] >>> * [[http://stratosphere.eu/docs/0.4/|Documentation]] >>> * [[https://github.com/stratosphere/stratosphere|Codebase]] >>> * [[https://groups.google.com/forum/#!forum/stratosphere-dev|Mailing list]] >>> >>> == Initial Source == >>> Stratosphere is hosted on >>> [[https://github.com/stratosphere/stratosphere|GitHub]] . This is the >>> codebase that we will migrate to the Apache Foundation. The code was >>> previously hosted on a TU Berlin’s own git infrastructure. It has always >>> been Apache 2.0 licensed. >>> >>> === Source and Intellectual Property Submission Plan === >>> All initial and past committers will sign a CLA with the ASF while the >>> incubator proposal for Stratosphere is being discussed. All organizations >>> that have employed Stratosphere contributors in the past will sign a SGA. >>> Current contributors will sign a CCLA. All major contributors are still >>> active in the project. >>> >>> === External Dependencies === >>> All critical dependencies are, to the extend of our knowledge, from other >>> Apache projects. These include Apache Hadoop (for YARN and HDFS) and some >>> libraries (log4j, commons codec, junit and more). Our web frontend uses >>> some MIT-licensed JavaScript libraries. >>> >>> == Required Resources == >>> >>> === Mailing list === >>> We will migrate our mailing lists to the following: >>> * us...@stratosphere.incubator.apache.org >>> * d...@stratosphere.incubator.apache.org >>> * priv...@stratosphere.incubator.apache.org >>> * comm...@stratosphere.incubator.apache.org >>> >>> === Source control === >>> We would like to use Git for source control and enable GitHib mirroring >>> functionality, where code reviews on GitHub are automatically >>> forwarded to the developer mailing list. (See also: >>> [[https://blogs.apache.org/infra/entry/improved_integration_between_apache_and]]) >>> >>> >>> === Issue tracking === >>> We are currently using GitHub for issue tracking. We request an >>> Apache-hosted JIRA, and we will import existing issues there. >>> >>> >>> == Initial committers == >>> * Stephan Ewen - stephan.e...@tu-berlin.de >>> * Fabian Hueske - fabian.hue...@tu-berlin.de >>> * Daniel Warneke - warn...@posteo.de >>> * Robert Metzger - metrob...@gmail.com >>> * Ufuk Celebi - u.cel...@fu-berlin.de >>> * Aljoscha Krettek - aljoscha.kret...@gmail.com >>> * Kostas Tzoumas - kostas.tzou...@tu-berlin.de >>> * Sebastian Schelter - s...@apache.org >>> >>> === Affiliations === >>> * Stephan Ewen (TU Berlin) >>> * Fabian Hueske (TU Berlin) >>> * Daniel Warneke (Amadeus IT Group) >>> * Robert Metzger (TU Berlin) >>> * Ufuk Celebi (FU Berlin) >>> * Aljoscha Krettek (TU Berlin) >>> * Kostas Tzoumas (TU Berlin) >>> * Sebastian Schelter (TU Berlin) >>> >>> == Sponsors == >>> === Champion === >>> Alan Gates (ga...@apache.org) >>> >>> === Nominated Mentors === >>> * Sean Owen (sro...@apache.org) (Note: Sean is an Apache member but not >>> currently on the IPC, he will need to request IPMC membership) >>> * Ted Dunning (tdunn...@apache.org) >>> * Owen O'Malley (omal...@apache.org) >>> >>> === Sponsoring Entity === >>> The Apache Incubator >>> >>> >>> -- >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity to >>> which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender immediately >>> and delete it from your system. Thank You. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>> For additional commands, e-mail: general-h...@incubator.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org