Thanks to all who voted. Obviously, I'm +1 (binding) on the proposal. With 14 +1s (10 binding) the vote passes.
I'll start the work to get the podling started. thanks, Arun On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote: > Hi Folks, > > Thanks for participating in the discussion. I'd like to call a VOTE for > acceptance of Apache Tez into the Incubator. I'll let the vote run till into > this weekend (Sun 2/24 6pm PST). > > [ ] +1 Accept Apache Tez into the Incubator > [ ] +0 Don't care. > [ ] -1 Don't accept Apache Tez into the Incubator because... > > Full proposal is pasted at the bottom of this email, and the corresponding > wiki is http://wiki.apache.org/incubator/TezProposal. > > Only VOTEs from Incubator PMC members are binding, but all are welcome to > express their thoughts. > > Here's my +1 (binding). > > thanks, > Arun > > PS: From the initial discussion, the only changes are that I've added one new > mentor and 2 new committers. All the new additions come from the non-major > employer while we continue to strive to further diversify during the > incubation. Thanks. > > ---- > > = Tez = > > == Abstract == > Tez is an effort to develop a generic application framework which can be used > to process arbitrarily complex data-processing tasks and also a re-usable set > of data-processing primitives which can be used by other projects. > > == Proposal == > Tez is a proposal to develop a generic application which can be used to > process complex data-processing task DAGs and runs natively on Apache Hadoop > YARN. YARN is a generic resource-management system on which currently > applications like MapReduce already exist. MapReduce is a specific, and > constrained, DAG - which is not optimal for several frameworks like Apache > Hive > and Apache Pig. Furthermore, we propose to develop a re-usable set of > libraries of data-processing primitives such as sorting, merging, > data-shuffling, intermediate data management etc. which are necessary for Tez > which we envision can be used directly by other projects. > > == Background == > Apache Hadoop MapReduce has emerged as the assembly-language on which other > frameworks like Apache Pig and Apache Hive have been built. However, it has > been well accepted that MapReduce produces very constrained task DAGs for each > job which results in Apache Pig and Apache Hive requiring multiple MapReduce > jobs for several queries. By providing a more expressive DAG of tasks for a > job, Tez attempts to provide significantly enhanced data-processing > capabilities for projects like Apache Pig, Apache Hive, Cascading etc. > > == Rationale == > There is an important gap that Tez fulfills in the Apache Hadoop ecosystem of > allowing for more expressive task DAGs for data-processing applications such > as Apache Pig, Apache Hive, Cascading etc. > > With emergence of Apache Hadoop YARN, there is a strong need for a > common DAG application which can then be shared by Apache Pig, Apache Hive, > Cascading etc. > > == Initial Goals == > The initial goals for this project are to specify the detailed requirements > and architecture, and then develop the initial implementation including the > DAG ApplicationMaster to run natively inside Apache Hadoop YARN. > > == Current Status == > Significant work has been completed to identify the initial requirements and > define the overall system architecture. There is a patch available in the > internal Hortonworks git repository which can act as the initial seed. > > === Meritocracy === > We plan to invest in supporting a meritocracy. We will discuss the > requirements > in an open forum. Several companies have already expressed interest in this > project, and we intend to invite additional developers to participate. > We will encourage and monitor community participation so that privileges can > be > extended to those that contribute. > > === Community === > The need for a generic DAG application for data processing in the open source > is > tremendous, so there is a potential for a very large community. We believe > that Tez's extensible architecture will further encourage community > participation. > Also, related Apache projects (eg, Pig, Hive) have very large and active > communities, and we expect that over time Tez will also attract a large > community. > > === Core Developers === > The developers on the initial committers list include people very experienced > in the Apache Hadoop ecosystem: > > * Alan Gates <gates at apache dot org> > * Arun C Murthy <acmurthy at apache dot org> > * Ashutosh Chauhan <hashutosh at apache dot org> > * Bikas Saha <bikas at apache dot org> > * Chris Douglas <cdouglas at apache dot org> > * Daryn Sharp <daryn at apache dot org> > * Devaraj Das <ddas at apache dot org> > * Gopal Vijayaraghavan <gopal at hortonworks dot com> > * Gunther Hagleitner <ghagleitner at hortonworks dot com> > * Hitesh Shah <hitesh at apache dot org> > * Jason Lowe <jlowe at apache dot org> > * Jean Xu <jeanxu at facebook dot com> > * Jitendra Pandey <jitendra at apache dot org> > * Julien Le Dem <julien at apache dot org> > * Kevin Wilfong <kevinwilfong at apache dot org> > * Mike Liddell <mike dot lidell at microsoft dot com> > * Namit Jain <namit at apache dot org> > * Nathan Roberts <nroberts at yahoo dash inc dot com> > * Owen O'Malley <omalley at apache dot org> > * Robert Evans <bobby at apache dot org> > * Siddharth Seth <sseth at apache dot org> > * Tom White <tomwhite at apache dot org> > * Thomas Graves <tgraves at apache dot org> > * Vikram Dixit <vikram at apache dot org> > * Vinod Kumar Vavilapalli <vinodkv at apache dot org> > * William Graham <billgraham at apache dot org> > > We realize that though we have significant employer diversity already, > additional diversity is always better, and we will work > aggressively to recruit developers from additional companies. > > === Alignment === > The initial committers strongly believe that a standard task DAG > application on Apache Hadoop YARN will gain broader adoption as an open > source, > community driven project, where the community can contribute not only to the > core components, but also to a growing collection of applications which will > be based on top of Tez. Our hope is that the Apache Hive, Apache Pig, > Cascading and other communities will find tremendous value in Tez and will > adopt > it en masse. > > == Known Risks == > > === Orphaned Products === > The contributors are leading users and vendors in the Apache Hadoop > ecosystem, > with significant open source experience, so the risk of being orphaned is > relatively low. The project could be at risk if vendors decided to change > their strategies in the market. In such an event, the current committers > plan to continue working on the project on their own time, though the > progress will likely be slower. We plan to mitigate this risk by > recruiting additional committers. > > === Inexperience with Open Source === > The initial committers include veteran Apache members (Committers, PMC members > and Apache Members) and other developers who have varying degrees of > experience > with open source projects. All have been involved with source code that has > been released under an open source license, and several also have experience > developing code with an open source development process. > > === Homogenous Developers === > The initial committers are employed by a number of companies, including > Cloudera, Facebook, Hortonworks, Microsoft, Twitter and Yahoo. We are > committed > to recruiting additional committers from other companies based on their > contributions to the project even though we do have significant diversity > already. > > === Reliance on Salaried Developers === > It is expected that Tez development will occur on both salaried time and on > volunteer time, after hours. The majority of initial committers are paid by > their employer to contribute to this project. However, they are all > passionate > about the project, and we are confident that the project will continue even > if > no salaried developers contribute to the project. We are committed to > recruiting > additional committers including non-salaried developers. > > === Relationships with Other Apache Products === > As mentioned in the Alignment section, Tez is closely integrated with Hadoop, > Hive and Pig in a numerous ways. We look forward to collaborating with > those communities, as well as other Apache communities. > > === An Excessive Fascination with the Apache Brand === > Tez solves a real need for generic task DAG management in the Apache Hadoop > ecosystem, something which has been addressed in a very ad hoc manner so far > by multiple Apache projects. Our rationale for developing Tez as an Apache > project is detailed in the Rationale section. We believe that the Apache > brand > and community process will help us attract more contributors to this project, > and help establish ubiquitous APIs. > > == Documentation == > http://wiki.apache.org/incubator/TezProposal > > == Initial Source == > Available as a patch. > > == Cryptography == > Tez will eventually support encryption on the wire. This is not one of the > initial > goals, and we do not expect Tez to be a controlled export item due to the use > of encryption. > > == Required Resources == > > === Mailing List === > * tez-private > * tez-dev > * tez-user > > === Subversion Directory === > Git is the preferred source control system: git://git.apache.org/tez > > === Issue Tracking === > > JIRA Tez (TEZ) > > == Initial Committers == > * Alan Gates <gates at apache dot org> > * Arun C Murthy <acmurthy at apache dot org> > * Ashutosh Chauhan <hashutosh at apache dot org> > * Bikas Saha <bikas at apache dot org> > * Chris Douglas <cdouglas at apache dot org> > * Daryn Sharp <daryn at apache dot org> > * Devaraj Das <ddas at apache dot org> > * Gopal Vijayaraghavan <gopal at hortonworks dot com> > * Gunther Hagleitner <ghagleitner at hortonworks dot com> > * Hitesh Shah <hitesh at apache dot org> > * Jason Lowe <jlowe at apache dot org> > * Jean Xu <jeanxu at facebook dot com> > * Jitendra Pandey <jitendra at apache dot org> > * Julien Le Dem <julien at apache dot org> > * Kevin Wilfong <kevinwilfong at apache dot org> > * Mike Liddell <mike dot lidell at microsoft dot com> > * Namit Jain <namit at apache dot org> > * Nathan Roberts <nroberts at yahoo dash inc dot com> > * Owen O'Malley <omalley at apache dot org> > * Robert Evans <bobby at apache dot org> > * Siddharth Seth <sseth at apache dot org> > * Tom White <tomwhite at apache dot org> > * Thomas Graves <tgraves at apache dot org> > * Vikram Dixit <vikram at apache dot org> > * Vinod Kumar Vavilapalli <vinodkv at apache dot org> > * William Graham <billgraham at apache dot org> > > == Affiliations == > The initial committers are employees of Cloudera, Facebook, Hortonworks, > Microsoft, Twitter and Yahoo Inc. > > * Alan Gates - Hortonworks > * Arun C Murthy - Hortonworks > * Ashutosh Chauhan - Hortonworks > * Bikas Saha - Hortonworks > * Chris Douglas - Microsoft > * Daryn Sharp - Yahoo > * Devaraj Das - Hortonworks > * Gopal Vijayaraghavan - Hortonworks > * Gunther Hagleitner - Hortonworks > * Hitesh Shah - Hortonworks > * Jason Lowe - Yahoo > * Jean Xu - Facebook > * Jitendra Pandey - Hortonworks > * Julien Le Dem - Twitter > * Kevin Wilfong - Facebook > * Mike Liddell - Microsoft > * Namit Jain - Facebook > * Nathan Roberts - Yahoo > * Owen O'Malley - Hortonworks > * Robert Evans - Yahoo > * Siddharth Seth - Hortonworks > * Tom White - Cloudera > * Thomas Graves - Yahoo > * Vikram Dixit - Hortonworks > * Vinod Kumar Vavilapalli - Hortonworks > * William Graham - Twitter > > The nominated mentors are employees of Hortonworks, LinkedIn, > NASA JPL and Microsoft. > > * Alan Gates - Hortonworks > * Arun C Murthy - Hortonworks > * Chris Douglas - Microsoft > * Chris Mattman - NASA JPL > * Jakob Homan - LinkedIn > * Owen O'Malley - Hortonworks > > == Sponsors == > > === Champion === > Arun C Murthy <acmurthy at apache dot org> > > === Nominated Mentors === > * Alan Gates <gates at apache dot org> – Architect at Hortonworks. > Committer for Pig. > * Arun C Murthy <acmurthy at apache dot org> – Architect at Hortonworks. > Committer for Hadoop. > * Chris Douglas <cdouglas at apache dot org> - Sr. Research Engineer at > Microsoft. Committer for Hadoop. > * Chris Mattman <mattmann at apache dot org> - Sr. Computer Scientist, NASA > JPL. Committer for Nutch, OODT and Tika. > * Jakob Homan <jghoman at apache dot org> – Sr. Software Engineer, > LinkedIn. Committer for Hadoop, Kafka, Giraph. > * Owen O'Malley <omalley at apache dot org> – Architect at Hortonworks. > Committer for Hadoop, Ambari. > > === Sponsoring Entity === > Incubator > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/