Same here. Happy to see this come to the ASF! On Tue, Nov 17, 2015 at 1:18 PM, Mattmann, Chris A (3980) <chris.a.mattm...@jpl.nasa.gov> wrote: > Awesome! Glad to see this coming to the ASF :-) > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > -----Original Message----- > From: Henry Robinson <he...@cloudera.com> > Reply-To: "general@incubator.apache.org" <general@incubator.apache.org> > Date: Tuesday, November 17, 2015 at 1:49 PM > To: "general@incubator.apache.org" <general@incubator.apache.org> > Subject: [DISCUSS] Impala incubator proposal > >>Hi all - >> >>We'd like to start a discussion regarding a proposal to submit Impala to >>the Apache Incubator. >> >>The proposal text is available on the Wiki here: >>https://wiki.apache.org/incubator/ImpalaProposal >> >>and pasted below for convenience. >> >>I'm excited to make this proposal, and look forward to the community's >>input! >> >>Best, >>Henry >> >> >>= Abstract = >>Impala is a high-performance C++ and Java SQL query engine for data stored >>in Apache Hadoop-based clusters. >> >>= Proposal = >> >>We propose to contribute the Impala codebase and associated artifacts >>(e.g. >>documentation, web-site content etc.) to the Apache Software Foundation >>with the intent of forming a productive, meritocratic and open community >>around Impala’s continued development, according to the ‘Apache Way’. >> >>Cloudera owns several trademarks regarding Impala, and proposes to >>transfer >>ownership of those trademarks in full to the ASF. >> >>= Background = >>Engineers at Cloudera developed Impala and released it as an >>Apache-licensed open-source project in Fall 2012. Impala was written as a >>brand-new, modern C++ SQL engine targeted from the start for data stored >>in >>Apache Hadoop clusters. >> >>Impala’s most important benefit to users is high-performance, making it >>extremely appropriate for common enterprise analytic and business >>intelligence workloads. This is achieved by a number of software >>techniques, including: native support for data stored in HDFS and related >>filesystems, just-in-time compilation and optimization of individual query >>plans, high-performance C++ codebase and massively-parallel distributed >>architecture. In benchmarks, Impala is routinely amongst the very highest >>performing SQL query engines. >> >>= Rationale = >> >>Despite the exciting innovation in the so-called ‘big-data’ space, SQL >>remains by far the most common interface for interacting with data in both >>traditional warehouses and modern ‘big-data’ clusters. There is clearly a >>need, as evidenced by the eager adoption of Impala and other SQL engines >>in >>enterprise contexts, for a query engine that offers the familiar SQL >>interface, but that has been specifically designed to operate in massive, >>distributed clusters rather than in traditional, fixed-hardware, >>warehouse-specific deployments. Impala is one such query engine. >> >>We believe that the ASF is the right venue to foster an open-source >>community around Impala’s development. We expect that Impala will benefit >>from more productive collaboration with related Apache projects, and under >>the auspices of the ASF will attract talented contributors who will push >>Impala’s development forward at pace. >> >>We believe that the timing is right for Impala’s development to move >>wholesale to the ASF: Impala is well-established, has been Apache-licensed >>open-source for more than three years, and the core project is relatively >>stable. We are excited to see where an ASF-based community can take Impala >>from this strong starting point. >> >>= Initial Goals = >>Our initial goals are as follows: >> >>* Establish ASF-compatible engineering practices and workflows >>* Refactor and publish existing internal build scripts and test >>infrastructure, in order to make them usable by any community member. >>* Transfer source code, documentation and associated artifacts to the ASF. >>* Grow the user and developer communities >> >>= Current Status = >> >>Impala is developed as an Apache-licensed open-source project. The source >>code is available at http://github.com/cloudera/Impala, and developer >>documentation is at https://github.com/cloudera/Impala/wiki. The majority >>of commits to the project have come from Cloudera-employed developers, but >>we have accepted some contributions from individuals from other >>organizations. >> >>All code reviews are done via a public instance of the Gerrit review tool >>at http://gerrit.cloudera.org:8080/, and discussed on a public mailing >>list. All patches must be reviewed before they are accepted into the >>codebase, via a voting mechanism that is similar to that used on Apache >>projects such as Hadoop and HBase. >> >>Before a patch is committed, it must pass a suite of pre-commit tests. >>These tests are currently run on Cloudera’s internal infrastructure. One >>of >>our initial goals will be to work with the ASF Infrastructure team to find >>a way to run these tests in an acceptable way on publicly accessible >>machines. >> >>Issues are tracked in JIRA at https://issues.cloudera.org/projects/IMPALA, >>in a way that is extremely similar to existing practices at other ASF >>projects. >> >>= Meritocracy = >> >>We understand the central importance of meritocracy to the Apache Way. We >>will work to establish a welcoming, fair and meritocratic community, in >>part by expanding the set of committers on the project. Although Impala’s >>committer list will initially be dominated by members of the Impala >>engineering team at Cloudera, we look forward to growing a rich user and >>developer community. >> >>= Community = >>Impala has a strong user community (see >>https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user), and a >>growing developer community (see >>https://groups.google.com/a/cloudera.org/forum/#!forum/impala-dev). We >>wish >>to attract more developers to the project, and we believe that the ASF’s >>open and meritocratic philosophy will help us with this. We note the >>success of other, similar projects already part of the ASF. >> >>= Core Developers = >>Most - but not all - of Impala’s core developers are not currently >>affiliated with the ASF, and will require new ICLAs. >> >>= Alignment = >>Impala is related to several other Apache projects: >> >>* Data that is read by Impala is very often stored in Apache Hadoop >>clusters powered by the HDFS filesystem. >>* Impala can also read data stored in Apache HBase >>* Metadata for databases, tables and so on is read by Impala from Apache >>Hive. >>* The preferred data format for HDFS-based tables is Apache Parquet, and >>Apache Avro is also a supported data format. >>* Impala is closely integrated with Kudu, which is also being proposed to >>the Incubator. >>* Impala uses Apache Thrift as its RPC and serialization framework of >>choice. >> >>= Known Risks = >> >>== Orphaned Products == >>Impala is used by most of Cloudera’s customers, and Cloudera remains >>committed to developing and supporting the project. Cloudera has a strong >>track record in standing behind projects that were contributed to the ASF >>by its employees, including Apache Flume, Apache Sqoop, and others. Other >>companies both ship and support Impala, lending credence to the idea that >>Impala is not at risk of being suddenly orphaned. >> >>== Inexperience with Open Source == >>Although all committers on the initial list have significant experience >>with at least one open-source project - namely Impala - fewer have much >>experience with ASF-based software projects as contributors and community >>members. However, with the guidance of our mentors, committers who do have >>ASF experience, and time to learn during Incubation, we are confident that >>the project can be run in accordance with Apache principles on an ongoing >>basis. >> >>== Homogeneous Developers == >> >>The initial committers are employees of Cloudera. >> >>The project has received some contributions from developers outside of >>Cloudera, from individuals belonging to organizations such as Intel and >>Google, from hobbyists and from students using Impala to advance their >>understanding of distributed databases. The project attracted an active >>user community as well. We hope to continue to encourage contributions >>from >>these developers and community members and grow them into committers after >>they have had time to continue their contributions. >> >>== Reliance on Salaried Developers == >> >>Many of Impala’s initial set of committers work full-time on Impala, and >>are paid to do so. However, as mentioned elsewhere, we anticipate growth >>in >>the developer community which we hope will include hobbyists and academics >>who have an interested in distributed data systems. >> >>== An Excessive Fascination with the Apache Brand == >>Although we hope that Impala benefits from the Apache Brand, any reflected >>goodwill to Cloudera as the contributing entity is not the goal of >>establishing Impala as an Apache project. We will work with the Incubator >>PMC and the PRC to ensure that the Apache Brand is respected. >> >>= Documentation = >>Impala: A Modern, Open-Source SQL Engine for Hadoop ( >>http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf) >> >>Impala’s developer wiki (https://github.com/cloudera/Impala/wiki) >> >>Impala’s auto-generated API documentation ( >>http://impala.io/doc/html/index.html) >> >>= Initial Source = >>Impala’s initial source contribution will come from >>http://github.com/cloudera/Impala/. >> >>= External Dependencies = >> >>Impala depends upon a number of third-party libraries, which we list >>below. >>We intend to compile a LICENSE.txt file in the very short term (see >>https://issues.cloudera.org/browse/IMPALA-2670). >> >>* Google gflags (BSD) >>* Google glog (BSD) >>* Apache Thrift (Apache Software License v2.0) >>* Apache Commons (Apache Software License v2.0) >>* Apache Thrift (Apache Software License v2.0) >>* Apache Hadoop (Apache Software License v2.0) >>* Apache HBase (Apache Software License v2.0) >>* Apache Hive (Apache Software License v2.0) >>* Boost (Boost Software License) >>* OpenLdap (OpenLDAP Software License) >>* rapidjson (MIT) >>* Google RE2 (BSD-style) >>* lz4 (BSD) >>* snappy (BSD) >>* cyrus-sasl (CMU License) >>* Apache Avro (Apache Software License v2.0) >>* Cloudera squeasel (Apache Software License v2.0) >>* Apache htrace (Incubating) (Apache Software License v2.0) >>* Apache Sentry (Incubating) (Apache Software License v2.0) >>* Apache Shiro (Apache Software License v2.0) >>* Twitter Bootstrap (Apache Software License v2.0) >>* d3 (BSD) >>* LLVM (BSD-like) >> >>Build and test dependencies: >> >>* ant (Apache Software License v2.0) >>* maven (Apache Software License v2.0) >>* cmake (BSD) >>* clang (BSD) >>* Google gtest (Apache Software License v2.0) >> >>= Required Resources = >> >>We request that following resources be created for the project to use: >> >>== Mailing lists == >> >>* priv...@impala.incubator.apache.org (moderated subscriptions) >>* comm...@impala.incubator.apache.org >>* d...@impala.incubator.apache.org >>* iss...@impala.incubator.apache.org >>* u...@impala.incubator.apache.org >> >>== Git repository == >>https://git.apache.org/impala.git >> >>== JIRA instance == >>JIRA project IMPALA (IMPALA or IMP) >> >>== Other Resources == >>We hope to continue using Gerrit for our code review and commit workflow. >>We are involved with discussions that the Kudu team at Cloudera have been >>having with Jake Farrell to start discussions on how Gerrit can fit into >>the ASF. We know that several other ASF projects or podlings are also >>interested in Gerrit. >> >>If the Infrastructure team does not have the bandwidth to support gerrit, >>we will continue to support our own instance of gerrit for Impala, and >>make >>the necessary integrations such that commits are properly authenticated >>and >>maintain sufficient provenance to uphold the ASF standards (e.g. via the >>solution adopted by the AsterixDB podling). >> >>= Initial Committers = >> >>* Tim Armstrong >>* Alex Behm >>* Taras Bobrovytsky >>* Casey Ching >>* Martin Grund >>* Daniel Hecht >>* Michael Ho >>* Matthew Jacobs >>* Ishaan Joshi >>* Marcel Kornacker >>* Sailesh Mukil >>* Henry Robinson >>* John Russell >>* Dimitris Tsirogiannis >>* Skye Wanderman-Milne >>* Juan Yu >> >>== Affiliations == >>All: Cloudera Inc. >> >>= Sponsors = >> >>== Champion == >>Tom White >> >>== Nominated Mentors == >>Tom White >>Todd Lipcon >>Carl Steinbach >> >>= Sponsoring Entity = >>We ask that the Incubator PMC sponsor this proposal. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org
--------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org