+1 (binding) > On Mar 22, 2016, at 3:00 PM, Chris Douglas <cdoug...@apache.org> wrote: > > +1 (binding) -C > > On Tue, Mar 22, 2016 at 2:01 PM, Roman Shaposhnik <r...@apache.org> wrote: >> Hi! >> >> Quickstep proposal was made available for discussion last week >> https://wiki.apache.org/incubator/QuickstepProposal >> and the feedback so far seems to be positive. >> >> Please vote to accept Quickstep into the Apache Incubator. >> The vote will be open until Mon 3/28 noon PST. >> >> [ ] +1 Accept Quickstep into the Apache Incubator >> [ ] +0 Abstain >> [ ] -1 Don't accept Quickstep into the Apache Incubator because ... >> >> == Abstract == >> >> Quickstep is a high-performance database engine. It is designed to (1) >> convert data to insights at bare-metal speed, (2) support multiple >> query surfaces including SQL (the first (and current) version only >> supports SQL, and (3) deliver bare-metal performance on any hardware >> (including running on a laptop, running on a high-end (single node) >> server, and running on a distributed cluster). Since its inception, >> the project has been planned to deliver a high-performance single node >> system first, followed by a distributed system. >> >> Quickstep is composed of several different modules that handle >> different concerns of a database system. The main modules are: >> * Utility - Reusable general-purpose code that is used by many other >> modules. >> * Threading - Provides a cross-platform abstraction for threads and >> synchronization primitives that abstract the underlying OS threading >> features. >> * Types - The core type system used across all of Quickstep. Handles >> details of how SQL types are stored, parsed, serialized & >> deserialized, and converted. Also includes basic containers for typed >> values (tuples and column-vectors) and low-level operations that apply >> to typed values (e.g. basic arithmetic and comparisons). >> * Catalog - Tracks database schema as well as physical storage >> information for relations (e.g. which physical blocks store a >> relation's data, and any physical partitioning and placement >> information). >> * Storage - Physically stores relational data in self-contained, >> self-describing blocks, both in-memory and on persistent storage (disk >> or a distributed filesystem). Also includes some heavyweight run-time >> data structures used in query processing (e.g. hash tables for join >> and aggregation). Includes a buffer manager component for managing >> memory use and a file manager component that handles data persistence. >> * Compression - Implements ordered dictionary compression. Several >> storage formats in the Storage module are capable of storing >> compressed column data and evaluating some expressions directly on >> compressed data without decompressing. The common code supporting >> compression is in this module. >> * Expressions - Builds on the simple operations provided by the >> Types module to support arbitrarily complex expressions over data, >> including scalar expressions, predicates, and aggregate functions with >> and without grouping. >> * Relational Operators - This module provides the building blocks >> for queries in Quickstep. A query is represented as a directed acyclic >> graph of relational operators, each of which is responsible for >> applying some relational-algebraic operation(s) to transform its >> input. Operators generate individual self-contained "work orders" that >> can be executed independently. Most operators are parallelism-friendly >> and generate one work-order per storage block of input. >> * Query Execution - Handles the actual scheduling and execution of >> work from a query at runtime. The central class is the Foreman, an >> independent thread with a global view of the query plan and progress. >> The Foreman dispatches work-orders to stateless Worker threads and >> monitors their progress, and also coordinates streaming of partial >> results between producers and consumers in a query plan DAG to >> maximize parallelism. This module also includes the QueryContext >> class, which holds global shared state for an individual query and is >> designed to support easy serialization/deserialization for distributed >> execution. >> * Parser - A simple SQL lexer and parser that parses SQL syntax into >> an abstract syntax tree for consumption by the Query Optimizer. >> * Query Optimizer - Takes the abstract syntax tree generated by the >> parser and transforms it into a runable query-plan DAG for the Query >> Execution module. The Query Optimizer is responsible for resolving >> references to relations and attributes in the query, checking it for >> semantic correctness, and applying optimizations (e.g. filter >> pushdown, column pruning, join ordering) as part of the transformation >> process. >> * Command-Line Interface - An interactive SQL shell interface to Quickstep. >> >> Quickstep is implemented in C++ and does not require many external >> libraries to run. Quickstep is currently an open source project >> licensed under the Apache License Version 2.0 and governed by a group >> of engineers at Pivotal. >> >> Quickstep began in 2011 as a research project in the Computer Sciences >> Department at the University of Wisconsin >> https://quickstep.cs.wisc.edu/ and the copyrights underlying the >> project was transferred to a company called Quickstep Technologies, >> which was acquired by Pivotal in 2015. >> >> == Proposal == >> The goal of this proposal is to bring an already existing open source >> project into the Apache Software Foundation (ASF) family thus >> leveraging a very successful “Apache Way” governance model in order to >> increase community participation and diversity. We hope that it will >> allow us to build a vibrant, diverse and self-governed open source >> community around the technology. Pivotal has agreed to transfer the >> brand name "Quickstep" to ASF and will stop using Quickstep to refer >> to this software if the project gets accepted into the ASF Incubator >> under the name of "Apache Quickstep (incubating)". Pivotal may market >> and sell products that include Apache Quickstep (incubating) under a >> different brand name, but no determination has been made regarding >> that. While Quickstep is our primary choice for a name of the project, >> in anticipation of any potential issues with PODLINGNAMESEARCH we have >> come up with two alternative names: (1) Bolero or (2) Hustle. >> >> Pivotal is submitting this proposal to transfer the Quickstep source >> code and associated artifacts (documentation, web site content, wiki, >> etc.) from its current Github location to the ASF Incubator under the >> Apache License, Version 2.0 and is asking the Incubator PMC to >> establish an open source community. >> >> == Background == >> >> Quickstep is a next-generation relational data processing kernel >> currently being developed as a collaboration between the academic >> community and Pivotal. Quickstep aims to deliver efficient and >> sustainable data processing performance on current and future hardware >> by using a hardware-software co-design philosophy. >> >> For the hardware available today, this means effectively exploiting >> large main memories, fast on-die CPU caches, highly parallel >> multi-core CPUs, and NVRAM storage technologies. >> >> For the hardware available in the future, the project aims to >> co-design hardware and software primitives that will allow data >> processing kernels to work on increasing amounts of data economically >> -- both from the raw performance perspective, and from the perspective >> of the energy consumed by data processing kernels. >> >> == Rationale == >> >> In the past decade, ASF has established itself as one of the >> quintessential sources of innovation in data management and data >> processing frameworks. At the same time, there is a clear need for a >> modern, flexible framework capable of exploiting the hardware >> characteristics of today and make it available as a set of building >> blocks to as wide a community of developers as possible. We strongly >> believe that Quickstep technology can benefit a broader ecosystem of >> database developers and researchers but this "world domination" needs >> to be achieved through a vibrant, diverse, self-governed community >> collectively innovating around a single codebase while at the same >> time cross-pollinating with various other data management communities. >> ASF is the ideal place to meet those ambitious goals. We also believe >> that our experience bringing various Pivotal data products into ASF >> family - including Apache Geode (incubating), Apache HAWQ (incubating) >> and Apache MADlib (incubating) can be leveraged to make the Quickstep >> transition a success, thus improving the chances of it becoming a >> truly vibrant Apache community. >> >> == Initial Goals == >> >> Our initial goals are to bring Quickstep into ASF, transition internal >> engineering processes into the open, and foster a collaborative >> development model according to the "Apache Way." Pivotal and its >> academic partners plan to develop new functionality in an open, >> community-driven way. To get there, the existing internal build, test >> and release processes will be refactored to support open development. >> >> == Current Status == >> >> Currently, the project code base is licensed under the Apache License >> v.2 and is available in a GitHub repository >> https://github.com/pivotalsoftware/quickstep . The documentation and >> wiki pages are available at same repository. Throughout its history >> Quickstep was developed in a hybrid closed/opens source mode but it >> has its roots in open source database management communities. The >> internal engineering practices adopted by the development team lend >> themselves well to an open, collaborative and meritocratic >> environment. >> >> The Quickstep team has always focused on building a robust end user >> community of researchers. The existing documentation along with >> various publications are expected to facilitate conversions between >> our existing users so as to transform them into an active community of >> Quickstep members, stakeholders and developers. >> >> == Meritocracy == >> >> Our proposed list of initial committers include the current Quickstep >> R&D team and several existing academic partners. This group will form >> a base for the broader community we will invite to collaborate on the >> codebase. We intend to radically expand the initial developer and user >> community by running the project in accordance with the "Apache Way". >> Users and new contributors will be treated with respect and welcomed. >> By participating in the community and providing quality >> patches/support that move the project forward, contributors will earn >> merit. They also will be encouraged to provide non-code contributions >> (documentation, events, community management, etc.) and will gain >> merit for doing so. Those with a proven support and quality track >> record will be encouraged to become committers. >> >> == Community == >> >> If Quickstep is accepted for incubation, the primary initial goal will >> be transitioning the core community towards embracing the Apache Way >> of project governance. We would solicit major existing contributors to >> become committers on the project from the start. >> >> == Core Developers == >> A small percentage of Quickstep core developers are skilled in working >> as part of openly governed Apache communities (mainly around the >> Hadoop ecosystem). That said, most of the core developers are >> currently NOT affiliated with the ASF and would require new ICLAs >> before committing to the project. >> >> == Alignment == >> The following existing ASF projects can be considered when reviewing >> the Quickstep proposal: >> * Apache Hive: Potential alignment here is to consider a version of >> Hive that run on the Quickstep executor. >> * Apache HAWQ (incubating): Potential alignment here is to consider >> exchanging ideas and/or code for execution across both systems. >> * Apache YARN: Work has started on a distributed version of >> Quickstep, and its current path is to run as a YARN application. >> * Apache Mesos: Potential alignment here is for Quickstep to run in >> Apache Mesos. >> >> == Known Risks == >> Development has been done mostly by a tightly knit group of University >> of Wisconsin researchers and later was sponsored mostly by a single >> company (Pivotal) thus far and coordinated mainly by the core >> Quickstep team. The Quickstep team now spans Pivotal and the >> University of Wisconsin. >> >> For the project to fully transition to the Apache Way governance >> model, development must shift towards the meritocracy-centric model of >> growing a community of contributors balanced with the needs for >> extreme stability and core implementation coherency. The tools and >> development practices in place for the Quickstep product are >> compatible with the ASF infrastructure and thus we do not anticipate >> any on-boarding pains. >> >> The project went through a very thorough vetting as part of Pivotal >> open sourcing it under the Apache License v. 2.0 only a few month >> ago. This gives us reasonable confidence to conclude that the code >> base is clean and free from IP complications. >> Orphaned products >> Pivotal is fully committed to maintaining its position as one of the >> leading providers of database management and data processing solutions >> and the corresponding Pivotal commercial product will continue to be >> developed around the Quickstep project. >> >> Moreover, Pivotal has a vested interest in making Quickstep successful >> by driving its close integration with both existing projects >> contributed to open source by Pivotal including Apache HAWQ >> (incubating) and Greenplum Database, and sister ASF projects. We >> expect this to further reduce the risk of orphaning the product. >> >> == Inexperience with Open Source == >> Pivotal has embraced open source software since its formation by >> employing contributors/committers and by shepherding open source >> projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals >> working at Pivotal have experience with the formation of vibrant >> communities around open technologies with the Cloud Foundry >> Foundation, and continuing with the creation of a community around >> Apache Geode (incubating), Apache HAWQ (incubating) and Apache MADlib >> (incubating). Although some of the initial committers have not had the >> experience of developing entirely open source, community-driven >> projects, we expect to bring to bear the open development practices >> that have proven successful on longstanding Pivotal open source >> projects to the Quickstep community. Additionally, several ASF >> veterans have agreed to mentor the project and are listed in this >> proposal. The project will rely on their collective guidance and >> wisdom to quickly transition the entire team of initial committers >> towards practicing the Apache Way. >> >> == Homogeneous Developers == >> While many of the initial committers are employed by Pivotal or at the >> University of Wisconsin, we have already seen a healthy level of >> interest from existing customers and partners. We intend to convert >> that interest directly into participation and will be investing in >> activities to recruit additional committers from other companies. >> >> == Reliance on Salaried Developers == >> Many of the contributors are paid to work in the Big Data and data >> processing space and nearly all are committed to a career in that >> space. While they might wander from their current employers, they are >> unlikely to venture far from their core expertise and thus will >> continue to be engaged with the project regardless of their current >> employers. >> >> == Relationships with Other Apache Products == >> As mentioned in the Alignment section, Quickstep may consider various >> degrees of integration and code exchange with Apache Hive, Apache HAWQ >> (incubating), Apache YARN and Apache Mesos. >> >> == An Excessive Fascination with the Apache Brand == >> While we intend to leverage the Apache ‘branding’ when talking to >> other projects as testament of our project’s ‘neutrality’, we have no >> plans for making use of Apache brand in press releases nor posting >> billboards advertising acceptance of Quickstep into Apache Incubator. >> >> == Documentation == >> The documentation is currently available at http://quickstep.cs.wisc.edu/ >> >> == Initial Source == >> Initial source code is currently licensed under Apache License v.2 and >> is available at https://github.com/pivotalsoftware/quickstep. >> >> == Source and Intellectual Property Submission Plan == >> As soon as Quickstep is approved to join the Incubator, the source >> code will be transitioned via an exhibit to Pivotal's current Software >> Grant Agreement onto ASF infrastructure. We know of no legal >> encumbrances inhibiting the transfer of source code to the ASF. >> >> == External Dependencies == >> >> Runtime dependencies: >> * farmhash: https://github.com/google/farmhash [License: MIT] >> * gflags: https://github.com/gflags/gflags [License: BSD] >> * glog: https://github.com/google/glog [License: BSD] >> * gperftools: https://github.com/gperftools/gperftools [License: BSD] >> * linenoise: https://github.com/antirez/linenoise [License: BSD 2-Clause] >> * protobuf: https://github.com/google/protobuf [License: BSD] >> >> Build only dependencies: >> * cmake: https://cmake.org/ [License: BSD] >> * bison: https://www.gnu.org/software/bison/ [License: GPL with >> exception for generated parsers] >> * flex: http://flex.sourceforge.net [License: BSD] >> >> Test only dependencies: >> * benchmark: https://github.com/google/benchmark [License: Apache 2.0] >> * cpplint: https://github.com/google/styleguide [License: BSD] >> * gtest: https://github.com/google/googletest [License: BSD] >> * iwyu: http://include-what-you-use.org/ [License: UIUC BSD-Like] >> >> Cryptography: N/A >> >> == Required Resources == >> >> === Mailing lists === >> * priv...@quickstep.incubator.apache.org (moderated subscriptions) >> * comm...@quickstep.incubator.apache.org >> * d...@quickstep.incubator.apache.org >> * iss...@quickstep.incubator.apache.org >> * u...@quickstep.incubator.apache.org >> >> === Git Repository === >> https://git-wip-us.apache.org/repos/asf/incubator-quickstep.git >> >> === Issue Tracking === >> >> JIRA Project QUICKSTEP (QUICKSTEP) >> >> === Other Resources === >> Means of setting up regular builds for Quickstep on builds.apache.org >> will require integration with Docker support. >> >> == Initial Committers == >> * Jignesh M. Patel >> * Harshad Deshmukh >> * Jianqiao Zhu >> * Zuyu Zhang >> * Marc Spehlmann >> * Saket Saurabh >> * Hakan Memisoglu >> * Rogers Jeffrey Leo John >> * Adalbert Gerald Soosai Raj >> * Udip Pant >> * Siddharth Suresh >> * Rathijit Sen >> * Craig Chasseur >> * Qiang Zeng >> * Shoban Chandrabose >> * Navneet Potti >> * Yinan Li >> * Sangmin Shin >> * James Paton >> * Shixuan Fan >> * Roman Shaposhnik >> * Konstantin Boudnik >> * Julian Hyde >> * Dhruba Borthakur >> >> == Affiliations == >> * Pivotal: Jignesh M. Patel, Zuyu Zhang, Roman Shaposhnik >> * Google: Craig Chasseur >> * Facebook: James Paton, Dhruba Borthakur >> * Pinterest: Sangmin Shin >> * Microsoft: Yinan Li >> * Hortonworks: Julian Hyde >> * Memcore: Konstantin Boudnik >> * University of Wisconsin (and supported in part by Pivotal): Everyone else >> >> == Sponsors == >> >> === Champion === >> Roman Shaposhnik >> >> === Nominated Mentors === >> The initial mentors are listed below: >> * Konstantin Boudnik - Apache Member, Memcore >> * Roman Shaposhnik - Apache Member, Pivotal >> * Julian Hyde, IPMC Member, Hortonworks >> >> === Sponsoring Entity === >> We would like to propose Apache incubator to sponsor this project. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org