+1 (non-binding) On Tue, Nov 24, 2015 at 11:40 AM, Jarek Jarcec Cecho <jar...@apache.org> wrote: >> [X] +1, accept Kudu into the Incubator > > (binding) > > Jarcec > >> On Nov 24, 2015, at 11:32 AM, Todd Lipcon <t...@apache.org> wrote: >> >> Hi all, >> >> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to >> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is >> pasted below and also available on the wiki at: >> https://wiki.apache.org/incubator/KuduProposal >> >> The proposal is unchanged since the original version, except for the >> addition of Carl Steinbach as a Mentor. >> >> Please cast your votes: >> >> [] +1, accept Kudu into the Incubator >> [] +/-0, positive/negative non-counted expression of feelings >> [] -1, do not accept Kudu into the incubator (please state reasoning) >> >> Given the US holiday this week, I imagine many folks are traveling or >> otherwise offline. So, let's run the vote for a full week rather than the >> traditional 72 hours. Unless the IPMC objects to the extended voting >> period, the vote will close on Tues, Dec 1st at noon PST. >> >> Thanks >> -Todd >> ----- >> >> = Kudu Proposal = >> >> == Abstract == >> >> Kudu is a distributed columnar storage engine built for the Apache Hadoop >> ecosystem. >> >> == Proposal == >> >> Kudu is an open source storage engine for structured data which supports >> low-latency random access together with efficient analytical access >> patterns. Kudu distributes data using horizontal partitioning and >> replicates each partition using Raft consensus, providing low >> mean-time-to-recovery and low tail latencies. Kudu is designed within the >> context of the Apache Hadoop ecosystem and supports many integrations with >> other data analytics projects both inside and outside of the Apache >> Software Foundation. >> >> >> >> We propose to incubate Kudu as a project of the Apache Software Foundation. >> >> == Background == >> >> In recent years, explosive growth in the amount of data being generated and >> captured by enterprises has resulted in the rapid adoption of open source >> technology which is able to store massive data sets at scale and at low >> cost. In particular, the Apache Hadoop ecosystem has become a focal point >> for such “big data” workloads, because many traditional open source >> database systems have lagged in offering a scalable alternative. >> >> >> >> Structured storage in the Hadoop ecosystem has typically been achieved in >> two ways: for static data sets, data is typically stored on Apache HDFS >> using binary data formats such as Apache Avro or Apache Parquet. However, >> neither HDFS nor these formats has any provision for updating individual >> records, or for efficient random access. Mutable data sets are typically >> stored in semi-structured stores such as Apache HBase or Apache Cassandra. >> These systems allow for low-latency record-level reads and writes, but lag >> far behind the static file formats in terms of sequential read throughput >> for applications such as SQL-based analytics or machine learning. >> >> >> >> Kudu is a new storage system designed and implemented from the ground up to >> fill this gap between high-throughput sequential-access storage systems >> such as HDFS and low-latency random-access systems such as HBase or >> Cassandra. While these existing systems continue to hold advantages in some >> situations, Kudu offers a “happy medium” alternative that can dramatically >> simplify the architecture of many common workloads. In particular, Kudu >> offers a simple API for row-level inserts, updates, and deletes, while >> providing table scans at throughputs similar to Parquet, a commonly-used >> columnar format for static data. >> >> >> >> More information on Kudu can be found at the existing open source project >> website: http://getkudu.io and in particular in the Kudu white-paper PDF: >> http://getkudu.io/kudu.pdf from which the above was excerpted. >> >> == Rationale == >> >> As described above, Kudu fills an important gap in the open source storage >> ecosystem. After our initial open source project release in September 2015, >> we have seen a great amount of interest across a diverse set of users and >> companies. We believe that, as a storage system, it is critical to build an >> equally diverse set of contributors in the development community. Our >> experiences as committers and PMC members on other Apache projects have >> taught us the value of diverse communities in ensuring both longevity and >> high quality for such foundational systems. >> >> == Initial Goals == >> >> * Move the existing codebase, website, documentation, and mailing lists to >> Apache-hosted infrastructure >> * Work with the infrastructure team to implement and approve our code >> review, build, and testing workflows in the context of the ASF >> * Incremental development and releases per Apache guidelines >> >> == Current Status == >> >> ==== Releases ==== >> >> Kudu has undergone one public release, tagged here >> https://github.com/cloudera/kudu/tree/kudu0.5.0-release >> >> This initial release was not performed in the typical ASF fashion -- no >> source tarball was released, but rather only convenience binaries made >> available in Cloudera’s repositories. We will adopt the ASF source release >> process upon joining the incubator. >> >> >> ==== Source ==== >> >> Kudu’s source is currently hosted on GitHub at >> https://github.com/cloudera/kudu >> >> This repository will be transitioned to Apache’s git hosting during >> incubation. >> >> >> >> ==== Code review ==== >> >> Kudu’s code reviews are currently public and hosted on Gerrit at >> http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu >> >> The Kudu developer community is very happy with gerrit and hopes to work >> with the Apache Infrastructure team to figure out how we can continue to >> use Gerrit within ASF policies. >> >> >> >> ==== Issue tracking ==== >> >> Kudu’s bug and feature tracking is hosted on JIRA at: >> https://issues.cloudera.org/projects/KUDU/summary >> >> This JIRA instance contains bugs and development discussion dating back 2 >> years prior to Kudu’s open source release and will provide an initial seed >> for the ASF JIRA. >> >> >> >> ==== Community discussion ==== >> >> Kudu has several public discussion forums, linked here: >> http://getkudu.io/community.html >> >> >> >> ==== Build Infrastructure ==== >> >> The Kudu Gerrit instance is configured to only allow patches to be >> committed after running them through an extensive set of pre-commit tests >> and code lints. The project currently makes use of elastic public cloud >> resources to perform these tests. Until this point, these resources have >> been internal to Cloudera, though we are currently investing in moving to a >> publicly accessible infrastructure. >> >> >> >> ==== Development practices ==== >> >> Given that Kudu is a persistent storage engine, the community has a high >> quality bar for contributions to its core. We have a firm belief that high >> quality is achieved through automation, not manual inspection, and hence >> put a focus on thorough testing and build infrastructure to ensure that >> bar. The development community also practices review-then-commit for all >> changes to ensure that changes are accompanied by appropriate tests, are >> well commented, etc. >> >> Rather than seeing these practices as barriers to contribution, we believe >> that a fully automated and standardized review and testing practice makes >> it easier for new contributors to have patches accepted. Any new developer >> may post a patch to Gerrit using the same workflow as a seasoned >> contributor, and the same suite of tests will be automatically run. If the >> tests pass, a committer can quickly review and commit the contribution from >> their web browser. >> >> === Meritocracy === >> >> We believe strongly in meritocracy in electing committers and PMC members. >> We believe that contributions can come in forms other than just code: for >> example, one of our initial proposed committers has contributed solely in >> the area of project documentation. We will encourage contributions and >> participation of all types, and ensure that contributors are appropriately >> recognized. >> >> === Community === >> >> Though Kudu is relatively new as an open source project, it has already >> seen promising growth in its community across several organizations: >> >> * '''Cloudera''' is the original development sponsor for Kudu. >> * '''Xiaomi''' has been helping to develop and optimize Kudu for a new >> production use case, contributing code, benchmarks, feedback, and >> conference talks. >> * '''Intel''' has contributed optimizations related to their hardware >> technologies. >> * '''Dropbox''' has been experimenting with Kudu for a machine monitoring >> use case, and has been contributing bug reports and product feedback. >> * '''Dremio''' is working on integration with Apache Drill and exploring >> using Kudu in a production use case. >> * Several community-built Docker images, tutorials, and blog posts have >> sprouted up since Kudu’s release. >> >> >> >> By bringing Kudu to Apache, we hope to encourage further contribution from >> the above organizations as well as to engage new users and contributors in >> the community. >> >> === Core Developers === >> >> Kudu was initially developed as a project at Cloudera. Most of the >> contributions to date have been by developers employed by Cloudera. >> >> >> >> Many of the developers are committers or PMC members on other Apache >> projects. >> >> === Alignment === >> >> As a project in the big data ecosystem, Kudu is aligned with several other >> ASF projects. Kudu includes input/output format integration with Apache >> Hadoop, and this integration can also provide a bridge to Apache Spark. We >> are planning to integrate with Apache Hive in the near future. We also >> integrate closely with Cloudera Impala, which is also currently being >> proposed for incubation. We have also scheduled a hackathon with the Apache >> Drill team to work on integration with that query engine. >> >> == Known Risks == >> >> === Orphaned Products === >> >> The risk of Kudu being abandoned is low. Cloudera has invested a great deal >> in the initial development of the project, and intends to grow its >> investment over time as Kudu becomes a product adopted by its customer >> base. Several other organizations are also experimenting with Kudu for >> production use cases which would live for many years. >> >> === Inexperience with Open Source === >> >> Kudu has been released in the open for less than two months. However, from >> our very first public announcement we have been committed to open-source >> style development: >> >> * our code reviews are fully public and documented on a mailing list >> * our daily development chatter is in a public chat room >> * we send out weekly “community status” reports highlighting news and >> contributions >> * we published our entire JIRA history and discuss bugs in the open >> * we published our entire Git commit history, going back three years (no >> squashing) >> >> >> >> Several of the initial committers are experienced open source developers, >> several being committers and/or PMC members on other ASF projects (Hadoop, >> HBase, Thrift, Flume, et al). Those who are not ASF committers have >> experience on non-ASF open source projects (Kiji, open-vm-tools, et al). >> >> === Homogenous Developers === >> >> The initial committers are employees or former employees of Cloudera. >> However, the committers are spread across multiple offices (Palo Alto, San >> Francisco, Melbourne), so the team is familiar with working in a >> distributed environment across varied time zones. >> >> >> >> The project has received some contributions from developers outside of >> Cloudera, and is starting to attract a ''user'' community as well. We hope >> to continue to encourage contributions from these developers and community >> members and grow them into committers after they have had time to continue >> their contributions. >> >> === Reliance on Salaried Developers === >> >> As mentioned above, the majority of development up to this point has been >> sponsored by Cloudera. We have seen several community users participate in >> discussions who are hobbyists interested in distributed systems and >> databases, and hope that they will continue their participation in the >> project going forward. >> >> === Relationships with Other Apache Products === >> >> Kudu is currently related to the following other Apache projects: >> >> * Hadoop: Kudu provides MapReduce input/output formats for integration >> * Spark: Kudu integrates with Spark via the above-mentioned input formats, >> and work is progressing on support for Spark Data Frames and Spark SQL. >> >> >> >> The Kudu team has reached out to several other Apache projects to start >> discussing integrations, including Flume, Kafka, Hive, and Drill. >> >> >> >> Kudu integrates with Impala, which is also being proposed for incubation. >> >> >> >> Kudu is already collaborating on ValueVector, a proposed TLP spinning out >> from the Apache Drill community. >> >> >> >> We look forward to continuing to integrate and collaborate with these >> communities. >> >> === An Excessive Fascination with the Apache Brand === >> >> Many of the initial committers are already experienced Apache committers, >> and understand the true value provided by the Apache Way and the principles >> of the ASF. We believe that this development and contribution model is >> especially appropriate for storage products, where Apache’s >> community-over-code philosophy ensures long term viability and >> consensus-based participation. >> >> == Documentation == >> >> * Documentation is written in AsciiDoc and committed in the Kudu source >> repository: >> >> * https://github.com/cloudera/kudu/tree/master/docs >> >> >> >> * The Kudu web site is version-controlled on the ‘gh-pages’ branch of the >> above repository. >> >> * A LaTeX whitepaper is also published, and the source is available within >> the same repository. >> * APIs are documented within the source code as JavaDoc or C++-style >> documentation comments. >> * Many design documents are stored within the source code repository as >> text files next to the code being documented. >> >> == Source and Intellectual Property Submission Plan == >> >> The Kudu codebase and web site is currently hosted on GitHub and will be >> transitioned to the ASF repositories during incubation. Kudu is already >> licensed under the Apache 2.0 license. >> >> >> >> Some portions of the code are imported from other open source projects >> under the Apache 2.0, BSD, or MIT licenses, with copyrights held by authors >> other than the initial committers. These copyright notices are maintained >> in those files as well as a top-level NOTICE.txt file. We believe this to >> be permissible under the license terms and ASF policies, and confirmed via >> a recent thread on general@incubator.apache.org . >> >> >> >> The “Kudu” name is not a registered trademark, though before the initial >> release of the project, we performed a trademark search and Cloudera’s >> legal counsel deemed it acceptable in the context of a data storage engine. >> There exists an unrelated open source project by the same name related to >> deployments on Microsoft’s Azure cloud service. We have been in contact >> with legal counsel from Microsoft and have obtained their approval for the >> use of the Kudu name. >> >> >> >> Cloudera currently owns several domain names related to Kudu (getkudu.io, >> kududb.io, et al) which will be transferred to the ASF and redirected to >> the official page during incubation. >> >> >> >> Portions of Kudu are protected by pending or published patents owned by >> Cloudera. Given the protections already granted by the Apache License, we >> do not anticipate any explicit licensing or transfer of this intellectual >> property. >> >> == External Dependencies == >> >> The full set of dependencies and licenses are listed in >> https://github.com/cloudera/kudu/blob/master/LICENSE.txt >> >> and summarized here: >> >> * '''Twitter Bootstrap''': Apache 2.0 >> * '''d3''': BSD 3-clause >> * '''epoch JS library''': MIT >> * '''lz4''': BSD 2-clause >> * '''gflags''': BSD 3-clause >> * '''glog''': BSD 3-clause >> * '''gperftools''': BSD 3-clause >> * '''libev''': BSD 2-clause >> * '''squeasel''':MIT license >> * '''protobuf''': BSD 3-clause >> * '''rapidjson''': MIT >> * '''snappy''': BSD 3-clause >> * '''trace-viewer''': BSD 3-clause >> * '''zlib''': zlib license >> * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike) >> * '''bitshuffle''': MIT >> * '''boost''': Boost license >> * '''curl''': MIT >> * '''libunwind''': MIT >> * '''nvml''': BSD 3-clause >> * '''cyrus-sasl''': Cyrus SASL license (BSD-alike) >> * '''openssl''': OpenSSL License (BSD-alike) >> >> * '''Guava''': Apache 2.0 >> * '''StumbleUpon Async''': BSD >> * '''Apache Hadoop''': Apache 2.0 >> * '''Apache log4j''': Apache 2.0 >> * '''Netty''': Apache 2.0 >> * '''slf4j''': MIT >> * '''Apache Commons''': Apache 2.0 >> * '''murmur''': Apache 2.0 >> >> >> '''Build/test-only dependencies''': >> >> * '''CMake''': BSD 3-clause >> * '''gcovr''': BSD 3-clause >> * '''gmock''': BSD 3-clause >> * '''Apache Maven''': Apache 2.0 >> * '''JUnit''': EPL >> * '''Mockito''': MIT >> >> == Cryptography == >> >> Kudu does not currently include any cryptography-related code. >> >> == Required Resources == >> >> === Mailing lists === >> >> * priv...@kudu.incubator.apache.org (PMC) >> * comm...@kudu.incubator.apache.org (git push emails) >> * iss...@kudu.incubator.apache.org (JIRA issue feed) >> * d...@kudu.incubator.apache.org (Gerrit code reviews plus dev discussion) >> * u...@kudu.incubator.apache.org (User questions) >> >> >> === Repository === >> >> * git://git.apache.org/kudu >> >> === Gerrit === >> >> We hope to continue using Gerrit for our code review and commit workflow. >> The Kudu team has already been in contact with Jake Farrell to start >> discussions on how Gerrit can fit into the ASF. We know that several other >> ASF projects and podlings are also interested in Gerrit. >> >> >> >> If the Infrastructure team does not have the bandwidth to support Gerrit, >> we will continue to support our own instance of Gerrit for Kudu, and make >> the necessary integrations such that commits are properly authenticated and >> maintain sufficient provenance to uphold the ASF standards (e.g. via the >> solution adopted by the AsterixDB podling). >> >> == Issue Tracking == >> >> We would like to import our current JIRA project into the ASF JIRA, such >> that our historical commit messages and code comments continue to reference >> the appropriate bug numbers. >> >> == Initial Committers == >> >> * Adar Dembo a...@cloudera.com >> * Alex Feinberg a...@strlen.net >> * Andrew Wang w...@apache.org >> * Dan Burkert d...@cloudera.com >> * David Alves dral...@apache.org >> * Jean-Daniel Cryans jdcry...@apache.org >> * Mike Percy mpe...@apache.org >> * Misty Stanley-Jones mi...@apache.org >> * Todd Lipcon t...@apache.org >> >> The initial list of committers was seeded by listing those contributors who >> have contributed 20 or more patches in the last 12 months, indicating that >> they are active and have achieved merit through participation on the >> project. We chose not to include other contributors who either have not yet >> contributed a significant number of patches, or whose contributions are far >> in the past and we don’t expect to be active within the ASF. >> >> == Affiliations == >> >> * Adar Dembo - Cloudera >> * Alex Feinberg - Forward Networks >> * Andrew Wang - Cloudera >> * Dan Burkert - Cloudera >> * David Alves - Cloudera >> * Jean-Daniel Cryans - Cloudera >> * Mike Percy - Cloudera >> * Misty Stanley-Jones - Cloudera >> * Todd Lipcon - Cloudera >> >> == Sponsors == >> >> === Champion === >> >> * Todd Lipcon >> >> === Nominated Mentors === >> >> * Jake Farrell - ASF Member and Infra team member, Acquia >> * Brock Noland - ASF Member, StreamSets >> * Michael Stack - ASF Member, Cloudera >> * Jarek Jarcec Cecho - ASF Member, Cloudera >> * Chris Mattmann - ASF Member, NASA JPL and USC >> * Julien Le Dem - Incubator PMC, Dremio >> * Carl Steinbach - ASF Member, LinkedIn >> >> === Sponsoring Entity === >> >> The Apache Incubator > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org >
-- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org