+1 (binding) Thank you, Atri.
--Chris Nauroth On 10/9/15, 8:55 AM, "Atri Sharma" <[email protected]> wrote: >Hi all, > >Following the discussion about Concerted I would like to call a vote for >accepting Concerted as a new incubator project. > >The proposal text is included below, and available on the wiki: > >https://wiki.apache.org/incubator/ConcertedProposal > >The vote is open for 72 hours: > >[ ] +1 accept Concerted in the Incubator >[ ] ±0 >[ ] -1 (please give reason) > >Regards, > >Atri > >= Abstract = > >Concerted is an in memory write less read more engine aimed to provide >extreme read performance with very high degree of concurrency and >scalability and focus on minimizing own resource footprint. > >= Proposal = >Concerted is built on the principal that a new type of workload is >dominating the scene and is now needed to be supported. These are the >large >data set analytical workloads being analyzed or used on large clusters or >high power machines. Large analytical workloads depend on the ability to >query large data sets efficiently and in high concurrency while >maintaining >semantics such as immediate consistency. An in memory engine designed to >support extreme read queries while providing support for aggregation >through various features (such as multidimensional representation of >tuples) will accelerate many usecases around large scale analytics. > >Concerted believes that best understanding of user application lies with >user application developer. The need for massive read scaling should be on >demand and should be flexible to the level that user can decide as to >which >representation and access of data suits his/her current requirements. >Hence, Concerted is not built in a traditional client/server model. >Concerted provides users with an API which can be used to load, read, >update and delete data. User chooses which data structure has to be used >for his current requirements. All API access is covered by Concerted's >internal systems like lock manager, transaction manager and cache manager >which ensure that reads scale to high level in every API call. > >Concerted is a Do It Yourself in memory platform for making in memory >supporting engines. The use case we think of is supporting big data >warehouses like Hive, but there are endless use cases for a custom, highly >scalable in memory platform. > >The goal of this proposal is to leverage an existing code base available >on >Github and licensed under the Apache License 2.0 to build a community >around the project. Currently the community consists of existing hackers >of >Concerted as well as people who have been following and associated with >the >project since a while as well as database experts who are excited about >building a project like this. We are hoping that entering into Apache >would >help us attract more contributors as well as connect with existing big >data >projects like Apache Hive, Apache HAWQ, Apache Storm, Apache Tajo, Apache >Spark, Apache Geode to leverage their community base while assisting in >their use cases with Concerted. We had a discussion with founders of >Apache >Tajo and they showed interest in using Concerted for some of their use >cases. >= Background = >Relational databases were built with the cost of physical memory in mind. >The cost is no longer very relevant and physical memory is now available >on >demand. Another driving factor behind Concerted is that there is a >paradigm >shift with big data coming into picture. Disk IO speeds are more of a >bottleneck than ever before. Combining the read dominance of analytical >workload with the speed of in memory structures, Concerted fits the >current >scene. Also, supporting OLAP workloads with in memory support for faster >read constant queries and joins will be useful. > >= Rationale = >As explained above, large analytical workloads need an in memory >lightweight engine which supports massive read concurrency, ground level >support for aggregations and analytics, extreme scalability and high read >performance, along with the engine being very light itself. Concerted aims >to solve these needs. Concerted is designed and built with three goals as >objectives: > > >Performance > To provide high performance access to data from a large number of >rows, >Concerted uses efficient representation and in memory indexing of data >coupled with high performance transactions, custom transactions and >lightweight locking and lockless techniques and an intelligent locking >manager. > >Scalability > Concerted is built with extreme concurrency and scalability in mind. > >Efficiency > Concerted aims to give expected performance under vast variety of >workloads and aims to have as low footprint as possible. > >= Initial Goals = >The initial goal is to leverage an existing code base and invest in >building a community around the project. We anticipate a lot of initial >restructuring of the existing code so that it becomes easier to include >new >contributors and minimize ramp up time. We plan to approach this >refactoring in a fully transparent, community-driven way thus starting to >practice the "Apache Way" governance model from the get go. > >Various contributors are getting individual changes into branches in >github >repository and our initial major goal will be to merge in all those >changes >in master repository. > >= Current Status = >Concerted is currently under restructuring to suit the needs of an open >source project. Current source is available at >https://github.com/atris/Concerted (Please note that updated codebase is >not yet present on github) Concerted is currently being licensed under >Apache License 2.0. Most of the code base is implemented in C and C++ and >has external dependencies listed later. > >== Meritocracy == > >We plan to drive the technical roadmap and implementation in a fully >transparent, community-driven way soliciting feedback from all of the >community members and building a consensus-driven approach to evolving the >code base and the community itself. Users and new contributors will be >treated with respect and welcomed. By participating in the community and >providing quality patches/support that move the project forward, >contributors will earn merit. They also will be encouraged to provide >non-code contributions (documentation, events, community management, etc.) >and will gain merit for doing so. Those with a proven support and quality >track record will be encouraged to become committers. > >== Community == >In memory is the new cutting edge thing and a new community around >performance oriented systems and enhancing relational database performance >by having complete in memory OLTP engines will greatly benefit >performance. >So we expect data warehousing projects and communities as well as projects >and companies looking for high performance OLTP performance. In addition, >Ingenium Data Systems is building products around Concerted and will have >salaried developers contribute to the project as part of job >responsibility. > >== Core Developers == >Core developers are a diverse group of developers, many of which are very >experienced in open source and the Apache Hadoop ecosystem. Specifically, >Atri is an Apache Apex committer and Atri and Pavel are major contributors >to PostgreSQL project.Atri is also committer for other open source >projects. > > * Amrish <amrishs AT ingeniumsys DOT com> > * Nupur S <nupurs AT ingeniumsys DOT com> > * Pavel Stehule <pavel DOT stehule AT gmail.com> > * Atri Sharma <atri AT apache DOT org> > * Nishith Singhal <nishsinghal AT gmail DOT com> > * Michael Down <michael AT dowuk DOT com> > * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com> > * Wang Albert <albertwang87 AT gmail DOT com> > * Hans-Jurgen Schonig <postgres AT cybertec DOT at> > * Kris Popat <krispopat AT apache DOT org> > * Ayrton Gomesz <com DOT ayrton AT gmail DOT com> > >== Alignment == >Concerted will be helpful to systems like Tajo which can benefit with in >memory structures optimized for heavy reads and joins (dimension tables). >In addition Concerted will benefit projects looking for in memory >relational database as a metadata store, which is the case for most of the >Apache Big Data projects. We expect Apache HAWQ (incubating), Apache Hive, >Apache Storm, Apache Tajo to be utilizing Concerted as a supporting >engine. >For eg, a data warehouse built on HAWQ, Hive or Tajo can utilize Concerted >as an in memory engine for querying and joining dimensional tables. > >= Known Risks = > >== Orphaned Products == >Most of the code is developed by a small group of core developers and this >may be a risk for orphaned product. However, the code base is simple as >compared to other open source projects and the interest level in Concerted >has risen exponentially over the years with many computer professionals >expressing interest in the project and doing some use cases of the >same.Specifically, there were some projects done around Concerted in JIIT, >Noida (an engineering school) and Wang is a student in Lehigh University >who has been following Concerted's progress over many years. The core >developers are aligned with this project and since the code base is >simple, >future committers will have a quick ramp up and the risk shall be >mitigated. Besides, Ingenium Data Systems is launching a product based on >Concerted and will be having all its salaried developers contribute to >Concerted as a part of their job functions. > >== Inexperience with Open Source == >Most of the initial committers have experience working on open source >projects. In particular, Atri is an active member of many open source >projects. > >== Homogeneous Developers == >Although initial core developers were based out of India, community now >consists of computer professionals from various parts of the world hence >diversity should not be an issue. In addition, we will be documenting >internals of the project in public facing documents and it shall allow >more >contributors to join in. > >== Reliance on Salaried Developers == >It is expected that Concerted development will occur on both salaried time >and on volunteer time. Nupur and Amrish belong to Ingenium and are >committed to building this project along with their team. Atri, as the >originator of this project, will be actively working on the project and is >now pushing Concerted into major data warehousing projects, since he is >involved in architecture of data platforms. Developers are expected to be >contributing in their volunteer time. In addition, we will be working with >various open source projects which will be benefited by Concerted and will >be involving those communities into Concerted's development as well. For >eg, Apache Tajo has shown interest and will be supporting development of >the project. > >== Relationships with Other Apache Products == >Concerted has some overlapping function with Apache Geode(Incubating). >However, Geode is an in memory key value store whereas Concerted is a >write >less read many engine. Concerted will complement Geode and increase the >use >cases Geode can support with Concerted's help. > >A major objective for Concerted is supporting OLAP workloads and data >warehouses with in memory performance and highly performant reads and >joins. Concerted will be collaborating with many open source projects such >as Apache HAWQ (incubating), Apache Hive, Apache Tajo etc to support their >OLAP workloads hence enabling them to support larger set of usecases with >a >better throughput. For eg, a star schema in Hive will benefit from having >dimension tables in Concerted with highly efficient and scalable reads and >joins will be very fast. Similar workload for Tajo. > >Concerted will fit in many other use cases in Apache spectrum as well. For >eg, Concerted can be used with Apache Geode for in memory aggregation >indexing. Concerted can also be used with Apache Flink for streaming real >time data into in memory, perform in memory aggregation and then >performing >batch processing for efficiency. > > >== A Excessive Fascination with the Apache Brand == >We believe that the "Apache Way" governance model will provide additional >help to us in finding contributors and growing the community. The >community >and development process will make this project more stable and help >establish ubiquitous APIs. In addition, Concerted is looking to support >multiple Apache projects in their use cases and accelerate their >performance while soliciting their support in development of the project. >We will not be using Apache brand for excessive branding or with any >commercial aspects of Concerted. Apache brand will primarily be used for >community building. > >= Documentation = >Public documents are currently in development and will be published soon. > >= Initial Source = >The initial source is written in C++ and is heavily in development. It >will >be restructured and released publicly. >We understand that there might be concerns around github source being >developed by only a single person and development not happening after >2013. >The source on github is only the source initially developed as an >independent project hence the limitation. However, due to reason that >project has been present on github for a while now, it has attracted >attention and people have been using and developing it locally. For eg, >Ingenium Data System took an interest in the project and locally developed >it and used it in an upcoming product they are going to release soon. The >project now wants to accumulate all independent development efforts and >help attract people to grow the community and project. We are currently in >process of updating github repository and making branches for all local >development efforts. > >= Source and Intellectual Property Submission Plan = > >We intend the entire code base to be licensed under the Apache License, >Version 2.0. > >= External Dependencies = >Currently, Concerted only depends on g++ compiler and pthreads. pthreads >will be replaced by Boost in next release. > >= Cryptography = > >N/A > >= Required Resources = >== Mailling List == > *[email protected] (moderated subscriptions) > *[email protected] > *[email protected] > *[email protected] > >== Git Repository == > >https://git-wip-us.apache.org/repos/asf/incubator-concerted.git > >== Issue Tracking == >Jira Concerted (CONCERTED) > >== Other Resources == > * Continuous Integration > * Jenkins > * Wiki > * cwiki.apache.org/confluence/display/CONCERTED > >= Initial Committers = > * Roman Shaposhnik <rvs AT apache DOT org> > * Daniel Dai <daijy AT apache DOT org> > * Jake Farrell <jfarrell AT apache DOT org> > * Lars Hofhansl <larsh AT apache DOT org> > * Julian Hyde <jhyde AT apache DOT org> > * Chris Nauroth <cnauroth AT hortonworks DOT com> > * Pavel Stehule <pavel DOT stehule AT gmail.com> > * Amrish <amrishs AT ingeniumsys DOT com> > * Nupur S <nupurs AT ingeniumsys DOT com> > * Atri Sharma <atri AT apache DOT org> > * Nishith Singhal <nishsinghal AT gmail DOT com> > * Michael Down <michael AT dowuk DOT com> > * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com> > * Wang Albert <albertwang87 AT gmail DOT com> > * Hans-Jurgen Schonig <postgres AT cybertec DOT at> > * Kris Popat <krispopat AT apache DOT org> > * Ayrton Gomesz <com DOT ayrton AT gmail DOT com> > >= Affiliations = > * Roman Shaposhnik (Pivotal) > * Daniel Dai (HortonWorks) > * Jake Farrell (Acquia) > * Lars Hofhansl (Salesforce) > * Julian Hyde (HortonWorks) > * Chris Nauroth (HortonWorks) > * Pavel Stehule (GoodData) > * Amrish (Ingenium Data Systems) > * Nupur S (Ingenium Data Systems) > * Atri Sharma (Barclays) > * Nishith Singhal (Wipro) > * Michael Down (Barclays) > * Vijayakumar Ramdoss (EMC) > * Wang Albert (Lehigh University) > * Hans- Jurgen Schonig (CyberTec) > * Kris Popat (CETIS LLP) > * Ayrton Gomesz (IQLabs) > >The nominated mentors are employees of HortonWorks, Acquia, and >Salesforce. > > * Daniel Dai (HortonWorks) > * Jake Farrell (Acquia) > * Lars Hofhansl (Salesforce) > * Julian Hyde (HortonWorks) > * Chris Nauroth (HortonWorks) > >= Sponsors = > >== Champion == > > * Roman Shaposhnik (rvs AT apache DOT org) > >== Nominated Mentors == > > * Daniel Dai <daijy AT apache DOT org> > * Jake Farrell <jfarrell AT apache DOT org> > * Lars Hofhansl <larsh AT apache DOT org> > * Julian Hyde <jhyde AT apache DOT org> > * Chris Nauroth <cnauroth AT hortonworks DOT com> > >== Sponsoring Entity == >Apache Incubator --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
