I’m not sure this needs to be resolved before the polling can be accepted into the Incubator.
Regards, Alan > On Oct 9, 2015, at 2:01 PM, Julian Hyde <jh...@apache.org> wrote: > > I have agreed to be a mentor to Concerted and I think it is an > interesting idea. I am inclined to vote for it entering the incubator. > > However since the project has not released any source code yet, there > are a couple of questions I'd like to get answered for the record: > > 1. How many lines of existing code are there? What is their approximate age? > > 2. Concerted is in C/C++ but you mention interfacing with JVM-based > products like Hive. How you would interface with other languages? Is > it a goal of the project to create APIs to other languages such as > Java? Would access from those languages be as efficient as native > access? > > I apologize that I didn't bring these up in the discussion thread. > > Julian > > > On Fri, Oct 9, 2015 at 11:53 AM, Ayrton Gomesz <com.ayr...@gmail.com> wrote: >> +1 >> @henry.saputra thanks man >> On Oct 9, 2015 5:50 PM, "Henry Saputra" <henry.sapu...@gmail.com> wrote: >> >>> +1 (binding) >>> Good luck guys! >>> >>> On Fri, Oct 9, 2015 at 8:55 AM, Atri Sharma <a...@apache.org> wrote: >>>> Hi all, >>>> >>>> Following the discussion about Concerted I would like to call a vote for >>>> accepting Concerted as a new incubator project. >>>> >>>> The proposal text is included below, and available on the wiki: >>>> >>>> https://wiki.apache.org/incubator/ConcertedProposal >>>> >>>> The vote is open for 72 hours: >>>> >>>> [ ] +1 accept Concerted in the Incubator >>>> [ ] ±0 >>>> [ ] -1 (please give reason) >>>> >>>> Regards, >>>> >>>> Atri >>>> >>>> = Abstract = >>>> >>>> Concerted is an in memory write less read more engine aimed to provide >>>> extreme read performance with very high degree of concurrency and >>>> scalability and focus on minimizing own resource footprint. >>>> >>>> = Proposal = >>>> Concerted is built on the principal that a new type of workload is >>>> dominating the scene and is now needed to be supported. These are the >>> large >>>> data set analytical workloads being analyzed or used on large clusters or >>>> high power machines. Large analytical workloads depend on the ability to >>>> query large data sets efficiently and in high concurrency while >>> maintaining >>>> semantics such as immediate consistency. An in memory engine designed to >>>> support extreme read queries while providing support for aggregation >>>> through various features (such as multidimensional representation of >>>> tuples) will accelerate many usecases around large scale analytics. >>>> >>>> Concerted believes that best understanding of user application lies with >>>> user application developer. The need for massive read scaling should be >>> on >>>> demand and should be flexible to the level that user can decide as to >>> which >>>> representation and access of data suits his/her current requirements. >>>> Hence, Concerted is not built in a traditional client/server model. >>>> Concerted provides users with an API which can be used to load, read, >>>> update and delete data. User chooses which data structure has to be used >>>> for his current requirements. All API access is covered by Concerted's >>>> internal systems like lock manager, transaction manager and cache manager >>>> which ensure that reads scale to high level in every API call. >>>> >>>> Concerted is a Do It Yourself in memory platform for making in memory >>>> supporting engines. The use case we think of is supporting big data >>>> warehouses like Hive, but there are endless use cases for a custom, >>> highly >>>> scalable in memory platform. >>>> >>>> The goal of this proposal is to leverage an existing code base available >>> on >>>> Github and licensed under the Apache License 2.0 to build a community >>>> around the project. Currently the community consists of existing hackers >>> of >>>> Concerted as well as people who have been following and associated with >>> the >>>> project since a while as well as database experts who are excited about >>>> building a project like this. We are hoping that entering into Apache >>> would >>>> help us attract more contributors as well as connect with existing big >>> data >>>> projects like Apache Hive, Apache HAWQ, Apache Storm, Apache Tajo, Apache >>>> Spark, Apache Geode to leverage their community base while assisting in >>>> their use cases with Concerted. We had a discussion with founders of >>> Apache >>>> Tajo and they showed interest in using Concerted for some of their use >>>> cases. >>>> = Background = >>>> Relational databases were built with the cost of physical memory in mind. >>>> The cost is no longer very relevant and physical memory is now available >>> on >>>> demand. Another driving factor behind Concerted is that there is a >>> paradigm >>>> shift with big data coming into picture. Disk IO speeds are more of a >>>> bottleneck than ever before. Combining the read dominance of analytical >>>> workload with the speed of in memory structures, Concerted fits the >>> current >>>> scene. Also, supporting OLAP workloads with in memory support for faster >>>> read constant queries and joins will be useful. >>>> >>>> = Rationale = >>>> As explained above, large analytical workloads need an in memory >>>> lightweight engine which supports massive read concurrency, ground level >>>> support for aggregations and analytics, extreme scalability and high read >>>> performance, along with the engine being very light itself. Concerted >>> aims >>>> to solve these needs. Concerted is designed and built with three goals as >>>> objectives: >>>> >>>> >>>> Performance >>>> To provide high performance access to data from a large number of >>> rows, >>>> Concerted uses efficient representation and in memory indexing of data >>>> coupled with high performance transactions, custom transactions and >>>> lightweight locking and lockless techniques and an intelligent locking >>>> manager. >>>> >>>> Scalability >>>> Concerted is built with extreme concurrency and scalability in mind. >>>> >>>> Efficiency >>>> Concerted aims to give expected performance under vast variety of >>>> workloads and aims to have as low footprint as possible. >>>> >>>> = Initial Goals = >>>> The initial goal is to leverage an existing code base and invest in >>>> building a community around the project. We anticipate a lot of initial >>>> restructuring of the existing code so that it becomes easier to include >>> new >>>> contributors and minimize ramp up time. We plan to approach this >>>> refactoring in a fully transparent, community-driven way thus starting to >>>> practice the "Apache Way" governance model from the get go. >>>> >>>> Various contributors are getting individual changes into branches in >>> github >>>> repository and our initial major goal will be to merge in all those >>> changes >>>> in master repository. >>>> >>>> = Current Status = >>>> Concerted is currently under restructuring to suit the needs of an open >>>> source project. Current source is available at >>>> https://github.com/atris/Concerted (Please note that updated codebase is >>>> not yet present on github) Concerted is currently being licensed under >>>> Apache License 2.0. Most of the code base is implemented in C and C++ and >>>> has external dependencies listed later. >>>> >>>> == Meritocracy == >>>> >>>> We plan to drive the technical roadmap and implementation in a fully >>>> transparent, community-driven way soliciting feedback from all of the >>>> community members and building a consensus-driven approach to evolving >>> the >>>> code base and the community itself. Users and new contributors will be >>>> treated with respect and welcomed. By participating in the community and >>>> providing quality patches/support that move the project forward, >>>> contributors will earn merit. They also will be encouraged to provide >>>> non-code contributions (documentation, events, community management, >>> etc.) >>>> and will gain merit for doing so. Those with a proven support and quality >>>> track record will be encouraged to become committers. >>>> >>>> == Community == >>>> In memory is the new cutting edge thing and a new community around >>>> performance oriented systems and enhancing relational database >>> performance >>>> by having complete in memory OLTP engines will greatly benefit >>> performance. >>>> So we expect data warehousing projects and communities as well as >>> projects >>>> and companies looking for high performance OLTP performance. In addition, >>>> Ingenium Data Systems is building products around Concerted and will have >>>> salaried developers contribute to the project as part of job >>> responsibility. >>>> >>>> == Core Developers == >>>> Core developers are a diverse group of developers, many of which are very >>>> experienced in open source and the Apache Hadoop ecosystem. Specifically, >>>> Atri is an Apache Apex committer and Atri and Pavel are major >>> contributors >>>> to PostgreSQL project.Atri is also committer for other open source >>> projects. >>>> >>>> * Amrish <amrishs AT ingeniumsys DOT com> >>>> * Nupur S <nupurs AT ingeniumsys DOT com> >>>> * Pavel Stehule <pavel DOT stehule AT gmail.com> >>>> * Atri Sharma <atri AT apache DOT org> >>>> * Nishith Singhal <nishsinghal AT gmail DOT com> >>>> * Michael Down <michael AT dowuk DOT com> >>>> * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com> >>>> * Wang Albert <albertwang87 AT gmail DOT com> >>>> * Hans-Jurgen Schonig <postgres AT cybertec DOT at> >>>> * Kris Popat <krispopat AT apache DOT org> >>>> * Ayrton Gomesz <com DOT ayrton AT gmail DOT com> >>>> >>>> == Alignment == >>>> Concerted will be helpful to systems like Tajo which can benefit with in >>>> memory structures optimized for heavy reads and joins (dimension tables). >>>> In addition Concerted will benefit projects looking for in memory >>>> relational database as a metadata store, which is the case for most of >>> the >>>> Apache Big Data projects. We expect Apache HAWQ (incubating), Apache >>> Hive, >>>> Apache Storm, Apache Tajo to be utilizing Concerted as a supporting >>> engine. >>>> For eg, a data warehouse built on HAWQ, Hive or Tajo can utilize >>> Concerted >>>> as an in memory engine for querying and joining dimensional tables. >>>> >>>> = Known Risks = >>>> >>>> == Orphaned Products == >>>> Most of the code is developed by a small group of core developers and >>> this >>>> may be a risk for orphaned product. However, the code base is simple as >>>> compared to other open source projects and the interest level in >>> Concerted >>>> has risen exponentially over the years with many computer professionals >>>> expressing interest in the project and doing some use cases of the >>>> same.Specifically, there were some projects done around Concerted in >>> JIIT, >>>> Noida (an engineering school) and Wang is a student in Lehigh University >>>> who has been following Concerted's progress over many years. The core >>>> developers are aligned with this project and since the code base is >>> simple, >>>> future committers will have a quick ramp up and the risk shall be >>>> mitigated. Besides, Ingenium Data Systems is launching a product based on >>>> Concerted and will be having all its salaried developers contribute to >>>> Concerted as a part of their job functions. >>>> >>>> == Inexperience with Open Source == >>>> Most of the initial committers have experience working on open source >>>> projects. In particular, Atri is an active member of many open source >>>> projects. >>>> >>>> == Homogeneous Developers == >>>> Although initial core developers were based out of India, community now >>>> consists of computer professionals from various parts of the world hence >>>> diversity should not be an issue. In addition, we will be documenting >>>> internals of the project in public facing documents and it shall allow >>> more >>>> contributors to join in. >>>> >>>> == Reliance on Salaried Developers == >>>> It is expected that Concerted development will occur on both salaried >>> time >>>> and on volunteer time. Nupur and Amrish belong to Ingenium and are >>>> committed to building this project along with their team. Atri, as the >>>> originator of this project, will be actively working on the project and >>> is >>>> now pushing Concerted into major data warehousing projects, since he is >>>> involved in architecture of data platforms. Developers are expected to be >>>> contributing in their volunteer time. In addition, we will be working >>> with >>>> various open source projects which will be benefited by Concerted and >>> will >>>> be involving those communities into Concerted's development as well. For >>>> eg, Apache Tajo has shown interest and will be supporting development of >>>> the project. >>>> >>>> == Relationships with Other Apache Products == >>>> Concerted has some overlapping function with Apache Geode(Incubating). >>>> However, Geode is an in memory key value store whereas Concerted is a >>> write >>>> less read many engine. Concerted will complement Geode and increase the >>> use >>>> cases Geode can support with Concerted's help. >>>> >>>> A major objective for Concerted is supporting OLAP workloads and data >>>> warehouses with in memory performance and highly performant reads and >>>> joins. Concerted will be collaborating with many open source projects >>> such >>>> as Apache HAWQ (incubating), Apache Hive, Apache Tajo etc to support >>> their >>>> OLAP workloads hence enabling them to support larger set of usecases >>> with a >>>> better throughput. For eg, a star schema in Hive will benefit from having >>>> dimension tables in Concerted with highly efficient and scalable reads >>> and >>>> joins will be very fast. Similar workload for Tajo. >>>> >>>> Concerted will fit in many other use cases in Apache spectrum as well. >>> For >>>> eg, Concerted can be used with Apache Geode for in memory aggregation >>>> indexing. Concerted can also be used with Apache Flink for streaming real >>>> time data into in memory, perform in memory aggregation and then >>> performing >>>> batch processing for efficiency. >>>> >>>> >>>> == A Excessive Fascination with the Apache Brand == >>>> We believe that the "Apache Way" governance model will provide additional >>>> help to us in finding contributors and growing the community. The >>> community >>>> and development process will make this project more stable and help >>>> establish ubiquitous APIs. In addition, Concerted is looking to support >>>> multiple Apache projects in their use cases and accelerate their >>>> performance while soliciting their support in development of the project. >>>> We will not be using Apache brand for excessive branding or with any >>>> commercial aspects of Concerted. Apache brand will primarily be used for >>>> community building. >>>> >>>> = Documentation = >>>> Public documents are currently in development and will be published soon. >>>> >>>> = Initial Source = >>>> The initial source is written in C++ and is heavily in development. It >>> will >>>> be restructured and released publicly. >>>> We understand that there might be concerns around github source being >>>> developed by only a single person and development not happening after >>> 2013. >>>> The source on github is only the source initially developed as an >>>> independent project hence the limitation. However, due to reason that >>>> project has been present on github for a while now, it has attracted >>>> attention and people have been using and developing it locally. For eg, >>>> Ingenium Data System took an interest in the project and locally >>> developed >>>> it and used it in an upcoming product they are going to release soon. The >>>> project now wants to accumulate all independent development efforts and >>>> help attract people to grow the community and project. We are currently >>> in >>>> process of updating github repository and making branches for all local >>>> development efforts. >>>> >>>> = Source and Intellectual Property Submission Plan = >>>> >>>> We intend the entire code base to be licensed under the Apache License, >>>> Version 2.0. >>>> >>>> = External Dependencies = >>>> Currently, Concerted only depends on g++ compiler and pthreads. pthreads >>>> will be replaced by Boost in next release. >>>> >>>> = Cryptography = >>>> >>>> N/A >>>> >>>> = Required Resources = >>>> == Mailling List == >>>> *priv...@concerted.incubator.apache.org (moderated subscriptions) >>>> *comm...@concerted.incubator.apache.org >>>> *d...@concerted.incubator.apache.org >>>> *iss...@concerted.incubator.apache.org >>>> >>>> == Git Repository == >>>> >>>> https://git-wip-us.apache.org/repos/asf/incubator-concerted.git >>>> >>>> == Issue Tracking == >>>> Jira Concerted (CONCERTED) >>>> >>>> == Other Resources == >>>> * Continuous Integration >>>> * Jenkins >>>> * Wiki >>>> * cwiki.apache.org/confluence/display/CONCERTED >>>> >>>> = Initial Committers = >>>> * Roman Shaposhnik <rvs AT apache DOT org> >>>> * Daniel Dai <daijy AT apache DOT org> >>>> * Jake Farrell <jfarrell AT apache DOT org> >>>> * Lars Hofhansl <larsh AT apache DOT org> >>>> * Julian Hyde <jhyde AT apache DOT org> >>>> * Chris Nauroth <cnauroth AT hortonworks DOT com> >>>> * Pavel Stehule <pavel DOT stehule AT gmail.com> >>>> * Amrish <amrishs AT ingeniumsys DOT com> >>>> * Nupur S <nupurs AT ingeniumsys DOT com> >>>> * Atri Sharma <atri AT apache DOT org> >>>> * Nishith Singhal <nishsinghal AT gmail DOT com> >>>> * Michael Down <michael AT dowuk DOT com> >>>> * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com> >>>> * Wang Albert <albertwang87 AT gmail DOT com> >>>> * Hans-Jurgen Schonig <postgres AT cybertec DOT at> >>>> * Kris Popat <krispopat AT apache DOT org> >>>> * Ayrton Gomesz <com DOT ayrton AT gmail DOT com> >>>> >>>> = Affiliations = >>>> * Roman Shaposhnik (Pivotal) >>>> * Daniel Dai (HortonWorks) >>>> * Jake Farrell (Acquia) >>>> * Lars Hofhansl (Salesforce) >>>> * Julian Hyde (HortonWorks) >>>> * Chris Nauroth (HortonWorks) >>>> * Pavel Stehule (GoodData) >>>> * Amrish (Ingenium Data Systems) >>>> * Nupur S (Ingenium Data Systems) >>>> * Atri Sharma (Barclays) >>>> * Nishith Singhal (Wipro) >>>> * Michael Down (Barclays) >>>> * Vijayakumar Ramdoss (EMC) >>>> * Wang Albert (Lehigh University) >>>> * Hans- Jurgen Schonig (CyberTec) >>>> * Kris Popat (CETIS LLP) >>>> * Ayrton Gomesz (IQLabs) >>>> >>>> The nominated mentors are employees of HortonWorks, Acquia, and >>> Salesforce. >>>> >>>> * Daniel Dai (HortonWorks) >>>> * Jake Farrell (Acquia) >>>> * Lars Hofhansl (Salesforce) >>>> * Julian Hyde (HortonWorks) >>>> * Chris Nauroth (HortonWorks) >>>> >>>> = Sponsors = >>>> >>>> == Champion == >>>> >>>> * Roman Shaposhnik (rvs AT apache DOT org) >>>> >>>> == Nominated Mentors == >>>> >>>> * Daniel Dai <daijy AT apache DOT org> >>>> * Jake Farrell <jfarrell AT apache DOT org> >>>> * Lars Hofhansl <larsh AT apache DOT org> >>>> * Julian Hyde <jhyde AT apache DOT org> >>>> * Chris Nauroth <cnauroth AT hortonworks DOT com> >>>> >>>> == Sponsoring Entity == >>>> Apache Incubator >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>> For additional commands, e-mail: general-h...@incubator.apache.org >>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org