COOL proposal

qingpeng Fri, 08 Apr 2022 10:55:11 -0700

Dear Apache Incubator Community,

We propose to contribute COOL as an Apache Incubator project.

COOL is a cohort OLAP system specialized for cohort analysis withextremely low latency. The vision of COOL is to address the inefficiencyof underlying database systems processing cohort analysis (cohortqueries) which is an emerging and widely-used analysis pattern invarious areas. By utilizing COOL, we can process complicated cohortqueries with flexible definitions of cohorts and events in nearreal-time response time.

We need Champions and Mentors, to help guide us on the development ofthis project. Please feel free to contact our team if any of you areinterested.


Thanks a lot.

Best Regards,
Team of COOL



# COOL proposal

## Abstract

COOL is an online cohort analytical processing system specialized forcohort analysis with extremely low latency.



## Proposal

The vision of COOL is to address the inefficiency of underlying databasesystems processing cohort analysis (cohort queries) which is an emergingand widely-used analysis pattern in various areas. In COOL, cohort queryprocessing is facilitated by specialized operators that involve only twofast scans on sophisticated storage to achieve real-time responses.

COOL has been designed to provide user-friendly querying primitives toaddress the pain point of writing complex and lengthy queries for cohortanalysis using SQL-like languages. Specifically, at least five SQLqueries are needed for a conventional OLAP database system to performcohort analysis in a non-intrusive manner.

We submit this proposal to donate the COOL system, its related code, andartifacts (documentation, website application, wiki, etc) to ApacheSoftware Foundation Incubator. We are confident that COOL will furtherpromote the diversity of the Apache community and the Apache is able toprovide COOL with a better environment to build its community, making ita useful and efficient tool for large-scale cohort analysis.



##Background

Cohort analysis (https://en.wikipedia.org/wiki/Cohort_analysis for quickreference) is a method of analyzing metrics across different groups(i.e., cohorts), which share common characteristics in the accumulateddata. These characteristics play a critical role in user profiling andthe decision-making process in data-driven organizations.

For example, cohort analysis is useful in customer retention analysisand the effectiveness of a promotional event. Observing the growth ofusers alongside running the user acquisition, or observing playerprogression in online gaming, we can evaluate how different groups ofplayers evolve as time progresses. The efficiency of cohort queryprocessing is vital in such a scenario as analysts may have to work outstrategies promptly for the online service.

Another example of cohort analysis could be a side-effect evaluation ofa clinical trial, in which the clinicians want to monitor and determinethe effectiveness of new medicines among different patient groups.Unlike the analysts for online services, the clinicians can wait for amuch longer duration (over months or even years) to study theeffectiveness of treatments, etc. However, it is difficult for anyclinician to construct complex cohort queries (using SQL) to conductcohort analysis.

With the target of providing near real-time cohort analysis responses,COOL was initiated as a research project around 2016. It has been usedfor various real-world applications, such as sales of online gamegadgets/equipment, and sales of virtual assets and gears in onlinegames. The COOL system has been designed as a very efficient cohortanalytical processing system with a fast response time and flexibledefinition of cohorts and events. It is at least one order of magnitudefaster than cohort processing using a conventional database engine.

For ease of use, COOL accepts a single self-defined query in JSONformat, rather than multiple complex SQL statements.



##Rationale

There is a strong need to support cohort analysis efficiently andeffectively with the society evolving and COOL meets such need greatly.The querying response of cohort analysis in COOL is real-time, which isat least one order of magnitude compared to traditional OLAP systems.Meanwhile, COOL accepts a single self-defined query in JSON format,rather than multiple complex SQL statements. Besides, COOL can alsointegrate data from different data sources.



##Initial Goals

The initial goal is to move the existing codebase to Apache SoftwareFoundation and improve it with the standard Apache development process.We plan for incremental development in the following directions: morestorage connectors, more file format parsers, a feasible cachingmechanism, and utilizing COOL's cohort results to facilitate buildingmachine learning models. All these will be released in stages with thecommunity following the Apache process.



##Current Status

COOL was started as a research project in the database system lab of NUSaround 2016. All the codes are made available under Apache License V2,and the related artifacts can be found on Github.

The introduction website of the COOL system: http://13.212.103.48:3001/

The GitHub for the source code of the COOL system:https://github.com/COOL-cohort/COOLThe GitHub for the source code of the COOL website:https://github.com/COOL-cohort/COOL-siteThe GitHub for the source code of the COOL webapp:https://github.com/COOL-cohort/COOL-webapp


###Meritocracy

The project was originally created by David Jiang, Qingchao Cai, andZhongle Xie. And the project now has committers and users from bothdifferent organizations in Singapore and China.The committers of the project are all joined by submitting codes fixingbugs and providing new features. If the proposal were accepted, we wouldwork to select PPMC members for the project and continuously operate inthe Apache way.


###Community

Although we are in the early stage of building a well-organizedcommunity, the need for cohort analysis is growing, especially as partof deep customer relationship management (CRM) and medical cohortanalysis. Therefore, COOL should be able to attract more contributorsto join our community to improve its codebase. Besides, we also havemany experienced developers who have participated in building the ApacheSINGA and other open sources, and we are capable of organizing awell-developed community for COOL.


###Core Developers

Thus far, the core developers of COOL are experienced researchers andengineers primarily from the National Unversity of Singapore andZhejiang University. Some of them had participated in Apache Singa andhave adequate open-source experience.


###Alignment

Apache Incubator would be a perfect fit for the project for thefollowing reasons:1. COOL enriches the ecosystem of OLAP systems for underlying ApacheProjects since there is no specialized cohort analytical system in thecurrent project list.2. The developer team of COOL is familiar with the Apache process andway. The lab has already contributed Apache SINGA, a Top-Level Project,to the foundation and a few members from Apache SINGA have joined theCOOL team.3. Joining Apache can help attract and coordinate development effortsfrom companies.4. COOL can naturally connect with Apache projects like HDFS andZooKeeper.




##Known Risks

Currently, the development team members are mostly from universities andresearch institutions. The team fully becomes an "Apache-style" project,the project needs to embrace more developers from the industry or thecommunity.


###Project Name

The name (i.e., COOL) is short and easy to be remembered, and we do notfind any similar names or projects which may cause conflict to the bestof our knowledge. Hence, we believe the name COOL should be suitable forthis project.


###Orphaned products

We believe that the COOL system will draw more attention from users inthe industry and attract more developers to contribute to both thecodebase and community because COOL can not only conduct cohort analysiswith extremely low latency but also simplify the cohort queries withoutdefining complex joint expressions.We have already developed a website application to facilitate possibleusers to use our COOL system to conduct cohort analysis.In practice, we also have deployed the COOL system in NationalUniversity Health System to assist clinicians in analyzing insightfulpatterns among COVID19 patients from cohort results. Meanwhile, the teamhas cooperated with a few companies in building their user cohortanalysis applications.We plan to improve the COOL system from different aspects, such as morestorage connectors, more file format parsers, a feasible cachingmechanism, and utilizing COOL's cohort results to facilitate buildingmachine learning models.


###Inexperience with Open Source

Our initial committers include several experienced developers who hadparticipated in the Apache SINGA project. In fact, some of them are thecore contributors and from the PPMC of the project. Hence, we have theexperience to grow the community and maintain participation.


###Length of Incubation

We have made preliminary plans on improving the COOL from differentaspects and are devoted to realizing them. Besides, our committers areexperienced in developing open-source projects and have participated ingrowing a well-organized community. Hence, we believe all these stepsare realizable.


###Homogenous Developers

The current core developers mainly are researchers from the NationalUniversity of Singapore and Zhejiang University. We also have a smallnumber of developers from ByteDance and other enterprises. We do want tobuild a well-organized community and encourage developers to join andpromote the development of our COOL system.


###Reliance on Salaried Developers

Most of the developers are working for research labs, and universitiesor are studying for their doctorate. They build the COOL system whileconducting their research on cohort analysis and cohort-based neuralnetwork models. The COOL system will be a powerful tool to facilitateadvanced cohort analytics in the commercial world and scientificresearch that exploit the use of cohort of analysis (eg. Reaction todrugs and treatments.)


###Relationships with Other Apache Products

COOL has naturally connected with Apache projects like HDFS andZooKeeper. Besides, COOL is supporting Parquet files as a method to loaddata from other systems into COOL and export data for other downstreamanalysis tasks. Supports for Apache Avro and Apache Arrow are also onour schedule.


###A Excessive Fascination with the Apache Brand

Without a doubt, we appreciate the reputation of the Apache brand, whichwill help to attract contributors and users. We also appreciate theApache development process. We believe that COOL, as a specialized OLAPsystem for cohort analysis, can promote the diversification of theApache community.




##Documentation

The introduction of the COOL system can be found in:http://13.212.103.48:3001/



##Initial Source

The codebase of the COOL system is based on Java and relies on Maven tocompile and build the COOL engine. Besides, we also prepare websiteapplications and interesting use cases to demonstrate how to leverageCOOL. More details can be found on the Introduction webpage or the Gitrepositories.


###Source and Intellectual Property Submission Plan

Once COOL is accepted and sponsored by Apache, we can transfer allsource codes and copyrights to the Apache Software Foundation.


###External Dependencies
All dependencies of the COOL system comply with the Apache License V2.

###Cryptography
Not applicable to COOL.


##Required Resources
###Mailing lists
We plan to use the following mailing lists:
• us...@cool.incubator.apache.org
• d...@cool.incubator.apache.org
• priv...@cool.incubator.apache.org
• comm...@cool.incubator.apache.org

###Subversion Directory
We prefer to continue using Git to control our COOL system development.

###Git Repositories
• COOL system: https://github.com/COOL-cohort/COOL
• COOL website: https://github.com/COOL-cohort/COOL-site
• COOL webapp: https://github.com/COOL-cohort/COOL-webapp

###Issue Tracking
We would like to use JIRA to track issues.


##Initial Committers
• Beng Chin Ooi (oo...@comp.nus.edu.sg)
• Zhongle Xie (xi...@zju.edu.cn)
• Meihui Zhang (meihui_zh...@bit.edu.cn)
• Qingpeng Cai (qingp...@comp.nus.edu.sg)
• Naili Xing (dcsx...@nus.edu.sg)
• Guoyu Hu (guoyu...@u.nus.edu)
• Hongbin Ying (yinghong...@mzhtechnologies.com)
• Changshuo Liu (changs...@u.nus.edu)
• Fei Xiao (fxiao...@comp.nus.edu.sg)
• Yuncheng Wu (dcsw...@nus.edu.sg)
• Gang Chen (c...@zju.edu.cn)
• Pengyuan Shen (she...@mzhtechnologies.com)
• Chenghao Cai (chenghao....@nusri.cn)
• Ishant virendra Wankhede (ishant.virendra.wankh...@walmart.com)


##Affiliations
• Beng Chin Ooi, National University of Singapore
• Zhongle Xie, Zhejiang University
• Meihui Zhang, Beijing Institute of Technology
• Qingpeng Cai, National University of Singapore
• Naili Xing, National University of Singapore
• Hongbin Ying, MZH Technologies
• Guoyu Hu, National University of Singapore
• Changshuo Liu, National University of Singapore
• Fei Xiao, National University of Singapore
• Yuncheng Wu, National University of Singapore
• Gang Chen, Zhejiang University
• Pengyuan Shen, MZH Technologies
• Chenghao Cai, NUS AI Innovation and Commercialisation Centre
• Ishant virendra Wankhede, Walmart


##Sponsors
###Champion
TODO
###Nominated Mentors
TODO
###Sponsoring Entity
The Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

COOL proposal

Reply via email to