Dear Apache Incubator Community,

We propose to contribute COOL as an Apache Incubator project.

COOL is a cohort OLAP system specialized for cohort analysis with extremely low latency. The vision of COOL is to address the inefficiency of underlying database systems processing cohort analysis (cohort queries) which is an emerging and widely-used analysis pattern in various areas. By utilizing COOL, we can process complicated cohort queries with flexible definitions of cohorts and events in near real-time response time.

We need Champions and Mentors, to help guide us on the development of this project. Please feel free to contact our team if any of you are interested.

Thanks a lot.

Best Regards,
Team of COOL



# COOL proposal

## Abstract
COOL is an online cohort analytical processing system specialized for cohort analysis with extremely low latency.


## Proposal
The vision of COOL is to address the inefficiency of underlying database systems processing cohort analysis (cohort queries) which is an emerging and widely-used analysis pattern in various areas. In COOL, cohort query processing is facilitated by specialized operators that involve only two fast scans on sophisticated storage to achieve real-time responses.

COOL has been designed to provide user-friendly querying primitives to address the pain point of writing complex and lengthy queries for cohort analysis using SQL-like languages. Specifically, at least five SQL queries are needed for a conventional OLAP database system to perform cohort analysis in a non-intrusive manner.

We submit this proposal to donate the COOL system, its related code, and artifacts (documentation, website application, wiki, etc) to Apache Software Foundation Incubator. We are confident that COOL will further promote the diversity of the Apache community and the Apache is able to provide COOL with a better environment to build its community, making it a useful and efficient tool for large-scale cohort analysis.


##Background
Cohort analysis (https://en.wikipedia.org/wiki/Cohort_analysis for quick reference) is a method of analyzing metrics across different groups (i.e., cohorts), which share common characteristics in the accumulated data. These characteristics play a critical role in user profiling and the decision-making process in data-driven organizations.

For example, cohort analysis is useful in customer retention analysis and the effectiveness of a promotional event. Observing the growth of users alongside running the user acquisition, or observing player progression in online gaming, we can evaluate how different groups of players evolve as time progresses. The efficiency of cohort query processing is vital in such a scenario as analysts may have to work out strategies promptly for the online service.

Another example of cohort analysis could be a side-effect evaluation of a clinical trial, in which the clinicians want to monitor and determine the effectiveness of new medicines among different patient groups. Unlike the analysts for online services, the clinicians can wait for a much longer duration (over months or even years) to study the effectiveness of treatments, etc. However, it is difficult for any clinician to construct complex cohort queries (using SQL) to conduct cohort analysis.

With the target of providing near real-time cohort analysis responses, COOL was initiated as a research project around 2016. It has been used for various real-world applications, such as sales of online game gadgets/equipment, and sales of virtual assets and gears in online games. The COOL system has been designed as a very efficient cohort analytical processing system with a fast response time and flexible definition of cohorts and events. It is at least one order of magnitude faster than cohort processing using a conventional database engine.

For ease of use, COOL accepts a single self-defined query in JSON format, rather than multiple complex SQL statements.


##Rationale
There is a strong need to support cohort analysis efficiently and effectively with the society evolving and COOL meets such need greatly. The querying response of cohort analysis in COOL is real-time, which is at least one order of magnitude compared to traditional OLAP systems. Meanwhile, COOL accepts a single self-defined query in JSON format, rather than multiple complex SQL statements. Besides, COOL can also integrate data from different data sources.


##Initial Goals
The initial goal is to move the existing codebase to Apache Software Foundation and improve it with the standard Apache development process. We plan for incremental development in the following directions: more storage connectors, more file format parsers, a feasible caching mechanism, and utilizing COOL's cohort results to facilitate building machine learning models. All these will be released in stages with the community following the Apache process.


##Current Status
COOL was started as a research project in the database system lab of NUS around 2016. All the codes are made available under Apache License V2, and the related artifacts can be found on Github.
The introduction website of the COOL system: http://13.212.103.48:3001/
The GitHub for the source code of the COOL system: https://github.com/COOL-cohort/COOL The GitHub for the source code of the COOL website: https://github.com/COOL-cohort/COOL-site The GitHub for the source code of the COOL webapp: https://github.com/COOL-cohort/COOL-webapp

###Meritocracy
The project was originally created by David Jiang, Qingchao Cai, and Zhongle Xie. And the project now has committers and users from both different organizations in Singapore and China. The committers of the project are all joined by submitting codes fixing bugs and providing new features. If the proposal were accepted, we would work to select PPMC members for the project and continuously operate in the Apache way.

###Community
Although we are in the early stage of building a well-organized community, the need for cohort analysis is growing, especially as part of deep customer relationship management (CRM) and medical cohort analysis. Therefore, COOL should be able to attract more contributors to join our community to improve its codebase. Besides, we also have many experienced developers who have participated in building the Apache SINGA and other open sources, and we are capable of organizing a well-developed community for COOL.

###Core Developers
Thus far, the core developers of COOL are experienced researchers and engineers primarily from the National Unversity of Singapore and Zhejiang University. Some of them had participated in Apache Singa and have adequate open-source experience.

###Alignment
Apache Incubator would be a perfect fit for the project for the following reasons: 1. COOL enriches the ecosystem of OLAP systems for underlying Apache Projects since there is no specialized cohort analytical system in the current project list. 2. The developer team of COOL is familiar with the Apache process and way. The lab has already contributed Apache SINGA, a Top-Level Project, to the foundation and a few members from Apache SINGA have joined the COOL team. 3. Joining Apache can help attract and coordinate development efforts from companies. 4. COOL can naturally connect with Apache projects like HDFS and ZooKeeper.



##Known Risks
Currently, the development team members are mostly from universities and research institutions. The team fully becomes an "Apache-style" project, the project needs to embrace more developers from the industry or the community.

###Project Name
The name (i.e., COOL) is short and easy to be remembered, and we do not find any similar names or projects which may cause conflict to the best of our knowledge. Hence, we believe the name COOL should be suitable for this project.

###Orphaned products
We believe that the COOL system will draw more attention from users in the industry and attract more developers to contribute to both the codebase and community because COOL can not only conduct cohort analysis with extremely low latency but also simplify the cohort queries without defining complex joint expressions. We have already developed a website application to facilitate possible users to use our COOL system to conduct cohort analysis. In practice, we also have deployed the COOL system in National University Health System to assist clinicians in analyzing insightful patterns among COVID19 patients from cohort results. Meanwhile, the team has cooperated with a few companies in building their user cohort analysis applications. We plan to improve the COOL system from different aspects, such as more storage connectors, more file format parsers, a feasible caching mechanism, and utilizing COOL's cohort results to facilitate building machine learning models.

###Inexperience with Open Source
Our initial committers include several experienced developers who had participated in the Apache SINGA project. In fact, some of them are the core contributors and from the PPMC of the project. Hence, we have the experience to grow the community and maintain participation.

###Length of Incubation
We have made preliminary plans on improving the COOL from different aspects and are devoted to realizing them. Besides, our committers are experienced in developing open-source projects and have participated in growing a well-organized community. Hence, we believe all these steps are realizable.

###Homogenous Developers
The current core developers mainly are researchers from the National University of Singapore and Zhejiang University. We also have a small number of developers from ByteDance and other enterprises. We do want to build a well-organized community and encourage developers to join and promote the development of our COOL system.

###Reliance on Salaried Developers
Most of the developers are working for research labs, and universities or are studying for their doctorate. They build the COOL system while conducting their research on cohort analysis and cohort-based neural network models. The COOL system will be a powerful tool to facilitate advanced cohort analytics in the commercial world and scientific research that exploit the use of cohort of analysis (eg. Reaction to drugs and treatments.)

###Relationships with Other Apache Products
COOL has naturally connected with Apache projects like HDFS and ZooKeeper. Besides, COOL is supporting Parquet files as a method to load data from other systems into COOL and export data for other downstream analysis tasks. Supports for Apache Avro and Apache Arrow are also on our schedule.

###A Excessive Fascination with the Apache Brand
Without a doubt, we appreciate the reputation of the Apache brand, which will help to attract contributors and users. We also appreciate the Apache development process. We believe that COOL, as a specialized OLAP system for cohort analysis, can promote the diversification of the Apache community.



##Documentation
The introduction of the COOL system can be found in: http://13.212.103.48:3001/


##Initial Source
The codebase of the COOL system is based on Java and relies on Maven to compile and build the COOL engine. Besides, we also prepare website applications and interesting use cases to demonstrate how to leverage COOL. More details can be found on the Introduction webpage or the Git repositories.

###Source and Intellectual Property Submission Plan
Once COOL is accepted and sponsored by Apache, we can transfer all source codes and copyrights to the Apache Software Foundation.

###External Dependencies
All dependencies of the COOL system comply with the Apache License V2.

###Cryptography
Not applicable to COOL.


##Required Resources
###Mailing lists
We plan to use the following mailing lists:
• us...@cool.incubator.apache.org
• d...@cool.incubator.apache.org
• priv...@cool.incubator.apache.org
• comm...@cool.incubator.apache.org

###Subversion Directory
We prefer to continue using Git to control our COOL system development.

###Git Repositories
• COOL system: https://github.com/COOL-cohort/COOL
• COOL website: https://github.com/COOL-cohort/COOL-site
• COOL webapp: https://github.com/COOL-cohort/COOL-webapp

###Issue Tracking
We would like to use JIRA to track issues.


##Initial Committers
• Beng Chin Ooi (oo...@comp.nus.edu.sg)
• Zhongle Xie (xi...@zju.edu.cn)
• Meihui Zhang (meihui_zh...@bit.edu.cn)
• Qingpeng Cai (qingp...@comp.nus.edu.sg)
• Naili Xing (dcsx...@nus.edu.sg)
• Guoyu Hu (guoyu...@u.nus.edu)
• Hongbin Ying (yinghong...@mzhtechnologies.com)
• Changshuo Liu (changs...@u.nus.edu)
• Fei Xiao (fxiao...@comp.nus.edu.sg)
• Yuncheng Wu (dcsw...@nus.edu.sg)
• Gang Chen (c...@zju.edu.cn)
• Pengyuan Shen (she...@mzhtechnologies.com)
• Chenghao Cai (chenghao....@nusri.cn)
• Ishant virendra Wankhede (ishant.virendra.wankh...@walmart.com)


##Affiliations
• Beng Chin Ooi, National University of Singapore
• Zhongle Xie, Zhejiang University
• Meihui Zhang, Beijing Institute of Technology
• Qingpeng Cai, National University of Singapore
• Naili Xing, National University of Singapore
• Hongbin Ying, MZH Technologies
• Guoyu Hu, National University of Singapore
• Changshuo Liu, National University of Singapore
• Fei Xiao, National University of Singapore
• Yuncheng Wu, National University of Singapore
• Gang Chen, Zhejiang University
• Pengyuan Shen, MZH Technologies
• Chenghao Cai, NUS AI Innovation and Commercialisation Centre
• Ishant virendra Wankhede, Walmart


##Sponsors
###Champion
TODO
###Nominated Mentors
TODO
###Sponsoring Entity
The Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to