[Discuss] Reposition Kylin as "Analytical Data warehouse for big data"

ShaoFeng Shi Sun, 12 Jan 2020 04:33:24 -0800

Hello, Kylin developers and users, HAPPY NEW YEAR 2020!

In last month, we released Kylin 3.0, with the new Real-time streaming
feature and a Lambda architecture. This allows our users to host only one
system for both batch and real-time analytics, and then can query batch and
streaming data together.


If you look at Kylin's home page, its slogan is still the "OLAP Engine for
Big data", which was made 5 years ago when it was born. While today,
Kylin's capability has been verified beyond an "OLAP engine". I visited
many Kylin users in China, US, Euro in last year, and have got many
different scenarios:

1. eBay initiated the Kylin project to offload analytical workloads from
Teradata to Hadoop; Kylin serves the online queries with high performance
and high availability. Till today, Kylin serves millions of queries every
day, most are in < 1 seconds;
2. China Unionpay and CPIC use Kylin to replace IBM Cognos cubes. One Kylin
cube replaced more than 100 Cognos cubes, with better building performance
and query performance.
3. China Construction Bank uses Hadoop + Kylin to offload the Greenplum.
Some systems have been migrated to Kylin successfully.
4. Yum (KFC) and several other users are using Kylin to replace Microsoft
SSAS.
5. Meituan, Ctrip, JD, Didi, Xiao Mi, Huawei, OLX group, autohome.com.cn,
Xactly, and many others are using Kylin as the platform of their DaaS (Data
as a Service), providing data service to their thousands of internal
analysts and tens of thousands of external tenants.

Now let's look at the definition of Data warehouse [1]:

"*A data warehouse is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of management's decision-making
process.*"

In Kylin, each model/cube is created for a certain subject; Kylin
integrates well with Hive, Hadoop, Spark, Kafka, and other systems; Kylin
incremental loads the data by time, build the cube and then save as
segments (partitions), and they are non-volatile unless you refresh them;
During the analysis (roll-up, drill-down, etc), the data is always
consistent. Kylin provides SQL interface and JDBC/ODBC/HTTP API for you to
easily connect from BI/visualization tools like Tableau and others.

All in all, you can see that users are using Kylin not just as a SQL
engine, but also as an Analytical Data Warehouse, for very large scale data
(PB scale). In the world of big data, Kylin is unique. Its design is
elegant, its architecture is scalable and pluggable.  In order to give
Kylin more visibility and can be discovered by more people, I propose to
change Kylin's position/slogan from the "OLAP engine for big data" to
"Analytical Data warehouse for big data".

Please feel free to share your comments.

[1] https://www.1keydata.com/datawarehousing/data-warehouse-definition.html

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]

[Discuss] Reposition Kylin as "Analytical Data warehouse for big data"

Reply via email to