Re: [DISCUSS] The future of Apache Kylin

ShaoFeng Shi Tue, 11 Jan 2022 01:30:55 -0800

+1

Kylin is a multi-dimensional OLAP (MOLAP) engine from day one; But as SQL
is the main query language, which makes it is a little confusing for users
to differentiate it from other technologies. Introducing the new semantic
layer will make Kylin a more complete solution.


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: [email protected]

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]




Yaqian Zhang <[email protected]> 于2022年1月11日周二 16:07写道：

> Cool!
> Looking forward to the new features of the next generation Apache Kylin.
>
> 在 2022年1月11日，下午2:30，Xiaoxiang Yu <[email protected]> 写道：
>
> Thanks Yang, there are two new features that I really looking forward to,
> and they are:
>
> 1. New *SEMANTIC LAYER* will make Kylin be accessible by excel (MDX) and
> more BI tools.
> 2. New *flexible** ModeL *will let Kylin user modify Model/Cube (such as
> add/delete dimensions/measures) which status is Ready without purge the any
> useful cuboid/segmemnt .
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2022-01-11 13:59:13, "Li Yang" <[email protected]> wrote:
> >Hi All
> >
> >Apache Kylin has been stable for quite a while and it may be a good time to
> >think about the future of it. Below are thoughts from my team and myself.
> >Love to hear yours as well. Ideas and comments are very welcome.  :-)
> >
> >*APACHE KYLIN TODAY*
> >
> >Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
> >a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
> >Parquet to replace HBase as storage engine, so as to improve file scanning
> >performance. At the same time, Kylin 4.0 reimplements the spark based build
> >engine and query engine, making it possible to separate computing and
> >storage, and better adapt to the technology trend of cloud native. Kylin
> >4.0 comprehensively updated the build and query engine, realized the
> >deployment mode without Hadoop dependency, decreasing the complexity of
> >deployment. However, Kylin also has a lot to improve, such as the ability
> >of business semantic layer needs to be strengthened and the modification of
> >model/cube is not flexible. With these, we thinking a few things to do:
> >
> >   - Multi-dimensional query ability friendly to non-technical personnel.
> >   Multi-dimensional model is the key to distinguish Kylin from the general
> >   OLAP engines. The feature is that the model concept based on dimension and
> >   measurement is more friendly to non-technical personnel and closer to the
> >   goal of citizen analyst. The multi-dimensional query capability that
> >   non-technical personnel can use should be the new focus of Kylin
> >   technology.
> >
> >
> >   - Native Engine. The query engine of Kylin still has much room for
> >   improvement in vector acceleration and cpu instruction level optimization.
> >   The Spark community Kylin relies on also has a strong demand for native
> >   engine. It is optimistic that native engine can improve the performance of
> >   Kylin by at least three times, which is worthy of investment.
> >
> >
> >   - More cloud native capabilities. Kylin 4.0 has only completed the
> >   initial cloud deployment and realized the features of rapid deployment and
> >   dynamic resource scaling on the cloud, but there are still many cloud
> >   native capabilities to be developed.
> >
> >More explanations are following.
> >
> >*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
> >
> >The core of Kylin is a multi-dimensional database, which is a special OLAP
> >engine. Although Kylin has always had the ability of a relational database
> >since its birth, and it is often compared with other relational OLAP
> >engines, what really makes Kylin different is multi-dimensional model and
> >multi-dimensional database ability. Considering the essence of Kylin and
> >its wide range of business uses in the future (not only technical uses),
> >positioning Kylin as a multi-dimensional database makes perfect sense. With
> >business semantics and precomputation technology, Apache Kylin helps
> >non-technical people understand and afford big data, and realizes data
> >democratization.
> >
> >*THE SEMANTIC LAYER*
> >
> >The key difference between the multi-dimensional database and the
> >relational database is business expression ability. Although SQL has strong
> >expression ability and is the basic skill of data analysts, SQL and the RDB
> >are still too difficult for non-technical personnel if we aim at "everyone
> >is a data analyst". From the perspective of non-technical personnel, the
> >data lake and data warehouse are like a dark room. They know that there is
> >a lot of data, but they can't see clearly, understand and use this data
> >because they don't understand database theory and SQL.
> >
> >How to make the Data Lake (and data warehouse) clear to non-technical
> >personnel? This requires introducing a more friendly data model for
> >non-technical personnel — multi-dimensional data model. While the
> >relational model describes the technical form of data, the
> >multi-dimensional model describes the business form of data. In a MDB,
> >measurement corresponds to business indicators that everyone understands,
> >and dimension is the perspective of comparing and observing these business
> >indicators. Compare KPI with last month and compare performance between
> >parallel business units, which are concepts understood by every
> >non-technical personnel. By mapping the relational model to the
> >multi-dimensional model, the essence is to enhance the business semantics
> >on the technical data, form a business semantic layer, and help
> >non-technical personnel understand, explore and use the data. In order to
> >enhance Kylin's ability as the semantic layer, supporting multi-dimensional
> >query language is the key content of Kylin roadmap, such as MDX and DAX.
> >MDX can transform the data model in Kylin into a business friendly
> >language, endow data with business value, and facilitate Kylin's
> >multi-dimensional analysis with BI tools such as Excel and Tableau.
> >
> >*PRECOMPUTATION AND MODEL FLEXIBILITY*
> >
> >It is kylin's unchanging mission to continue to reduce the cost of a single
> >query through precomputation technology so that ordinary people can afford
> >big data. If the multi-dimensional model solves the problem that
> >non-technical personnel can understand data, then precomputation can solve
> >the problem that ordinary people can afford data. Both are necessary
> >conditions for data democratization. Through one calculation and multiple
> >use, the data cost can be shared by multiple users to achieve the scale
> >effect that the more users, the cheaper. Precalculation is Kylin's
> >traditional strength, but it lacks some flexibility in the change of
> >precalculation model. In order to strengthen the ability to change models
> >flexibly of Kylin and bring more optimization room, Kylin community expects
> >to propose a new metadata format in Kylin in the future to make
> >precalculation more flexible, be able to cope with that table format or
> >business requirements may change at any time.
> >
> >*SUMMARY*
> >
> >To sum up, we would like to propose Kylin as a multi-dimensional database.
> >Through multi-dimensional model and precomputation technology, ordinary
> >people can understand and afford big data, and finally realize the vision
> >of data democratization. Meanwhile, for today's users who use Kylin as the
> >SQL acceleration layer, Kylin will continue to enhance its SQL engine, to
> >ensure that the precomputation technology can be used by both relational
> >model and multi-dimensional model. In the figure below, we picture the
> >future of Kylin. The newly added and modified parts are roughly marked in
> >blue and orange.
> >
> >*FURTHER READING*
> >
> >   - https://en.wikipedia.org/wiki/Data_model
> >   - https://en.wikipedia.org/wiki/Semantic_layer
> >   - https://en.wikipedia.org/wiki/Multidimensional_analysis
> >   - https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
> >   - https://en.wikipedia.org/wiki/XML_for_Analysis
> >   - https://en.wikipedia.org/wiki/SIMD
> >   - https://en.wikipedia.org/wiki/Cloud_native_computing
> >   -
> >   
> > https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/
> >
> >
> >Please share your ideas and comments.  :-)
> >
> >Cheers
> >Yang
>
>
>

Re: [DISCUSS] The future of Apache Kylin

Reply via email to