+1 Kylin is a multi-dimensional OLAP (MOLAP) engine from day one; But as SQL is the main query language, which makes it is a little confusing for users to differentiate it from other technologies. Introducing the new semantic layer will make Kylin a more complete solution.
Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC, Apache Incubator PMC, Email: [email protected] Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: [email protected] Join Kylin dev mail group: [email protected] Yaqian Zhang <[email protected]> 于2022年1月11日周二 16:07写道: > Cool! > Looking forward to the new features of the next generation Apache Kylin. > > 在 2022年1月11日,下午2:30,Xiaoxiang Yu <[email protected]> 写道: > > Thanks Yang, there are two new features that I really looking forward to, > and they are: > > 1. New *SEMANTIC LAYER* will make Kylin be accessible by excel (MDX) and > more BI tools. > 2. New *flexible** ModeL *will let Kylin user modify Model/Cube (such as > add/delete dimensions/measures) which status is Ready without purge the any > useful cuboid/segmemnt . > > -- > *Best wishes to you ! * > *From :**Xiaoxiang Yu* > > > At 2022-01-11 13:59:13, "Li Yang" <[email protected]> wrote: > >Hi All > > > >Apache Kylin has been stable for quite a while and it may be a good time to > >think about the future of it. Below are thoughts from my team and myself. > >Love to hear yours as well. Ideas and comments are very welcome. :-) > > > >*APACHE KYLIN TODAY* > > > >Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is > >a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses > >Parquet to replace HBase as storage engine, so as to improve file scanning > >performance. At the same time, Kylin 4.0 reimplements the spark based build > >engine and query engine, making it possible to separate computing and > >storage, and better adapt to the technology trend of cloud native. Kylin > >4.0 comprehensively updated the build and query engine, realized the > >deployment mode without Hadoop dependency, decreasing the complexity of > >deployment. However, Kylin also has a lot to improve, such as the ability > >of business semantic layer needs to be strengthened and the modification of > >model/cube is not flexible. With these, we thinking a few things to do: > > > > - Multi-dimensional query ability friendly to non-technical personnel. > > Multi-dimensional model is the key to distinguish Kylin from the general > > OLAP engines. The feature is that the model concept based on dimension and > > measurement is more friendly to non-technical personnel and closer to the > > goal of citizen analyst. The multi-dimensional query capability that > > non-technical personnel can use should be the new focus of Kylin > > technology. > > > > > > - Native Engine. The query engine of Kylin still has much room for > > improvement in vector acceleration and cpu instruction level optimization. > > The Spark community Kylin relies on also has a strong demand for native > > engine. It is optimistic that native engine can improve the performance of > > Kylin by at least three times, which is worthy of investment. > > > > > > - More cloud native capabilities. Kylin 4.0 has only completed the > > initial cloud deployment and realized the features of rapid deployment and > > dynamic resource scaling on the cloud, but there are still many cloud > > native capabilities to be developed. > > > >More explanations are following. > > > >*KYLIN AS A MULTI-DIMENSIONAL DATABASE* > > > >The core of Kylin is a multi-dimensional database, which is a special OLAP > >engine. Although Kylin has always had the ability of a relational database > >since its birth, and it is often compared with other relational OLAP > >engines, what really makes Kylin different is multi-dimensional model and > >multi-dimensional database ability. Considering the essence of Kylin and > >its wide range of business uses in the future (not only technical uses), > >positioning Kylin as a multi-dimensional database makes perfect sense. With > >business semantics and precomputation technology, Apache Kylin helps > >non-technical people understand and afford big data, and realizes data > >democratization. > > > >*THE SEMANTIC LAYER* > > > >The key difference between the multi-dimensional database and the > >relational database is business expression ability. Although SQL has strong > >expression ability and is the basic skill of data analysts, SQL and the RDB > >are still too difficult for non-technical personnel if we aim at "everyone > >is a data analyst". From the perspective of non-technical personnel, the > >data lake and data warehouse are like a dark room. They know that there is > >a lot of data, but they can't see clearly, understand and use this data > >because they don't understand database theory and SQL. > > > >How to make the Data Lake (and data warehouse) clear to non-technical > >personnel? This requires introducing a more friendly data model for > >non-technical personnel — multi-dimensional data model. While the > >relational model describes the technical form of data, the > >multi-dimensional model describes the business form of data. In a MDB, > >measurement corresponds to business indicators that everyone understands, > >and dimension is the perspective of comparing and observing these business > >indicators. Compare KPI with last month and compare performance between > >parallel business units, which are concepts understood by every > >non-technical personnel. By mapping the relational model to the > >multi-dimensional model, the essence is to enhance the business semantics > >on the technical data, form a business semantic layer, and help > >non-technical personnel understand, explore and use the data. In order to > >enhance Kylin's ability as the semantic layer, supporting multi-dimensional > >query language is the key content of Kylin roadmap, such as MDX and DAX. > >MDX can transform the data model in Kylin into a business friendly > >language, endow data with business value, and facilitate Kylin's > >multi-dimensional analysis with BI tools such as Excel and Tableau. > > > >*PRECOMPUTATION AND MODEL FLEXIBILITY* > > > >It is kylin's unchanging mission to continue to reduce the cost of a single > >query through precomputation technology so that ordinary people can afford > >big data. If the multi-dimensional model solves the problem that > >non-technical personnel can understand data, then precomputation can solve > >the problem that ordinary people can afford data. Both are necessary > >conditions for data democratization. Through one calculation and multiple > >use, the data cost can be shared by multiple users to achieve the scale > >effect that the more users, the cheaper. Precalculation is Kylin's > >traditional strength, but it lacks some flexibility in the change of > >precalculation model. In order to strengthen the ability to change models > >flexibly of Kylin and bring more optimization room, Kylin community expects > >to propose a new metadata format in Kylin in the future to make > >precalculation more flexible, be able to cope with that table format or > >business requirements may change at any time. > > > >*SUMMARY* > > > >To sum up, we would like to propose Kylin as a multi-dimensional database. > >Through multi-dimensional model and precomputation technology, ordinary > >people can understand and afford big data, and finally realize the vision > >of data democratization. Meanwhile, for today's users who use Kylin as the > >SQL acceleration layer, Kylin will continue to enhance its SQL engine, to > >ensure that the precomputation technology can be used by both relational > >model and multi-dimensional model. In the figure below, we picture the > >future of Kylin. The newly added and modified parts are roughly marked in > >blue and orange. > > > >*FURTHER READING* > > > > - https://en.wikipedia.org/wiki/Data_model > > - https://en.wikipedia.org/wiki/Semantic_layer > > - https://en.wikipedia.org/wiki/Multidimensional_analysis > > - https://en.wikipedia.org/wiki/MultiDimensional_eXpressions > > - https://en.wikipedia.org/wiki/XML_for_Analysis > > - https://en.wikipedia.org/wiki/SIMD > > - https://en.wikipedia.org/wiki/Cloud_native_computing > > - > > > > https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/ > > > > > >Please share your ideas and comments. :-) > > > >Cheers > >Yang > > >
