Hi, Chun en, Thanks for the information. What's the detailed release plan of this feature to the community?
Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC Email: shaofeng...@apache.org Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: user-subscr...@kylin.apache.org Join Kylin dev mail group: dev-subscr...@kylin.apache.org Xiaoxiang Yu <x...@apache.org> 于2020年1月20日周一 下午1:59写道: > Great news! > I can foresee Kylin could be in a more Cloud-Native way after the mature > of parquet storage. And I wish the developer team will share more detail > for its desgin. > > > > > -- > > Best wishes to you ! > From :Xiaoxiang Yu > > > > At 2020-01-19 22:22:30, "George Ni" <n...@apache.org> wrote: > >Hi Kylin users & developers, > > > >By-layer Spark Cubing has been introduced into Apache Kylin since v2.0 to > >achieve better performance and it does run much faster compared to MR > >engine. Also Hbase has been Kylin’s trustful storage engine since Kylin > was > >born and it has been proved to be a success for providing the ability to > >handle high concurrency queries in extremely large data scale with low > >latency. But there are also limitations for HBase, such as filtering is > not > >flexible as we could only filter by RowKey, measures are usually combined > >together which causes more data to be scanned than requested. > > > > > > > >So in order to optimize Kylin in both building strategy and storage > engine, > >development team of Kyligence is introducing a new cube building engine > >which uses Spark Sql to construct cuboids with a new strategy and stores > >cube results in Parquet files. The building strategy allows Kylin to build > >cuboids in a smarter way by choosing and building on the optimal cuboid > >source. And Parquet, a columnar storage format available to any project in > >the Hadoop ecosystem, will power the filtering ability with the page-level > >column index and reduce I/O by saving measures in different columns. Also > >with Storing cuboid in Parquet instead of Hbase, we can utilize Kylin in > >Cloud Native way. More information on design and technique details will > >come soon. > > > > > > > >Below is the comparison in building duration and size of results between > >By-layer Spark Cubing and the new cubing strategy. > > > > > > > >Environment > > > >4-nodes Hadoop cluster > > > >YRAN has 400GB RAM and 128 cores in total; > > > >CDH 5.1, Apache Kylin 3.0. > > > > > > > >Spark > > > >Spark 2.4.1-kylin-r17 > > > > > > > >Test Data > > > >SSB data > > > >Cube: 15 dimensions, 3 measures (SUM) > > > > > > > >Test Scenarios > > > >Build the cube at different source size level: 30 million, 60 million > >source rows; Compare the build time with Spark (by layer) + Hbase and > >SparkSql + Parquet. > > > > > >Besides, we attempt to resolve many drawbacks in current query engine, > >which relies heavily on Apache Calcite, such as the performance bottleneck > >in aggregating large query results which currently can only be operated by > >a single worker. By embracing SparkSql, this kind of expensive computing > >can be done distributedly. Also combined with Parquet format, plenty of > >filtering optimizations could be applied,which will boost Kylin’s query > >performance significantly. The features will be open source along with > >technique details in the near future. > > > > > > > > - https://issues.apache.org/jira/browse/KYLIN-4188 > > > > > >-- > > > >--------------------- > > > >Best regards, > > > > > > > >Ni Chunen / George >