I agree, one storage for next-g kylin is good enough. But would like to keep the interface as of today's best practices, so that people could easily extend to other storage options.
Best Regards! --------------------- Luke Han On Sat, Feb 1, 2020 at 9:13 PM ShaoFeng Shi <shaofeng...@apache.org> wrote: > In my opinion, it is very hard to maintain HBase storage and parquet > storage together. So parquet storage is stable enough, the Kylin 4.0 can no > longer depend on HBase. > > Best regards, > > Shaofeng Shi 史少锋 > Apache Kylin PMC > Email: shaofeng...@apache.org > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > Join Kylin user mail group: user-subscr...@kylin.apache.org > Join Kylin dev mail group: dev-subscr...@kylin.apache.org > > > > > nichunen <n...@apache.org> 于2020年1月30日周四 下午11:04写道: > > > Hi Shaofeng, > > > > > > For your questions: > > > > > > 1) When the Parquet storage is released (say in Kylin 4.0), will the > HBase > > storage still be kept (co-exist), or totally be replaced? > > I think we will keep an active branch with releases for Hbase storage, it > > won’t be totally replaced in the near feature. > > > > 2) Is there a migration tool for migrating HBase cubes to the new > storage? > > > > The tool is in the developing plan. What’s more, the metadata will be > > compatible. > > > > > > > > Best regards, > > > > > > > > Ni Chunen / George > > > > > > On 2020/1/21, 4:10 AM, "ShaoFeng Shi" <shaofeng...@apache.org> wrote: > > > > Chun en, > > > > Thanks for the info. I think we need to discuss more in the community, > for > > example: > > > > 1) When the Parquet storage is released (say in Kylin 4.0), will the > HBase > > storage still be kept (co-exist), or totally be replaced? > > 2) Is there a migration tool for migrating HBase cubes to the new > storage? > > > > Best regards, > > > > Shaofeng Shi 史少锋 > > Apache Kylin PMC > > Email: shaofeng...@apache.org > > > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > > Join Kylin user mail group: user-subscr...@kylin.apache.org > > Join Kylin dev mail group: dev-subscr...@kylin.apache.org > > > > > > > > > > nichunen <n...@apache.org> 于2020年1月20日周一 下午9:38写道: > > > > Hi Shaofeng, > > > > > > Below is our plan for this project, any suggestion will be very welcome. > > > > > > 1. In mid-February of 2020, open source the prototype code of this > feature > > to branch "kylin-on-parquet-v2", cube can be bulit with new building > > engine, and stored with parquet format. > > > > > > 2. In late April of 2020, the query module for the new storage type is > > scheduled to be ready, a happy path for cube creation, building and query > > will be available then. > > > > > > 3. In May or June of 2020, a Beta version (Kylin 4.0?) will be released. > > > > > > > > Best regards, > > > > > > > > Ni Chunen / George > > > > > > > > On 01/20/2020 16:00,ShaoFeng Shi<shaofeng...@apache.org> wrote: > > Hi, Chun en, > > > > Thanks for the information. What's the detailed release plan of this > > feature to the community? > > > > Best regards, > > > > Shaofeng Shi 史少锋 > > Apache Kylin PMC > > Email: shaofeng...@apache.org > > > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > > Join Kylin user mail group: user-subscr...@kylin.apache.org > > Join Kylin dev mail group: dev-subscr...@kylin.apache.org > > > > > > > > > > Xiaoxiang Yu <x...@apache.org> 于2020年1月20日周一 下午1:59写道: > > > > Great news! > > I can foresee Kylin could be in a more Cloud-Native way after the mature > > of parquet storage. And I wish the developer team will share more detail > > for its desgin. > > > > > > > > > > -- > > > > Best wishes to you ! > > From :Xiaoxiang Yu > > > > > > > > At 2020-01-19 22:22:30, "George Ni" <n...@apache.org> wrote: > > Hi Kylin users & developers, > > > > By-layer Spark Cubing has been introduced into Apache Kylin since v2.0 to > > achieve better performance and it does run much faster compared to MR > > engine. Also Hbase has been Kylin’s trustful storage engine since Kylin > > was > > born and it has been proved to be a success for providing the ability to > > handle high concurrency queries in extremely large data scale with low > > latency. But there are also limitations for HBase, such as filtering is > > not > > flexible as we could only filter by RowKey, measures are usually combined > > together which causes more data to be scanned than requested. > > > > > > > > So in order to optimize Kylin in both building strategy and storage > > engine, > > development team of Kyligence is introducing a new cube building engine > > which uses Spark Sql to construct cuboids with a new strategy and stores > > cube results in Parquet files. The building strategy allows Kylin to > build > > cuboids in a smarter way by choosing and building on the optimal cuboid > > source. And Parquet, a columnar storage format available to any project > in > > the Hadoop ecosystem, will power the filtering ability with the > page-level > > column index and reduce I/O by saving measures in different columns. Also > > with Storing cuboid in Parquet instead of Hbase, we can utilize Kylin in > > Cloud Native way. More information on design and technique details will > > come soon. > > > > > > > > Below is the comparison in building duration and size of results between > > By-layer Spark Cubing and the new cubing strategy. > > > > > > > > Environment > > > > 4-nodes Hadoop cluster > > > > YRAN has 400GB RAM and 128 cores in total; > > > > CDH 5.1, Apache Kylin 3.0. > > > > > > > > Spark > > > > Spark 2.4.1-kylin-r17 > > > > > > > > Test Data > > > > SSB data > > > > Cube: 15 dimensions, 3 measures (SUM) > > > > > > > > Test Scenarios > > > > Build the cube at different source size level: 30 million, 60 million > > source rows; Compare the build time with Spark (by layer) + Hbase and > > SparkSql + Parquet. > > > > > > Besides, we attempt to resolve many drawbacks in current query engine, > > which relies heavily on Apache Calcite, such as the performance > bottleneck > > in aggregating large query results which currently can only be operated > by > > a single worker. By embracing SparkSql, this kind of expensive computing > > can be done distributedly. Also combined with Parquet format, plenty of > > filtering optimizations could be applied,which will boost Kylin’s query > > performance significantly. The features will be open source along with > > technique details in the near future. > > > > > > > > - https://issues.apache.org/jira/browse/KYLIN-4188 > > > > > > -- > > > > --------------------- > > > > Best regards, > > > > > > > > Ni Chunen / George > > > > > > > > >