[
https://issues.apache.org/jira/browse/KYLIN-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871600#comment-17871600
]
ASF subversion and git services commented on KYLIN-5948:
--------------------------------------------------------
Commit a0e06936a5171735d163b29c6bfb0eff0a7f90f6 in kylin's branch
refs/heads/kylin5 from lionelcao
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=a0e06936a5 ]
KYLIN-5948 Support internal table & explain query plan
1. Internal table loading
2. support gluten query
3. Add internal table APIs
4. Snapshot/InternalTable matching does not rely on model
5. Support derived dimension query for Snapshot/InternalTable
6. Support query internal table directly
7. Support partition for internal table
8. Support explain query plan
---------
Co-authored-by: Zhiting Guo <[email protected]>
Co-authored-by: Pengfei Zhan <[email protected]>
Co-authored-by: huangsheng <[email protected]>
Co-authored-by: Zhong.Zhu <[email protected]>
Co-authored-by: Zhimin Wu <[email protected]>
> Support internal table
> ----------------------
>
> Key: KYLIN-5948
> URL: https://issues.apache.org/jira/browse/KYLIN-5948
> Project: Kylin
> Issue Type: New Feature
> Components: Job Engine
> Reporter: Shuai Li
> Assignee: Cao, Lionel
> Priority: Major
> Attachments: image-2024-08-07-15-31-59-473.png
>
>
> *01 Background*
> To enhance the performance of detail data and ad-hoc queries, the internal
> table feature is designed.
> The internal table manage user's data directly in inner storage, where Kylin
> actively controls the data storage format and data organization to
> specifically improve query performance.
> *02 Dev Design*
> What needs to be done is as follows:
> *1. Define the internal table metadata*
> {code:java}
> protected String project;
> protected final DatabaseDesc database;
> protected String identity;
> protected String name;
> private Map<String, String> tblProperties;
> private StorageType storageType;
> private String location;
> public enum StorageType {
> parquet, //only for dev/UT
> gluten, //clickhouse mergetree (default)
> deltalake, //future
> iceberg //future
> } {code}
> *2. implement internal table catalog*
> We implement a kylin internal table catalog which extends TableCatalog of
> spark and we only need to implement the loadTable function. The
> KyinternalCatalog get table metadata from kylin metadb and expose them as a
> ClickhouseTableV2(from apache gluten) to spark.
>
> *3. implement create table, update table, delete table, truncate table
> functions etc.*
> Table management operations are implemented in web UI/open api(TODO), don't
> support DDL statement yet.
>
> *4. implement load data into internal table function*
> Support full load and incremental load.
> *5. Support partition, bucket feature*
> *6. Support config table properties such as primaryKey, orderByKey*
> *7. Support gluten-mergetree as default storage type*
> *query process with internal table:*
> !image-2024-08-07-15-31-59-473.png|width=321,height=268!
>
>
> *03 Roadmap TODOs*
> 1. Support cache pre-loading
> 2. Support more partition type
--
This message was sent by Atlassian Jira
(v8.20.10#820010)