[
https://issues.apache.org/jira/browse/KYLIN-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyang closed KYLIN-5948.
-------------------------
> Support internal table
> ----------------------
>
> Key: KYLIN-5948
> URL: https://issues.apache.org/jira/browse/KYLIN-5948
> Project: Kylin
> Issue Type: New Feature
> Components: Job Engine, Query Engine
> Affects Versions: 5.0.0
> Reporter: Shuai Li
> Assignee: Cao, Lionel
> Priority: Major
> Fix For: 5.0.0
>
> Attachments: image-2024-08-07-15-31-59-473.png
>
>
> *01 Background*
> To enhance the performance of detail data and ad-hoc queries, the internal
> table feature is designed.
> The internal table manage user's data directly in inner storage, where Kylin
> actively controls the data storage format and data organization to
> specifically improve query performance.
> *02 Dev Design*
> What needs to be done is as follows:
> *1. Define the internal table metadata*
> {code:java}
> protected String project;
> protected final DatabaseDesc database;
> protected String identity;
> protected String name;
> private Map<String, String> tblProperties;
> private StorageType storageType;
> private String location;
> public enum StorageType {
> parquet, //only for dev/UT
> gluten, //clickhouse mergetree (default)
> deltalake, //future
> iceberg //future
> } {code}
> *2. implement internal table catalog*
> We implement a kylin internal table catalog which extends TableCatalog of
> spark and we only need to implement the loadTable function. The
> KyinternalCatalog get table metadata from kylin metadb and expose them as a
> ClickhouseTableV2(from apache gluten) to spark.
>
> *3. implement create table, update table, delete table, truncate table
> functions etc.*
> Table management operations are implemented in web UI/open api(TODO), don't
> support DDL statement yet.
>
> *4. implement load data into internal table function*
> Support full load and incremental load.
> *5. Support partition, bucket feature*
> *6. Support config table properties such as primaryKey, orderByKey*
> *7. Support gluten-mergetree as default storage type*
> *query process with internal table:*
> !image-2024-08-07-15-31-59-473.png|width=321,height=268!
>
>
> *03 Roadmap TODOs*
> 1. Support cache pre-loading
> 2. Support more partition type
--
This message was sent by Atlassian Jira
(v8.20.10#820010)