[ 
https://issues.apache.org/jira/browse/KYLIN-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang closed KYLIN-5948.
-------------------------

> Support internal table
> ----------------------
>
>                 Key: KYLIN-5948
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5948
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Job Engine, Query Engine
>    Affects Versions: 5.0.0
>            Reporter: Shuai Li
>            Assignee: Cao, Lionel
>            Priority: Major
>             Fix For: 5.0.0
>
>         Attachments: image-2024-08-07-15-31-59-473.png
>
>
> *01 Background*
> To enhance the performance of detail data and ad-hoc queries, the internal 
> table feature is designed.
> The internal table manage user's data directly in inner storage, where Kylin 
> actively controls the data storage format and data organization to 
> specifically improve query performance.
> *02 Dev Design*
> What needs to be done is as follows:
> *1. Define the internal table metadata*
> {code:java}
> protected String project;
> protected final DatabaseDesc database;
> protected String identity;
> protected String name;
> private Map<String, String> tblProperties;
> private StorageType storageType;
> private String location;
> public enum StorageType {
>     parquet, //only for dev/UT
>     gluten, //clickhouse mergetree (default)
>     deltalake, //future
>     iceberg  //future
> } {code}
> *2. implement internal table catalog*
> We implement a kylin internal table catalog which extends TableCatalog of 
> spark and we only need to implement the  loadTable function. The 
> KyinternalCatalog get table metadata from kylin metadb and expose them as a 
> ClickhouseTableV2(from apache gluten) to spark. 
>  
> *3. implement create table, update table, delete table, truncate table 
> functions etc.*
> Table management operations are implemented in web UI/open api(TODO), don't 
> support DDL statement yet.
>  
> *4. implement load data into internal table function*
> Support full load and incremental load.
> *5. Support partition, bucket feature*
> *6. Support config table properties such as primaryKey, orderByKey*
> *7. Support gluten-mergetree as default storage type*
> *query process with internal table:* 
> !image-2024-08-07-15-31-59-473.png|width=321,height=268!
>  
>  
> *03 Roadmap TODOs*
> 1. Support cache pre-loading
> 2. Support more partition type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to