[jira] [Updated] (KYLIN-5948) Support internal table

Cao, Lionel (Jira) Wed, 07 Aug 2024 00:59:26 -0700


     [ 
https://issues.apache.org/jira/browse/KYLIN-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cao, Lionel updated KYLIN-5948:
-------------------------------
    Description: 
*01 Background*
To enhance the performance of detail data and ad-hoc queries, the internal 
table feature is designed.
The internal table manage user's data directly in inner storage, where Kylin 
actively controls the data storage format and data organization to specifically 
improve query performance.

*02 Dev Design*
What needs to be done is as follows:
*1. Define the internal table metadata*
{code:java}
protected String project;
protected final DatabaseDesc database;
protected String identity;
protected String name;

private Map<String, String> tblProperties;
private StorageType storageType;
private String location;

public enum StorageType {
    parquet, //only for dev/UT
    gluten, //clickhouse mergetree (default)
    deltalake, //future
    iceberg  //future
} {code}
*2. implement internal table catalog*

We implement a kylin internal table catalog which extends TableCatalog of spark 
and we only need to implement the  loadTable function. The KyinternalCatalog 
get table metadata from kylin metadb and expose them as a 
ClickhouseTableV2(from apache gluten) to spark. 

 

*3. implement create table, update table, delete table, truncate table 
functions etc.*

Table management operations are implemented in web UI/open api(TODO), don't 
support DDL statement yet.

 
*4. implement load data into internal table function*

Support full load and incremental load.

*5. Support partition, bucket feature*
*6. Support config table properties such as primaryKey, orderByKey*
*7. Support gluten-mergetree as default storage type*

*query process with internal table:* 

!image-2024-08-07-15-31-59-473.png|width=321,height=268!

 

 

*03 Roadmap TODOs*
1. Support cache pre-loading
2. Support more partition type

  was:
*01 Background*
To enhance the performance of detail data and ad-hoc queries, the internal 
table feature is designed.
The internal table manage user's data directly in inner storage, where Kylin 
actively controls the data storage format and data organization to specifically 
improve query performance.

*02 Dev Design*
What needs to be done is as follows:
*1. Define the internal table metadata*
{code:java}
protected String project;
protected final DatabaseDesc database;
protected String identity;
protected String name;

private Map<String, String> tblProperties;
private StorageType storageType;
private String location;

public enum StorageType {
    parquet, //only for dev/UT
    gluten, //clickhouse mergetree (default)
    deltalake, //future
    iceberg  //future
} {code}

*2. implement internal table catalog*

We implement a kylin internal table catalog whichi extends TableCatalog of 
spark and we only need to implement the  loadTable function. The 
KyinternalCatalog get table metadata from kylin metadb and expose them as a 
ClickhouseTableV2(from apache gluten) to spark. 

 

*3. implement create table, update table, delete table, truncate table 
functions etc.*

Table management operations are implemented in web UI/open api(TODO), don't 
support DDL statement yet.

 
*4. implement load data into internal table function*

Support full load and incremental load.


*5. Support partition, bucket feature*
*6. Support config table properties such as primaryKey, orderByKey*
*7. Support gluten-mergetree as default storage type*

*query process with internal table:* 

!image-2024-08-07-15-31-59-473.png|width=321,height=268!

 

 

*03 Roadmap TODOs*
1. Support cache pre-loading
2. Support more partition type


> Support internal table
> ----------------------
>
>                 Key: KYLIN-5948
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5948
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Job Engine
>            Reporter: Shuai Li
>            Assignee: Cao, Lionel
>            Priority: Major
>         Attachments: image-2024-08-07-15-31-59-473.png
>
>
> *01 Background*
> To enhance the performance of detail data and ad-hoc queries, the internal 
> table feature is designed.
> The internal table manage user's data directly in inner storage, where Kylin 
> actively controls the data storage format and data organization to 
> specifically improve query performance.
> *02 Dev Design*
> What needs to be done is as follows:
> *1. Define the internal table metadata*
> {code:java}
> protected String project;
> protected final DatabaseDesc database;
> protected String identity;
> protected String name;
> private Map<String, String> tblProperties;
> private StorageType storageType;
> private String location;
> public enum StorageType {
>     parquet, //only for dev/UT
>     gluten, //clickhouse mergetree (default)
>     deltalake, //future
>     iceberg  //future
> } {code}
> *2. implement internal table catalog*
> We implement a kylin internal table catalog which extends TableCatalog of 
> spark and we only need to implement the  loadTable function. The 
> KyinternalCatalog get table metadata from kylin metadb and expose them as a 
> ClickhouseTableV2(from apache gluten) to spark. 
>  
> *3. implement create table, update table, delete table, truncate table 
> functions etc.*
> Table management operations are implemented in web UI/open api(TODO), don't 
> support DDL statement yet.
>  
> *4. implement load data into internal table function*
> Support full load and incremental load.
> *5. Support partition, bucket feature*
> *6. Support config table properties such as primaryKey, orderByKey*
> *7. Support gluten-mergetree as default storage type*
> *query process with internal table:* 
> !image-2024-08-07-15-31-59-473.png|width=321,height=268!
>  
>  
> *03 Roadmap TODOs*
> 1. Support cache pre-loading
> 2. Support more partition type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KYLIN-5948) Support internal table

Reply via email to