[jira] [Commented] (KYLIN-5996) Support preloading for internal table cache

Guoliang Sun (Jira) Tue, 18 Feb 2025 01:30:16 -0800


    [ 
https://issues.apache.org/jira/browse/KYLIN-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927992#comment-17927992
 ]


Guoliang Sun commented on KYLIN-5996:
-------------------------------------

h1. Dev Design
h2. Gluten command for loading cache
h3. Syntax definition for loading internal tables
 - Starts with "CACHE DATA" to avoid conflicts with vanilla Spark's "CACHE 
TABLE".
 - META: If set, only caches the metadata.gluten and meta.bin data. Currently 
unsupported.
 - ASYNC: When executing this command, specifies whether to use asynchronous 
mode, meaning no need to wait for the cache result after execution.
 - SELECT selectedColumn ..: Used to specify which columns' data to cache when 
caching data.
 - FROM ....: Specifies the target table for caching or uses a delta path.
 - AFTER: Optional. Can specify a time, then only caches the part data inserted 
after that time (or based on data partition values). Filters through the delta 
meta using this time and outputs the list of parts to be cached.

{quote} There are two types of time here:
1. TIMESTAMP: A timestamp string, representing filtering based on the insertion 
timestamp. Only data later than this time will be cached.
2. Date partition column: Specifies a date partition value. Only data later 
than this time partition will be cached.
{quote}
h3. Syntax definition for loading index
 - Starts with "CACHE FILES".
 - ASYNC: When executing this command, specifies whether to use asynchronous 
mode, meaning no need to wait for the cache result after execution.
 - SELECT selectedColumn ..: Used to specify which columns' data to cache when 
caching data.
 - FROM ....: Specifies the target via a Parquet path.

h3. Examples of loading cache

 
{code:java}
# Example of loading an internal table
CACHE DATA SELECT * FROM lineitem_mergetree_hdfs;# Example of loading an index
CACHE FILES SELECT * FROM 'hdfs://192.168.3.107:8020/tpch-data-sf10/lineitem'; 
{code}
h2. Preloading Cache Configuration

 
||Parameter||Default Value||Description||Effective Scope||
|{*}kylin.internal-table.preloaded-cache.enabled{*}{*}{*}|true|Whether to 
enable preloading cache for internal tables.  
Default value is `true`, indicating that preloading of internal table cache is 
enabled by default.|System Level \| Project Level \| Table Level|
|*kylin.index.preloaded-cache.enabled*|true|Whether to enable preloading cache 
for index files.  
Default value is `true`, indicating that preloading of index file cache is 
enabled by default.|System Level \| Project Level \| Model Level|
|*kylin.cache.gluten-cache-concurrent-running-threshold*|20|Concurrent requests 
for API calls to execute Gluten cache.  
Default value is `20`, indicating a default concurrency of 20.|System Level|
|*kylin.cache.gluten-cache-request-timeout*|1d|Timeout for forwarding REST 
requests of the Gluten cache command.  
Default value is `1d`, meaning a timeout is returned if the forwarded request 
execution exceeds 1 day.|System Level|

 

 

 

> Support preloading for internal table cache
> -------------------------------------------
>
>                 Key: KYLIN-5996
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5996
>             Project: Kylin
>          Issue Type: New Feature
>    Affects Versions: 5.0.0
>            Reporter: Guoliang Sun
>            Priority: Major
>
> Kylin5 supports Gluten as the native engine, and its query performance 
> largely depends on loading data from remote storage into the local cache.  
> Gluten currently plans to support caching for MergeTree and Parquet formats.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KYLIN-5996) Support preloading for internal table cache

Reply via email to