[
https://issues.apache.org/jira/browse/KYLIN-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927992#comment-17927992
]
Guoliang Sun commented on KYLIN-5996:
-------------------------------------
h1. Dev Design
h2. Gluten command for loading cache
h3. Syntax definition for loading internal tables
- Starts with "CACHE DATA" to avoid conflicts with vanilla Spark's "CACHE
TABLE".
- META: If set, only caches the metadata.gluten and meta.bin data. Currently
unsupported.
- ASYNC: When executing this command, specifies whether to use asynchronous
mode, meaning no need to wait for the cache result after execution.
- SELECT selectedColumn ..: Used to specify which columns' data to cache when
caching data.
- FROM ....: Specifies the target table for caching or uses a delta path.
- AFTER: Optional. Can specify a time, then only caches the part data inserted
after that time (or based on data partition values). Filters through the delta
meta using this time and outputs the list of parts to be cached.
{quote} There are two types of time here:
1. TIMESTAMP: A timestamp string, representing filtering based on the insertion
timestamp. Only data later than this time will be cached.
2. Date partition column: Specifies a date partition value. Only data later
than this time partition will be cached.
{quote}
h3. Syntax definition for loading index
- Starts with "CACHE FILES".
- ASYNC: When executing this command, specifies whether to use asynchronous
mode, meaning no need to wait for the cache result after execution.
- SELECT selectedColumn ..: Used to specify which columns' data to cache when
caching data.
- FROM ....: Specifies the target via a Parquet path.
h3. Examples of loading cache
{code:java}
# Example of loading an internal table
CACHE DATA SELECT * FROM lineitem_mergetree_hdfs;# Example of loading an index
CACHE FILES SELECT * FROM 'hdfs://192.168.3.107:8020/tpch-data-sf10/lineitem';
{code}
h2. Preloading Cache Configuration
||Parameter||Default Value||Description||Effective Scope||
|{*}kylin.internal-table.preloaded-cache.enabled{*}{*}{*}|true|Whether to
enable preloading cache for internal tables.
Default value is `true`, indicating that preloading of internal table cache is
enabled by default.|System Level \| Project Level \| Table Level|
|*kylin.index.preloaded-cache.enabled*|true|Whether to enable preloading cache
for index files.
Default value is `true`, indicating that preloading of index file cache is
enabled by default.|System Level \| Project Level \| Model Level|
|*kylin.cache.gluten-cache-concurrent-running-threshold*|20|Concurrent requests
for API calls to execute Gluten cache.
Default value is `20`, indicating a default concurrency of 20.|System Level|
|*kylin.cache.gluten-cache-request-timeout*|1d|Timeout for forwarding REST
requests of the Gluten cache command.
Default value is `1d`, meaning a timeout is returned if the forwarded request
execution exceeds 1 day.|System Level|
> Support preloading for internal table cache
> -------------------------------------------
>
> Key: KYLIN-5996
> URL: https://issues.apache.org/jira/browse/KYLIN-5996
> Project: Kylin
> Issue Type: New Feature
> Affects Versions: 5.0.0
> Reporter: Guoliang Sun
> Priority: Major
>
> Kylin5 supports Gluten as the native engine, and its query performance
> largely depends on loading data from remote storage into the local cache.
> Gluten currently plans to support caching for MergeTree and Parquet formats.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)