[
https://issues.apache.org/jira/browse/KYLIN-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927980#comment-17927980
]
Guoliang Sun edited comment on KYLIN-5991 at 2/18/25 8:47 AM:
--------------------------------------------------------------
h3. Root Cause & Dev Design
# When creating an internal table, the case where the tablePartition field of
the internal table is null was not considered, which led to the NPE issue.
# When Gluten is not enabled in the configuration, during the process of
loading a data source and simultaneously loading it as an internal table,
validate the configuration and throw the corresponding exception.
# When submitting an incremental refresh task, validate whether the refresh
time range exceeds the existing time range.
# Due to the metadata structure upgrade in KE5, some older Tool classes are no
longer supported. During the upgrade script execution, some Tools cannot be
found, causing the upgrade to exit halfway and leaving subsequent processes
incomplete.
## Skip some upgrade Tool processes required for version 4.x
### UpdateUserAclTool
### UpgradeExcludedTableTool
# The logic of converting a regular table to an internal table essentially
involves saving empty data to the delta directory, which is divided into two
steps:
## Create delta_log.
## Save empty data.
## If the second step fails and the delta_log file is not promptly cleared, it
can lead to a mismatch in the schema within the delta_log if the table
structure changes or related sort keys and primary keys are adjusted later.
# Capture exceptions thrown during delta.save() and delete the already created
delta_log directory on HDFS.
# When a single-table model corresponding to an internal table exists, the
metrics and CC (Calculated Columns) on the model are treated as columns of the
table during execution plan generation. This causes queries on the internal
table data to throw "column does not exist" exceptions.
# Refer to the snapshot logic. When querying internal table data, use 1 to
populate the CC columns and metrics.
# Add validation to check if the internal table exists.
# In the parameter validation phase, add logic to validate partition columns
when *date_partition_format* is not empty.
# In internal table operations, some error code messages do not comply with
the standards. xxxx have not been added to the corresponding configuration
files, causing the correct error_code to be unrecognized when an error occurs.
## Solution: Add the error codes xxxx to the corresponding configuration
files, *kylin_errorcode_conf_en.properties* and
{*}kylin_errorcode_conf_zh.properties{*}. This will ensure accurate error codes
are generated.
was (Author: JIRAUSER298470):
h3. Root Cause & Dev Design
# When creating an internal table, the case where the tablePartition field of
the internal table is null was not considered, which led to the NPE issue.
# When Gluten is not enabled in the configuration, during the process of
loading a data source and simultaneously loading it as an internal table,
validate the configuration and throw the corresponding exception.
# When submitting an incremental refresh task, validate whether the refresh
time range exceeds the existing time range.
# Due to the metadata structure upgrade in KE5, some older Tool classes are no
longer supported. During the upgrade script execution, some Tools cannot be
found, causing the upgrade to exit halfway and leaving subsequent processes
incomplete.
## Skip some upgrade Tool processes required for version 4.x
### UpdateUserAclTool
### UpgradeExcludedTableTool
# The logic of converting a regular table to an internal table essentially
involves saving empty data to the delta directory, which is divided into two
steps:
## Create delta_log.
## Save empty data.
## If the second step fails and the delta_log file is not promptly cleared, it
can lead to a mismatch in the schema within the delta_log if the table
structure changes or related sort keys and primary keys are adjusted later.
# Capture exceptions thrown during delta.save() and delete the already created
delta_log directory on HDFS.
# When a single-table model corresponding to an internal table exists, the
metrics and CC (Calculated Columns) on the model are treated as columns of the
table during execution plan generation. This causes queries on the internal
table data to throw "column does not exist" exceptions.
# Refer to the snapshot logic. When querying internal table data, use 1 to
populate the CC columns and metrics.
# Add validation to check if the internal table exists.
# In the parameter validation phase, add logic to validate partition columns
when *date_partition_format* is not empty.
# In internal table operations, some error code messages do not comply with
the standards. xxxx have not been added to the corresponding configuration
files, causing the correct error_code to be unrecognized when an error occurs.
#
## Solution: Add the error codes xxxx to the corresponding configuration
files, *kylin_errorcode_conf_en.properties* and
{*}kylin_errorcode_conf_zh.properties{*}. This will ensure accurate error codes
are generated.
> Multiple abnormal errors in internal tables
> -------------------------------------------
>
> Key: KYLIN-5991
> URL: https://issues.apache.org/jira/browse/KYLIN-5991
> Project: Kylin
> Issue Type: Bug
> Affects Versions: 5.0.0
> Reporter: Guoliang Sun
> Priority: Major
>
> # [Single-table Load/Refresh API] Incremental load without partition columns,
> API does not throw an error.
> # Failed to delete table without internal table
> # [Single-table Refresh] Date out of range not validated
> ## When a time partition column exists and the selected date exceeds the
> range of existing data, the refresh request does not throw an error and loads
> data beyond the time range.
> # [KE5 Upgrade] KE5 upgraded using the upgrade process, but the old version
> directory was retained and not deleted.
> # [Create Internal Table] Inconsistent transaction: Delta log was
> successfully written, but the table metadata was not updated, causing
> subsequent creation of this internal table to fail.
> # When an internal table exists, the model cannot be hit by a query.
> ## Create a project and enable internal tables.
> ## Load the data source table.
> ## Create an internal table.
> ## Use this table for modeling and building.
> ## After the model is built, query the model. Expected to hit the model, but
> the query shows pushdown (and pushdown does not hit the internal table).
> # [Internal Table API] The URL parameters for table and database in the data
> clearing interface are not separated.
> ## Need to be consistent with other internal table APIs, and pass the
> parameters separately.
> # [Source Table - Reload] After adding or deleting columns in the source
> table, it is necessary to check whether the table is an internal table with
> data during loading.
> ## If it is an internal table with data, reloading is not allowed. The user
> should be prompted to clear the internal table data before reloading.
> # [Internal Table API] The delete table partition interface does not
> validate the 'partitions' parameter.
> # [Internal Table API] The interface for getting internal table partition
> details does not validate the database name and table name parameters.
> ## Passing non-existent values for the database name and table name results
> in a successful response.
> # [Internal Table API] When creating an internal table, the partition column
> and time partition format are not cross-validated.
> ## The request succeeds when no partition column is passed, but only the
> time partition format is provided.
> # Normalizes the Error Code of the inner table apis
> ## The error message is unreasonable. When Accept-Language=cn is set, the
> error message is still in English.
> # Support Internal Table Open API
> ## Add a parameter to the table reload interface to support directly
> clearing data and reloading the table (the UI will prompt users to clear the
> data themselves before reloading).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)