[
https://issues.apache.org/jira/browse/KYLIN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832356#comment-17832356
]
pengfei.zhan commented on KYLIN-5767:
-------------------------------------
h1. Problem
With Snapshot Management turned on, the Model Snapshot Build task did not skip
the Build Snapshot step. KYLIN will get the table sampling information
according to "tableManager.getTableExtIfExists(tableDesc)". If the tables's
sampling information is empty or the number of rows sampled is equal to 0, it
will calculate the total rows. This step will be queried at the JDBC data
source level by the way of "select *". However, if table from the customer
environment is too large, the build stage in this step will spent too much
time. Usually, the large dimension data and the partition column is null may
lead to this situation.
If the sampling information is empty or the number of rows sampled is equal to
0, then the total rows will be calculated. This step will be queried at the
JDBC data source level through the "select *" method.
> Calculating total rows abnormal when jdbc datasource is connnected
> ------------------------------------------------------------------
>
> Key: KYLIN-5767
> URL: https://issues.apache.org/jira/browse/KYLIN-5767
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Affects Versions: 5.0-beta
> Reporter: pengfei.zhan
> Assignee: pengfei.zhan
> Priority: Major
> Fix For: 5.0.0
>
>
> {{When the JDBC data source is connected, the snapshot management function is
> enabled and the dimension table is not sampled, optimize the build logic to
> ensure that the job can be executed normally when the dimension table data
> volume is large}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)