[ 
https://issues.apache.org/jira/browse/KYLIN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832356#comment-17832356
 ] 

pengfei.zhan commented on KYLIN-5767:
-------------------------------------

h1. Problem

With Snapshot Management turned on, the Model Snapshot Build task did not skip 
the Build Snapshot step. KYLIN will get the table sampling information 
according to "tableManager.getTableExtIfExists(tableDesc)". If the tables's 
sampling information is empty or the number of rows sampled is equal to 0, it 
will calculate the total rows. This step will be queried at the JDBC data 
source level by the way of "select *". However, if table from the customer 
environment is too large, the build stage in this step will spent too much 
time. Usually,  the large dimension data and the partition column is null may 
lead to this situation.

If the sampling information is empty or the number of rows sampled is equal to 
0, then the total rows will be calculated. This step will be queried at the 
JDBC data source level through the "select *" method.

> Calculating total rows abnormal when jdbc datasource is connnected
> ------------------------------------------------------------------
>
>                 Key: KYLIN-5767
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5767
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: 5.0-beta
>            Reporter: pengfei.zhan
>            Assignee: pengfei.zhan
>            Priority: Major
>             Fix For: 5.0.0
>
>
> {{When the JDBC data source is connected, the snapshot management function is 
> enabled and the dimension table is not sampled, optimize the build logic to 
> ensure that the job can be executed normally when the dimension table data 
> volume is large}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to