[ 
https://issues.apache.org/jira/browse/KYLIN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930194#comment-17930194
 ] 

Guoliang Sun commented on KYLIN-6057:
-------------------------------------

h3. Dev Design

- Adjust the dictionary `build version` to be determined only after acquiring 
the distributed lock.  
- For a single build task involving multiple dictionaries, change from a 
unified `build version` to independent `build versions`, maintained and passed 
via `buildParam` to subsequent `encodeColumn` operations.  
- After acquiring the distributed lock, the default `build version` remains as 
`System.currentTimeMillis()`.  
- Retrieve and compare the latest `version_xxx` directory suffix in the current 
dictionary directory with the current `build version`. If the current `build 
version` is less than or equal to the existing directory suffix, adjust the 
current `build version` to `existing suffix + 1`.  
  - Example: Current `buildVersion` is `1725604515885`, and the dictionary 
directory contains `version_1725604550000`.  
    - Since `1725604515885 < 1725604550000`,  
    - Adjust the current `buildVersion` to `1725604550001`.  
- After dictionary construction but before flat table encoding, pause and 
restart the build task. On the driver side, directly retrieve the actual latest 
dictionary `buildVersion` instead of resetting it to `-1`, ensuring that Spark 
executors always use a fixed `buildVersion` during execution.  
- If the actual dictionary value cannot be retrieved on the driver side, throw 
a `NoRetryException` to terminate the current build.

> Incorrect Data in Global Dictionary Construction
> ------------------------------------------------
>
>                 Key: KYLIN-6057
>                 URL: https://issues.apache.org/jira/browse/KYLIN-6057
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: 5.0.0
>            Reporter: Guoliang Sun
>            Assignee: Guoliang Sun
>            Priority: Major
>             Fix For: 5.0.2
>
>
> Kylin Query Count Distinct Results Incorrect  
> h3. Root Cause
> During high-concurrency builds, slight differences in machine clocks may 
> cause different build tasks to use the same old version of a field for 
> construction, leading to overlapping values in the global dictionary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to