zhimin wu created KYLIN-5828:
--------------------------------
Summary: During multi-jobs concurrent building, the flat table may
use inconsistent global dictionaries, resulting in incorrect count distinct
query results.
Key: KYLIN-5828
URL: https://issues.apache.org/jira/browse/KYLIN-5828
Project: Kylin
Issue Type: Bug
Components: Storage - Parquet
Reporter: zhimin wu
Assignee: zhimin wu
*Root Cause*
When multiple tasks are concurrently building and using the same global
dictionary, the consistency of the dictionary version used in the flat table
encoding process is not guaranteed. At the same time, another task expands the
dictionary, causing some flat table partitions to mistakenly use the new
version of the dictionary partition file. Due to the inconsistent data
distribution, the correct dictionary content cannot be obtained, resulting in a
flat table encoding column of 0 and ultimately causing an abnormal count
distinct value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)