[ 
https://issues.apache.org/jira/browse/KYLIN-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928305#comment-17928305
 ] 

Guoliang Sun commented on KYLIN-6028:
-------------------------------------

h3. Root Cause

Within a transaction, for metadata that needs to be modified, the latest copy 
is directly fetched from the DB, and then the corresponding modifications are 
made. For metadata that does not need modification, the cached version in 
memory is used, similar to the "repeatable read" design in databases.  

During model refresh/build operations, a new segment is added, and the 
`dataFlow` is updated. At this point, the latest value of `dataFlow` is 
guaranteed to be fetched, but the `segments` in memory may not necessarily 
reflect the latest state.  

For example, if `dataflow1` already contains `seg0`, and two concurrent 
requests come in:  
- Request 1 intends to add `seg1` and trigger a task. After processing, 
`dataflow1` will record the UUIDs of both `seg0` and `seg1`.  
- Request 2 intends to add `seg2` and trigger a task. Request 2 fetches the 
latest value of `dataflow1` from the DB, obtaining the updated `dataflow1` that 
includes the UUIDs of `seg0` and `seg1`.  
- Request 2 attempts to retrieve `seg0` and `seg1` from memory using their 
UUIDs. However, due to synchronization delays in `auditLog`, only `seg0` is 
retrieved, while `seg1` is ignored.  
- Request 2 adds `seg2`. After processing, it updates `dataFlow` to include 
only `seg0` and `seg2`.  
- Ultimately, two tasks are created to build `seg1` and `seg2`, but the model 
ends up containing only `seg2`.  

The critical issue in the above logic is that when an inconsistency between 
`dataflow` and `segment` metadata is detected, `seg1` is ignored, leading to 
metadata loss.

> Kylin5 encounters metadata anomalies when concurrently submitting 
> build/refresh tasks
> -------------------------------------------------------------------------------------
>
>                 Key: KYLIN-6028
>                 URL: https://issues.apache.org/jira/browse/KYLIN-6028
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: 5.0.0
>            Reporter: Guoliang Sun
>            Priority: Major
>
> In Kylin5, when two incremental build tasks with the same time range for the 
> same model are submitted concurrently, both requests succeed. However, only 
> one segment is created for the model, while two build tasks are created, 
> which is inconsistent with expectations.  
> Further verification shows that the same issue occurs when concurrently 
> refreshing the same segment.  
> Additional testing reveals that submitting build/refresh tasks concurrently 
> for a model may result in issues, regardless of whether these tasks conflict 
> or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to