[
https://issues.apache.org/jira/browse/KYLIN-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928305#comment-17928305
]
Guoliang Sun commented on KYLIN-6028:
-------------------------------------
h3. Root Cause
Within a transaction, for metadata that needs to be modified, the latest copy
is directly fetched from the DB, and then the corresponding modifications are
made. For metadata that does not need modification, the cached version in
memory is used, similar to the "repeatable read" design in databases.
During model refresh/build operations, a new segment is added, and the
`dataFlow` is updated. At this point, the latest value of `dataFlow` is
guaranteed to be fetched, but the `segments` in memory may not necessarily
reflect the latest state.
For example, if `dataflow1` already contains `seg0`, and two concurrent
requests come in:
- Request 1 intends to add `seg1` and trigger a task. After processing,
`dataflow1` will record the UUIDs of both `seg0` and `seg1`.
- Request 2 intends to add `seg2` and trigger a task. Request 2 fetches the
latest value of `dataflow1` from the DB, obtaining the updated `dataflow1` that
includes the UUIDs of `seg0` and `seg1`.
- Request 2 attempts to retrieve `seg0` and `seg1` from memory using their
UUIDs. However, due to synchronization delays in `auditLog`, only `seg0` is
retrieved, while `seg1` is ignored.
- Request 2 adds `seg2`. After processing, it updates `dataFlow` to include
only `seg0` and `seg2`.
- Ultimately, two tasks are created to build `seg1` and `seg2`, but the model
ends up containing only `seg2`.
The critical issue in the above logic is that when an inconsistency between
`dataflow` and `segment` metadata is detected, `seg1` is ignored, leading to
metadata loss.
> Kylin5 encounters metadata anomalies when concurrently submitting
> build/refresh tasks
> -------------------------------------------------------------------------------------
>
> Key: KYLIN-6028
> URL: https://issues.apache.org/jira/browse/KYLIN-6028
> Project: Kylin
> Issue Type: Bug
> Affects Versions: 5.0.0
> Reporter: Guoliang Sun
> Priority: Major
>
> In Kylin5, when two incremental build tasks with the same time range for the
> same model are submitted concurrently, both requests succeed. However, only
> one segment is created for the model, while two build tasks are created,
> which is inconsistent with expectations.
> Further verification shows that the same issue occurs when concurrently
> refreshing the same segment.
> Additional testing reveals that submitting build/refresh tasks concurrently
> for a model may result in issues, regardless of whether these tasks conflict
> or not.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)