Sergei Morozov created FLINK-37604:
--------------------------------------
Summary: Flink CDC pipeline composer doesn't assign UIDs to
operators
Key: FLINK-37604
URL: https://issues.apache.org/jira/browse/FLINK-37604
Project: Flink
Issue Type: Improvement
Components: Flink CDC
Affects Versions: cdc-3.2.0
Reporter: Sergei Morozov
>From the
>[documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#assigning-operator-ids]:
{quote}It is *highly recommended* that you specify operator IDs via the
*{{uid(String)}}* method.
{quote}
The pipeline composer currently doesn't specify the UIDs.
Steps to reproduce:
# Enable debug logging on
{{{}org.apache.flink.streaming.api.graph.StreamGraphHasherV2{}}}.
# Start a CDC pipeline
# Observe messages like the following in the log
{noformat}
Generated hash 'cbc357ccb763df2852fee8c4fc7d55f2' for node 'Source: Value
Source-1' {id: 1, parallelism: 1, user function: }
Generated hash '78be0dd8677bc2711e2a56947a5ea048' for node 'PrePartition-3'
{id: 3, parallelism: 1, user function: }
Generated hash '0deb1b26a3d9eb3c8f0c11f7110b2903' for node 'PostPartition-5'
{id: 5, parallelism: 1, user function:
org.apache.flink.cdc.runtime.partitioning.PostPartitionProcessor}
Generated hash 'c63277377682ee6b89910f3fdc3b3a1e' for node 'Sink Writer:
Sink-6' {id: 6, parallelism: 1, user function: }
Generated hash '25f5e74a29ec629de3d4a0e9f84185e9' for node 'Sink Committer:
Sink-9' {id: 9, parallelism: 1, user function: }
{noformat}
It means that Flink generates these UIDs based on the job graph (the input and
output nodes of a given node). If the job graph changes (by adding more
operators), these hashes will be regenerated, and Flink will be unable to
restore the state of the affected operators.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)