[ 
https://issues.apache.org/jira/browse/FLINK-39718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ran Tao updated FLINK-39718:
----------------------------
    Description: 
When using a distributed pipeline source with Paimon sink, the job may fail if 
the target table does not already exist.

The failure happens in *DistributedPrePartitionOperator*. The previous 
*PaimonHashFunction* rebuilds the hash function at this stage and immediately 
accesses the external Paimon catalog to load the target table. If the table has 
not been created yet, *catalog.getTable(...)* throws TableNotExistException, 
and the job fails in the pre-partition stage.

This issue is usually not exposed in MySQL-CDC pipelines because MySQL-CDC uses 
regular topology. In that path, CreateTableEvent is handled by SchemaOperator 
first, and MetadataApplier creates the downstream table before records enter 
*RegularPrePartitionOperator*. As a result, the previous PaimonHashFunction can 
usually find the target table from the catalog.

For distributed pipeline sources, the execution order is different: 
pre-partitioning happens before the target table is created by 
*MetadataApplier* in the schema coordination phase.

 This issue is not specific to Kafka. It can affect any distributed pipeline 
source with the same execution order.

  was:
When using a distributed pipeline source with Paimon sink, the job may fail if 
the target table does not already exist.

The failure happens in *DistributedPrePartitionOperator*. The previous 
*PaimonHashFunction* rebuilds the hash function at this stage and immediately 
accesses the external Paimon catalog to load the target table. If
  the table has not been created yet, *catalog.getTable(...)* throws 
TableNotExistException, and the job fails in the pre-partition stage.

This issue is usually not exposed in MySQL-CDC pipelines because MySQL-CDC uses 
regular topology. In that path, CreateTableEvent is handled by SchemaOperator 
first, and MetadataApplier creates the downstream table before records enter 
*RegularPrePartitionOperator*. As a result, the previous PaimonHashFunction can 
usually find the target table from the catalog.

For distributed pipeline sources, the execution order is different: 
pre-partitioning happens before the target table is created by 
*MetadataApplier* in the schema coordination phase.

 This issue is not specific to Kafka. It can affect any distributed pipeline 
source with the same execution order.


> [pipeline][paimon] Paimon pipeline sink fails with distributed source when 
> target table does not exist
> ------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39718
>                 URL: https://issues.apache.org/jira/browse/FLINK-39718
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>            Reporter: Ran Tao
>            Priority: Major
>
> When using a distributed pipeline source with Paimon sink, the job may fail 
> if the target table does not already exist.
> The failure happens in *DistributedPrePartitionOperator*. The previous 
> *PaimonHashFunction* rebuilds the hash function at this stage and immediately 
> accesses the external Paimon catalog to load the target table. If the table 
> has not been created yet, *catalog.getTable(...)* throws 
> TableNotExistException, and the job fails in the pre-partition stage.
> This issue is usually not exposed in MySQL-CDC pipelines because MySQL-CDC 
> uses regular topology. In that path, CreateTableEvent is handled by 
> SchemaOperator first, and MetadataApplier creates the downstream table before 
> records enter *RegularPrePartitionOperator*. As a result, the previous 
> PaimonHashFunction can usually find the target table from the catalog.
> For distributed pipeline sources, the execution order is different: 
> pre-partitioning happens before the target table is created by 
> *MetadataApplier* in the schema coordination phase.
>  This issue is not specific to Kafka. It can affect any distributed pipeline 
> source with the same execution order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to