[I] [Bug] Flink CDC ingestion does not update iceberg metadata [paimon]

via GitHub Thu, 05 Jun 2025 04:26:06 -0700


0dunay0 opened a new issue, #5700:
URL: https://github.com/apache/paimon/issues/5700


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   Master branch `b4be1f1bacf7af5c2111008cbfbacf439df0214e`
   
   ### Compute Engine
   
   Flink
   
   ### Minimal reproduce step
   
   Set up a Flink CDC ingestion that evolves the source in a compatible way 
e.g. 
   `{field1: INT, field2: FLOAT} -> {field1: INT, field2: FLOAT, field3: 
STRING}`, the addition of field3 takes effect in paimon but not in Iceberg 
metadata. 
   
   After evolving the CDC source (avro schema for kafka topic in my case) by 
adding `field3`, I published new messages that I can see in the Paimon table 
but not in Iceberg. I can see that a new Iceberg metadata file is generated at 
the same time new data for the new Paimon schema is ingested e.g. metadata file 
increased to v6 but the metadata content is not updated.
   
   ```
   // inspecting Paimon metadata
   ls ~/.paimon-data/warehouse/default.db/test_table/schema/ 
   schema-0 schema-1
   
   cat ~/.paimon-data/warehouse/default.db/test_table/schema/schema-0 | jq 
'{id, fields}'
   {
     "id": 0,
     "fields": [
       {
         "id": 0,
         "name": "field1",
         "type": "INT"
       },
       {
         "id": 1,
         "name": "field2",
         "type": "FLOAT"
       }
     ]
   }
   
   cat ~/.paimon-data/warehouse/default.db/test_table/schema/schema-1 | jq 
'{id, fields}'
   {
     "id": 1,
     "fields": [
       {
         "id": 0,
         "name": "field1",
         "type": "INT"
       },
       {
         "id": 1,
         "name": "field2",
         "type": "FLOAT"
       },
       {
         "id": 3,
         "name": "field3",
         "type": "STRING"
       }
     ]
   }
   
   // inspecting Iceberg metadata
   cat ~/.paimon-data/warehouse/default.db/test_table/metadata/v6.metadata.json 
| jq '.schemas[]'
   
   {
     "type": "struct",
     "schema-id": 0,
     "fields": [
       {
         "id": 0,
         "name": "field1",
         "required": false,
         "type": "int",
         "doc": null
       },
       {
         "id": 1,
         "name": "field2",
         "required": false,
         "type": "float",
         "doc": null
       }
     ]
   }
   {
     "type": "struct",
     "schema-id": 2,
     "fields": [
       {
         "id": 0,
         "name": "field1",
         "required": false,
         "type": "int",
         "doc": null
       },
       {
         "id": 1,
         "name": "field2",
         "required": false,
         "type": "float",
         "doc": null
       }
     ]
   }
   
   ### What doesn't meet your expectations?
   
   I expect the behaviour to match what happens with Flink SQL engine i.e. sync 
Paimon and Iceberg metadata.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] Flink CDC ingestion does not update iceberg metadata [paimon]

Reply via email to