[PR] [FLINK-20454][formats] Add metadata support for Debezium Avro [flink]

via GitHub Sat, 20 Jun 2026 06:19:19 -0700


p-eye opened a new pull request, #28498:
URL: https://github.com/apache/flink/pull/28498


   ## What is the purpose of the change
   
   This pull request implements metadata reading support for the 
`debezium-avro-confluent` format, bringing feature parity with the 
`debezium-json` format. Users can now access Debezium metadata fields (such as 
source database, schema, table, and timestamps) when using Avro-encoded 
Debezium messages from Kafka.
   
   ## Brief change log
   
   - Add `DebeziumAvroDecodingFormat` implementing `ProjectableDecodingFormat` 
with metadata support
   - Implement 6 metadata fields (`ingestion-timestamp`, `source.timestamp`, 
`source.databases`, `source.schema`, `schema name`, `source.table`, 
`source.properties`)
   - Add metadata deserialization tests
   - Update documentation with Avro format examples
   
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   - Added `DebeziumAvroSerDeSchemaTest.testDeserializationWithMetadata()` that 
validates all 6 metadata fields (ingestion-timestamp, source.timestamp, 
source.database, source.schema, source.table, source.properties) are correctly 
extracted from Debezium Avro CDC messages.
   - Update `DebeziumAvroFormatFactoryTest.testSeDeSchema()` and 
`testSeDeSchemaWithSchemaOption()` to verify the factory creates 
deserialization schemas with correct metadata support parameters
   - Existing tests (`testDeleteDataDeserialization()`, 
`testSeDeSchemaWithInvalidSchemaOption()`) pass without modification, ensuring 
backward compatibility
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): yes (adds 
metadata extraction in deser path, but only when metadata columns are 
explicitly requested)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? docs
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   <!--
   If generative AI tooling has been used in the process of authoring this PR, 
please
   change the checkbox below to `[X]` followed by the name of the tool, and 
uncomment the
   "Generated-by" line. See the ASF Generative Tooling Guidance for details:
   https://www.apache.org/legal/generative-tooling.html
   
   You are responsible for the quality and correctness of every change in this 
PR
   regardless of the tooling used. Low-effort AI-generated PRs will be closed. 
See
   AGENTS.md for the full guidance.
   -->
   
   - [x] Yes (please specify the tool below)
   
   Generated-by: Claude Sonnect 4.5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [FLINK-20454][formats] Add metadata support for Debezium Avro [flink]

Reply via email to