tsreaper commented on pull request #18657: URL: https://github.com/apache/flink/pull/18657#issuecomment-1034694706
Hi @slinkydeveloper . Your observation is correct. Avro is a format that stores the schema in the file itself. When calling `DataFileReader.openReader` it will call `DataFileStream#initialize`, which reads the schema from the file header and sets it to the actual reader. So we can just give our expected schema to the reader and leave the mapping work to it. Schema in header is a must for avro files so there is nothing to worry about. See [avro format specification](https://avro.apache.org/docs/current/spec.html#Object+Container+Files). To quote from the specification: > A file header consists of: > * Four bytes, ASCII 'O', 'b', 'j', followed by 1. > * file metadata, **including the schema.** > * The 16-byte, randomly-generated sync marker for this file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org