[GitHub] [flink] tsreaper commented on pull request #18657: [FLINK-26001][avro] Implement ProjectableDecodingFormat for avro BulkDecodingFormat

GitBox Thu, 10 Feb 2022 01:35:08 -0800


tsreaper commented on pull request #18657:
URL: https://github.com/apache/flink/pull/18657#issuecomment-1034694706



   Hi @slinkydeveloper .
   
   Your observation is correct. Avro is a format that stores the schema in the 
file itself. When calling `DataFileReader.openReader` it will call 
`DataFileStream#initialize`, which reads the schema from the file header and 
sets it to the actual reader. So we can just give our expected schema to the 
reader and leave the mapping work to it.
   
   Schema in header is a must for avro files so there is nothing to worry 
about. See [avro format 
specification](https://avro.apache.org/docs/current/spec.html#Object+Container+Files).
 To quote from the specification:
   > A file header consists of:
   > * Four bytes, ASCII 'O', 'b', 'j', followed by 1.
   > * file metadata, **including the schema.**
   > * The 16-byte, randomly-generated sync marker for this file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] tsreaper commented on pull request #18657: [FLINK-26001][avro] Implement ProjectableDecodingFormat for avro BulkDecodingFormat

Reply via email to