slinkydeveloper commented on pull request #18657: URL: https://github.com/apache/flink/pull/18657#issuecomment-1034623498
Hi @JingsongLi @tsreaper, I have a question about this PR. I stumbled upon this and, without Avro knowledge, I thought it was wrong as I initially thought the schema inference would break with the projected data type (if you see the git history, i have changed the type here from the projected to non projected one). But after careful studying of Avro codebase, I understood that `GenericDatumReader` is quite smart and it uses the schema provided by us (inferred by the projected data type) and it converges it with the one from the file header, generating a decoder able to read only the fields we need (this seems to be done by `GenericDatumReader#getResolver`). Is this correct? Or is there something else I'm missing? Could there be cases where the file header doesn't have the schema, and then the `GenericDatumReader` relies only on the projected data type, hence breaking the reading? Also, since this behavior is very non-obvious to someone who doesn't have deep knowledge about avro (it's not even in their [javadocs](https://avro.apache.org/docs/1.10.2/api/java/org/apache/avro/generic/GenericDatumReader.html)), could you push an hotfix commit explaining this behavior? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org