slinkydeveloper edited a comment on pull request #18657:
URL: https://github.com/apache/flink/pull/18657#issuecomment-1034623498


   Hi @JingsongLi @tsreaper, I have a question about this PR.
   
   I stumbled upon this and, without Avro knowledge, I thought it was wrong as 
I initially thought the schema inference would break with the projected data 
type (if you see the git history, i have changed the type here from the 
projected to non projected one). But after careful studying of Avro codebase, I 
understood that `GenericDatumReader` is quite smart and it uses the schema 
provided by us (inferred by the projected data type) and it converges it with 
the one from the file header, generating a decoder able to read only the fields 
we need (this seems to be done by `GenericDatumReader#getResolver`). Is this 
correct? Or is there something else I'm missing? Could there be cases where the 
file header doesn't have the schema, and then the `GenericDatumReader` relies 
only on the projected data type, hence breaking the reading?
   
   Also, since this behavior is very non-obvious to someone who doesn't have 
deep knowledge about avro (it's not even in their 
[javadocs](https://avro.apache.org/docs/1.10.2/api/java/org/apache/avro/generic/GenericDatumReader.html)),
 could you push an hotfix commit with a comment explaining this behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to