[ https://issues.apache.org/jira/browse/FLINK-29527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630132#comment-17630132 ]
Sun Shun commented on FLINK-29527: ---------------------------------- [~lirui] could you please help take a look at this PR when you are free, thanks > Make unknownFieldsIndices work for single ParquetReader > ------------------------------------------------------- > > Key: FLINK-29527 > URL: https://issues.apache.org/jira/browse/FLINK-29527 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Affects Versions: 1.16.0 > Reporter: Sun Shun > Assignee: Sun Shun > Priority: Major > Labels: pull-request-available > > Currently, from the improvement FLINK-23715, Flink use a collection named > `unknownFieldsIndices` to track the nonexistent fields, and it is kept inside > the `ParquetVectorizedInputFormat`, and applied to all parquet files under > given path. > However, some fields may only be nonexistent in some of the historical > parquet files, while exist in latest ones. And based on > `unknownFieldsIndices`, flink will always skip these fields, even thought > they are existing in the later parquets. > As a result, the value of these fields will become empty when they are > nonexistent in some historical parquet files. -- This message was sent by Atlassian Jira (v8.20.10#820010)