Rajesh Balamohan created HIVE-25827: ---------------------------------------
Summary: Parquet file footer is read multiple times, when multiple splits are created in same file Key: HIVE-25827 URL: https://issues.apache.org/jira/browse/HIVE-25827 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan Attachments: image-2021-12-21-03-19-38-577.png With large files, it is possible that multiple splits are created in the same file. With current codebase, "ParquetRecordReaderBase" ends up reading file footer for each split. It can be optimized not to read footer information multiple times for the same file. [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java#L160] [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L91] !image-2021-12-21-03-19-38-577.png|width=1363,height=1256! -- This message was sent by Atlassian Jira (v8.20.1#820001)