Rajesh Balamohan created HIVE-25827:
---------------------------------------

             Summary: Parquet file footer is read multiple times, when multiple 
splits are created in same file
                 Key: HIVE-25827
                 URL: https://issues.apache.org/jira/browse/HIVE-25827
             Project: Hive
          Issue Type: Improvement
            Reporter: Rajesh Balamohan
         Attachments: image-2021-12-21-03-19-38-577.png

With large files, it is possible that multiple splits are created in the same 
file. With current codebase, "ParquetRecordReaderBase" ends up reading file 
footer for each split. 

It can be optimized not to read footer information multiple times for the same 
file.

 

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java#L160]

 

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L91]

 

 

!image-2021-12-21-03-19-38-577.png|width=1363,height=1256!

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to