Rajesh Balamohan created HIVE-26699:
---------------------------------------

             Summary: Iceberg: S3 fadvise can hurt JSON parsing significantly 
in DWX
                 Key: HIVE-26699
                 URL: https://issues.apache.org/jira/browse/HIVE-26699
             Project: Hive
          Issue Type: Improvement
            Reporter: Rajesh Balamohan


Hive reads JSON metadata information (TableMetadataParser::read()) multiple 
times; E.g during query compilation, AM split computation, stats computation, 
during commits  etc.

 

With large JSON files (due to multiple inserts), it takes a lot longer time 
with S3 FS with "fs.s3a.experimental.input.fadvise" set to "random". (e.g in 
the order of 10x).To be on safer side, it will be good to set this to "normal" 
mode in configs, when reading iceberg tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to