[ 
https://issues.apache.org/jira/browse/HIVE-21815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857664#comment-16857664
 ] 

Krisztian Kasa commented on HIVE-21815:
---------------------------------------

[~ashutoshc]
The ReaderImpl parses the stats in its constructor if there were no 
FileMetadata passed with the ReaderOptions. it uses it's wrapped OrcTail 
instance to do it:
https://github.com/apache/orc/blob/2aeefca937722ef0be05674ced6fd7acb6a50278/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L381

Later as [~gopalv] pointed out a new OrcTail instance is created using the 
ReaderImpl and the orcTail.getStripeStatistics() method also parses the stats.

Changing the code as mentioned in the previous comment fixes the issue.
Or the ReaderImpl could parse the stats on demand but this requires code change 
in orc. Also OrcTail could parse the stats on demand.

> Stats in ORC file are parsed twice
> ----------------------------------
>
>                 Key: HIVE-21815
>                 URL: https://issues.apache.org/jira/browse/HIVE-21815
>             Project: Hive
>          Issue Type: Improvement
>          Components: ORC
>            Reporter: Gopal V
>            Priority: Major
>         Attachments: orc-tail-getproto.png, tez-am-2x-protobuf.svg
>
>
> ORC record reader unnecessarily parses stats twice
> {code}
>       if (orcTail == null) {
>         Reader orcReader = OrcFile.createReader(file.getPath(),
>             OrcFile.readerOptions(context.conf)
>                 .filesystem(fs)
>                 .maxLength(AcidUtils.getLogicalLength(fs, file)));
>         orcTail = new OrcTail(orcReader.getFileTail(), 
> orcReader.getSerializedFileFooter(),
>             file.getModificationTime());
>         if (context.cacheStripeDetails) {
>           context.footerCache.put(new FooterCacheKey(fsFileId, 
> file.getPath()), orcTail);
>         }
>       }
>       stripes = orcTail.getStripes();
>       stripeStats = orcTail.getStripeStatistics();
> {code}
> We go from Reader -> OrcTail -> StripeStatistics.
> stripeStats is read out of the orcTail and is already read inside 
> orcReader.getStripeStatistics().
> !orc-tail-getproto.png!
>  [^tez-am-2x-protobuf.svg] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to