[
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276527#comment-14276527
]
Rui Li commented on HIVE-9367:
------------------------------
I just verified the patch here can reduce the getSplits time from 1s to less
than 200ms. The test table consists of one 100GB sequence file.
> CombineFileInputFormatShim#getDirIndices is expensive
> -----------------------------------------------------
>
> Key: HIVE-9367
> URL: https://issues.apache.org/jira/browse/HIVE-9367
> Project: Hive
> Issue Type: Improvement
> Reporter: Jimmy Xiang
> Assignee: Jimmy Xiang
> Attachments: HIVE-9367.1.patch
>
>
> [~lirui] found out that we spent quite some time on
> CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me
> we should be able to get rid of this method completely if we can enhance
> CombineFileInputFormatShim a little.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)