[
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276435#comment-14276435
]
Jimmy Xiang commented on HIVE-9367:
-----------------------------------
With the FileStatus, we don't need to go to NN to get the FileStatus again,
since FileStatus already has info about if the path is a file or dir.
Originally, in getDirIndices, we get FileStatus again, which is an extra call
for each file. So this patch saves us a call to get FileStatus for each file.
> CombineFileInputFormatShim#getDirIndices is expensive
> -----------------------------------------------------
>
> Key: HIVE-9367
> URL: https://issues.apache.org/jira/browse/HIVE-9367
> Project: Hive
> Issue Type: Improvement
> Reporter: Jimmy Xiang
> Assignee: Jimmy Xiang
> Attachments: HIVE-9367.1.patch
>
>
> [~lirui] found out that we spent quite some time on
> CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me
> we should be able to get rid of this method completely if we can enhance
> CombineFileInputFormatShim a little.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)