[
https://issues.apache.org/jira/browse/ORC-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146260#comment-16146260
]
ASF GitHub Bot commented on ORC-162:
------------------------------------
Github user prasanthj commented on the issue:
https://github.com/apache/orc/pull/163
Hive creates empty files only for MR to support bucketed joins. Tez doesn't
create empty bucket files anymore. Hive currently discards empty files during
split generation. We can do similar thing in Orc's version of OrcInputFormat
(or add EmptyFilePathPattern to ignore 0 length files or files <=
MAGIC.length). Creating splits for empty is anyway useless. As far as calling
the Reader's directly with a empty file path, we can treat it as empty file
with struct<>.
> Handle 0 byte files as empty ORC files
> --------------------------------------
>
> Key: ORC-162
> URL: https://issues.apache.org/jira/browse/ORC-162
> Project: ORC
> Issue Type: Bug
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
>
> Hive often creates empty files for empty buckets, which can introduce
> significant load on the HDFS cluster. Therefore, they made the Hive
> OrcOutputFormat and OrcInputFormat use 0 byte ORC files as a special case.
> We need to make the other readers treat them reasonably.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)