[jira] [Commented] (ORC-162) Handle 0 byte files as empty ORC files

ASF GitHub Bot (JIRA) Thu, 31 Aug 2017 16:03:15 -0700

    [ 
https://issues.apache.org/jira/browse/ORC-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149740#comment-16149740
 ]


ASF GitHub Bot commented on ORC-162:
------------------------------------

Github user prasanthj commented on a diff in the pull request:

    https://github.com/apache/orc/pull/163#discussion_r136468354
  
    --- Diff: java/mapreduce/src/java/org/apache/orc/mapred/OrcInputFormat.java 
---
    @@ -151,4 +153,26 @@ public static void setSearchArgument(Configuration 
conf,
         return new OrcMapredRecordReader<>(file, buildOptions(conf,
             file, split.getStart(), split.getLength()));
       }
    +
    +  /**
    +   * Filter out the 0 byte files, so that we don't generate splits for the
    +   * empty ORC files.
    +   * @param job the job configuration
    +   * @return a list of files that need to be read
    +   * @throws IOException
    +   */
    +  protected FileStatus[] listStatus(JobConf job) throws IOException {
    +    FileStatus[] result = super.listStatus(job);
    +    List<FileStatus> ok = new ArrayList<>(result.length);
    --- End diff --
    
    Make sense. Just noticed filter gets applied after listStatus anyway. 


> Handle 0 byte files as empty ORC files
> --------------------------------------
>
>                 Key: ORC-162
>                 URL: https://issues.apache.org/jira/browse/ORC-162
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>
> Hive often creates empty files for empty buckets, which can introduce 
> significant load on the HDFS cluster. Therefore, they made the Hive 
> OrcOutputFormat and OrcInputFormat use 0 byte ORC files as a special case.
> We need to make the other readers treat them reasonably.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ORC-162) Handle 0 byte files as empty ORC files

Reply via email to