[jira] Commented: (HIVE-74) Hive can use CombineFileInputFormat for when the input are many small files

Joydeep Sen Sarma (JIRA) Wed, 18 Feb 2009 11:12:27 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674742#action_12674742
 ]


Joydeep Sen Sarma commented on HIVE-74:
---------------------------------------

Is it possible to do this in a way that Hive continues to compile against 
0.17/18/19?. I think this is almost a hard requirement.

One possibility is to have a new version of HiveInputSplit that only compiles 
against 0.20 - and have this conditionally in the code only for 0.20 and 
onwards. (for example in HiveInputFormat.java - there's a conditional tag 
(//[exclude_0_19]) that does some conditional code inclusion). I am not sure 
how this was implemented.

But even this is less than ideal. How will we deploy this with 17 (with 
combinefilesplit and related patches) (unless we are not using the open source 
version directly)

> Hive can use CombineFileInputFormat for when the input are many small files
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-74
>                 URL: https://issues.apache.org/jira/browse/HIVE-74
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.2.0
>
>         Attachments: hiveCombineSplit.patch, hiveCombineSplit.patch
>
>
> There are cases when the input to a Hive job are thousands of small files. In 
> this case, there is a mapper for each file. Most of the overhead for spawning 
> all these mappers can be avoided if Hive used CombineFileInputFormat 
> introduced via HADOOP-4565

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-74) Hive can use CombineFileInputFormat for when the input are many small files

Reply via email to