[jira] Commented: (HIVE-1118) Hive merge map files should have different bytes/mapper setting

Namit Jain (JIRA) Fri, 29 Jan 2010 09:46:57 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806408#action_12806408
 ]


Namit Jain commented on HIVE-1118:
----------------------------------

Wont it lead to a lot of small files - 1MB, assuming the reducer maintains the 
same size for the data

> Hive merge map files should have different bytes/mapper setting
> ---------------------------------------------------------------
>
>                 Key: HIVE-1118
>                 URL: https://issues.apache.org/jira/browse/HIVE-1118
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>
> Currently, by default, we get one reducer for each 1GB of input data.
> It's also true for the conditional merge job that will run if the average 
> file size is smaller than a threshold.
> This actually makes those job very slow, because each reducer needs to 
> consume 1GB of data.
> Alternatively, we can just use that threshold to determine the number of 
> reducers per job (or introduce a new parameter).
> Let's say the threshold is 1MB, then we only start the the merge job if the 
> average file size is less than 1MB, and the eventual result file size will be 
> around 1MB (or another small number).
> This will remove the extreme cases where we have thousands of empty files, 
> but still make normal jobs fast enough.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1118) Hive merge map files should have different bytes/mapper setting

Reply via email to