[ 
https://issues.apache.org/jira/browse/HIVE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806408#action_12806408
 ] 

Namit Jain commented on HIVE-1118:
----------------------------------

Wont it lead to a lot of small files - 1MB, assuming the reducer maintains the 
same size for the data

> Hive merge map files should have different bytes/mapper setting
> ---------------------------------------------------------------
>
>                 Key: HIVE-1118
>                 URL: https://issues.apache.org/jira/browse/HIVE-1118
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>
> Currently, by default, we get one reducer for each 1GB of input data.
> It's also true for the conditional merge job that will run if the average 
> file size is smaller than a threshold.
> This actually makes those job very slow, because each reducer needs to 
> consume 1GB of data.
> Alternatively, we can just use that threshold to determine the number of 
> reducers per job (or introduce a new parameter).
> Let's say the threshold is 1MB, then we only start the the merge job if the 
> average file size is less than 1MB, and the eventual result file size will be 
> around 1MB (or another small number).
> This will remove the extreme cases where we have thousands of empty files, 
> but still make normal jobs fast enough.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to