[ 
https://issues.apache.org/jira/browse/HIVE-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011302#comment-13011302
 ] 

Adam Kramer commented on HIVE-1199:
-----------------------------------

+1. This is also a bigger issue for automation of jobs that require tweaking 
the amount of resources. I have a job right now that needs about 10x the number 
of mappers to run smoothly, and I would like to pipeline it, but the data size 
is growing...so if I configure the split sizes, I need to do so based on 
today's size of the table. That should be handled by Hive.

Ideally, this would mean that the split.sizes are generated or recomputed 
dynamically. One variable, mapred.map.tasks.approx, could be set or 
unset...then Hive could do some quick math based on the size of the table and 
dynamically set its own mapred.max.split.size and min.split.size to get 
approximately the desired number of mappers. Doesn't have to be perfect in 
order to be useful!

> configure total number of mappers
> ---------------------------------
>
>                 Key: HIVE-1199
>                 URL: https://issues.apache.org/jira/browse/HIVE-1199
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>
> For users, it might be very difficult to control the number of mappers. There 
> are many parameters which confuses the users - 
> for CombineHiveInputFormat, a different set of parameters is required to 
> control the number of mappers.
> In general, users should have a way to specify the total number of mappers, 
> which should be obeyed. This will be very difficult
> to guarantee, since the query might be reading from a large number of 
> partitions, where a mapper can only span one partition.
> What if the number of mappers that the user wants is less than the total 
> number of partitions ?
> It would be a very hueristic to have - a simple usecase that Joy had is as 
> follows:
> A query needs to be run on one table, which has a lot of small files - it 
> will be easy for him to specify the total number of mappers
> rather than the various rac local/node local combinefileinputformat 
> parameters.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to