Re: how to use mapred.min.split.size option ?

Mapred Learn Wed, 25 May 2011 09:59:27 -0700

I gave mapred.min.size=1000000000L i.e. 1 GB and each input file is 233 MB
and block size = 64 MB.
With all these values, i thought my split size would work and 4 input files
would be combined to get 1 GB input split but somehow this does not happen
and I get 10 mappers , each corresponding to 233 MB file.


On Wed, May 25, 2011 at 7:59 AM, Mapred Learn <mapred.le...@gmail.com>wrote:

>  Thanks Juwei !
> I will go through this..
>
> Sent from my iPhone
>
> On May 25, 2011, at 7:51 AM, Juwei Shi <shiju...@gmail.com> wrote:
>
> The following are suitable for hadoop 0.20.2.
>
> 2011/5/25 Juwei Shi <shiju...@gmail.com>
>
>> The input split size is detemined by map.min.split.size, dfs.block.size
>> and mapred.map.tasks.
>>
>> goalSize = totalSize / mapred.map.tasks
>> minSize = max {mapred.min.split.size, minSplitSize}
>> splitSize= max (minSize, min(goalSize, dfs.block.size))
>>
>> minSplitSize is determined by each InputFormat such as
>> SequenceFileInputFormat.
>>
>> You may want to refer to FileInputFormat.java for more details.
>>
>>
>> 2011/5/25 Mapred Learn <mapred.le...@gmail.com>
>>
>>> Resending ====>
>>>
>>>
>>> > Hi,
>>> > I have few input splits that are few MB in size.
>>> > I want to submit 1 GB of input to every mapper. Does anyone know how
>>> can I do it ?
>>>  > Currently each mapper gets one input split that results in many small
>>> map-output files.
>>> >
>>> > I tried setting -Dmapred.map.min.split.size=<number> , but still it
>>> does not take effect.
>>> >
>>> > Thanks,
>>> > -JJ
>>>
>>
>>
>>
>> --
>> - Juwei Shi
>>
>
>
>
> --
> - Juwei Shi (史巨伟)
>
>

Re: how to use mapred.min.split.size option ?

Reply via email to