I gave mapred.min.size=1000000000L i.e. 1 GB and each input file is 233 MB and block size = 64 MB. With all these values, i thought my split size would work and 4 input files would be combined to get 1 GB input split but somehow this does not happen and I get 10 mappers , each corresponding to 233 MB file.
On Wed, May 25, 2011 at 7:59 AM, Mapred Learn <mapred.le...@gmail.com>wrote: > Thanks Juwei ! > I will go through this.. > > Sent from my iPhone > > On May 25, 2011, at 7:51 AM, Juwei Shi <shiju...@gmail.com> wrote: > > The following are suitable for hadoop 0.20.2. > > 2011/5/25 Juwei Shi <shiju...@gmail.com> > >> The input split size is detemined by map.min.split.size, dfs.block.size >> and mapred.map.tasks. >> >> goalSize = totalSize / mapred.map.tasks >> minSize = max {mapred.min.split.size, minSplitSize} >> splitSize= max (minSize, min(goalSize, dfs.block.size)) >> >> minSplitSize is determined by each InputFormat such as >> SequenceFileInputFormat. >> >> You may want to refer to FileInputFormat.java for more details. >> >> >> 2011/5/25 Mapred Learn <mapred.le...@gmail.com> >> >>> Resending ====> >>> >>> >>> > Hi, >>> > I have few input splits that are few MB in size. >>> > I want to submit 1 GB of input to every mapper. Does anyone know how >>> can I do it ? >>> > Currently each mapper gets one input split that results in many small >>> map-output files. >>> > >>> > I tried setting -Dmapred.map.min.split.size=<number> , but still it >>> does not take effect. >>> > >>> > Thanks, >>> > -JJ >>> >> >> >> >> -- >> - Juwei Shi >> > > > > -- > - Juwei Shi (史巨伟) > >