Thanks All! On 11 Jul 2012 19:07, "Bejoy KS" <[email protected]> wrote:
> ** > Hi Manoj > > Block size is in hdfs storage level where as split size is the amount of > data processed by each mapper while running a map reduce job(One split is > the data processed by one mapper). One or more hdfs blocks can contribute a > split. Splits are determined by the InputFormat as well as the min and max > split size properties. > > As Arun mentioned use CombineFileInputFormat and adjust the min and max > split size properties to control/limit the number of mappers. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ------------------------------ > *From: * Manoj Babu <[email protected]> > *Date: *Wed, 11 Jul 2012 18:17:41 +0530 > *To: *<[email protected]> > *ReplyTo: * [email protected] > *Subject: *Re: Mapper basic question > > Hi Tariq \Arun, > > The no of blocks(splits) = *total no of file size/hdfs block size * > replicate value* > The no of splits is again nothing but the blocks here. > > Other than increasing the block size(input splits) is it possible to limit > that no of mappers? > > > Cheers! > Manoj. > > > > On Wed, Jul 11, 2012 at 6:06 PM, Arun C Murthy <[email protected]>wrote: > >> Take a look at CombineFileInputFormat - this will create 'meta splits' >> which include multiple small spilts, thus reducing #maps which are run. >> >> Arun >> >> On Jul 11, 2012, at 5:29 AM, Manoj Babu wrote: >> >> Hi, >> >> The no of mappers is depends on the no of blocks. Is it possible to limit >> the no of mappers size without increasing the HDFS block size? >> >> Thanks in advance. >> >> Cheers! >> Manoj. >> >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> >> >
