Actually, I just realized that numSplits can't be modified "definitely". Even if I write numSplits = 5, it's just a hint.
Then how come MultiFileInputFormat claims to use MultiFileSplit to contain one file/split ?? or is that also just a hint? Maha On Dec 15, 2010, at 2:13 AM, maha wrote: > Hi everyone, > > Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is > supposed to put each file from the input directory in a SEPARATE split. So > the number of Maps is equal to the number of input files. Yet, what I get is > that each split contains multiple paths of input files, hence # of maps is < > # of input files. Is it because "MultiFileInputFormat" is deprecated? > > In my implemented myMultiFileInputFormat I have only the following: > > public RecordReader<LongWritable, Text> getRecordReader(InputSplit split, > JobConf job, Reporter reporter){ > return (new myRecordReader((MultiFileSplit) split)); > } > > Yet, in myRecordReader, for example one split has the following; > > " /tmp/input/file1:0+300 > /tmp/input/file2:0+199 " > > instead of each line in its own split. > > Why? Any clues? > > Thank you, > Maha