Actually, I just realized that numSplits can't be modified "definitely". Even 
if I write numSplits = 5, it's just a hint. 

Then how come MultiFileInputFormat claims to use MultiFileSplit to contain one 
file/split ?? or is that also just a hint?

Maha

On Dec 15, 2010, at 2:13 AM, maha wrote:

> Hi everyone,
> 
>  Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is 
> supposed to put each file from the input directory in a SEPARATE split. So 
> the number of Maps is equal to the number of input files. Yet, what I get is 
> that each split contains multiple paths of input files, hence # of maps is < 
> # of input files. Is it because "MultiFileInputFormat" is deprecated?
> 
>  In my implemented myMultiFileInputFormat I have only the following:
> 
> public RecordReader<LongWritable, Text> getRecordReader(InputSplit split, 
> JobConf job, Reporter reporter){
>               return (new myRecordReader((MultiFileSplit) split));
>       }
> 
> Yet, in myRecordReader, for example one split has the following;
> 
>  " /tmp/input/file1:0+300
>    /tmp/input/file2:0+199  "
> 
>  instead of each line in its own split.
> 
>    Why? Any clues?
> 
>          Thank you,
>              Maha

Reply via email to