Re: Deprecated ... damaged?
Hi Allen and thanks for responding .. You're answer actually gave me another clue, I set numSplits = numFiles*100; in myInputFormat and it worked :D ... Do you think there are side effects for doing that? Thank you, Maha On Dec 15, 2010, at 12:16 PM, Allen Wittenauer wrote: > > On Dec 15, 2010, at 2:13 AM, maha wrote: > >> Hi everyone, >> >> Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is >> supposed to put each file from the input directory in a SEPARATE split. > > > Is there some reason you don't just use normal InputFormat with an > extremely high min.split.size? >
Re: Deprecated ... damaged?
On Dec 15, 2010, at 2:13 AM, maha wrote: > Hi everyone, > > Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is > supposed to put each file from the input directory in a SEPARATE split. Is there some reason you don't just use normal InputFormat with an extremely high min.split.size?
Re: Deprecated ... damaged?
Actually, I just realized that numSplits can't be modified "definitely". Even if I write numSplits = 5, it's just a hint. Then how come MultiFileInputFormat claims to use MultiFileSplit to contain one file/split ?? or is that also just a hint? Maha On Dec 15, 2010, at 2:13 AM, maha wrote: > Hi everyone, > > Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is > supposed to put each file from the input directory in a SEPARATE split. So > the number of Maps is equal to the number of input files. Yet, what I get is > that each split contains multiple paths of input files, hence # of maps is < > # of input files. Is it because "MultiFileInputFormat" is deprecated? > > In my implemented myMultiFileInputFormat I have only the following: > > public RecordReader getRecordReader(InputSplit split, > JobConf job, Reporter reporter){ > return (new myRecordReader((MultiFileSplit) split)); > } > > Yet, in myRecordReader, for example one split has the following; > > " /tmp/input/file1:0+300 >/tmp/input/file2:0+199 " > > instead of each line in its own split. > >Why? Any clues? > > Thank you, > Maha
Deprecated ... damaged?
Hi everyone, Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is supposed to put each file from the input directory in a SEPARATE split. So the number of Maps is equal to the number of input files. Yet, what I get is that each split contains multiple paths of input files, hence # of maps is < # of input files. Is it because "MultiFileInputFormat" is deprecated? In my implemented myMultiFileInputFormat I have only the following: public RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter){ return (new myRecordReader((MultiFileSplit) split)); } Yet, in myRecordReader, for example one split has the following; " /tmp/input/file1:0+300 /tmp/input/file2:0+199 " instead of each line in its own split. Why? Any clues? Thank you, Maha