It's addIputPath, then adds a Path object to the list of inputs.
So doing the filtering first then adding the paths (loop).

But I need an InputFormat anyway because I have my own RecordReader.
At the end I have to put the same logic in a different place. From my
point of view it is better for me to put the filtering logic there,
because my InputFormat is also a RecordReader Factory, and it will
instantiate a different RecordReader, base on the filter.

cheers

On 14/04/2008, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>  You don't really need a custom input format, I don't think.
>
>  You should be able to just add multiple inputs, one at a time after
>  filtering them outside hadoop.
>
>
>  On 4/14/08 10:59 AM, "Alfonso Olias Sanz" <[EMAIL PROTECTED]>
>  wrote:
>
>
>  > ok thanks for the info :)
>  >
>  > On 11/04/2008, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>  >>
>  >>  On Apr 11, 2008, at 10:21 AM, Amar Kamat wrote:
>  >>
>  >>
>  >>> A simpler way is to use
>  >> FileInputFormat.setInputPathFilter(JobConf, PathFilter).
>  >> Look at org.apache.hadoop.fs.PathFilter for details on PathFilter 
> interface.
>  >>>
>  >>
>  >>  +1, although FileInputFormat.setInputPathFilter is
>  >> available only in hadoop-0.17 and above... like Amar mentioned previously,
>  >> you'd have to have a custom InputFormat prior to hadoop-0.17.
>  >>
>  >>  Arun
>  >>
>  >>
>  >>
>  >>> Amar
>  >>> Alfonso Olias Sanz wrote:
>  >>>
>  >>>> Hi
>  >>>> I have a general purpose input folder that it is used as input in a
>  >>>> Map/Reduce task. That folder contains files grouped by names.
>  >>>>
>  >>>> I want to configure the JobConf in a way I can filter the files that
>  >>>> have to be processed from that pass (ie  files which name starts by
>  >>>> Elementary, or Source etc)  So the task function will only process
>  >>>> those files.  So if the folder contains 1000 files and only 50 start
>  >>>> by Elementary. Only those 50 will be processed by my task.
>  >>>>
>  >>>> I could set up different input folders and those containing the
>  >>>> different files, but I cannot do that.
>  >>>>
>  >>>>
>  >>>> Any idea?
>  >>>>
>  >>>> thanks
>  >>>>
>  >>>>
>  >>>
>  >>>
>  >>
>  >>
>
>

Reply via email to