NLineInputFormat is ideal for this purpose. Each split will be N lines of input (where N is configurable), so each mapper can retrieve N files for insertion into HDFS. You can set the number of redcers to zero.
Tom On Tue, Feb 3, 2009 at 4:23 AM, jason hadoop <jason.had...@gmail.com> wrote: > If you have a large number of ftp urls spread across many sites, simply set > that file to be your hadoop job input, and force the input split to be a > size that gives you good distribution across your cluster. > > > On Mon, Feb 2, 2009 at 3:23 PM, Steve Morin <steve.mo...@gmail.com> wrote: > >> Does any one have a good suggestion on how to submit a hadoop job that >> will split the ftp retrieval of a number of files for insertion into >> hdfs? I have been searching google for suggestions on this matter. >> Steve >> >