I have a very similar question: how do I recursively list all files in a given directory, to the end that all files are processed by MapReduce? If I just copy them to the output, let's say, is there any problem dropping them all in the same output directory in HDFS? To use a bad example, Windows chokes on many files in one directory. Thank you, Mark
On Thu, Jan 22, 2009 at 8:28 AM, Zak, Richard [USA] <zak_rich...@bah.com>wrote: > I am seeing the MultiFileInputFormat and the MultipleOutputFormat > Input/Output formats for the Job configuration. How can I properly use > them? I had previously used the default Input and Output Format types, > which for my PDF concatenation project, merely reduced Hadoop to a > scheduler. > > The idea is per directory, to concatenate all PDFs in said directory to > one PDF, and for this I'm using iText. > > How can I use these Format types? What would be in my input into the > mapper and what would my InputKeyValue and OutputKeyValue classes be? > Thank you! I can't find documentation on these other than the Javadoc, > which doesn't help much. > > Richard J. Zak >