Hadoop with many input/output files?

Zak, Richard [USA] Thu, 22 Jan 2009 06:35:17 -0800

I am seeing the MultiFileInputFormat and the MultipleOutputFormat
Input/Output formats for the Job configuration.  How can I properly use
them?  I had previously used the default Input and Output Format types,
which for my PDF concatenation project, merely reduced Hadoop to a
scheduler.
 
The idea is per directory, to concatenate all PDFs in said directory to
one PDF, and for this I'm using iText.
 
How can I use these Format types?  What would be in my input into the
mapper and what would my InputKeyValue and OutputKeyValue classes be?
Thank you!  I can't find documentation on these other than the Javadoc,
which doesn't help much.
 
Richard J. Zak

Hadoop with many input/output files?

Reply via email to