Once I run a map-reduce job I get output in the form of
part-r-00000 part-r-00001 ...

In many cases the output is significantly smaller than the original input -
take the classic word count

In most cases I want to combine the output into a single file that may well
not live on HDFS but on a more accessible file system

Are there standard libraries or approaches for consolidating reducer
output.

A second Map-Reduce job taking the output directory as an input is an OK
start but as output there needs to be a single reducer that
writes a real file and not reduce output -

Are there standard libraries or approaches to this?????

-- 
Steven M. Lewis PhD
4221 105th Ave Ne
Kirkland, WA 98033
206-384-1340 (cell)
Institute for Systems Biology
Seattle WA

Reply via email to