Not with HDFS, since only one process may write to a single file (and its
not visible until the file is closed). In fact, its worse than that ... the
same process that's writing that file cannot see it or read it until after
its done.

 If you have multiple reducers, you are simply out of luck and will have to
run a separate "job" to copy the data out.


On Sat, Oct 23, 2010 at 3:08 PM, Steve Lewis <lordjoe2...@gmail.com> wrote:

> Once I run a map-reduce job I get output in the form of
> part-r-00000 part-r-00001 ...
>
> In many cases the output is significantly smaller than the original input -
> take the classic word count
>
> In most cases I want to combine the output into a single file that may well
> not live on HDFS but on a more accessible file system
>
> Are there standard libraries or approaches for consolidating reducer
> output.
>
> A second Map-Reduce job taking the output directory as an input is an OK
> start but as output there needs to be a single reducer that
> writes a real file and not reduce output -
>
> Are there standard libraries or approaches to this?????
>
> --
> Steven M. Lewis PhD
> 4221 105th Ave Ne
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Institute for Systems Biology
> Seattle WA
>

Reply via email to