I've written a MR job with multiple outputs. The "normal" output goes to files named part-XXXXX and my secondary output records go to files I've chosen to name "ExceptionDocuments" (and therefore are named "ExceptionDocuments-m-XXXXX").
I'd like to pull merged copies of these files to my local filesystem (two separate merged files, one containing the "normal" output and one containing the ExceptionDocuments output). But, since hadoop lands both of these outputs to files residing in the same directory, when I issue "hadoop dfs -getmerge", what I get is a file that contains both outputs. To get around this, I have to move files around on HDFS so that my different outputs are in different directories. Is this the best/only way to deal with this? It would be better if hadoop offered the option of writing different outputs to different output directories, or if getmerge offered the ability to specify a file prefix for files desired to be merged. Thanks!