[
https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227701#comment-13227701
]
Jon Grasmeder commented on MRUNIT-13:
-------------------------------------
Jim - thanks for looking into this!
If it helps, here is the use case that I am working on: the reducer "bins"
input records into outputFileA, outputFileB or both. MultipleOutputs works
great for this, but I couldn't test using MRUnit. You probably already know
this, but the first issue is a ClassCastException in setup() when the
MockOutputCommitter is being cast as a FileOutputCommitter. The second issue
is a NullPointerException in reduce() when trying to perform the write(Text,
Text, String) using the MultipleOutputs instance.
As for output, I was planning to open a DataInputStream to read the result
files written by MultipleOutputs. As you mentioned, it would be easier for the
user if you can return Strings.
The challenge is that each call to reduce() could 'write' multiple records to
several 'files'. (In my case, I only write a single record each each file but
one can envision scenarios that require multiple writes per reduce() call.)
One solution may be to store (in JobConf) a pointer to a HashMap<String>
<List>, where the String is the baseOutputPath (modified if needed by the
namedOutput parameter) and List is the set of key/value Pair emitted by
write().
- Jon Grasmeder
> Add support for MultipleOutputs
> -------------------------------
>
> Key: MRUNIT-13
> URL: https://issues.apache.org/jira/browse/MRUNIT-13
> Project: MRUnit
> Issue Type: Sub-task
> Affects Versions: 0.5.0
> Reporter: E. Sammer
> Assignee: Jim Donofrio
>
> Add support to mrunit for Hadoop's MultipleOutputs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira