[jira] [Commented] (MRUNIT-13) Add support for MultipleOutputs

Jon Grasmeder (Commented) (JIRA) Mon, 12 Mar 2012 10:18:01 -0700

    [ 
https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227701#comment-13227701
 ]


Jon Grasmeder commented on MRUNIT-13:
-------------------------------------

Jim - thanks for looking into this!  

If it helps, here is the use case that I am working on: the reducer "bins" 
input records into outputFileA, outputFileB or both.  MultipleOutputs works 
great for this, but I couldn't test using MRUnit.  You probably already know 
this, but the first issue is a ClassCastException in setup() when the 
MockOutputCommitter is being cast as a FileOutputCommitter.  The second issue 
is a NullPointerException in reduce() when trying to perform the write(Text, 
Text, String) using the MultipleOutputs instance.

As for output, I was planning to open a DataInputStream to read the result 
files written by MultipleOutputs.  As you mentioned, it would be easier for the 
user if you can return Strings.   

The challenge is that each call to reduce() could 'write' multiple records to 
several 'files'.  (In my case, I only write a single record each each file but 
one can envision scenarios that require multiple writes per reduce() call.)  
One solution may be to store (in JobConf) a pointer to a HashMap<String> 
<List>, where the String is the baseOutputPath (modified if needed by the 
namedOutput parameter) and List is the set of key/value Pair emitted by 
write(). 

- Jon Grasmeder
                
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MRUNIT-13) Add support for MultipleOutputs

Reply via email to