[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

Amareshwari Sriramadasu (JIRA) Fri, 07 Aug 2009 04:20:42 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740509#action_12740509
 ]


Amareshwari Sriramadasu commented on MAPREDUCE-370:
---------------------------------------------------

bq. To achieve this, I think we could port MultipleOutputs, and change the 
semantics of getCollector() in the multi name case, so that the multi name is 
the full name of the name of the output file. This method is typically invoked 
in the reduce() method, where the key and value are available, and can be used 
to form the name.
Tom, are you saying that we should not have a protected method to 
generateOutputName(), which could be overridden to give the functionality. If 
so, we should have a way to find out whether it is namedOutput (i meant 
multiNamedOutputs) or an arbitrary name, to know which output format should be 
used for writing.
We should have something like :
{code}
  public <K,V> void write(String namedOutput, String outputPath, K key, V value)
          throws IOException, InterruptedException;
  public <K,V> void write(String outputPath, K key, V value)
          throws IOException, InterruptedException;
{code}

bq. Applications that want to add a unique suffix can call 
FileOutputFormat#getUniqueFile() themselves.
This should be done by the framework to support counters as  explained earlier.

> Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-370
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-370.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

Reply via email to