[jira] Commented: (MAPREDUCE-372) Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.

Owen O'Malley (JIRA) Mon, 20 Jul 2009 23:15:39 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733494#action_12733494
 ]


Owen O'Malley commented on MAPREDUCE-372:
-----------------------------------------

We really should make the context objects into interfaces. 

I agree that the new API makes this harder, because the run method means you 
have to allow a pull model instead of a push. The easiest way to do it would be 
to have a blocking queue and each stage in the pipeline is a separate thread. 
So the first mapper would read from the RecordReader (via the "real" context) 
and write outputs into a BlockingQueue. The next step would pull from that 
BlockingQueue and write to the next BlockingQueue and so on until the last 
wrote to the "real" context. Thus each thread is in the "run" method of each 
pipeline.

Issues include:
  1. Needing a thread per a step.
  2. Need to clone the keys and values between steps.
  3. Need to figure out the size of the queues. Probably 1 to start with...



> Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-372
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-372
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-372) Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.

Reply via email to