[ 
https://issues.apache.org/jira/browse/PIG-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169583#comment-14169583
 ] 

Keren Ouaknine commented on PIG-4004:
-------------------------------------

Hi Daniel,

Sure. In the new API, there is a Context object which contains the path of the 
input so you can use one MR job to read both inputs and a reducer to join.
Whereas with the old API, the benchmark has one MR job for reading page_views, 
another MR for reading power_users, and no reducer (to either of them). These 
are then added as a depending job to another MR job which has an identity 
mapper and a reducer with two inputs (see L13.java as an example).

Thanks,
Keren

   

> Upgrade the Pigmix queries from the (old) mapred API to mapreduce
> -----------------------------------------------------------------
>
>                 Key: PIG-4004
>                 URL: https://issues.apache.org/jira/browse/PIG-4004
>             Project: Pig
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 0.12.1
>            Reporter: Keren Ouaknine
>             Fix For: 0.15.0
>
>         Attachments: PIG-4004.patch
>
>
> Until now, the Pigmix queries were written using the old mapred API. 
> As a result, some queries were expressed with three concatenated MR jobs 
> instead of one. I rewrote all the queries to match the newer mapreduce API 
> and optimized them on the fly. 
> This is a continuity work to PIG-3915.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to