[
https://issues.apache.org/jira/browse/PIG-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169583#comment-14169583
]
Keren Ouaknine commented on PIG-4004:
-------------------------------------
Hi Daniel,
Sure. In the new API, there is a Context object which contains the path of the
input so you can use one MR job to read both inputs and a reducer to join.
Whereas with the old API, the benchmark has one MR job for reading page_views,
another MR for reading power_users, and no reducer (to either of them). These
are then added as a depending job to another MR job which has an identity
mapper and a reducer with two inputs (see L13.java as an example).
Thanks,
Keren
> Upgrade the Pigmix queries from the (old) mapred API to mapreduce
> -----------------------------------------------------------------
>
> Key: PIG-4004
> URL: https://issues.apache.org/jira/browse/PIG-4004
> Project: Pig
> Issue Type: Bug
> Components: tools
> Affects Versions: 0.12.1
> Reporter: Keren Ouaknine
> Fix For: 0.15.0
>
> Attachments: PIG-4004.patch
>
>
> Until now, the Pigmix queries were written using the old mapred API.
> As a result, some queries were expressed with three concatenated MR jobs
> instead of one. I rewrote all the queries to match the newer mapreduce API
> and optimized them on the fly.
> This is a continuity work to PIG-3915.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)