[
https://issues.apache.org/jira/browse/PIG-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614041#action_12614041
]
Olga Natkovich commented on PIG-318:
------------------------------------
Another idea that we had is to basically implement our chains as Map->Map->Map
jobs and get rid of reducers alltogether.
> Pipeline optimization for multiple reduces
> ------------------------------------------
>
> Key: PIG-318
> URL: https://issues.apache.org/jira/browse/PIG-318
> Project: Pig
> Issue Type: Improvement
> Reporter: Olga Natkovich
>
> Any time we chain together M-R jobs, we doing it because we need separate
> reducer like with group by followed by order by. We don't really need the
> maps. The ideal graph for us would be:
>
> M->R->SortShuffle->R->SortShuffle->R ...
>
> This would allow us to save read from DFs and write to the local disk which
> could be fairly significant.
> Aparently this similar discussion took place on hadoop mailing list several
> times and this request was turned down. Main reason is that in their opinion
> cost of implementing something like that would outweigh the benefit.
> To make a persuasive case, we need to measure the overhead of the empty maps
> for "typical" queries.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.