[ 
https://issues.apache.org/jira/browse/PIG-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189478#comment-13189478
 ] 

Thejas M Nair commented on PIG-2423:
------------------------------------

For case 1, in some cases, the query 1 that uses join might perform better. 
This is because co-group does not currently use combiner. 
In query 1, combiner would run in the map task to reduce its output size. If 
there are only a few unique values for the keys, the map output would be very 
small, and the IO between the map and reduce would go down drastically . The 
output size of the first MR job would also be relatively very small in such a 
case. The savings on IO is likely to be more than cost of an extra MR job in 
such case.

So for case 1, I think it makes sense to add a clause - "Note that the use of 
co-group stops combiner from getting used in current version of pig. So if the 
aggregation in query 1 will use combiner (depends on the udf interface) and the 
output size of aggregation is going to be relatively very small, the benefits 
of reduced IO because of combiner use is likely to outweigh the cost of 
additional MR job".


                
> document use case where co-group is better choice than join 
> ------------------------------------------------------------
>
>                 Key: PIG-2423
>                 URL: https://issues.apache.org/jira/browse/PIG-2423
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Thejas M Nair
>             Fix For: 0.10
>
>
> Optimization rules 2 and 3 suggested in 
> https://issues.apache.org/jira/secure/attachment/12506841/pig_tpch.ppt 
> (PIG-2397) recommend the use of co-group instead of  join in certain cases. 
> These should be documented in pig performance page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to