[ https://issues.apache.org/jira/browse/PIG-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-750: ------------------------------ Release Note: With changes in the patch, queries which have algebraic functions within expressions also will use combiner. This is as long as the bags from group-by are only input for algebraic expressions. If bag is projected or a non algebraic expression/udf has bag as input, combiner will not be used. Combiner will be used in case of following foreach statements (that follow group) - describe B ; B: {group: int, A: {c1 : int, c2 : int, c3 : int}} 1) foreach B generate SUM(A.c2) * AVG(A.c3), ... 2) foreach B generate 1 / SUM(A.c2) 3) foreach B generate EXP(AVG(A.c2)) 4) foreach B generate group + SUM(A.c2) Following statements will not use combiner - 1) foreach B generate A.c2, ... 2) foreach B generate EXP(c2) , SUM(c2) ... - Where EXP is non algebraic function In case of nested foreach statement, if it has limit, order, or filter , combiner does not get used (as before). This patch also fixes PIG-490, foreach statements that access group elements also use combiner for example - 1) foreach B generate group.$0, group.$1, COUNT(A); 1) foreach B generate group.c1, group.c2, COUNT(A); > Use combiner when algebraic UDFs are used in expressions > -------------------------------------------------------- > > Key: PIG-750 > URL: https://issues.apache.org/jira/browse/PIG-750 > Project: Pig > Issue Type: Improvement > Reporter: Amir Youssefi > Assignee: Thejas M Nair > Priority: Minor > Attachments: PIG-750.1.patch > > > Currently Pig uses combiner when all a,b, c,... are algebraic (e.g. SUM, AVG > etc.) in foreach: > foreach X generate a,b,c,... > It's a performance improvement if it uses combiner when a mix of algebraic > and non-algebraic functions are used as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.