[
https://issues.apache.org/jira/browse/PIG-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086546#comment-13086546
]
Alan Gates commented on PIG-2224:
---------------------------------
We should not allow "cogroup A by x, B all" in any situations. The result is
not well defined. "group B all" means collect all records of B into one bag.
"group A by x" means collect records for each unique value in x into a bag.
Only in the case where x has the same value for all of A would "cogroup A by x,
B all" make any sense. Since Pig cannot know a priori whether this holds it
needs to fail this query in all cases.
So I agree we should fix this logic in the parser. But we should fix it so it
fails whether the ALL is first or second.
> Incorrect arity test in AstValidator.g with ALL and column-based grouping
> condition together in cogroup
> -------------------------------------------------------------------------------------------------------
>
> Key: PIG-2224
> URL: https://issues.apache.org/jira/browse/PIG-2224
> Project: Pig
> Issue Type: Bug
> Components: grunt
> Affects Versions: 0.9.0
> Environment: Suse Linux 9/MacOS(10.7)
> Reporter: JArod Wen
> Labels: ALL, arity, astvalidator, cogroup, grunt
> Fix For: 0.9.1
>
> Attachments: pig-2224.diff.patch, pig-2224.diff.patch
>
>
> When ALL and column-based grouping condition are used together in COGROUP,
> the arity test in AstValidator.g (line 242) incorrectly sets the arity and
> causes exception. For example, assume we have the follow two relations:
> a = load 'A' as (col_a_0, col_a_1);
> b = load 'B' as (col_b_0, col_b_1);
> The following statement will throw an invalidation error:
> c = cogroup a by col_a_0, b ALL;
> It is because when processing a:col_a_0, the arity is set to 1; then when
> processing b:ALL, due to the null value in join_group_by_clause will emit
> arity 0 for the second relation, and arity test fails.
> Reversing the two relations will be a work-around for this error:
> c = cogroup b ALL, a by col_a_0;
> However it is a lucky shot: when processing b:ALL, since join_group_by_clause
> is null, arity is still 0; then when processing a:col_a_0, arity will be
> initialized so no arity test is done in this case (so it passes).
> The main reason is the omission of the consideration on ALL keyword during
> the arity test. I attached a patch to fix this, by separating the arity test
> for both join_group_by_clause and ALL. The patch is tested locally and it
> works.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira