[ https://issues.apache.org/jira/browse/PIG-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
JArod Wen updated PIG-2224: --------------------------- Description: When ALL and column-based grouping condition are used together in COGROUP, the arity test in AstValidator.g (line 242) incorrectly sets the arity and causes exception. For example, assume we have the follow two relations: a = load 'A' as (col_a_0, col_a_1); b = load 'B' as (col_b_0, col_b_1); The following statement will throw an invalidation error: c = cogroup a by col_a_0, b ALL; It is because when processing a:col_a_0, the arity is set to 1; then when processing b:ALL, due to the null value in join_group_by_clause will emit arity 0 for the second relation, and arity test fails. Reversing the two relations will be a work-around for this error: c = cogroup b ALL, a by col_a_0; However it is a lucky shot: when processing b:ALL, since join_group_by_clause is null, arity is still 0; then when processing a:col_a_0, arity will be initialized so no arity test is done in this case (so it passes). The main reason is the omission of the consideration on ALL keyword during the arity test. I attached a patch to fix this, by separating the arity test for both join_group_by_clause and ALL. The patch is tested locally and it works. {code} Index: src/org/apache/pig/parser/AstValidator.g =================================================================== --- src/org/apache/pig/parser/AstValidator.g (revision 1158481) +++ src/org/apache/pig/parser/AstValidator.g (working copy) @@ -242,7 +242,7 @@ ; group_item - : rel ( join_group_by_clause | ALL | ANY ) ( INNER | OUTER )? + : rel ( join_group_by_clause { if( $group_clause::arity == 0 ) { // For the first input @@ -252,6 +252,19 @@ "The arity of the group by columns do not match." ); } } + | ALL + { + if($group_clause::arity == 0){ + $group_clause::arity = 1; + } else { + if($group_clause::arity != 1){ + throw new ParserValidationException( input, new SourceLocation( (PigParserNode)$group_item.start ), + "The arity of the group by columns do not match." ); + } + } + } + | ANY ) ( INNER | OUTER )? + ; rel : alias { validateAliasRef( aliases, $alias.node, $alias.name ); } {code} was: When ALL and column-based grouping condition are used together in COGROUP, the arity test in AstValidator.g (line 242) incorrectly sets the arity and causes exception. For example, assume we have the follow two relations: a = load 'A' as (col_a_0, col_a_1); b = load 'B' as (col_b_0, col_b_1); The following statement will throw an invalidation error: c = cogroup a by col_a_0, b ALL; It is because when processing a:col_a_0, the arity is set to 1; then when processing b:ALL, due to the null value in join_group_by_clause will emit arity 0 for the second relation, and arity test fails. Reversing the two relations will be a work-around for this error: c = cogroup b ALL, a by col_a_0; However it is a lucky shot: when processing b:ALL, since join_group_by_clause is null, arity is still 0; then when processing a:col_a_0, arity will be initialized so no arity test is done in this case (so it passes). The main reason is the omission of the consideration on ALL keyword during the arity test. I attached a patch to fix this, by separating the arity test for both join_group_by_clause and ALL. The patch is tested locally and it works. -----------------patch starts--------------------- Index: src/org/apache/pig/parser/AstValidator.g =================================================================== --- src/org/apache/pig/parser/AstValidator.g (revision 1158481) +++ src/org/apache/pig/parser/AstValidator.g (working copy) @@ -242,7 +242,7 @@ ; group_item - : rel ( join_group_by_clause | ALL | ANY ) ( INNER | OUTER )? + : rel ( join_group_by_clause { if( $group_clause::arity == 0 ) { // For the first input @@ -252,6 +252,19 @@ "The arity of the group by columns do not match." ); } } + | ALL + { + if($group_clause::arity == 0){ + $group_clause::arity = 1; + } else { + if($group_clause::arity != 1){ + throw new ParserValidationException( input, new SourceLocation( (PigParserNode)$group_item.start ), + "The arity of the group by columns do not match." ); + } + } + } + | ANY ) ( INNER | OUTER )? + ; rel : alias { validateAliasRef( aliases, $alias.node, $alias.name ); } -----------------patch ends--------------------- > Incorrect arity test in AstValidator.g with ALL and column-based grouping > condition together in cogroup > ------------------------------------------------------------------------------------------------------- > > Key: PIG-2224 > URL: https://issues.apache.org/jira/browse/PIG-2224 > Project: Pig > Issue Type: Bug > Components: grunt > Affects Versions: 0.9.0 > Environment: Suse Linux 9/MacOS(10.7) > Reporter: JArod Wen > Labels: ALL, arity, astvalidator, cogroup, grunt > Fix For: 0.9.1 > > > When ALL and column-based grouping condition are used together in COGROUP, > the arity test in AstValidator.g (line 242) incorrectly sets the arity and > causes exception. For example, assume we have the follow two relations: > a = load 'A' as (col_a_0, col_a_1); > b = load 'B' as (col_b_0, col_b_1); > The following statement will throw an invalidation error: > c = cogroup a by col_a_0, b ALL; > It is because when processing a:col_a_0, the arity is set to 1; then when > processing b:ALL, due to the null value in join_group_by_clause will emit > arity 0 for the second relation, and arity test fails. > Reversing the two relations will be a work-around for this error: > c = cogroup b ALL, a by col_a_0; > However it is a lucky shot: when processing b:ALL, since join_group_by_clause > is null, arity is still 0; then when processing a:col_a_0, arity will be > initialized so no arity test is done in this case (so it passes). > The main reason is the omission of the consideration on ALL keyword during > the arity test. I attached a patch to fix this, by separating the arity test > for both join_group_by_clause and ALL. The patch is tested locally and it > works. > {code} > Index: src/org/apache/pig/parser/AstValidator.g > =================================================================== > --- src/org/apache/pig/parser/AstValidator.g (revision 1158481) > +++ src/org/apache/pig/parser/AstValidator.g (working copy) > @@ -242,7 +242,7 @@ > ; > > group_item > - : rel ( join_group_by_clause | ALL | ANY ) ( INNER | OUTER )? > + : rel ( join_group_by_clause > { > if( $group_clause::arity == 0 ) { > // For the first input > @@ -252,6 +252,19 @@ > "The arity of the group by columns do not match." ); > } > } > + | ALL > + { > + if($group_clause::arity == 0){ > + $group_clause::arity = 1; > + } else { > + if($group_clause::arity != 1){ > + throw new ParserValidationException( input, new > SourceLocation( (PigParserNode)$group_item.start ), > + "The arity of the group by columns do not match." ); > + } > + } > + } > + | ANY ) ( INNER | OUTER )? > + > ; > > rel : alias { validateAliasRef( aliases, $alias.node, $alias.name ); } > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira