[jira] [Comment Edited] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641576#comment-17641576 ] Ritika Maheshwari edited comment on SPARK-41236 at 11/30/22 10:51 PM: -- Hello Zhong , Try to rename the field as a different name than the original column name. select collect_set(age) as ageCol from db_table.table1 group by name having size(ageCol) > 1 Although your result will be zero rows. Because you have only one age for each of your names "a","b" and "c" Therefore size(ageCol) >1 will fail. But if you have your table as age name 1 "a" 2 "a" 3 "a" 4 "b" 5 "c" 6 "c" Then you will get a result [1,2,3] [5,6] was (Author: ritikam): Hello Zhong , Try to rename the field as a different name than the original column name. select collect_set(age) as ageCol from db_table.table1 group by name having size(ageCol) > 1 > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at >
[jira] [Comment Edited] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638626#comment-17638626 ] huldar chen edited comment on SPARK-41236 at 11/25/22 5:53 PM: --- If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31519. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( h4. [jingxiong zhong|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhongjingxiong] was (Author: huldar): If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31663. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( h4. [jingxiong zhong|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhongjingxiong] > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at >
[jira] [Comment Edited] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638626#comment-17638626 ] huldar chen edited comment on SPARK-41236 at 11/25/22 10:49 AM: If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31663. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( h4. [jingxiong zhong|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhongjingxiong] was (Author: huldar): If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31663. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at >
[jira] [Comment Edited] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638532#comment-17638532 ] jingxiong zhong edited comment on SPARK-41236 at 11/25/22 7:14 AM: --- I think you can raise a pr for it [~huldar] was (Author: JIRAUSER281124): I think you can a pr for it [~huldar] > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at >
[jira] [Comment Edited] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638493#comment-17638493 ] huldar chen edited comment on SPARK-41236 at 11/25/22 4:45 AM: --- If the aggregated column to be queried is renamed to the name of an existing column in the table, when parsing the aggregated column, it will be bound to the original column ID in the table, resulting in an exception. Should be parsed from aggregateExpressions first. like cases case 1: {code:java} select collect_set(testdata2.a) as a from testdata2 group by b having size(a) > 0 {code} analyzedPlan is: {code:java} 'Filter (size(tempresolvedcolumn(a#3, a), true) > 0) +- Aggregate [b#4], [collect_set(a#3, 0, 0) AS a#44] +- SubqueryAlias testdata2 +- View (`testData2`, [a#3,b#4]) +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#3, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#4] +- ExternalRDD [obj#2] {code} tempresolvedcolumn(a#3, a) should bind a#44. case 2: {code:java} select collect_set(testdata2.a) as b from testdata2 group by b having size(b) > 0 {code} analyzedPlan is: {code:java} 'Project [b#44] +- 'Filter (size(tempresolvedcolumn(b#4, b), true) > 0) +- Aggregate [b#4], [collect_set(a#3, 0, 0) AS b#44, b#4] +- SubqueryAlias testdata2 +- View (`testData2`, [a#3,b#4]) +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#3, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#4] +- ExternalRDD [obj#2] {code} tempresolvedcolumn(b#4, b) should bind b#44. The buggy code is in org.apache.spark.sql.catalyst.analysis.Analyzer.ResolveAggregateFunctions#resolveExprsWithAggregate# resolveCol was (Author: huldar): If the aggregated column to be queried is renamed to the name of an existing column in the table, when parsing the aggregated column, it will be bound to the original column ID in the table, resulting in an exception. Should be parsed from aggregateExpressions first. like cases case 1: {code:java} select collect_set(testdata2.a) as a from testdata2 group by b having size(a) > 0 {code} analyzedPlan is: {code:java} 'Filter (size(tempresolvedcolumn(a#3, a), true) > 0) +- Aggregate [b#4], [collect_set(a#3, 0, 0) AS a#44] +- SubqueryAlias testdata2 +- View (`testData2`, [a#3,b#4]) +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#3, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#4] +- ExternalRDD [obj#2] {code} tempresolvedcolumn(a#3, a) should bind a#44. case 2: {code:java} select collect_set(testdata2.a) as b from testdata2 group by b having size(b) > 0 {code} analyzedPlan is: {code:java} 'Project [b#44] +- 'Filter (size(tempresolvedcolumn(b#4, b), true) > 0) +- Aggregate [b#4], [collect_set(a#3, 0, 0) AS b#44, b#4] +- SubqueryAlias testdata2 +- View (`testData2`, [a#3,b#4]) +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#3, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#4] +- ExternalRDD [obj#2] {code} tempresolvedcolumn(b#4, b) should bind b#44. The buggy code is inorg.apache.spark.sql.catalyst.analysis.Analyzer.ResolveAggregateFunctions#resolveExprsWithAggregate# resolveCol > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1,