[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642332#comment-17642332 ] Ritika Maheshwari commented on SPARK-41236: --- Running the following query against Spark 3.3.0 code that was downloaded. The error message is improved. spark-sql> select collect_set(age) as age > from test2.ageGroups > GROUP BY name > having size(age) >1; Error in query: cannot resolve 'size(age)' due to data type mismatch: argument 1 requires (array or map) type, however, 'spark_catalog.test2.agegroups.age' is of int type.; line 4 pos 7; 'Filter (size('age, true) > 1) +- Aggregate [name#29], [collect_set(age#30, 0, 0) AS age#27] +- SubqueryAlias spark_catalog.test2.agegroups +- HiveTableRelation [`test2`.`agegroups`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [eid#28, name#29, age#30], Partition Cols: []] But this is confusing if it recognizes age as int the following query should not have failed. It fails complaining that age is an array as it it getting bound to the renamed column. spark-sql> select collect_set(age) as age > from test2.ageGroups > GROUP BY name > Having age >1; Error in query: cannot resolve '(age > 1)' due to data type mismatch: differing types in '(age > 1)' (array and int).; line 4 pos 7; 'Filter (age#62 > 1) +- Aggregate [name#64], [collect_set(age#65, 0, 0) AS age#62] +- SubqueryAlias spark_catalog.test2.agegroups +- HiveTableRelation [`test2`.`agegroups`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [eid#63, name#64, age#65], Partition Cols: []] > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) >
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641576#comment-17641576 ] Ritika Maheshwari commented on SPARK-41236: --- Hello Zhong , Try to rename the field as a different name than the original column name. select collect_set(age) as ageCol from db_table.table1 group by name having size(ageCol) > 1 > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at >
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638626#comment-17638626 ] huldar chen commented on SPARK-41236: - If I fix it according to my idea, it will cause 2 closed jira issues to reappear. SPARK-31663 and SPARK-31663. This may involve knowledge of SQL standards, and I am not good at it here.I don't think I can fix this bug.:( > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at >
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638551#comment-17638551 ] huldar chen commented on SPARK-41236: - ok, let's make it my first pr:) > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) >
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638532#comment-17638532 ] jingxiong zhong commented on SPARK-41236: - I think you can a pr for it [~huldar] > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) > at >
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638493#comment-17638493 ] huldar chen commented on SPARK-41236: - If the aggregated column to be queried is renamed to the name of an existing column in the table, when parsing the aggregated column, it will be bound to the original column ID in the table, resulting in an exception. Should be parsed from aggregateExpressions first. like cases case 1: {code:java} select collect_set(testdata2.a) as a from testdata2 group by b having size(a) > 0 {code} analyzedPlan is: {code:java} 'Filter (size(tempresolvedcolumn(a#3, a), true) > 0) +- Aggregate [b#4], [collect_set(a#3, 0, 0) AS a#44] +- SubqueryAlias testdata2 +- View (`testData2`, [a#3,b#4]) +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#3, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#4] +- ExternalRDD [obj#2] {code} tempresolvedcolumn(a#3, a) should bind a#44. case 2: {code:java} select collect_set(testdata2.a) as b from testdata2 group by b having size(b) > 0 {code} analyzedPlan is: {code:java} 'Project [b#44] +- 'Filter (size(tempresolvedcolumn(b#4, b), true) > 0) +- Aggregate [b#4], [collect_set(a#3, 0, 0) AS b#44, b#4] +- SubqueryAlias testdata2 +- View (`testData2`, [a#3,b#4]) +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#3, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#4] +- ExternalRDD [obj#2] {code} tempresolvedcolumn(b#4, b) should bind b#44. The buggy code is inorg.apache.spark.sql.catalyst.analysis.Analyzer.ResolveAggregateFunctions#resolveExprsWithAggregate# resolveCol > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at >
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638142#comment-17638142 ] jingxiong zhong commented on SPARK-41236: - Thanks a lot, Is it to write the specific case like this. [~hyukjin.kwon] > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > `select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 ` > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > like this: > spark-sql> create db1.table1(age int, name string); > Time taken: 1.709 seconds > spark-sql> insert into db1.table1 values(1, 'a'); > Time taken: 2.114 seconds > spark-sql> insert into db1.table1 values(2, 'b'); > Time taken: 10.208 seconds > spark-sql> insert into db1.table1 values(3, 'c'); > Time taken: 0.673 seconds > spark-sql> select collect_set(age) as age > > from db1.table1 > > group by name > > having size(age) > 1 ; > Time taken: 3.022 seconds -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638065#comment-17638065 ] Hyukjin Kwon commented on SPARK-41236: -- [~zhongjingxiong] mind providing a self-contained reproducer please? > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > `select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 ` > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org