[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-24 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Description: 
{code:java}
select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 
{code}

a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

h3. *like this:*

{code:sql}
create db1.table1(age int, name string);

insert into db1.table1 values(1, 'a');

insert into db1.table1 values(2, 'b');

insert into db1.table1 values(3, 'c');

--then run sql like this 
select collect_set(age) as age from db1.table1 group by name having size(age) > 
1 ;

{code}



h3. Stack Information
org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input 
columns: [age]; line 4 pos 12;
'Filter (size('age, true) > 1)
+- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0]
   +- SubqueryAlias spark_catalog.db1.table1
  +- HiveTableRelation [`db1`.`table1`, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], 
Partition Cols: []]

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88)
at 

[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-24 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Description: 
{code:java}
select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 
{code}

a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

h3. *like this:*

{code:sql}
create db1.table1(age int, name string);

insert into db1.table1 values(1, 'a');

insert into db1.table1 values(2, 'b');

insert into db1.table1 values(3, 'c');

--then run sql like this 
select collect_set(age) as age from db1.table1 group by name having size(age) > 
1 ;

{code}



h3. Stack Information
org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input 
columns: [age]; line 4 pos 12;
'Filter (size('age, true) > 1)
+- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0]
   +- SubqueryAlias spark_catalog.bigdata_qa.table1
  +- HiveTableRelation [`bigdata_qa`.`table1`, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], 
Partition Cols: []]

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88)
at 

[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-24 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Description: 
{code:java}
select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 
{code}

a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

h3. *like this:*

{code:sql}
create db1.table1(age int, name string);

insert into db1.table1 values(1, 'a');

insert into db1.table1 values(2, 'b');

insert into db1.table1 values(3, 'c');

{code}

then run sql like this `select collect_set(age) as age from db1.table1 group by 
name having size(age) > 1 ;`

h3. Stack Information
org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input 
columns: [age]; line 4 pos 12;
'Filter (size('age, true) > 1)
+- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0]
   +- SubqueryAlias spark_catalog.bigdata_qa.table1
  +- HiveTableRelation [`bigdata_qa`.`table1`, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], 
Partition Cols: []]

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88)
at 

[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-24 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Description: 

{code:java}
select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 
{code}

a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

h3. *like this:*
spark-sql> create db1.table1(age int, name string);
Time taken: 1.709 seconds
spark-sql> insert into db1.table1 values(1, 'a');
Time taken: 2.114 seconds
spark-sql> insert into db1.table1 values(2, 'b');
Time taken: 10.208 seconds
spark-sql> insert into db1.table1 values(3, 'c');
Time taken: 0.673 seconds
then run sql like this `select collect_set(age) as age from db1.table1 group by 
name having size(age) > 1 ;`

h3. Stack Information
org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input 
columns: [age]; line 4 pos 12;
'Filter (size('age, true) > 1)
+- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0]
   +- SubqueryAlias spark_catalog.bigdata_qa.table1
  +- HiveTableRelation [`bigdata_qa`.`table1`, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], 
Partition Cols: []]

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
at 

[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-24 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Description: 
`select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 `
a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

h3. *like this:*
spark-sql> create db1.table1(age int, name string);
Time taken: 1.709 seconds
spark-sql> insert into db1.table1 values(1, 'a');
Time taken: 2.114 seconds
spark-sql> insert into db1.table1 values(2, 'b');
Time taken: 10.208 seconds
spark-sql> insert into db1.table1 values(3, 'c');
Time taken: 0.673 seconds
then run sql like this `select collect_set(age) as age from db1.table1 group by 
name having size(age) > 1 ;`

h3. Stack Information
org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input 
columns: [age]; line 4 pos 12;
'Filter (size('age, true) > 1)
+- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0]
   +- SubqueryAlias spark_catalog.bigdata_qa.table1
  +- HiveTableRelation [`bigdata_qa`.`table1`, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], 
Partition Cols: []]

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
at 

[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-23 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Description: 
`select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 `
a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

like this:
spark-sql> create db1.table1(age int, name string);
Time taken: 1.709 seconds
spark-sql> insert into db1.table1 values(1, 'a');
Time taken: 2.114 seconds
spark-sql> insert into db1.table1 values(2, 'b');
Time taken: 10.208 seconds
spark-sql> insert into db1.table1 values(3, 'c');
Time taken: 0.673 seconds
then run sql like this `select collect_set(age) as age from db1.table1 group by 
name having size(age) > 1 ;`

Stack Information
org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input 
columns: [age]; line 4 pos 12;
'Filter (size('age, true) > 1)
+- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0]
   +- SubqueryAlias spark_catalog.bigdata_qa.table1
  +- HiveTableRelation [`bigdata_qa`.`table1`, 
org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], 
Partition Cols: []]

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154)
at 
org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
at 

[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-23 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Description: 
`select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 `
a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

like this:
spark-sql> create db1.table1(age int, name string);
Time taken: 1.709 seconds
spark-sql> insert into db1.table1 values(1, 'a');
Time taken: 2.114 seconds
spark-sql> insert into db1.table1 values(2, 'b');
Time taken: 10.208 seconds
spark-sql> insert into db1.table1 values(3, 'c');
Time taken: 0.673 seconds
then run sql like this `select collect_set(age) as age from db1.table1 group by 
name having size(age) > 1 ;`


  was:
`select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 `
a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

like this:
spark-sql> create db1.table1(age int, name string);
Time taken: 1.709 seconds
spark-sql> insert into db1.table1 values(1, 'a');
Time taken: 2.114 seconds
spark-sql> insert into db1.table1 values(2, 'b');
Time taken: 10.208 seconds
spark-sql> insert into db1.table1 values(3, 'c');
Time taken: 0.673 seconds
spark-sql> select collect_set(age) as age
 > from db1.table1
 > group by name
 > having size(age) > 1 ;



> The renamed field name cannot be recognized after group filtering
> -
>
> Key: SPARK-41236
> URL: https://issues.apache.org/jira/browse/SPARK-41236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jingxiong zhong
>Priority: Major
>
> `select collect_set(age) as age
> from db_table.table1
> group by name
> having size(age) > 1 `
> a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
> Is it a bug or a new standard?
> like this:
> spark-sql> create db1.table1(age int, name string);
> Time taken: 1.709 seconds
> spark-sql> insert into db1.table1 values(1, 'a');
> Time taken: 2.114 seconds
> spark-sql> insert into db1.table1 values(2, 'b');
> Time taken: 10.208 seconds
> spark-sql> insert into db1.table1 values(3, 'c');
> Time taken: 0.673 seconds
> then run sql like this `select collect_set(age) as age from db1.table1 group 
> by name having size(age) > 1 ;`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-23 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Description: 
`select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 `
a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

like this:
spark-sql> create db1.table1(age int, name string);
Time taken: 1.709 seconds
spark-sql> insert into db1.table1 values(1, 'a');
Time taken: 2.114 seconds
spark-sql> insert into db1.table1 values(2, 'b');
Time taken: 10.208 seconds
spark-sql> insert into db1.table1 values(3, 'c');
Time taken: 0.673 seconds
spark-sql> select collect_set(age) as age
 > from db1.table1
 > group by name
 > having size(age) > 1 ;


  was:
`select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 `
a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

like this:
spark-sql> create db1.table1(age int, name string);
Time taken: 1.709 seconds
spark-sql> insert into db1.table1 values(1, 'a');
Time taken: 2.114 seconds
spark-sql> insert into db1.table1 values(2, 'b');
Time taken: 10.208 seconds
spark-sql> insert into db1.table1 values(3, 'c');
Time taken: 0.673 seconds
spark-sql> select collect_set(age) as age
 > from db1.table1
 > group by name
 > having size(age) > 1 ;
Time taken: 3.022 seconds


> The renamed field name cannot be recognized after group filtering
> -
>
> Key: SPARK-41236
> URL: https://issues.apache.org/jira/browse/SPARK-41236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jingxiong zhong
>Priority: Major
>
> `select collect_set(age) as age
> from db_table.table1
> group by name
> having size(age) > 1 `
> a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
> Is it a bug or a new standard?
> like this:
> spark-sql> create db1.table1(age int, name string);
> Time taken: 1.709 seconds
> spark-sql> insert into db1.table1 values(1, 'a');
> Time taken: 2.114 seconds
> spark-sql> insert into db1.table1 values(2, 'b');
> Time taken: 10.208 seconds
> spark-sql> insert into db1.table1 values(3, 'c');
> Time taken: 0.673 seconds
> spark-sql> select collect_set(age) as age
>  > from db1.table1
>  > group by name
>  > having size(age) > 1 ;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-23 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Description: 
`select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 `
a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?

like this:
spark-sql> create db1.table1(age int, name string);
Time taken: 1.709 seconds
spark-sql> insert into db1.table1 values(1, 'a');
Time taken: 2.114 seconds
spark-sql> insert into db1.table1 values(2, 'b');
Time taken: 10.208 seconds
spark-sql> insert into db1.table1 values(3, 'c');
Time taken: 0.673 seconds
spark-sql> select collect_set(age) as age
 > from db1.table1
 > group by name
 > having size(age) > 1 ;
Time taken: 3.022 seconds

  was:
`select collect_set(age) as age
from db_table.table1
group by name
having size(age) > 1 `
a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
Is it a bug or a new standard?


> The renamed field name cannot be recognized after group filtering
> -
>
> Key: SPARK-41236
> URL: https://issues.apache.org/jira/browse/SPARK-41236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jingxiong zhong
>Priority: Major
>
> `select collect_set(age) as age
> from db_table.table1
> group by name
> having size(age) > 1 `
> a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
> Is it a bug or a new standard?
> like this:
> spark-sql> create db1.table1(age int, name string);
> Time taken: 1.709 seconds
> spark-sql> insert into db1.table1 values(1, 'a');
> Time taken: 2.114 seconds
> spark-sql> insert into db1.table1 values(2, 'b');
> Time taken: 10.208 seconds
> spark-sql> insert into db1.table1 values(3, 'c');
> Time taken: 0.673 seconds
> spark-sql> select collect_set(age) as age
>  > from db1.table1
>  > group by name
>  > having size(age) > 1 ;
> Time taken: 3.022 seconds



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41236:
-
Priority: Major  (was: Blocker)

> The renamed field name cannot be recognized after group filtering
> -
>
> Key: SPARK-41236
> URL: https://issues.apache.org/jira/browse/SPARK-41236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jingxiong zhong
>Priority: Major
>
> `select collect_set(age) as age
> from db_table.table1
> group by name
> having size(age) > 1 `
> a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
> Is it a bug or a new standard?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering

2022-11-23 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong updated SPARK-41236:

Summary: The renamed field name cannot be recognized after group filtering  
(was: The renamed field name cannot be recognized after group filtering, but it 
is the same as the original field name)

> The renamed field name cannot be recognized after group filtering
> -
>
> Key: SPARK-41236
> URL: https://issues.apache.org/jira/browse/SPARK-41236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jingxiong zhong
>Priority: Blocker
>
> `select collect_set(age) as age
> from db_table.table1
> group by name
> having size(age) > 1 `
> a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
> Is it a bug or a new standard?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org