[jira] [Commented] (SPARK-36815) Found duplicate rewrite attributes

2021-09-22 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418703#comment-17418703
 ] 

L. C. Hsieh commented on SPARK-36815:
-

It was resolved by https://github.com/apache/spark/pull/34068.

> Found duplicate rewrite attributes
> --
>
> Key: SPARK-36815
> URL: https://issues.apache.org/jira/browse/SPARK-36815
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2
>Reporter: gaoyajun02
>Priority: Major
>
> We are using Spark version 3.0.2 in production and some ETLs contain 
> multi-level CTEs and the following error occurs when we join them.
> {code:java}
> java.lang.AssertionError: assertion failed: Found duplicate rewrite 
> attributes at scala.Predef$.assert(Predef.scala:223) at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) 
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
> {code}
> I reproduced the problem with a simplified SQL as follows:
> {code:java}
> -- SQL
> with
> a as ( select name, get_json_object(json, '$.id') id, n from (
> select get_json_object(json, '$.name') name, json from values 
> ('{"name":"a", "id": 1}' ) people(json)
> ) LATERAL VIEW explode(array(1, 1, 2)) num as n ),
> b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) 
> c from a group by name) a2 on a1.name = a2.name)
> select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code}
> In debugging I found that a reference to the root Project existed in both 
> subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` 
> occurred in both subqueries, containing two new attrMapping, and they were 
> both eventually passed to the root Project, leading to this error
> plan:
> {code:java}
> Project [name#218, id#219, n#229]
> +- Join LeftOuter, (name#218 = name#232)
>:- SubqueryAlias a1
>:  +- SubqueryAlias a
>: +- Project [name#218, get_json_object(json#225, $.id) AS id#219, 
> n#229]
>:+- Generate explode(array(1, 1, 2)), false, num, [n#229]
>:   +- SubqueryAlias __auto_generated_subquery_name
>:  +- Project [get_json_object(json#225, $.name) AS name#218, 
> json#225]
>: +- SubqueryAlias people
>:+- LocalRelation [json#225]
>+- SubqueryAlias a2
>   +- Aggregate [name#232], [name#232, count(1) AS c#220L]
>  +- SubqueryAlias a
> +- Project [name#232, get_json_object(json#226, $.id) AS id#219, 
> n#230]
>+- Generate explode(array(1, 1, 2)), false, num, [n#230]
>   +- SubqueryAlias __auto_generated_subquery_name
>  +- Project [get_json_object(json#226, $.name) AS 
> name#232, json#226]
> +- SubqueryAlias people
>+- LocalRelation [json#226]
> {code}
>  newPlan:
> {code:java}
> !Project [name#218, id#219, n#229]
> +- Join LeftOuter, (name#218 = name#232)
>:- SubqueryAlias a1
>:  +- SubqueryAlias a
>: +- Project [name#218, get_json_object(json#225, $.id) AS id#233, 
> n#229]
>:+- Generate explode(array(1, 1, 2)), false, num, [n#229]
>:   +- SubqueryAlias __auto_generated_subquery_name
>:  +- Project [get_json_object(json#225, $.name) AS name#218, 
> json#225]
>: +- SubqueryAlias people
>:+- LocalRelation [json#225]
>+- SubqueryAlias a2
>   +- Aggregate [name#232], [name#232, count(1) AS c#220L]
>  +- SubqueryAlias a
> +- Project [name#232, get_json_object(json#226, $.id) AS id#234, 
> n#230]
>+- Generate explode(array(1, 1, 2)), false, num, [n#230]
>   +- SubqueryAlias __auto_generated_subquery_name
>  +- Project [get_json_object(json#226, $.name) AS 
> name#232, json#226]
> +- SubqueryAlias people
>+- LocalRelation [json#226]
> {code}
> attrMapping:
> {code:java}
> attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2
>  0 = {Tuple2@17769} "(id#219,id#233)"
>  1 = {Tuple2@17770} "(id#219,id#234)"
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36815) Found duplicate rewrite attributes

2021-09-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418429#comment-17418429
 ] 

Apache Spark commented on SPARK-36815:
--

User 'gaoyajun02' has created a pull request for this issue:
https://github.com/apache/spark/pull/34068

> Found duplicate rewrite attributes
> --
>
> Key: SPARK-36815
> URL: https://issues.apache.org/jira/browse/SPARK-36815
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2
>Reporter: gaoyajun02
>Priority: Major
>
> We are using Spark version 3.0.2 in production and some ETLs contain 
> multi-level CETs and the following error occurs when we join them.
> {code:java}
> java.lang.AssertionError: assertion failed: Found duplicate rewrite 
> attributes at scala.Predef$.assert(Predef.scala:223) at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) 
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
> {code}
> I reproduced the problem with a simplified SQL as follows:
> {code:java}
> -- SQL
> with
> a as ( select name, get_json_object(json, '$.id') id, n from (
> select get_json_object(json, '$.name') name, json from values 
> ('{"name":"a", "id": 1}' ) people(json)
> ) LATERAL VIEW explode(array(1, 1, 2)) num as n ),
> b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) 
> c from a group by name) a2 on a1.name = a2.name)
> select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code}
> In debugging I found that a reference to the root Project existed in both 
> subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` 
> occurred in both subqueries, containing two new attrMapping, and they were 
> both eventually passed to the root Project, leading to this error
> plan:
> {code:java}
> Project [name#218, id#219, n#229]
> +- Join LeftOuter, (name#218 = name#232)
>:- SubqueryAlias a1
>:  +- SubqueryAlias a
>: +- Project [name#218, get_json_object(json#225, $.id) AS id#219, 
> n#229]
>:+- Generate explode(array(1, 1, 2)), false, num, [n#229]
>:   +- SubqueryAlias __auto_generated_subquery_name
>:  +- Project [get_json_object(json#225, $.name) AS name#218, 
> json#225]
>: +- SubqueryAlias people
>:+- LocalRelation [json#225]
>+- SubqueryAlias a2
>   +- Aggregate [name#232], [name#232, count(1) AS c#220L]
>  +- SubqueryAlias a
> +- Project [name#232, get_json_object(json#226, $.id) AS id#219, 
> n#230]
>+- Generate explode(array(1, 1, 2)), false, num, [n#230]
>   +- SubqueryAlias __auto_generated_subquery_name
>  +- Project [get_json_object(json#226, $.name) AS 
> name#232, json#226]
> +- SubqueryAlias people
>+- LocalRelation [json#226]
> {code}
>  newPlan:
> {code:java}
> !Project [name#218, id#219, n#229]
> +- Join LeftOuter, (name#218 = name#232)
>:- SubqueryAlias a1
>:  +- SubqueryAlias a
>: +- Project [name#218, get_json_object(json#225, $.id) AS id#233, 
> n#229]
>:+- Generate explode(array(1, 1, 2)), false, num, [n#229]
>:   +- SubqueryAlias __auto_generated_subquery_name
>:  +- Project [get_json_object(json#225, $.name) AS name#218, 
> json#225]
>: +- SubqueryAlias people
>:+- LocalRelation [json#225]
>+- SubqueryAlias a2
>   +- Aggregate [name#232], [name#232, count(1) AS c#220L]
>  +- SubqueryAlias a
> +- Project [name#232, get_json_object(json#226, $.id) AS id#234, 
> n#230]
>+- Generate explode(array(1, 1, 2)), false, num, [n#230]
>   +- SubqueryAlias __auto_generated_subquery_name
>  +- Project [get_json_object(json#226, $.name) AS 
> name#232, json#226]
> +- SubqueryAlias people
>+- LocalRelation [json#226]
> {code}
> attrMapping:
> {code:java}
> attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2
>  0 = {Tuple2@17769} "(id#219,id#233)"
>  1 = {Tuple2@17770} "(id#219,id#234)"
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Commented] (SPARK-36815) Found duplicate rewrite attributes

2021-09-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418427#comment-17418427
 ] 

Apache Spark commented on SPARK-36815:
--

User 'gaoyajun02' has created a pull request for this issue:
https://github.com/apache/spark/pull/34068

> Found duplicate rewrite attributes
> --
>
> Key: SPARK-36815
> URL: https://issues.apache.org/jira/browse/SPARK-36815
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2
>Reporter: gaoyajun02
>Priority: Major
>
> We are using Spark version 3.0.2 in production and some ETLs contain 
> multi-level CETs and the following error occurs when we join them.
> {code:java}
> java.lang.AssertionError: assertion failed: Found duplicate rewrite 
> attributes at scala.Predef$.assert(Predef.scala:223) at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) 
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
> {code}
> I reproduced the problem with a simplified SQL as follows:
> {code:java}
> -- SQL
> with
> a as ( select name, get_json_object(json, '$.id') id, n from (
> select get_json_object(json, '$.name') name, json from values 
> ('{"name":"a", "id": 1}' ) people(json)
> ) LATERAL VIEW explode(array(1, 1, 2)) num as n ),
> b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) 
> c from a group by name) a2 on a1.name = a2.name)
> select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code}
> In debugging I found that a reference to the root Project existed in both 
> subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` 
> occurred in both subqueries, containing two new attrMapping, and they were 
> both eventually passed to the root Project, leading to this error
> plan:
> {code:java}
> Project [name#218, id#219, n#229]
> +- Join LeftOuter, (name#218 = name#232)
>:- SubqueryAlias a1
>:  +- SubqueryAlias a
>: +- Project [name#218, get_json_object(json#225, $.id) AS id#219, 
> n#229]
>:+- Generate explode(array(1, 1, 2)), false, num, [n#229]
>:   +- SubqueryAlias __auto_generated_subquery_name
>:  +- Project [get_json_object(json#225, $.name) AS name#218, 
> json#225]
>: +- SubqueryAlias people
>:+- LocalRelation [json#225]
>+- SubqueryAlias a2
>   +- Aggregate [name#232], [name#232, count(1) AS c#220L]
>  +- SubqueryAlias a
> +- Project [name#232, get_json_object(json#226, $.id) AS id#219, 
> n#230]
>+- Generate explode(array(1, 1, 2)), false, num, [n#230]
>   +- SubqueryAlias __auto_generated_subquery_name
>  +- Project [get_json_object(json#226, $.name) AS 
> name#232, json#226]
> +- SubqueryAlias people
>+- LocalRelation [json#226]
> {code}
>  newPlan:
> {code:java}
> !Project [name#218, id#219, n#229]
> +- Join LeftOuter, (name#218 = name#232)
>:- SubqueryAlias a1
>:  +- SubqueryAlias a
>: +- Project [name#218, get_json_object(json#225, $.id) AS id#233, 
> n#229]
>:+- Generate explode(array(1, 1, 2)), false, num, [n#229]
>:   +- SubqueryAlias __auto_generated_subquery_name
>:  +- Project [get_json_object(json#225, $.name) AS name#218, 
> json#225]
>: +- SubqueryAlias people
>:+- LocalRelation [json#225]
>+- SubqueryAlias a2
>   +- Aggregate [name#232], [name#232, count(1) AS c#220L]
>  +- SubqueryAlias a
> +- Project [name#232, get_json_object(json#226, $.id) AS id#234, 
> n#230]
>+- Generate explode(array(1, 1, 2)), false, num, [n#230]
>   +- SubqueryAlias __auto_generated_subquery_name
>  +- Project [get_json_object(json#226, $.name) AS 
> name#232, json#226]
> +- SubqueryAlias people
>+- LocalRelation [json#226]
> {code}
> attrMapping:
> {code:java}
> attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2
>  0 = {Tuple2@17769} "(id#219,id#233)"
>  1 = {Tuple2@17770} "(id#219,id#234)"
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Commented] (SPARK-36815) Found duplicate rewrite attributes

2021-09-21 Thread gaoyajun02 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418064#comment-17418064
 ] 

gaoyajun02 commented on SPARK-36815:


https://issues.apache.org/jira/browse/SPARK-33272 fixes this issue, but the 
Spark 3.0.2 branch does not

> Found duplicate rewrite attributes
> --
>
> Key: SPARK-36815
> URL: https://issues.apache.org/jira/browse/SPARK-36815
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2
>Reporter: gaoyajun02
>Priority: Major
> Fix For: 3.0.2
>
>
> We are using Spark version 3.0.2 in production and some ETLs contain 
> multi-level CETs and the following error occurs when we join them.
> {code:java}
> java.lang.AssertionError: assertion failed: Found duplicate rewrite 
> attributes at scala.Predef$.assert(Predef.scala:223) at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) 
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
> {code}
> I reproduced the problem with a simplified SQL as follows:
> {code:java}
> -- SQL
> with
> a as ( select name, get_json_object(json, '$.id') id, n from (
> select get_json_object(json, '$.name') name, json from values 
> ('{"name":"a", "id": 1}' ) people(json)
> ) LATERAL VIEW explode(array(1, 1, 2)) num as n ),
> b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) 
> c from a group by name) a2 on a1.name = a2.name)
> select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code}
> In debugging I found that a reference to the root Project existed in both 
> subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` 
> occurred in both subqueries, containing two new attrMapping, and they were 
> both eventually passed to the root Project, leading to this error
> plan:
> {code:java}
> Project [name#218, id#219, n#229]
> +- Join LeftOuter, (name#218 = name#232)
>:- SubqueryAlias a1
>:  +- SubqueryAlias a
>: +- Project [name#218, get_json_object(json#225, $.id) AS id#219, 
> n#229]
>:+- Generate explode(array(1, 1, 2)), false, num, [n#229]
>:   +- SubqueryAlias __auto_generated_subquery_name
>:  +- Project [get_json_object(json#225, $.name) AS name#218, 
> json#225]
>: +- SubqueryAlias people
>:+- LocalRelation [json#225]
>+- SubqueryAlias a2
>   +- Aggregate [name#232], [name#232, count(1) AS c#220L]
>  +- SubqueryAlias a
> +- Project [name#232, get_json_object(json#226, $.id) AS id#219, 
> n#230]
>+- Generate explode(array(1, 1, 2)), false, num, [n#230]
>   +- SubqueryAlias __auto_generated_subquery_name
>  +- Project [get_json_object(json#226, $.name) AS 
> name#232, json#226]
> +- SubqueryAlias people
>+- LocalRelation [json#226]
> {code}
>  newPlan:
> {code:java}
> !Project [name#218, id#219, n#229]
> +- Join LeftOuter, (name#218 = name#232)
>:- SubqueryAlias a1
>:  +- SubqueryAlias a
>: +- Project [name#218, get_json_object(json#225, $.id) AS id#233, 
> n#229]
>:+- Generate explode(array(1, 1, 2)), false, num, [n#229]
>:   +- SubqueryAlias __auto_generated_subquery_name
>:  +- Project [get_json_object(json#225, $.name) AS name#218, 
> json#225]
>: +- SubqueryAlias people
>:+- LocalRelation [json#225]
>+- SubqueryAlias a2
>   +- Aggregate [name#232], [name#232, count(1) AS c#220L]
>  +- SubqueryAlias a
> +- Project [name#232, get_json_object(json#226, $.id) AS id#234, 
> n#230]
>+- Generate explode(array(1, 1, 2)), false, num, [n#230]
>   +- SubqueryAlias __auto_generated_subquery_name
>  +- Project [get_json_object(json#226, $.name) AS 
> name#232, json#226]
> +- SubqueryAlias people
>+- LocalRelation [json#226]
> {code}
> attrMapping:
> {code:java}
> attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2
>  0 = {Tuple2@17769} "(id#219,id#233)"
>  1 = {Tuple2@17770} "(id#219,id#234)"
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands,