[ https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418064#comment-17418064 ]
gaoyajun02 edited comment on SPARK-36815 at 9/21/21, 12:17 PM: --------------------------------------------------------------- https://issues.apache.org/jira/browse/SPARK-33272 fixes this issue, but the Spark 3.0.2 branch does not. Hi [~cloud_fan], can we open backport PRs for 3.0.2? was (Author: gaoyajun02): https://issues.apache.org/jira/browse/SPARK-33272 fixes this issue, but the Spark 3.0.2 branch does not > Found duplicate rewrite attributes > ---------------------------------- > > Key: SPARK-36815 > URL: https://issues.apache.org/jira/browse/SPARK-36815 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.2 > Reporter: gaoyajun02 > Priority: Major > Fix For: 3.0.2 > > > We are using Spark version 3.0.2 in production and some ETLs contain > multi-level CETs and the following error occurs when we join them. > {code:java} > java.lang.AssertionError: assertion failed: Found duplicate rewrite > attributes at scala.Predef$.assert(Predef.scala:223) at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403) > {code} > I reproduced the problem with a simplified SQL as follows: > {code:java} > -- SQL > with > a as ( select name, get_json_object(json, '$.id') id, n from ( > select get_json_object(json, '$.name') name, json from values > ('{"name":"a", "id": 1}' ) people(json) > ) LATERAL VIEW explode(array(1, 1, 2)) num as n ), > b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) > c from a group by name) a2 on a1.name = a2.name) > select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code} > In debugging I found that a reference to the root Project existed in both > subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` > occurred in both subqueries, containing two new attrMapping, and they were > both eventually passed to the root Project, leading to this error > plan: > {code:java} > Project [name#218, id#219, n#229] > +- Join LeftOuter, (name#218 = name#232) > :- SubqueryAlias a1 > : +- SubqueryAlias a > : +- Project [name#218, get_json_object(json#225, $.id) AS id#219, > n#229] > : +- Generate explode(array(1, 1, 2)), false, num, [n#229] > : +- SubqueryAlias __auto_generated_subquery_name > : +- Project [get_json_object(json#225, $.name) AS name#218, > json#225] > : +- SubqueryAlias people > : +- LocalRelation [json#225] > +- SubqueryAlias a2 > +- Aggregate [name#232], [name#232, count(1) AS c#220L] > +- SubqueryAlias a > +- Project [name#232, get_json_object(json#226, $.id) AS id#219, > n#230] > +- Generate explode(array(1, 1, 2)), false, num, [n#230] > +- SubqueryAlias __auto_generated_subquery_name > +- Project [get_json_object(json#226, $.name) AS > name#232, json#226] > +- SubqueryAlias people > +- LocalRelation [json#226] > {code} > newPlan: > {code:java} > !Project [name#218, id#219, n#229] > +- Join LeftOuter, (name#218 = name#232) > :- SubqueryAlias a1 > : +- SubqueryAlias a > : +- Project [name#218, get_json_object(json#225, $.id) AS id#233, > n#229] > : +- Generate explode(array(1, 1, 2)), false, num, [n#229] > : +- SubqueryAlias __auto_generated_subquery_name > : +- Project [get_json_object(json#225, $.name) AS name#218, > json#225] > : +- SubqueryAlias people > : +- LocalRelation [json#225] > +- SubqueryAlias a2 > +- Aggregate [name#232], [name#232, count(1) AS c#220L] > +- SubqueryAlias a > +- Project [name#232, get_json_object(json#226, $.id) AS id#234, > n#230] > +- Generate explode(array(1, 1, 2)), false, num, [n#230] > +- SubqueryAlias __auto_generated_subquery_name > +- Project [get_json_object(json#226, $.name) AS > name#232, json#226] > +- SubqueryAlias people > +- LocalRelation [json#226] > {code} > attrMapping: > {code:java} > attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2 > 0 = {Tuple2@17769} "(id#219,id#233)" > 1 = {Tuple2@17770} "(id#219,id#234)" > {code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org