[jira] [Updated] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

Cheng Lian (JIRA) Sun, 31 Jan 2016 16:52:08 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cheng Lian updated SPARK-12725:
-------------------------------
    Description: 
Some analysis rules generate auxiliary attribute references with the same name 
but different expression IDs. For example, {{ResolveAggregateFunctions}} 
introduces {{havingCondition}} and {{aggOrder}}, and 
{{DistinctAggregationRewriter}} introduces {{gid}}.

This is OK for normal query execution since these attribute references get 
expression IDs. However, it's troublesome when converting resolved query plans 
back to SQL query strings since expression IDs are erased.

Here's an example Spark 1.6.0 snippet for illustration:
{code}
sqlContext.range(10).select('id as 'a, 'id as 'b).registerTempTable("t")
sqlContext.sql("SELECT SUM(a) FROM t GROUP BY a, b ORDER BY COUNT(a), 
COUNT(b)").explain(true)
{code}
The above code produces the following resolved plan:
{noformat}
== Analyzed Logical Plan ==
_c0: bigint
Project [_c0#101L]
+- Sort [aggOrder#102L ASC,aggOrder#103L ASC], true
   +- Aggregate [a#47L,b#48L], [(sum(a#47L),mode=Complete,isDistinct=false) AS 
_c0#101L,(count(a#47L),mode=Complete,isDistinct=false) AS 
aggOrder#102L,(count(b#48L),mode=Complete,isDistinct=false) AS aggOrder#103L]
      +- Subquery t
         +- Project [id#46L AS a#47L,id#46L AS b#48L]
            +- LogicalRDD [id#46L], MapPartitionsRDD[44] at range at 
<console>:26
{noformat}
Here we can see that both aggregate expressions in {{ORDER BY}} are extracted 
into an {{Aggregate}} operator, and both of them are named {{aggOrder}} with 
different expression IDs.

  was:
Some analysis rules generate auxiliary attribute references with the same name 
but different expression IDs. For example, {{ResolveAggregateFunctions}} 
introduces {{havingCondition}} and {{aggOrder}}, and 
{{DistinctAggregationRewriter}} introduces {{gid}}.

This is OK for normal query execution since these attribute references get 
expression IDs. However, it's troublesome when converting resolved query plans 
back to SQL query strings since expression IDs are erased.


> SQL generation suffers from name conficts introduced by some analysis rules
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-12725
>                 URL: https://issues.apache.org/jira/browse/SPARK-12725
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Cheng Lian
>
> Some analysis rules generate auxiliary attribute references with the same 
> name but different expression IDs. For example, {{ResolveAggregateFunctions}} 
> introduces {{havingCondition}} and {{aggOrder}}, and 
> {{DistinctAggregationRewriter}} introduces {{gid}}.
> This is OK for normal query execution since these attribute references get 
> expression IDs. However, it's troublesome when converting resolved query 
> plans back to SQL query strings since expression IDs are erased.
> Here's an example Spark 1.6.0 snippet for illustration:
> {code}
> sqlContext.range(10).select('id as 'a, 'id as 'b).registerTempTable("t")
> sqlContext.sql("SELECT SUM(a) FROM t GROUP BY a, b ORDER BY COUNT(a), 
> COUNT(b)").explain(true)
> {code}
> The above code produces the following resolved plan:
> {noformat}
> == Analyzed Logical Plan ==
> _c0: bigint
> Project [_c0#101L]
> +- Sort [aggOrder#102L ASC,aggOrder#103L ASC], true
>    +- Aggregate [a#47L,b#48L], [(sum(a#47L),mode=Complete,isDistinct=false) 
> AS _c0#101L,(count(a#47L),mode=Complete,isDistinct=false) AS 
> aggOrder#102L,(count(b#48L),mode=Complete,isDistinct=false) AS aggOrder#103L]
>       +- Subquery t
>          +- Project [id#46L AS a#47L,id#46L AS b#48L]
>             +- LogicalRDD [id#46L], MapPartitionsRDD[44] at range at 
> <console>:26
> {noformat}
> Here we can see that both aggregate expressions in {{ORDER BY}} are extracted 
> into an {{Aggregate}} operator, and both of them are named {{aggOrder}} with 
> different expression IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

Reply via email to