[jira] [Commented] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

Cheng Lian (JIRA) Sun, 31 Jan 2016 16:56:39 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125587#comment-15125587
 ]


Cheng Lian commented on SPARK-12725:
------------------------------------

There are other analysis rules that may use generated attributes (e.g., 
{{DistinctAggregationRewriter}}). I think a generic approach is better than 
special casing them one by one.

> SQL generation suffers from name conficts introduced by some analysis rules
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-12725
>                 URL: https://issues.apache.org/jira/browse/SPARK-12725
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Cheng Lian
>
> Some analysis rules generate auxiliary attribute references with the same 
> name but different expression IDs. For example, {{ResolveAggregateFunctions}} 
> introduces {{havingCondition}} and {{aggOrder}}, and 
> {{DistinctAggregationRewriter}} introduces {{gid}}.
> This is OK for normal query execution since these attribute references get 
> expression IDs. However, it's troublesome when converting resolved query 
> plans back to SQL query strings since expression IDs are erased.
> Here's an example Spark 1.6.0 snippet for illustration:
> {code}
> sqlContext.range(10).select('id as 'a, 'id as 'b).registerTempTable("t")
> sqlContext.sql("SELECT SUM(a) FROM t GROUP BY a, b ORDER BY COUNT(a), 
> COUNT(b)").explain(true)
> {code}
> The above code produces the following resolved plan:
> {noformat}
> == Analyzed Logical Plan ==
> _c0: bigint
> Project [_c0#101L]
> +- Sort [aggOrder#102L ASC,aggOrder#103L ASC], true
>    +- Aggregate [a#47L,b#48L], [(sum(a#47L),mode=Complete,isDistinct=false) 
> AS _c0#101L,(count(a#47L),mode=Complete,isDistinct=false) AS 
> aggOrder#102L,(count(b#48L),mode=Complete,isDistinct=false) AS aggOrder#103L]
>       +- Subquery t
>          +- Project [id#46L AS a#47L,id#46L AS b#48L]
>             +- LogicalRDD [id#46L], MapPartitionsRDD[44] at range at 
> <console>:26
> {noformat}
> Here we can see that both aggregate expressions in {{ORDER BY}} are extracted 
> into an {{Aggregate}} operator, and both of them are named {{aggOrder}} with 
> different expression IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

Reply via email to