[jira] [Commented] (SPARK-27561) Support "lateral column alias references" to allow column aliases to be used within SELECT clauses

Apache Spark (Jira) Mon, 12 Dec 2022 16:04:08 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-27561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646368#comment-17646368
 ]


Apache Spark commented on SPARK-27561:
--------------------------------------

User 'anchovYu' has created a pull request for this issue:
https://github.com/apache/spark/pull/39040

> Support "lateral column alias references" to allow column aliases to be used 
> within SELECT clauses
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27561
>                 URL: https://issues.apache.org/jira/browse/SPARK-27561
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Josh Rosen
>            Priority: Major
>
> Amazon Redshift has a feature called "lateral column alias references": 
> [https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/].
>  Quoting from that blogpost:
> {quote}The support for lateral column alias reference enables you to write 
> queries without repeating the same expressions in the SELECT list. For 
> example, you can define the alias 'probability' and use it within the same 
> select statement:
> {code:java}
> select clicks / impressions as probability, round(100 * probability, 1) as 
> percentage from raw_data;
> {code}
> {quote}
> There's more information about this feature on 
> [https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_list.html:]
> {quote}The benefit of the lateral alias reference is you don't need to repeat 
> the aliased expression when building more complex expressions in the same 
> target list. When Amazon Redshift parses this type of reference, it just 
> inlines the previously defined aliases. If there is a column with the same 
> name defined in the FROM clause as the previously aliased expression, the 
> column in the FROM clause takes priority. For example, in the above query if 
> there is a column named 'probability' in table raw_data, the 'probability' in 
> the second expression in the target list will refer to that column instead of 
> the alias name 'probability'.
> {quote}
> It would be nice if Spark supported this syntax. I don't think that this is 
> standard SQL, so it might be a good idea to research if other SQL databases 
> support similar syntax (and to see if they implement the same column 
> resolution strategy as Redshift).
> We should also consider whether this needs to be feature-flagged as part of a 
> specific SQL compatibility mode / dialect.
> One possibly-related existing ticket: SPARK-9338, which discusses the use of 
> SELECT aliases in GROUP BY expressions.
> /cc [~hvanhovell]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27561) Support "lateral column alias references" to allow column aliases to be used within SELECT clauses

Reply via email to