[ https://issues.apache.org/jira/browse/SPARK-27561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646824#comment-17646824 ]
Apache Spark commented on SPARK-27561: -------------------------------------- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39054 > Support "lateral column alias references" to allow column aliases to be used > within SELECT clauses > -------------------------------------------------------------------------------------------------- > > Key: SPARK-27561 > URL: https://issues.apache.org/jira/browse/SPARK-27561 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.1.0 > Reporter: Josh Rosen > Assignee: Xinyi Yu > Priority: Major > Fix For: 3.4.0 > > > Amazon Redshift has a feature called "lateral column alias references": > [https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/]. > Quoting from that blogpost: > {quote}The support for lateral column alias reference enables you to write > queries without repeating the same expressions in the SELECT list. For > example, you can define the alias 'probability' and use it within the same > select statement: > {code:java} > select clicks / impressions as probability, round(100 * probability, 1) as > percentage from raw_data; > {code} > {quote} > There's more information about this feature on > [https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_list.html:] > {quote}The benefit of the lateral alias reference is you don't need to repeat > the aliased expression when building more complex expressions in the same > target list. When Amazon Redshift parses this type of reference, it just > inlines the previously defined aliases. If there is a column with the same > name defined in the FROM clause as the previously aliased expression, the > column in the FROM clause takes priority. For example, in the above query if > there is a column named 'probability' in table raw_data, the 'probability' in > the second expression in the target list will refer to that column instead of > the alias name 'probability'. > {quote} > It would be nice if Spark supported this syntax. I don't think that this is > standard SQL, so it might be a good idea to research if other SQL databases > support similar syntax (and to see if they implement the same column > resolution strategy as Redshift). > We should also consider whether this needs to be feature-flagged as part of a > specific SQL compatibility mode / dialect. > One possibly-related existing ticket: SPARK-9338, which discusses the use of > SELECT aliases inĀ GROUP BY expressions. > /cc [~hvanhovell] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org