[jira] [Resolved] (SPARK-26979) [PySpark] Some SQL functions do not take column names

Sean Owen (JIRA) Sun, 17 Mar 2019 10:59:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-26979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-26979.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 23882
[https://github.com/apache/spark/pull/23882]

> [PySpark] Some SQL functions do not take column names
> -----------------------------------------------------
>
>                 Key: SPARK-26979
>                 URL: https://issues.apache.org/jira/browse/SPARK-26979
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Andre Sa de Mello
>            Assignee: Andre Sa de Mello
>            Priority: Minor
>              Labels: easyfix, pull-request-available, usability
>             Fix For: 3.0.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Most SQL functions defined in _org.apache.spark.sql.functions_ have two 
> variations, one taking a Column object as input, and another taking a string 
> representing a column name, which is then converted into a Column object 
> internally.
> There are, however, a few notable exceptions:
>  * lower()
>  * upper()
>  * abs()
>  * bitwiseNOT()
> While this doesn't break anything, as you can easily create a Column object 
> yourself prior to passing it to one of these functions, it has two 
> undesirable consequences:
>  # It is surprising - it breaks coder's expectations when they are first 
> starting with Spark. Every API should be as consistent as possible, so as to 
> make the learning curve smoother and to reduce causes for human error;
>  # It gets in the way of stylistic conventions. Most of the time it makes 
> Python/Scala/Java code more readable to use literal names, and the API 
> provides ample support for that, but these few exceptions prevent this 
> pattern from being universally applicable.
> This is a very easy fix, and I see no reason not to apply it. I have a PR 
> ready.
> *UPDATE:* Turns out there are many exceptions over this pattern that I wasn't 
> aware of. The reason I missed them is because I had been looking at things 
> from PySpark's point of view, and the API there does support column name 
> literals for almost all SQL functions.
> Exceptions for the PySpark API include all the above plus:
>  * ltrim()
>  * rtrim()
>  * trim()
>  * ascii()
>  * base64()
>  * unbase64()
> The argument for making the API consistent still stands, however. I have been 
> working on a PR to fix this on *PySpark's side*, and it should still be a 
> painless change. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26979) [PySpark] Some SQL functions do not take column names

Reply via email to