[ https://issues.apache.org/jira/browse/SPARK-26979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andre Sa de Mello updated SPARK-26979: -------------------------------------- Component/s: (was: SQL) PySpark > [PySpark] Some SQL functions do not take column names > ----------------------------------------------------- > > Key: SPARK-26979 > URL: https://issues.apache.org/jira/browse/SPARK-26979 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Andre Sa de Mello > Priority: Minor > Labels: easyfix, pull-request-available, usability > Original Estimate: 0h > Remaining Estimate: 0h > > Most SQL functions defined in _org.apache.spark.sql.functions_ have two > variations, one taking a Column object as input, and another taking a string > representing a column name, which is then converted into a Column object > internally. > There are, however, a few notable exceptions: > * lower() > * upper() > * abs() > * bitwiseNOT() > While this doesn't break anything, as you can easily create a Column object > yourself prior to passing it to one of these functions, it has two > undesirable consequences: > # It is surprising - it breaks coder's expectations when they are first > starting with Spark. Every API should be as consistent as possible, so as to > make the learning curve smoother and to reduce causes for human error; > # It gets in the way of stylistic conventions. Most of the time it makes > Python/Scala/Java code more readable to use literal names, and the API > provides ample support for that, but these few exceptions prevent this > pattern from being universally applicable. > This is a very easy fix, and I see no reason not to apply it. I have a PR > ready. > *UPDATE:* Turns out there are many exceptions over this pattern that I wasn't > aware of. The reason I missed them is because I had been looking at things > from PySpark's point of view, and the API there does support column name > literals for almost all SQL functions. > Exceptions for the PySpark API include all the above plus: > * ltrim() > * rtrim() > * trim() > * ascii() > * initcap() > * base64() > * unbase64() > The argument for making the API consistent still stands, however. I have been > working on a PR to fix this on PySpark's side, and it should still be a > painless change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org