[ 
https://issues.apache.org/jira/browse/SPARK-37788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Davies updated SPARK-37788:
----------------------------------
    Description: 
PySpark has mainly migrated to supporting both Column input types as well as 
string names of columns ("ColumnOrName") in it's functions module. There seem 
to be a small number of functions that need updating; either on conversions of 
input string names representing columns into the Column type, or simple 
annotation changes that indicate the function supports column string names.

Below are the functions I've seen:
 * F.overlay: Annotation only
 * F.least: Annotation only
 * F.slice: Needs a conversion
 * F.array_repeat: Needs a conversion

See here for additional context: 
[https://github.com/apache/spark/pull/35032#issuecomment-1003033776]

I'm happy to make a quick PR fixing these, if there is no reason for these 
functions being handled as a special case.

  was:
PySpark has mainly migrated to supporting both Column input types as well as 
string names of columns ("ColumnOrName") in it's functions module. There seem 
to be a small number of functions that need updating; either on conversions of 
input string names representing columns into the Column type, or simple 
annotation changes that indicate the function supports column string names.

Below are the functions I've seen:
 * F.overlay: Annotation only
 * F.least: Annotation only
 * F.slice: Needs a conversion
 * F.array_repeat: Needs a conversion

See here for additional context: 
[https://github.com/apache/spark/pull/35032#issuecomment-1003033776]

 

 


> ColumnOrName vs Column in PySpark Functions module
> --------------------------------------------------
>
>                 Key: SPARK-37788
>                 URL: https://issues.apache.org/jira/browse/SPARK-37788
>             Project: Spark
>          Issue Type: Question
>          Components: PySpark
>    Affects Versions: 3.2.0
>            Reporter: Daniel Davies
>            Priority: Minor
>
> PySpark has mainly migrated to supporting both Column input types as well as 
> string names of columns ("ColumnOrName") in it's functions module. There seem 
> to be a small number of functions that need updating; either on conversions 
> of input string names representing columns into the Column type, or simple 
> annotation changes that indicate the function supports column string names.
> Below are the functions I've seen:
>  * F.overlay: Annotation only
>  * F.least: Annotation only
>  * F.slice: Needs a conversion
>  * F.array_repeat: Needs a conversion
> See here for additional context: 
> [https://github.com/apache/spark/pull/35032#issuecomment-1003033776]
> I'm happy to make a quick PR fixing these, if there is no reason for these 
> functions being handled as a special case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to