[ 
https://issues.apache.org/jira/browse/SPARK-47854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Cao updated SPARK-47854:
----------------------------
    Description: 
Given that spark 4.0.0 is upcoming I wonder if we should at least consider 
renaming certain function variable naming in python. Otherwise, we may need to 
wait another 4 years to do so.

Example

[https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768]

There are 8 uses of `len` and 35 `str` as variable names, both of which are 
python built-ins. Shadowing `str` is somewhat dangerous in that the following 
would be non-sensical. 

 
{code:java}
def foo(str: "ColumnOrName", bar: "ColumnOrName"):
    # str is variable now, cannot be used as type
    bar = if lit(bar) if isinstance(bar, str) else bar
{code}
 

 

Now obviously this would be breaking change for user code if the function is 
called with kwargs style. If we rename `str` to `src` or `col`, old code 
calling `foo(str="x", bar="y")` would break; though `foo("x", bar="y")` would 
be fine.

 

Is this change a possibility? Or are we thinking that the kwargs breaking 
change is not enough of a benefit to make?

 

 

 

  was:
Given that spark 4.0.0 is upcoming I wonder if we should at least consider 
renaming certain function variable naming in python. Otherwise, we may need to 
wait another 4 years to do so.

Example

[https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768]

There are 8 uses of `len` and 35 `str` as variable names, both of which are 
python built-ins. Shadowing `str` is somewhat dangerous in that the following 
would be non-sensical. 

 
{code:java}
def foo(str: "ColumnOrName", bar: "ColumnOrName"):
      bar = if lit(bar) if isinstance(bar, str) else bar  # str is variable 
now, cannot be used as type
{code}
 

 

Now obviously this would be breaking change for user code if the function is 
called with kwargs style. If we rename `str` to `src` or `col`, old code 
calling `foo(str="x", bar="y")` would break; though `foo("x", bar="y")` would 
be fine.

 

Is this change a possibility? Or are we thinking that the kwargs breaking 
change is not enough of a benefit to make?

 

 

 


> [PYTHON] Avoid shadowing python built-ins in python function variable naming
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-47854
>                 URL: https://issues.apache.org/jira/browse/SPARK-47854
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.4.1, 3.5.0, 3.5.1, 3.3.4
>            Reporter: Liu Cao
>            Priority: Major
>
> Given that spark 4.0.0 is upcoming I wonder if we should at least consider 
> renaming certain function variable naming in python. Otherwise, we may need 
> to wait another 4 years to do so.
> Example
> [https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768]
> There are 8 uses of `len` and 35 `str` as variable names, both of which are 
> python built-ins. Shadowing `str` is somewhat dangerous in that the following 
> would be non-sensical. 
>  
> {code:java}
> def foo(str: "ColumnOrName", bar: "ColumnOrName"):
>     # str is variable now, cannot be used as type
>     bar = if lit(bar) if isinstance(bar, str) else bar
> {code}
>  
>  
> Now obviously this would be breaking change for user code if the function is 
> called with kwargs style. If we rename `str` to `src` or `col`, old code 
> calling `foo(str="x", bar="y")` would break; though `foo("x", bar="y")` would 
> be fine.
>  
> Is this change a possibility? Or are we thinking that the kwargs breaking 
> change is not enough of a benefit to make?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to