[ https://issues.apache.org/jira/browse/SPARK-47854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liu Cao updated SPARK-47854: ---------------------------- Description: Given that spark 4.0.0 is upcoming I wonder if we should at least consider renaming certain function variable naming in python. Otherwise, we may need to wait another 4 years to do so. Example [https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768] There are 8 uses of `len` and 35 `str` as variable names, both of which are python built-ins. Shadowing `str` is somewhat dangerous in that the following would be non-sensical. {code:java} def foo(str: "ColumnOrName", bar: "ColumnOrName"): # str is variable now, cannot be used as type bar = if lit(bar) if isinstance(bar, str) else bar {code} Now obviously this would be breaking change for user code if the function is called with kwargs style. If we rename `str` to `src` or `col`, old code calling `foo(str="x", bar="y")` would break; though `foo("x", bar="y")` would be fine. Is this change a possibility? Or are we thinking that the kwargs breaking change is not enough of a benefit to make? was: Given that spark 4.0.0 is upcoming I wonder if we should at least consider renaming certain function variable naming in python. Otherwise, we may need to wait another 4 years to do so. Example [https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768] There are 8 uses of `len` and 35 `str` as variable names, both of which are python built-ins. Shadowing `str` is somewhat dangerous in that the following would be non-sensical. {code:java} def foo(str: "ColumnOrName", bar: "ColumnOrName"): bar = if lit(bar) if isinstance(bar, str) else bar # str is variable now, cannot be used as type {code} Now obviously this would be breaking change for user code if the function is called with kwargs style. If we rename `str` to `src` or `col`, old code calling `foo(str="x", bar="y")` would break; though `foo("x", bar="y")` would be fine. Is this change a possibility? Or are we thinking that the kwargs breaking change is not enough of a benefit to make? > [PYTHON] Avoid shadowing python built-ins in python function variable naming > ---------------------------------------------------------------------------- > > Key: SPARK-47854 > URL: https://issues.apache.org/jira/browse/SPARK-47854 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.4.1, 3.5.0, 3.5.1, 3.3.4 > Reporter: Liu Cao > Priority: Major > > Given that spark 4.0.0 is upcoming I wonder if we should at least consider > renaming certain function variable naming in python. Otherwise, we may need > to wait another 4 years to do so. > Example > [https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768] > There are 8 uses of `len` and 35 `str` as variable names, both of which are > python built-ins. Shadowing `str` is somewhat dangerous in that the following > would be non-sensical. > > {code:java} > def foo(str: "ColumnOrName", bar: "ColumnOrName"): > # str is variable now, cannot be used as type > bar = if lit(bar) if isinstance(bar, str) else bar > {code} > > > Now obviously this would be breaking change for user code if the function is > called with kwargs style. If we rename `str` to `src` or `col`, old code > calling `foo(str="x", bar="y")` would break; though `foo("x", bar="y")` would > be fine. > > Is this change a possibility? Or are we thinking that the kwargs breaking > change is not enough of a benefit to make? > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org