Furcy Pin created SPARK-42258:
---------------------------------

             Summary: pyspark.sql.functions should not expose typing.cast
                 Key: SPARK-42258
                 URL: https://issues.apache.org/jira/browse/SPARK-42258
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 3.3.1
            Reporter: Furcy Pin


In pyspark, the `pyspark.sql.functions` modules imports and exposes the method 
`typing.cast`.

This may lead to errors from users that can be hard to spot.


*Example*
It took me a few minutes to understand why the following code:

 
{code:java}
from pyspark.sql import SparkSession
from pyspark.sql import functions as f

spark = SparkSession.builder.getOrCreate()
df = spark.sql("""SELECT 1 as a""")
df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema()  {code}
which executes without any problem, gives the following result:

 

 
{code:java}
root
|-- a: integer (nullable = false){code}

This is because `f.cast` here calls `typing.cast, and the correct syntax is:
{code:java}
df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code}
 
which indeed gives:

{code:java}
root
 |-- a: string (nullable = false) {code}

*Suggestion of solution*

Option 1: The methods imported in the module `pyspark.sql.functions` could be 
obfuscated to prevent this. For instance:
{code:java}
from typing import cast as _cast{code}

Option 2: only import `typing` and replace all occurrences of `cast` with 
`typing.cast`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to