Furcy Pin created SPARK-42258: --------------------------------- Summary: pyspark.sql.functions should not expose typing.cast Key: SPARK-42258 URL: https://issues.apache.org/jira/browse/SPARK-42258 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.3.1 Reporter: Furcy Pin
In pyspark, the `pyspark.sql.functions` modules imports and exposes the method `typing.cast`. This may lead to errors from users that can be hard to spot. *Example* It took me a few minutes to understand why the following code: {code:java} from pyspark.sql import SparkSession from pyspark.sql import functions as f spark = SparkSession.builder.getOrCreate() df = spark.sql("""SELECT 1 as a""") df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code} which executes without any problem, gives the following result: {code:java} root |-- a: integer (nullable = false){code} This is because `f.cast` here calls `typing.cast, and the correct syntax is: {code:java} df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code} which indeed gives: {code:java} root |-- a: string (nullable = false) {code} *Suggestion of solution* Option 1: The methods imported in the module `pyspark.sql.functions` could be obfuscated to prevent this. For instance: {code:java} from typing import cast as _cast{code} Option 2: only import `typing` and replace all occurrences of `cast` with `typing.cast` -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org