Charles Carlson created SPARK-54518:
---------------------------------------
Summary: PySpark 4.0.1 DataFrame Column Type Mismatch
Key: SPARK-54518
URL: https://issues.apache.org/jira/browse/SPARK-54518
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 4.0.1, 4.0.0
Environment: I have a Macbook Pro an M2 Pro Chip. I'm using Python
3.10.18 and PySpark 4.0.1. My java/jdk info is pasted below.
Reporter: Charles Carlson
It is possible to create a DataFrame with a schema including IntergerType and
DoubleType values that are then cast into StringType incorrectly. In this
attached notebook photo we can see that a DataFrama is created in two normal
ways with integers and floats that are then inexplicably cast to strings
without a path for reversal. The desired behavior is to have a DataFrame
created with the columns `INT_COL` to be an `IntegerType` and `DOUBLE_COL` as a
`DoubleType`.
!image-2025-11-25-18-47-57-623.png!
Code to replicate this:
{code:java}
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, DoubleType,
StringType
from pyspark.sql.functions import col
import pandas as pd
spark = SparkSession.Builder().getOrCreate()
{code}
{code:java}
data_types = StructType(
[
StructField("STRING_COL", StringType()),
StructField("INT_COL", IntegerType()),
StructField("DOUBLE_COL", DoubleType()),
]
)
sdf = spark.createDataFrame([("Hello World", 1, 1 / 2), (None, None, None)] ,
schema=data_types)
sdf.describe() {code}
{code:java}
cast_sdf = sdf.withColumn("NEW_INT_COL", col("INT_COL").cast(IntegerType()))
cast_sdf.describe()
{code}
{code:java}
pdf = pd.DataFrame([("Hello World", 1, 1 / 2), (None, None, None)], columns =
["STRING_COL", "INT_COL", "DOUBLE_COL"])
pdf.describe()
new_sdf = spark.createDataFrame(pdf)
new_sdf.describe() {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]