Charles Carlson created SPARK-54518:
---------------------------------------

             Summary: PySpark 4.0.1 DataFrame Column Type Mismatch
                 Key: SPARK-54518
                 URL: https://issues.apache.org/jira/browse/SPARK-54518
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 4.0.1, 4.0.0
         Environment: I have a Macbook Pro an M2 Pro Chip. I'm using Python 
3.10.18 and PySpark 4.0.1. My java/jdk info is pasted below.

 

 
            Reporter: Charles Carlson


It is possible to create a DataFrame with a schema including IntergerType and 
DoubleType values that are then cast into StringType incorrectly. In this 
attached notebook photo we can see that a DataFrama is created in two normal 
ways with integers and floats that are then inexplicably cast to strings 
without a path for reversal. The desired behavior is to have a DataFrame 
created with the columns `INT_COL` to be an `IntegerType` and `DOUBLE_COL` as a 
`DoubleType`. 

!image-2025-11-25-18-47-57-623.png!

 

 

Code to replicate this:

 
{code:java}
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, DoubleType, 
StringType
from pyspark.sql.functions import col
import pandas as pd
spark = SparkSession.Builder().getOrCreate()
{code}
 
{code:java}
data_types = StructType(
    [
        StructField("STRING_COL", StringType()),
        StructField("INT_COL", IntegerType()),
        StructField("DOUBLE_COL", DoubleType()),
    ]
)

sdf = spark.createDataFrame([("Hello World", 1, 1 / 2), (None, None, None)] , 
schema=data_types)
sdf.describe() {code}
{code:java}
cast_sdf = sdf.withColumn("NEW_INT_COL", col("INT_COL").cast(IntegerType())) 
cast_sdf.describe()
{code}
{code:java}
pdf = pd.DataFrame([("Hello World", 1, 1 / 2), (None, None, None)], columns = 
["STRING_COL", "INT_COL", "DOUBLE_COL"])
pdf.describe()
new_sdf = spark.createDataFrame(pdf)
new_sdf.describe() {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to