Derk Crezee created SPARK-35876:
-----------------------------------

             Summary: array_zip unexpected column names
                 Key: SPARK-35876
                 URL: https://issues.apache.org/jira/browse/SPARK-35876
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.1.2
            Reporter: Derk Crezee


{{When I'm using the array_zip function in combination with renamed columns, I 
get an unexpected schema written to disk.}}

 
{code:java}
// code placeholder
data = [
  Row(a1=["a", "a"], b1=["b", "b"]),
]
df = (
  spark.sparkContext.parallelize(data).toDF()
    .withColumnRenamed("a1", "a2")
    .withColumnRenamed("b1", "b2")
    .withColumn("zipped", arrays_zip(col("a2"), col("b2")))
)
df.printSchema()
// root
//  |-- a2: array (nullable = true)
//  |    |-- element: string (containsNull = true)
//  |-- b2: array (nullable = true)
//  |    |-- element: string (containsNull = true)
//  |-- zipped: array (nullable = true)
//  |    |-- element: struct (containsNull = false)
//  |    |    |-- a2: string (nullable = true)
//  |    |    |-- b2: string (nullable = true)

df.write.save("test.parquet")
spark.read.load("test.parquet").printSchema()
// root
//  |-- a2: array (nullable = true)
//  |    |-- element: string (containsNull = true)
//  |-- b2: array (nullable = true)
//  |    |-- element: string (containsNull = true)
//  |-- zipped: array (nullable = true)
//  |    |-- element: struct (containsNull = true)
//  |    |    |-- a1: string (nullable = true)
//  |    |    |-- b1: string (nullable = true){code}
I would expect the schema of the DataFrame written to disk to be the same as 
that printed out. It seems that instead of using the renamed version of the 
column names, it uses the old column names.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to