[Spark SQL]: Spark 3.2 generates different results to query when columns name have mixed casing vs when they have same casing

Amit Singh Rathore Wed, 08 Feb 2023 21:29:10 -0800

Hi Team,

I am running a query in Spark 3.2.


val df1 =
sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4",
"col5")
val op_cols_same_case = List("id","col2","col3","col4", "col5", "id")
val df2 = df1.select(op_cols_same_case .head, op_cols_same_case .tail: _*)
df2.select("id").show()

This query runs fine. But when I change the casing of the op_cols to have
mix of upper & lower case ("id" & "ID") it throws an ambiguous col ref
error.

val df1 =
sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4",
"col5")
val op_cols_same_case = List("id","col2","col3","col4", "col5", "ID")
val df2 = df1.select(op_cols_same_case .head, op_cols_same_case .tail: _*)
df2.select("id").show()

My question is why is this different behavior when I have duplicate columns
with the same names ("id", "id") vs the same name in different cases ("id",
"ID")? Either both should fail or non should fail considering spark
caseSensitive is false by default in 3.2

Note I checked, this issue is there in spark 2.4 as well. It works for both
case (mixed & single casing) spark 2.3.


Thanks
Spark user

[Spark SQL]: Spark 3.2 generates different results to query when columns name have mixed casing vs when they have same casing

Reply via email to