Hi Team,

I am running a query in Spark 3.2.

val df1 =
sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4",
"col5")
val op_cols_same_case = List("id","col2","col3","col4", "col5", "id")
val df2 = df1.select(op_cols_same_case .head, op_cols_same_case .tail: _*)
df2.select("id").show()

This query runs fine. But when I change the casing of the op_cols to have
mix of upper & lower case ("id" & "ID") it throws an ambiguous col ref
error.

val df1 =
sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4",
"col5")
val op_cols_same_case = List("id","col2","col3","col4", "col5", "ID")
val df2 = df1.select(op_cols_same_case .head, op_cols_same_case .tail: _*)
df2.select("id").show()

My question is why is this different behavior when I have duplicate columns
with the same names ("id", "id") vs the same name in different cases ("id",
"ID")? Either both should fail or non should fail considering spark
caseSensitive is false by default in 3.2

Note I checked, this issue is there in spark 2.4 as well. It works for both
case (mixed & single casing) spark 2.3.


Thanks
Spark user

Reply via email to