Asif created SPARK-43112: ---------------------------- Summary: Spark may use a column other than the actual specified partitioning column for partitioning, for Hive format tables Key: SPARK-43112 URL: https://issues.apache.org/jira/browse/SPARK-43112 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.1 Reporter: Asif
The class org.apache.spark.sql.catalyst.catalog.HiveTableRelation has its output method implemented as // The partition column should always appear after data columns. override def output: Seq[AttributeReference] = dataCols ++ partitionCols But the DataWriting commands of spark like InsertIntoHiveDirCommand, expect that the out from HiveTableRelation is in the order in which the columns are actually defined in the DDL. As a result, multiple mistmatch scenarios can happen like: 1) data type casting exception being thrown , even though the data frame being inserted has schema which is identical to what is used for creating ddl. OR 2) Wrong column being used for partitioning , if the datatypes are same or castable, like datetype and long will be creating a PR with the bug test -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org