Ranga Reddy created SPARK-40988: ----------------------------------- Summary: Spark3 partition column value is not validated with user provided schema. Key: SPARK-40988 URL: https://issues.apache.org/jira/browse/SPARK-40988 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0, 3.2.0, 3.1.0, 3.0.0 Reporter: Ranga Reddy Fix For: 3.4.0
Spark3 has not validated the Partition Column type while inserting the data but on the Hive side exception is thrown while inserting different type values. *Spark Code:* {code:java} scala> val tableName="test_partition_table" tableName: String = test_partition_table scala>scala> spark.sql(s"DROP TABLE IF EXISTS $tableName") res0: org.apache.spark.sql.DataFrame = [] scala> spark.sql(s"CREATE EXTERNAL TABLE $tableName ( id INT, name STRING ) PARTITIONED BY (age INT) LOCATION 'file:/tmp/spark-warehouse/$tableName'") res1: org.apache.spark.sql.DataFrame = [] scala> spark.sql("SHOW tables").show(truncate=false) +---------+---------------------+-----------+ |namespace|tableName |isTemporary| +---------+---------------------+-----------+ |default |test_partition_table |false | +---------+---------------------+-----------+ scala> spark.sql("SET spark.sql.sources.validatePartitionColumns").show(50, false) +------------------------------------------+-----+ |key |value| +------------------------------------------+-----+ |spark.sql.sources.validatePartitionColumns|true | +------------------------------------------+-----+ scala> spark.sql(s"""INSERT INTO $tableName partition (age=25) VALUES (1, 'Ranga')""") res4: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions $tableName").show(50, false) +---------+ |partition| +---------+ |age=25 | +---------+ scala> spark.sql(s"select * from $tableName").show(50, false) +---+-----+---+ |id |name |age| +---+-----+---+ |1 |Ranga|25 | +---+-----+---+ scala> spark.sql(s"""INSERT INTO $tableName partition (age=\"test_age\") VALUES (2, 'Nishanth')""") res7: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions $tableName").show(50, false) +------------+ |partition | +------------+ |age=25 | |age=test_age| +------------+ scala> spark.sql(s"select * from $tableName").show(50, false) +---+--------+----+ |id |name |age | +---+--------+----+ |1 |Ranga |25 | |2 |Nishanth|null| +---+--------+----+ {code} *Hive Code:* {code:java} > INSERT INTO test_partition_table partition (age="test_age2") VALUES (3, > 'Nishanth'); Error: Error while compiling statement: FAILED: SemanticException [Error 10248]: Cannot add partition column age of type string as it cannot be converted to type int (state=42000,code=10248){code} *Expected Result:* When *spark.sql.sources.validatePartitionColumns=true* it needs to be validated the datatype value and exception needs to be thrown if we provide wrong data type value. *Reference:* [https://spark.apache.org/docs/3.3.1/sql-migration-guide.html#data-sources] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org