Ranga Reddy created SPARK-40988:
-----------------------------------

             Summary: Spark3 partition column value is not validated with user 
provided schema.
                 Key: SPARK-40988
                 URL: https://issues.apache.org/jira/browse/SPARK-40988
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.0, 3.2.0, 3.1.0, 3.0.0
            Reporter: Ranga Reddy
             Fix For: 3.4.0


Spark3 has not validated the Partition Column type while inserting the data but 
on the Hive side exception is thrown while inserting different type values.

*Spark Code:*

 
{code:java}
scala> val tableName="test_partition_table"
tableName: String = test_partition_table

scala>scala> spark.sql(s"DROP TABLE IF EXISTS $tableName")
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql(s"CREATE EXTERNAL TABLE $tableName ( id INT, name STRING ) 
PARTITIONED BY (age INT) LOCATION 'file:/tmp/spark-warehouse/$tableName'")
res1: org.apache.spark.sql.DataFrame = []

scala> spark.sql("SHOW tables").show(truncate=false)
+---------+---------------------+-----------+
|namespace|tableName            |isTemporary|
+---------+---------------------+-----------+
|default  |test_partition_table |false      |
+---------+---------------------+-----------+

scala> spark.sql("SET spark.sql.sources.validatePartitionColumns").show(50, 
false)
+------------------------------------------+-----+
|key                                       |value|
+------------------------------------------+-----+
|spark.sql.sources.validatePartitionColumns|true |
+------------------------------------------+-----+

scala> spark.sql(s"""INSERT INTO $tableName partition (age=25) VALUES (1, 
'Ranga')""")
res4: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions 
$tableName").show(50, false)
+---------+
|partition|
+---------+
|age=25   |
+---------+

scala> spark.sql(s"select * from $tableName").show(50, false)
+---+-----+---+
|id |name |age|
+---+-----+---+
|1  |Ranga|25 |
+---+-----+---+

scala> spark.sql(s"""INSERT INTO $tableName partition (age=\"test_age\") VALUES 
(2, 'Nishanth')""")
res7: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions 
$tableName").show(50, false)
+------------+
|partition   |
+------------+
|age=25      |
|age=test_age|
+------------+

scala> spark.sql(s"select * from $tableName").show(50, false)
+---+--------+----+
|id |name    |age |
+---+--------+----+
|1  |Ranga   |25  |
|2  |Nishanth|null|
+---+--------+----+ {code}
*Hive Code:*

 

 
{code:java}
> INSERT INTO test_partition_table partition (age="test_age2") VALUES (3, 
> 'Nishanth');
Error: Error while compiling statement: FAILED: SemanticException [Error 
10248]: Cannot add partition column age of type string as it cannot be 
converted to type int (state=42000,code=10248){code}
 

*Expected Result:*

When *spark.sql.sources.validatePartitionColumns=true* it needs to be validated 
the datatype value and exception needs to be thrown if we provide wrong data 
type value.

*Reference:*

[https://spark.apache.org/docs/3.3.1/sql-migration-guide.html#data-sources]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to