[ https://issues.apache.org/jira/browse/SPARK-40988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-40988: ------------------------------------ Assignee: Apache Spark > Test case for insert partition should verify value > --------------------------------------------------- > > Key: SPARK-40988 > URL: https://issues.apache.org/jira/browse/SPARK-40988 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 > Reporter: Ranga Reddy > Assignee: Apache Spark > Priority: Minor > > Spark3 has not validated the Partition Column type while inserting the data > but on the Hive side exception is thrown while inserting different type > values. > *Spark Code:* > > {code:java} > scala> val tableName="test_partition_table" > tableName: String = test_partition_table > scala>scala> spark.sql(s"DROP TABLE IF EXISTS $tableName") > res0: org.apache.spark.sql.DataFrame = [] > scala> spark.sql(s"CREATE EXTERNAL TABLE $tableName ( id INT, name STRING ) > PARTITIONED BY (age INT) LOCATION 'file:/tmp/spark-warehouse/$tableName'") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("SHOW tables").show(truncate=false) > +---------+---------------------+-----------+ > |namespace|tableName |isTemporary| > +---------+---------------------+-----------+ > |default |test_partition_table |false | > +---------+---------------------+-----------+ > scala> spark.sql("SET spark.sql.sources.validatePartitionColumns").show(50, > false) > +------------------------------------------+-----+ > |key |value| > +------------------------------------------+-----+ > |spark.sql.sources.validatePartitionColumns|true | > +------------------------------------------+-----+ > scala> spark.sql(s"""INSERT INTO $tableName partition (age=25) VALUES (1, > 'Ranga')""") > res4: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions > $tableName").show(50, false) > +---------+ > |partition| > +---------+ > |age=25 | > +---------+ > scala> spark.sql(s"select * from $tableName").show(50, false) > +---+-----+---+ > |id |name |age| > +---+-----+---+ > |1 |Ranga|25 | > +---+-----+---+ > scala> spark.sql(s"""INSERT INTO $tableName partition (age=\"test_age\") > VALUES (2, 'Nishanth')""") > res7: org.apache.spark.sql.DataFrame = []scala> spark.sql(s"show partitions > $tableName").show(50, false) > +------------+ > |partition | > +------------+ > |age=25 | > |age=test_age| > +------------+ > scala> spark.sql(s"select * from $tableName").show(50, false) > +---+--------+----+ > |id |name |age | > +---+--------+----+ > |1 |Ranga |25 | > |2 |Nishanth|null| > +---+--------+----+ {code} > *Hive Code:* > > > {code:java} > > INSERT INTO test_partition_table partition (age="test_age2") VALUES (3, > > 'Nishanth'); > Error: Error while compiling statement: FAILED: SemanticException [Error > 10248]: Cannot add partition column age of type string as it cannot be > converted to type int (state=42000,code=10248){code} > > *Expected Result:* > When *spark.sql.sources.validatePartitionColumns=true* it needs to be > validated the datatype value and exception needs to be thrown if we provide > wrong data type value. > *Reference:* > [https://spark.apache.org/docs/3.3.1/sql-migration-guide.html#data-sources] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org