[GitHub] [spark] maropu commented on a change in pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

GitBox Wed, 24 Feb 2021 03:44:48 -0800


maropu commented on a change in pull request #30421:
URL: https://github.com/apache/spark/pull/30421#discussion_r581885977




##########
File path: docs/sql-migration-guide.md
##########
@@ -63,6 +63,8 @@ license: |
  
   - In Spark 3.2, the output schema of `SHOW TBLPROPERTIES` becomes `key: 
string, value: string` whether you specify the table property key or not. In 
Spark 3.1 and earlier, the output schema of `SHOW TBLPROPERTIES` is `value: 
string` when you specify the table property key. To restore the old schema with 
the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to 
`true`.
 
+  - In Spark 3.2, we support using corresponding typed literal of partition 
column value type as partition column value in SQL, such as if we have a 
partition table with partition column of date type, we can use typed date 
literal `date '2020-01-01'` as partition spec `PARTITION (dt = date 
'2020-01-01')`, it will be treated as partition column value `2020-01-01`. In 
Spark 3.1 and earlier, the partition value will be treated as string value 
`date '2020-01-01'` and it's a illegal date type string value and will been 
converted to `__HIVE_DEFAULT_PARTITION__`.

Review comment:
       How about simply saying it like this?
   ```
   In Spark 3.2, we support a typed literal for a partition constant value in a 
INSERT clause. 
   For example, a right-side constant value in `PARTITION (dt = 
date'2020-01-01')` is parsed
   as a date-typed literal in the partition spec. In Spark 3.1 and earlier, ...
   ```

##########
File path: docs/sql-ref-syntax-dml-insert-into.md
##########
@@ -40,8 +40,8 @@ INSERT INTO [ TABLE ] table_identifier [ partition_spec ] [ ( 
column_list ) ]
 
 * **partition_spec**
 
-    An optional parameter that specifies a comma-separated list of key and 
value pairs
-    for partitions.
+    An optional parameter that specifies a comma separated list of key and 
value pairs

Review comment:
       nit: why did you changed `comma-separated` => `comma separated`?

##########
File path: docs/sql-ref-syntax-dml-insert-overwrite-table.md
##########
@@ -40,8 +40,8 @@ INSERT OVERWRITE [ TABLE ] table_identifier [ partition_spec 
[ IF NOT EXISTS ] ]
 
 * **partition_spec**
 
-    An optional parameter that specifies a comma-separated list of key and 
value pairs
-    for partitions.
+    An optional parameter that specifies a comma separated list of key and 
value pairs

Review comment:
       ditto

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4021,6 +4021,27 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
       }
     }
   }
+
+  test("SPARK-33474: Support TypeConstructed partition spec value") {
+    withTable("t1", "t2", "t4") {
+      sql("CREATE TABLE t1(name STRING, part DATE) USING PARQUET PARTITIONED 
BY (part)")
+      sql("INSERT INTO t1 PARTITION(part = date'2019-01-02') VALUES('a')")
+      checkAnswer(sql("SELECT name, CAST(part AS STRING) FROM t1"), Row("a", 
"2019-01-02"))
+
+      sql("CREATE TABLE t2(name STRING, part TIMESTAMP) USING PARQUET 
PARTITIONED BY (part)")
+      sql("INSERT INTO t2 PARTITION(part = timestamp'2019-01-02 11:11:11') 
VALUES('a')")
+      checkAnswer(sql("SELECT name, CAST(part AS STRING) FROM t2"), Row("a", 
"2019-01-02 11:11:11"))
+
+      val e = intercept[AnalysisException] {
+        sql("CREATE TABLE t3(name STRING, part INTERVAL) USING PARQUET 
PARTITIONED BY (part)")
+      }.getMessage
+      assert(e.contains("Cannot use interval for partition column"))
+
+      sql("CREATE TABLE t4(name STRING, part BINARY) USING CSV PARTITIONED BY 
(part)")
+      sql(s"INSERT INTO t4 PARTITION(part = X'537061726B2053514C') 
VALUES('a')")
+      checkAnswer(sql("SELECT name, cast(part as string) FROM t4"), Row("a", 
"Spark SQL"))
+    }

Review comment:
       The last thing that I'm concerned about is whether we already have tests 
corresponding to the @cloud-fan 's last comment.
   
   https://github.com/apache/spark/pull/30421#issuecomment-734049999
   ```
   Let's make sure this feature works correctly:
   
    - All the literals are supported. Non-literals are forbidden. e.g. 
part_col=array(1) does not create a string value "array(1)".
    - Null literal is supported. We should use null instead of "null" to 
represent it.
    - If the literal data type doesn't match the partition column data type, we 
should do type check and cast like normal table insertion.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #30421: [SPARK-33474][SQL] Support TypeConstructed partition spec value

Reply via email to