[GitHub] [spark] AngersZhuuuu commented on a change in pull request #32411: [SPARK-28551][SQL]In CTAS with LOCATION , should not allow to a non-empty directory.

GitBox Wed, 05 May 2021 00:18:50 -0700


AngersZhuuuu commented on a change in pull request #32411:
URL: https://github.com/apache/spark/pull/32411#discussion_r626314340




##########
File path: docs/sql-migration-guide.md
##########
@@ -78,7 +78,9 @@ license: |
   - In Spark 3.2, the timestamps subtraction expression such as `timestamp 
'2021-03-31 23:48:00' - timestamp '2021-01-01 00:00:00'` returns values of 
`DayTimeIntervalType`. In Spark 3.1 and earlier, the type of the same 
expression is `CalendarIntervalType`. To restore the behavior before Spark 3.2, 
you can set `spark.sql.legacy.interval.enabled` to `true`.
 
   - In Spark 3.2, `CREATE TABLE .. LIKE ..` command can not use reserved 
properties. You need their specific clauses to specify them, for example, 
`CREATE TABLE test1 LIKE test LOCATION 'some path'`. You can set 
`spark.sql.legacy.notReserveProperties` to `true` to ignore the 
`ParseException`, in this case, these properties will be silently removed, for 
example: `TBLPROPERTIES('owner'='yao')` will have no effect. In Spark version 
3.1 and below, the reserved properties can be used in `CREATE TABLE .. LIKE ..` 
command but have no side effects, for example, 
`TBLPROPERTIES('location'='/tmp')` does not change the location of the table 
but only create a headless property just like `'a'='b'`.
-
+ 
+  - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty  `LOCATION` will 
throw `AnalysisException`. To restore the behavior before Spark 3.2, you can 
set `spark.sql.legacy.ctas.allowNonEmptyLocation` to `true`.

Review comment:
       >   with non-empty  `LOCATION`
   
   with non-empty `LOCATION` should use one space.
   
   
   
   
   

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -3707,6 +3716,8 @@ class SQLConf extends Serializable with Logging {
   def allowStarWithSingleTableIdentifierInCount: Boolean =
     getConf(SQLConf.ALLOW_STAR_WITH_SINGLE_TABLE_IDENTIFIER_IN_COUNT)
 
+  def allowCreateTableAsSelectNonEmptyDirPath: Boolean = 
getConf(SQLConf.ALLOW_CTAS_NON_EMPTY_PATH)

Review comment:
       allowCreateTableAsSelectNonEmptyLocation?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1604,6 +1604,15 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val ALLOW_CTAS_NON_EMPTY_PATH =
+    buildConf("spark.sql.legacy.ctas.allowNonEmptyLocation")

Review comment:
       ALLOW_CTAS_NON_EMPTY_LOCATION? should we keep consistence?

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
##########
@@ -96,4 +98,24 @@ object DataWritingCommand {
       sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY),
       metrics.values.toSeq)
   }
+
+  def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, sparkSession: 
SparkSession) {
+    if (saveMode != SaveMode.Overwrite &&
+      !sparkSession.sqlContext.conf.allowCreateTableAsSelectNonEmptyDirPath) {
+      val filePath = new org.apache.hadoop.fs.Path(tablePath)
+      val fs = 
filePath.getFileSystem(sparkSession.sparkContext.hadoopConfiguration)
+      if(fs != null && fs.exists(filePath)) {
+        val locStats = fs.getFileStatus(filePath)
+        if(locStats != null && locStats.isDirectory) {
+          val lStats = fs.listStatus(filePath)
+          if(lStats != null && lStats.length != 0) {
+            throw new AnalysisException(
+              s"CREATE-TABLE-AS-SELECT cannot create table" +
+                s" with location to a non-empty directory " +
+                s"${tablePath} .")

Review comment:
       Only two line should be ok.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #32411: [SPARK-28551][SQL]In CTAS with LOCATION , should not allow to a non-empty directory.

Reply via email to