spark git commit: [SPARK-22601][SQL] Data load is getting displayed successful on providing non existing nonlocal file path

lixiao Thu, 30 Nov 2017 20:47:39 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 af8a692d6 -> ba00bd961



[SPARK-22601][SQL] Data load is getting displayed successful on providing non 
existing nonlocal file path

## What changes were proposed in this pull request?
When user tries to load data with a non existing hdfs file path system is not 
validating it and the load command operation is getting successful.
This is misleading to the user. already there is a validation in the scenario 
of none existing local file path. This PR has added validation in the scenario 
of nonexisting hdfs file path
## How was this patch tested?
UT has been added for verifying the issue, also snapshots has been added after 
the verification in a spark yarn cluster

Author: sujith71955 <sujithchacko.2...@gmail.com>

Closes #19823 from sujith71955/master_LoadComand_Issue.

(cherry picked from commit 16adaf634bcca3074b448d95e72177eefdf50069)
Signed-off-by: gatorsmile <gatorsm...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ba00bd96
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ba00bd96
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ba00bd96

Branch: refs/heads/branch-2.2
Commit: ba00bd9615cc37a903f4333dad57e0eeafbdfd0c
Parents: af8a692
Author: sujith71955 <sujithchacko.2...@gmail.com>
Authored: Thu Nov 30 20:45:30 2017 -0800
Committer: gatorsmile <gatorsm...@gmail.com>
Committed: Thu Nov 30 20:46:46 2017 -0800

----------------------------------------------------------------------
 .../org/apache/spark/sql/execution/command/tables.scala     | 9 ++++++++-
 .../org/apache/spark/sql/hive/execution/HiveDDLSuite.scala  | 9 +++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ba00bd96/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
----------------------------------------------------------------------
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index 8b61240..126c1cb 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -333,7 +333,7 @@ case class LoadDataCommand(
         uri
       } else {
         val uri = new URI(path)
-        if (uri.getScheme() != null && uri.getAuthority() != null) {
+        val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != 
null) {
           uri
         } else {
           // Follow Hive's behavior:
@@ -373,6 +373,13 @@ case class LoadDataCommand(
           }
           new URI(scheme, authority, absolutePath, uri.getQuery(), 
uri.getFragment())
         }
+        val hadoopConf = sparkSession.sessionState.newHadoopConf()
+        val srcPath = new Path(hdfsUri)
+        val fs = srcPath.getFileSystem(hadoopConf)
+        if (!fs.exists(srcPath)) {
+          throw new AnalysisException(s"LOAD DATA input path does not exist: 
$path")
+        }
+        hdfsUri
       }
 
     if (partition.nonEmpty) {

http://git-wip-us.apache.org/repos/asf/spark/blob/ba00bd96/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
----------------------------------------------------------------------
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
index c1c8281..f4c2625 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
@@ -1983,4 +1983,13 @@ class HiveDDLSuite
       }
     }
   }
+
+  test("load command for non local invalid path validation") {
+    withTable("tbl") {
+      sql("CREATE TABLE tbl(i INT, j STRING)")
+      val e = intercept[AnalysisException](
+        sql("load data inpath '/doesnotexist.csv' into table tbl"))
+      assert(e.message.contains("LOAD DATA input path does not exist"))
+    }
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22601][SQL] Data load is getting displayed successful on providing non existing nonlocal file path

Reply via email to