[ 
https://issues.apache.org/jira/browse/SPARK-27140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27140:
------------------------------------

    Assignee: Apache Spark

> The feature is 'insert overwrite local directory' has an inconsistent 
> behavior in different environment.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27140
>                 URL: https://issues.apache.org/jira/browse/SPARK-27140
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0, 2.4.0, 3.0.0
>            Reporter: jiaan.geng
>            Assignee: Apache Spark
>            Priority: Major
>
> In local[*] mode, maropu give a test case as follows:
> {code:java}
> $ls /tmp/noexistdir
> ls: /tmp/noexistdir: No such file or directory
> scala> sql("""create table t(c0 int, c1 int)""")
> scala> spark.table("t").explain
> == Physical Plan ==
> Scan hive default.t [c0#5, c1#6], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6]
> scala> sql("""insert into t values(1, 1)""")
> scala> sql("""select * from t""").show
> +---+---+
> | c0| c1|
> +---+---+
> |  1|  1|
> +---+---+
> scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * 
> from t""")
> $ls /tmp/noexistdir/t/
> _SUCCESS  part-00000-bbea4213-071a-49b4-aac8-8510e7263d45-c000
> {code}
> This test case prove spark will create the not exists path and move middle 
> result from local temporary path to created path.This test based on newest 
> master.
> I follow the test case provided by maropu,but find another behavior.
>  I run these SQL maropu provided on local[*] deploy mode based on 2.3.0.
>  Inconsistent behavior appears as follows:
> {code:java}
> ls /tmp/noexistdir
> ls: cannot access /tmp/noexistdir: No such file or directory
> scala> sql("""create table t(c0 int, c1 int)""")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.table("t").explain
> == Physical Plan ==
> HiveTableScan [c0#5, c1#6], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6]
> scala> sql("""insert into t values(1, 1)""")
> scala> sql("""select * from t""").show
> +---+---+                                                                     
>   
> | c0| c1|
> +---+---+
> |  1|  1|
> +---+---+
> scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * 
> from t""")
> res1: org.apache.spark.sql.DataFrame = [] 
> ls /tmp/noexistdir/t/
> /tmp/noexistdir/t
> vi /tmp/noexistdir/t
>   1 
> {code}
> Then I pull the master branch and compile it and deploy it on my hadoop 
> cluster.I get the inconsistent behavior again. The spark version to test is 
> 3.0.0.
> {code:java}
> ls /tmp/noexistdir
> ls: cannot access /tmp/noexistdir: No such file or directory
> Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector 
> with the Serial old collector is deprecated and will likely be removed in a 
> future release
> Spark context Web UI available at http://10.198.66.204:55326
> Spark context available as 'sc' (master = local[*], app id = 
> local-1551259036573).
> Spark session available as 'spark'.
> Welcome to spark version 3.0.0-SNAPSHOT
> Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sql("""select * from t""").show
> +---+---+                                                                     
>   
> | c0| c1|
> +---+---+
> |  1|  1|
> +---+---+
> scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * 
> from t""")
> res1: org.apache.spark.sql.DataFrame = []                                     
>   
> scala> 
> ll /tmp/noexistdir/t
> -rw-r--r-- 1 xitong xitong 0 Feb 27 17:19 /tmp/noexistdir/t
> vi /tmp/noexistdir/t
>   1
> {code}
> The /tmp/noexistdir/t is a file too.
> I create a PR `https://github.com/apache/spark/pull/23950` used for test the 
> behavior by UT.
> UT results are the same as those of maropu's test, but different from mine.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to