[ https://issues.apache.org/jira/browse/SPARK-27140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-27140: ------------------------------------ Assignee: Apache Spark > The feature is 'insert overwrite local directory' has an inconsistent > behavior in different environment. > -------------------------------------------------------------------------------------------------------- > > Key: SPARK-27140 > URL: https://issues.apache.org/jira/browse/SPARK-27140 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0, 2.4.0, 3.0.0 > Reporter: jiaan.geng > Assignee: Apache Spark > Priority: Major > > In local[*] mode, maropu give a test case as follows: > {code:java} > $ls /tmp/noexistdir > ls: /tmp/noexistdir: No such file or directory > scala> sql("""create table t(c0 int, c1 int)""") > scala> spark.table("t").explain > == Physical Plan == > Scan hive default.t [c0#5, c1#6], HiveTableRelation `default`.`t`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6] > scala> sql("""insert into t values(1, 1)""") > scala> sql("""select * from t""").show > +---+---+ > | c0| c1| > +---+---+ > | 1| 1| > +---+---+ > scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * > from t""") > $ls /tmp/noexistdir/t/ > _SUCCESS part-00000-bbea4213-071a-49b4-aac8-8510e7263d45-c000 > {code} > This test case prove spark will create the not exists path and move middle > result from local temporary path to created path.This test based on newest > master. > I follow the test case provided by maropu,but find another behavior. > I run these SQL maropu provided on local[*] deploy mode based on 2.3.0. > Inconsistent behavior appears as follows: > {code:java} > ls /tmp/noexistdir > ls: cannot access /tmp/noexistdir: No such file or directory > scala> sql("""create table t(c0 int, c1 int)""") > res0: org.apache.spark.sql.DataFrame = [] > scala> spark.table("t").explain > == Physical Plan == > HiveTableScan [c0#5, c1#6], HiveTableRelation `default`.`t`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6] > scala> sql("""insert into t values(1, 1)""") > scala> sql("""select * from t""").show > +---+---+ > > | c0| c1| > +---+---+ > | 1| 1| > +---+---+ > scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * > from t""") > res1: org.apache.spark.sql.DataFrame = [] > ls /tmp/noexistdir/t/ > /tmp/noexistdir/t > vi /tmp/noexistdir/t > 1 > {code} > Then I pull the master branch and compile it and deploy it on my hadoop > cluster.I get the inconsistent behavior again. The spark version to test is > 3.0.0. > {code:java} > ls /tmp/noexistdir > ls: cannot access /tmp/noexistdir: No such file or directory > Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector > with the Serial old collector is deprecated and will likely be removed in a > future release > Spark context Web UI available at http://10.198.66.204:55326 > Spark context available as 'sc' (master = local[*], app id = > local-1551259036573). > Spark session available as 'spark'. > Welcome to spark version 3.0.0-SNAPSHOT > Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131) > Type in expressions to have them evaluated. > Type :help for more information. > scala> sql("""select * from t""").show > +---+---+ > > | c0| c1| > +---+---+ > | 1| 1| > +---+---+ > scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * > from t""") > res1: org.apache.spark.sql.DataFrame = [] > > scala> > ll /tmp/noexistdir/t > -rw-r--r-- 1 xitong xitong 0 Feb 27 17:19 /tmp/noexistdir/t > vi /tmp/noexistdir/t > 1 > {code} > The /tmp/noexistdir/t is a file too. > I create a PR `https://github.com/apache/spark/pull/23950` used for test the > behavior by UT. > UT results are the same as those of maropu's test, but different from mine. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org