[ https://issues.apache.org/jira/browse/SPARK-31178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061292#comment-17061292 ]
Burak Yavuz commented on SPARK-31178: ------------------------------------- cc [~wenchen] [~rdblue] > sql("INSERT INTO v2DataSource ...").collect() double inserts > ------------------------------------------------------------ > > Key: SPARK-31178 > URL: https://issues.apache.org/jira/browse/SPARK-31178 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Burak Yavuz > Priority: Blocker > > The following unit test fails in DataSourceV2SQLSuite: > {code:java} > test("do not double insert on INSERT INTO collect()") { > import testImplicits._ > val t1 = s"${catalogAndNamespace}tbl" > sql(s"CREATE TABLE $t1 (id bigint, data string) USING $v2Format") > val tmpView = "test_data" > val df = Seq((1L, "a"), (2L, "b"), (3L, "c")).toDF("id", "data") > df.createOrReplaceTempView(tmpView) > sql(s"INSERT INTO TABLE $t1 SELECT * FROM $tmpView").collect() > verifyTable(t1, df) > } {code} > The INSERT INTO is double inserting when ".collect()" is called. I think this > is because the V2 SparkPlans are not commands, and doExecute on a Spark plan > can be called multiple times causing data to be inserted multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org