[ https://issues.apache.org/jira/browse/SPARK-28411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886381#comment-16886381 ]
Huaxin Gao commented on SPARK-28411: ------------------------------------ I am working on this. Will submit a PR soon. > insertInto with overwrite inconsistent behaviour Python/Scala > ------------------------------------------------------------- > > Key: SPARK-28411 > URL: https://issues.apache.org/jira/browse/SPARK-28411 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 2.2.1, 2.4.0 > Reporter: Maria Rebelka > Priority: Minor > > The df.write.mode("overwrite").insertInto("table") has inconsistent behaviour > between Scala and Python. In Python, insertInto ignores "mode" parameter and > appends by default. Only when changing syntax to df.write.insertInto("table", > overwrite=True) we get expected behaviour. > This is a native Spark syntax, expected to be the same between languages... > Also, in other write methods, like saveAsTable or write.parquet "mode" seem > to be respected. > Reproduce, Python, ignore "overwrite": > {code:java} > df = spark.createDataFrame(sc.parallelize([(1, 2),(3,4)]),['i','j']) > # create the table and load data > df.write.saveAsTable("spark_overwrite_issue") > # insert overwrite, expected result - 2 rows > df.write.mode("overwrite").insertInto("spark_overwrite_issue") > spark.sql("select * from spark_overwrite_issue").count() > # result - 4 rows, insert appended data instead of overwrite{code} > Reproduce, Scala, works as expected: > {code:java} > val df = Seq((1, 2),(3,4)).toDF("i","j") > df.write.mode("overwrite").insertInto("spark_overwrite_issue") > spark.sql("select * from spark_overwrite_issue").count() > # result - 2 rows{code} > Tested on Spark 2.2.1 (EMR) and 2.4.0 (Databricks) -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org