[ https://issues.apache.org/jira/browse/SPARK-16410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-16410. ------------------------------- Resolution: Duplicate > DataFrameWriter's jdbc method drops table in overwrite mode > ----------------------------------------------------------- > > Key: SPARK-16410 > URL: https://issues.apache.org/jira/browse/SPARK-16410 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.1, 1.6.2 > Reporter: Ian Hellstrom > > According to the [API > documentation|http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter], > the write mode {{overwrite}} should _overwrite the existing data_, which > suggests that the data is removed, i.e. the table is truncated. > However, that is now what happens in the [source > code|https://github.com/apache/spark/blob/0ad6ce7e54b1d8f5946dde652fa5341d15059158/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L421]: > {code} > if (mode == SaveMode.Overwrite && tableExists) { > JdbcUtils.dropTable(conn, table) > tableExists = false > } > {code} > This clearly shows that the table is first dropped and then recreated. This > causes two major issues: > * Existing indexes, partitioning schemes, etc. are completely lost. > * The case of identifiers may be changed without the user understanding why. > In my opinion, the table should be truncated, not dropped. Overwriting data > is a DML operation and should not cause DDL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org