[ https://issues.apache.org/jira/browse/SPARK-24116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460472#comment-16460472 ]
Rui Li commented on SPARK-24116: -------------------------------- [~hyukjin.kwon], sorry for the late response. For example, assume we have two non-partitioned tables, one is text table and the other is Parquet table. If we insert overwrite the text table, old data will go to HDFS trash. But if we insert overwrite the Parquet table, old data doesn't go to trash. I believe SparkSQL has different code paths to load data into different kinds of tables. And whether old data goes to trash is inconsistent among these code paths. Specifically, {{Hive::loadTable}} moves old data to trash but seems other code paths simply delete the old data. Ideally it's good if SparkSQL lets user specify whether old data goes to trash when overwriting, some feature like HIVE-15880. > SparkSQL inserting overwrite table has inconsistent behavior regarding HDFS > trash > --------------------------------------------------------------------------------- > > Key: SPARK-24116 > URL: https://issues.apache.org/jira/browse/SPARK-24116 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: Rui Li > Priority: Major > > When inserting overwrite a table, the old data may or may not go to trash > based on: > # Date format. E.g. text table may go to trash but parquet table doesn't. > # Whether table is partitioned. E.g. partitioned text table doesn't go to > trash while non-partitioned table does. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org