[ https://issues.apache.org/jira/browse/SPARK-39348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Max updated SPARK-39348: ------------------------ Description: When you attempt to rerun an Apache Spark write operation by cancelling the currently running job, the following error occurs: Error: org.apache.spark.sql.AnalysisException: Cannot create the managed table('`testdb`.` testtable`'). The associated location ('dbfs:/user/hive/warehouse/testdb.db/metastore_cache_ testtable) already exists.; This problem can occur if: * The cluster is terminated while a write operation is in progress. * A temporary network issue occurs. * The job is interrupted. Once the metastore data for a particular table is corrupted, it is hard to recover except by dropping the files in that location manually. Basically, the problem is that a metadata directory called {{_STARTED}} isn’t deleted automatically when Azure Databricks tries to overwrite it. You can reproduce the problem by following these steps: 1. Create a DataFrame: {{val df = spark.range(1000)}} 2. Write the DataFrame to a location in overwrite mode: {{df.write.mode(SaveMode.Overwrite).saveAsTable("testdb.testtable")}} 3. Cancel the command while it is executing. 4. Re-run the {{write}} command. was: When you attempt to rerun an Apache Spark write operation by cancelling the currently running job, the following error occurs: Error: org.apache.spark.sql.AnalysisException: Cannot create the managed table('`testdb`.` testtable`'). The associated location ('dbfs:/user/hive/warehouse/testdb.db/metastore_cache_ testtable) already exists.; This problem can occur if: * The cluster is terminated while a write operation is in progress. * A temporary network issue occurs. * The job is interrupted. Once the metastore data for a particular table is corrupted, it is hard to recover except by dropping the files in that location manually. Basically, the problem is that a metadata directory called {{_STARTED}} isn’t deleted automatically when Azure Databricks tries to overwrite it. You can reproduce the problem by following these steps: # Create a DataFrame: {{val df = spark.range(1000)}} # Write the DataFrame to a location in overwrite mode: {{df.write.mode(SaveMode.Overwrite).saveAsTable("testdb.testtable")}} # Cancel the command while it is executing. # Re-run the {{write}} command. > Create table in overwrite mode fails when interrupted > ----------------------------------------------------- > > Key: SPARK-39348 > URL: https://issues.apache.org/jira/browse/SPARK-39348 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 3.1.1 > Reporter: Max > Priority: Major > > When you attempt to rerun an Apache Spark write operation by cancelling the > currently running job, the following error occurs: > Error: org.apache.spark.sql.AnalysisException: Cannot create the managed > table('`testdb`.` testtable`'). > The associated location > ('dbfs:/user/hive/warehouse/testdb.db/metastore_cache_ testtable) already > exists.; > This problem can occur if: > * The cluster is terminated while a write operation is in progress. > * A temporary network issue occurs. > * The job is interrupted. > Once the metastore data for a particular table is corrupted, it is hard to > recover except by dropping the files in that location manually. Basically, > the problem is that a metadata directory called {{_STARTED}} isn’t deleted > automatically when Azure Databricks tries to overwrite it. > You can reproduce the problem by following these steps: > 1. Create a DataFrame: > {{val df = spark.range(1000)}} > 2. Write the DataFrame to a location in overwrite mode: > {{df.write.mode(SaveMode.Overwrite).saveAsTable("testdb.testtable")}} > 3. Cancel the command while it is executing. > 4. Re-run the {{write}} command. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org