[ 
https://issues.apache.org/jira/browse/SPARK-39348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max updated SPARK-39348:
------------------------
    Description: 
When you attempt to rerun an Apache Spark write operation by cancelling the 
currently running job, the following error occurs:

Error: org.apache.spark.sql.AnalysisException: Cannot create the managed 
table('`testdb`.` testtable`').
The associated location ('dbfs:/user/hive/warehouse/testdb.db/metastore_cache_ 
testtable) already exists.;

This problem can occur if:
 * The cluster is terminated while a write operation is in progress.
 * A temporary network issue occurs.
 * The job is interrupted.

Once the metastore data for a particular table is corrupted, it is hard to 
recover except by dropping the files in that location manually. Basically, the 
problem is that a metadata directory called {{_STARTED}} isn’t deleted 
automatically when Azure Databricks tries to overwrite it.

You can reproduce the problem by following these steps:

1. Create a DataFrame:

{{val df = spark.range(1000)}}

2. Write the DataFrame to a location in overwrite mode:

{{df.write.mode(SaveMode.Overwrite).saveAsTable("testdb.testtable")}}

3. Cancel the command while it is executing.

4. Re-run the {{write}} command.

  was:
When you attempt to rerun an Apache Spark write operation by cancelling the 
currently running job, the following error occurs:

Error: org.apache.spark.sql.AnalysisException: Cannot create the managed 
table('`testdb`.` testtable`').
The associated location ('dbfs:/user/hive/warehouse/testdb.db/metastore_cache_ 
testtable) already exists.;

This problem can occur if:
 * The cluster is terminated while a write operation is in progress.
 * A temporary network issue occurs.
 * The job is interrupted.

Once the metastore data for a particular table is corrupted, it is hard to 
recover except by dropping the files in that location manually. Basically, the 
problem is that a metadata directory called {{_STARTED}} isn’t deleted 
automatically when Azure Databricks tries to overwrite it.

You can reproduce the problem by following these steps:
 # Create a DataFrame:

{{val df = spark.range(1000)}}

 # Write the DataFrame to a location in overwrite mode:

{{df.write.mode(SaveMode.Overwrite).saveAsTable("testdb.testtable")}}

 # Cancel the command while it is executing.

 # Re-run the {{write}} command.


> Create table in overwrite mode fails when interrupted
> -----------------------------------------------------
>
>                 Key: SPARK-39348
>                 URL: https://issues.apache.org/jira/browse/SPARK-39348
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 3.1.1
>            Reporter: Max
>            Priority: Major
>
> When you attempt to rerun an Apache Spark write operation by cancelling the 
> currently running job, the following error occurs:
> Error: org.apache.spark.sql.AnalysisException: Cannot create the managed 
> table('`testdb`.` testtable`').
> The associated location 
> ('dbfs:/user/hive/warehouse/testdb.db/metastore_cache_ testtable) already 
> exists.;
> This problem can occur if:
>  * The cluster is terminated while a write operation is in progress.
>  * A temporary network issue occurs.
>  * The job is interrupted.
> Once the metastore data for a particular table is corrupted, it is hard to 
> recover except by dropping the files in that location manually. Basically, 
> the problem is that a metadata directory called {{_STARTED}} isn’t deleted 
> automatically when Azure Databricks tries to overwrite it.
> You can reproduce the problem by following these steps:
> 1. Create a DataFrame:
> {{val df = spark.range(1000)}}
> 2. Write the DataFrame to a location in overwrite mode:
> {{df.write.mode(SaveMode.Overwrite).saveAsTable("testdb.testtable")}}
> 3. Cancel the command while it is executing.
> 4. Re-run the {{write}} command.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to