[ 
https://issues.apache.org/jira/browse/SPARK-44543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dipayan Dev updated SPARK-44543:
--------------------------------
    Attachment: image-2023-07-25-17-52-55-006.png

> Cleanup .spark-staging directories when yarn application fails
> --------------------------------------------------------------
>
>                 Key: SPARK-44543
>                 URL: https://issues.apache.org/jira/browse/SPARK-44543
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, Spark Shell
>    Affects Versions: 3.4.1
>            Reporter: Dipayan Dev
>            Priority: Major
>         Attachments: image-2023-07-25-17-52-55-006.png
>
>
> Spark creates the staging directories like .hive-staging, .spark-staging etc 
> which get created when you run an upsert to a partitioned table. Such 
> directories will remain forever in Google Cloud Storage, in case the yarn 
> application manager gets killed.
>  
> Over time this pileup and incurs a lot of cloud storage cost.
>  
> Can we update our File committer to clean up the temporary directories in 
> case the job commit fails.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to