[ https://issues.apache.org/jira/browse/SPARK-44543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dipayan Dev updated SPARK-44543: -------------------------------- Attachment: image-2023-07-25-17-52-55-006.png > Cleanup .spark-staging directories when yarn application fails > -------------------------------------------------------------- > > Key: SPARK-44543 > URL: https://issues.apache.org/jira/browse/SPARK-44543 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Spark Shell > Affects Versions: 3.4.1 > Reporter: Dipayan Dev > Priority: Major > Attachments: image-2023-07-25-17-52-55-006.png > > > Spark creates the staging directories like .hive-staging, .spark-staging etc > which get created when you run an upsert to a partitioned table. Such > directories will remain forever in Google Cloud Storage, in case the yarn > application manager gets killed. > > Over time this pileup and incurs a lot of cloud storage cost. > > Can we update our File committer to clean up the temporary directories in > case the job commit fails. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org