[ 
https://issues.apache.org/jira/browse/SPARK-52124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangsheng updated SPARK-52124:
-------------------------------
    Description: 
When submitting applications using Spark in standalone mode, a folder is 
generated under the {{work}} directory on each node every time a application is 
submitted. The naming convention for these folders is, for example, 
{{{}app-20250212191730-0249{}}}. These folders contain the resource files that 
each node downloads from the master node when the task is submitted. Although 
there is a scheduled cleanup mechanism ({{{}spark.worker.cleanup.enabled{}}}), 
it is not immediate. {color:#ff0000}If a large number of tasks are submitted in 
a short period of time, and each application depends on a significant amount of 
external resources, the disk space can be quickly exhausted.{color}
 
Therefore, I suggest actively deleting the disk space occupied under the 
{{work}} directory after each task is completed.

  was:
When submitting tasks using Spark in standalone mode, a folder is generated 
under the {{work}} directory on each node every time a task is submitted. The 
naming convention for these folders is, for example, 
{{{}app-20250212191730-0249{}}}. These folders contain the resource files that 
each node downloads from the master node when the task is submitted. Although 
there is a scheduled cleanup mechanism ({{{}spark.worker.cleanup.enabled{}}}), 
it is not immediate. {color:#FF0000}If a large number of tasks are submitted in 
a short period of time, and each task depends on a significant amount of 
external resources, the disk space can be quickly exhausted.{color}
 
Therefore, I suggest actively deleting the disk space occupied under the 
{{work}} directory after each task is completed.


> Actively Releasing Disk Space After Application Completion in Spark 
> Standalone Mode
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-52124
>                 URL: https://issues.apache.org/jira/browse/SPARK-52124
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.5.5
>            Reporter: huangsheng
>            Priority: Minor
>
> When submitting applications using Spark in standalone mode, a folder is 
> generated under the {{work}} directory on each node every time a application 
> is submitted. The naming convention for these folders is, for example, 
> {{{}app-20250212191730-0249{}}}. These folders contain the resource files 
> that each node downloads from the master node when the task is submitted. 
> Although there is a scheduled cleanup mechanism 
> ({{{}spark.worker.cleanup.enabled{}}}), it is not immediate. 
> {color:#ff0000}If a large number of tasks are submitted in a short period of 
> time, and each application depends on a significant amount of external 
> resources, the disk space can be quickly exhausted.{color}
>  
> Therefore, I suggest actively deleting the disk space occupied under the 
> {{work}} directory after each task is completed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to