[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152839#comment-14152839
 ] 

Aaron Davidson commented on SPARK-1860:
---------------------------------------

The Executor could clean up its own jars when it terminates normally, that 
seems fine. The impact of this seems limited, though, and it's a good idea to 
limit the scope of shutdown hooks as much as possible.

There are three classes of things to delete:
1. Shuffle files / block manager blocks -- large -- deleted by graceful 
Executor termination. Can be deleted immediately.
2. Uploaded jars / files -- usually small -- deleted by Worker cleanup. Can be 
deleted immediately.
3. Logs -- small to medium -- deleted by Worker cleanup. Should not be deleted 
immediately.

Number 1 is most critical in terms of impact on the system. Numbers 2 and 3 are 
of the same order of magnitude in size, so cleaning up 2 and not 3 is not 
expected to improve the system's stability by more than a factor of ~2x 
applications.

Note that the intentions of this particular JIRA are very simple: cleanup 2 and 
3 for all executors several days after they have terminated, rather than after 
they have started. If you wish to expand the scope of the Worker or Executor 
cleanup, that should be covered in a separate JIRA (which is welcome -- I just 
want to make sure we're on the same page about this particular issue!).

> Standalone Worker cleanup should not clean up running executors
> ---------------------------------------------------------------
>
>                 Key: SPARK-1860
>                 URL: https://issues.apache.org/jira/browse/SPARK-1860
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy
>    Affects Versions: 1.0.0
>            Reporter: Aaron Davidson
>            Priority: Blocker
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> executors that happen to be running for longer than 7 days, hitting streaming 
> jobs especially hard.
> Executor's log/data folders should not be cleaned up if they're still 
> running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to