[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running applications

2014-07-28 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076403#comment-14076403
 ] 

Aaron Davidson commented on SPARK-1860:
---

There's not an easy way to tell if an application is still running. However, 
the Worker has state about which executors are still running. This is really 
what I intended originally -- we must not clean up an Executor's own state from 
underneath it. I will change the title to reflect this intention.

> Standalone Worker cleanup should not clean up running applications
> --
>
> Key: SPARK-1860
> URL: https://issues.apache.org/jira/browse/SPARK-1860
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Priority: Critical
> Fix For: 1.1.0
>
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> applications that happen to be running for longer than 7 days, hitting 
> streaming jobs especially hard.
> Applications should not be cleaned up if they're still running. Until then, 
> this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running applications

2014-07-27 Thread Mingyu Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075911#comment-14075911
 ] 

Mingyu Kim commented on SPARK-1860:
---

Friendly ping? [~pwendell] Can you let me know if there is an easy way to tell 
from the worker node whether an app is active? Or, should we just go with the 
rather-fragile design as I proposed right above?

> Standalone Worker cleanup should not clean up running applications
> --
>
> Key: SPARK-1860
> URL: https://issues.apache.org/jira/browse/SPARK-1860
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Priority: Critical
> Fix For: 1.1.0
>
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> applications that happen to be running for longer than 7 days, hitting 
> streaming jobs especially hard.
> Applications should not be cleaned up if they're still running. Until then, 
> this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running applications

2014-06-29 Thread Mingyu Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047074#comment-14047074
 ] 

Mingyu Kim commented on SPARK-1860:
---

[~pwendell], would there be an easy way to tell from the worker node whether an 
app directory is active or not? In other words, can a worker node get the list 
of active application ids from the master? I thought this was not doable, so 
was just going to wipe out all app directories that haven't been used (i.e. no 
jobs have run even if the the application is still alive) based on the last 
modified date of the log files. What do you think?

> Standalone Worker cleanup should not clean up running applications
> --
>
> Key: SPARK-1860
> URL: https://issues.apache.org/jira/browse/SPARK-1860
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Priority: Critical
> Fix For: 1.1.0
>
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> applications that happen to be running for longer than 7 days, hitting 
> streaming jobs especially hard.
> Applications should not be cleaned up if they're still running. Until then, 
> this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running applications

2014-05-21 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005596#comment-14005596
 ] 

Andrew Ash commented on SPARK-1860:
---

So the Spark master webui shows the running applications, so it at least knows 
what's running.  I guess since this is running on a worker it may need to be 
told by the master what the active applications are.  Not sure the internals of 
Spark very well but there's got to be a way to determine this.

> Standalone Worker cleanup should not clean up running applications
> --
>
> Key: SPARK-1860
> URL: https://issues.apache.org/jira/browse/SPARK-1860
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Priority: Critical
> Fix For: 1.1.0
>
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> applications that happen to be running for longer than 7 days, hitting 
> streaming jobs especially hard.
> Applications should not be cleaned up if they're still running. Until then, 
> this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running applications

2014-05-21 Thread Mingyu Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004747#comment-14004747
 ] 

Mingyu Kim commented on SPARK-1860:
---

[~aash], is there a reliable way to check "folder is owned by a running 
application". I thought that's not possible, so I was just going to have the 
second if statement, which means folder for running applications that just 
haven't been active for TTS will also get wiped out, assuming that executor is 
writing out something to either stdout or stderr when it runs some computation.

This also means that if you have a long-running inactive application, the 
application should send a "heartbeat" by running a trivial computation once 
every while.

Any suggestions?

> Standalone Worker cleanup should not clean up running applications
> --
>
> Key: SPARK-1860
> URL: https://issues.apache.org/jira/browse/SPARK-1860
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Priority: Critical
> Fix For: 1.1.0
>
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> applications that happen to be running for longer than 7 days, hitting 
> streaming jobs especially hard.
> Applications should not be cleaned up if they're still running. Until then, 
> this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running applications

2014-05-18 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001151#comment-14001151
 ] 

Andrew Ash commented on SPARK-1860:
---

[~mkim] is going to take a look at this after discussion at 
https://issues.apache.org/jira/browse/SPARK-1154

I think the correct fix as Patrick outlines would be:

{code}
// pseudocode
for folder in onDiskFolders:
if folder is owned by a running application:
continue
if folder contains any folder/file (recursively) that is more recently 
touched (mtime) than the TTS:
continue
cleanUp(folder)
{code}

Schedule that to run periodically (interval configured by setting) and this 
should be all fixed up.

Is that right?

An alternative approach could be to have executor clean up the application's 
work directory when the application terminates, but un-clean executor shutdown 
could still leave work directories around so a TTL approach still needs to be 
included as well.

> Standalone Worker cleanup should not clean up running applications
> --
>
> Key: SPARK-1860
> URL: https://issues.apache.org/jira/browse/SPARK-1860
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Priority: Critical
> Fix For: 1.1.0
>
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> applications that happen to be running for longer than 7 days, hitting 
> streaming jobs especially hard.
> Applications should not be cleaned up if they're still running. Until then, 
> this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running applications

2014-05-16 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999602#comment-13999602
 ] 

Patrick Wendell commented on SPARK-1860:


I think it would be better to only start the TTL once an executor has finished 
and to only delete the specific folder used by the executor.

> Standalone Worker cleanup should not clean up running applications
> --
>
> Key: SPARK-1860
> URL: https://issues.apache.org/jira/browse/SPARK-1860
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Priority: Critical
> Fix For: 1.0.0
>
>
> The default values of the standalone worker cleanup code cleanup all 
> application data every 7 days. This includes jars that were added to any 
> applications that happen to be running for longer than 7 days, hitting 
> streaming jobs especially hard.
> Applications should not be cleaned up if they're still running. Until then, 
> this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)