[GitHub] spark pull request #20298: [SPARK-22976][Core]: Cluster mode driver dir remo...

2018-01-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20298


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20298: [SPARK-22976][Core]: Cluster mode driver dir remo...

2018-01-17 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request:

https://github.com/apache/spark/pull/20298

[SPARK-22976][Core]: Cluster mode driver dir removed while running

## What changes were proposed in this pull request?

The clean up logic on the worker perviously determined the liveness of a
particular applicaiton based on whether or not it had running executors.
This would fail in the case that a directory was made for a driver
running in cluster mode if that driver had no running executors on the
same machine. To preserve driver directories we consider both executors
and running drivers when checking directory liveness.

## How was this patch tested?

Manually started up two node cluster with a single core on each node. 
Turned on worker directory cleanup and set the interval to 1 second and 
liveness to one second. Without the patch the driver directory is removed 
immediately after the app is launched. With the patch it is not


### Without Patch
```
INFO  2018-01-05 23:48:24,693 Logging.scala:54 - Asked to launch driver 
driver-20180105234824-
INFO  2018-01-05 23:48:25,293 Logging.scala:54 - Changing view acls to: 
cassandra
INFO  2018-01-05 23:48:25,293 Logging.scala:54 - Changing modify acls to: 
cassandra
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - Changing view acls groups 
to:
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - Changing modify acls 
groups to:
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - SecurityManager: 
authentication disabled; ui acls disabled; users  with view permissions: 
Set(cassandra); groups with view permissions: Set(); users  with modify 
permissions: Set(cassandra); groups with modify permissions: Set()
INFO  2018-01-05 23:48:25,330 Logging.scala:54 - Copying user jar 
file:/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180105234824-/writeRead-0.1.jar
INFO  2018-01-05 23:48:25,332 Logging.scala:54 - Copying 
/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180105234824-/writeRead-0.1.jar
INFO  2018-01-05 23:48:25,361 Logging.scala:54 - Launch Command: 
"/usr/lib/jvm/jdk1.8.0_40//bin/java" 

INFO  2018-01-05 23:48:56,577 Logging.scala:54 - Removing directory: 
/var/lib/spark/worker/driver-20180105234824-  ### << Cleaned up

-- 
One minute passes while app runs (app has 1 minute sleep built in)
--

WARN  2018-01-05 23:49:58,080 ShuffleSecretManager.java:73 - Attempted to 
unregister application app-20180105234831- when it is not registered
INFO  2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - 
Application app-20180105234831- removed, cleanupLocalDirs = false
INFO  2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - 
Application app-20180105234831- removed, cleanupLocalDirs = false
INFO  2018-01-05 23:49:58,082 ExternalShuffleBlockResolver.java:163 - 
Application app-20180105234831- removed, cleanupLocalDirs = true
INFO  2018-01-05 23:50:00,999 Logging.scala:54 - Driver 
driver-20180105234824- exited successfully
```

With Patch
```
INFO  2018-01-08 23:19:54,603 Logging.scala:54 - Asked to launch driver 
driver-20180108231954-0002
INFO  2018-01-08 23:19:54,975 Logging.scala:54 - Changing view acls to: 
automaton
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls to: 
automaton
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing view acls groups 
to:
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls 
groups to:
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - SecurityManager: 
authentication disabled; ui acls disabled; users  with view permissions: 
Set(automaton); groups with view permissions: Set(); users  with modify 
permissions: Set(automaton); groups with modify permissions: Set()
INFO  2018-01-08 23:19:55,029 Logging.scala:54 - Copying user jar 
file:/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
INFO  2018-01-08 23:19:55,031 Logging.scala:54 - Copying 
/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
INFO  2018-01-08 23:19:55,038 Logging.scala:54 - Launch Command: ..
INFO  2018-01-08 23:21:28,674 ShuffleSecretManager.java:69 - Unregistered 
shuffle secret for application app-20180108232000-
INFO  2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - 
Application app-20180108232000- removed, cleanupLocalDirs = false
INFO  2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - 
Application app-20180108232000- removed, cleanupLocalDirs = false
INFO  2018-01-08 23:21:28,681 ExternalShuffleBlockResolver.java:163 - 
Application app-20180108232000- removed, cleanupLocalDirs = true
INFO  2018-01-08 23:21:31,703 Logging.scala:54 - Driver