[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

2016-04-12 Thread chtyim
Github user chtyim commented on a diff in the pull request:

https://github.com/apache/spark/pull/12318#discussion_r59338307
  
--- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
@@ -155,6 +156,7 @@ private[spark] class HttpServer(
   throw new ServerStateException("Server is already stopped")
 } else {
   server.stop()
+  Option(server.getThreadPool).collect { case x: LifeCycle => x 
}.foreach(_.stop())
--- End diff --

I think I'll use the if `isInstanceOf` as it's the most obvious on the 
intention and doesn't need an empty catch all case (which is a bit overkill in 
here since there is only one condition to match).

Just for the sake of discussion of style, I think it depends on whether you 
interpret in the imperative way "if some condition then do something" vs the 
functional way "create option -> filter by condition -> apply operation", which 
can be a forever debate :) I myself lean against the functional way for doing 
simple task (like there one liner in here), as it is more concise; but the 
imperative way when need to implement more complicated logical flow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

2016-04-12 Thread chtyim
Github user chtyim commented on a diff in the pull request:

https://github.com/apache/spark/pull/12318#discussion_r59331847
  
--- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
@@ -155,6 +156,7 @@ private[spark] class HttpServer(
   throw new ServerStateException("Server is already stopped")
 } else {
   server.stop()
+  Option(server.getThreadPool).collect { case x: LifeCycle => x 
}.foreach(_.stop())
--- End diff --

I agree that when there are multiple chaining with not so obvious 
function/partial function applications, scala can become unreadable. However, 
for the case we are having here, I do think that using Option collect foreach 
is quite straightforward and easily readable (and compare to someone chaining 
multiple RDD operations, this is by far very readable).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

2016-04-12 Thread chtyim
Github user chtyim commented on a diff in the pull request:

https://github.com/apache/spark/pull/12318#discussion_r59326211
  
--- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
@@ -155,6 +156,7 @@ private[spark] class HttpServer(
   throw new ServerStateException("Server is already stopped")
 } else {
   server.stop()
+  Option(server.getThreadPool).collect { case x: LifeCycle => x 
}.foreach(_.stop())
--- End diff --

Just checking, should I use `isInstanceOf` instead of `case`, which will 
both avoid creating a partial function as well as it's most readable to both 
Scala and Java dev? Also it's fewer lines of code and no need to write the 
quite ugly `case _ => // Do nothing`?

```scala
val threadPool = server.getThreadPool
if (threadPool != null && threadPool.isInstanceOf[LifeCycle]) {
  threadPool.asInstanceOf[LifeCycle].stop
}
```

Besides, if we have to check `null` like you suggested, why not use 
`Option`, which is one of the most common construct Scala?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

2016-04-11 Thread chtyim
Github user chtyim commented on the pull request:

https://github.com/apache/spark/pull/12318#issuecomment-208718393
  
Addressed comment. Please have a look again. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

2016-04-11 Thread chtyim
Github user chtyim commented on a diff in the pull request:

https://github.com/apache/spark/pull/12318#discussion_r59320494
  
--- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
@@ -155,6 +158,7 @@ private[spark] class HttpServer(
   throw new ServerStateException("Server is already stopped")
 } else {
   server.stop()
+  condOpt(server.getThreadPool) { case x: LifeCycle => x 
}.foreach(_.stop())
--- End diff --

That's a bit non idiomatic in Scala. I'll go with the way as @srowen 
suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

2016-04-11 Thread chtyim
Github user chtyim commented on a diff in the pull request:

https://github.com/apache/spark/pull/12318#discussion_r59317619
  
--- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
@@ -155,6 +158,7 @@ private[spark] class HttpServer(
   throw new ServerStateException("Server is already stopped")
 } else {
   server.stop()
+  condOpt(server.getThreadPool) { case x: LifeCycle => x 
}.foreach(_.stop())
--- End diff --

Yes it is effectively doing the same thing, but using the `condOpt` from 
`PartitialFunction` instead. I can change it to your version if it is more 
preferable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

2016-04-11 Thread chtyim
GitHub user chtyim opened a pull request:

https://github.com/apache/spark/pull/12318

[SPARK-14513][CORE] Fix threads left behind after stopping SparkContext

## What changes were proposed in this pull request?

Shutting down `QueuedThreadPool` used by Jetty `Server` to avoid threads 
leakage after SparkContext is stopped.

Note: If this fix is going to apply to the `branch-1.6`, one more patch on 
the `NettyRpcEnv` class is needed so that the 
`NettyRpcEnv._fileServer.shutdown` is called in the `NettyRpcEnv.cleanup` 
method. This is due to the removal of `_fileServer` field in the `NettyRpcEnv` 
class in the master branch. Please advice if a second PR is necessary for bring 
this fix back to `branch-1.6`

## How was this patch tested?

Ran the ./dev/run-tests locally


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chtyim/spark fixes/SPARK-14513-thread-leak

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12318


commit cf7b296e9b56951a6c114007101e38ca79a49d13
Author: Terence Yim <tere...@cask.co>
Date:   2016-04-11T22:10:51Z

[SPARK-14513][CORE] Fix threads left behind after stopping SparkContext




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13441] [YARN] Fix NPE in yarn Client.cr...

2016-02-25 Thread chtyim
Github user chtyim commented on the pull request:

https://github.com/apache/spark/pull/11337#issuecomment-188707421
  
The reason why failing is bad in this case is because if either the 
`HADOOP_CONF_DIR` or the `YARN_CONF_DIR` is not readable, exception is raised, 
even if one of those directories already provides all the hadoop conf files 
that are needed by Spark.

E.g. in one of our use case, we actually launch Spark job from a YARN 
container, which the `HADOOP_CONF_DIR` is set to be a private directory that is 
owned by the NM (that's how Cloudera CM works anyway), which the container 
fails to read. Even we try to workaround it by setting the `YARN_CONF_DIR` to a 
readable directory, however, due to the exception being raised, we can't submit 
the job even those hadoop files are available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13441] [YARN] Fix NPE in yarn Client.cr...

2016-02-24 Thread chtyim
Github user chtyim commented on the pull request:

https://github.com/apache/spark/pull/11337#issuecomment-188416520
  
Thanks @srowen for the review. Do I need to do anything to get this PR 
merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13441] [YARN] Fix NPE in yarn Client.cr...

2016-02-24 Thread chtyim
Github user chtyim commented on a diff in the pull request:

https://github.com/apache/spark/pull/11337#discussion_r53968653
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -537,9 +537,14 @@ private[spark] class Client(
   sys.env.get(envKey).foreach { path =>
 val dir = new File(path)
 if (dir.isDirectory()) {
-  dir.listFiles().foreach { file =>
-if (file.isFile && !hadoopConfFiles.contains(file.getName())) {
-  hadoopConfFiles(file.getName()) = file
+  val files = dir.listFiles()
+  if (files == null) {
+logWarning("Failed to list files under directory " + dir)
--- End diff --

According to the Java API doc, it returns empty array if the directory is 
empty. It only returns null if it is not a directory or have IO error 
(permission issue). Without logging a warning, we might be silently ignoring 
misconfiguration.

Ref: https://docs.oracle.com/javase/7/docs/api/java/io/File.html#listFiles()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13441] [YARN] Fix NPE in yarn Client.cr...

2016-02-23 Thread chtyim
GitHub user chtyim opened a pull request:

https://github.com/apache/spark/pull/11337

[SPARK-13441] [YARN] Fix NPE in yarn Client.createConfArchive method

## What changes were proposed in this pull request?

Instead of using result of File.listFiles() directly, which may throw NPE, 
check for null first. If it is null, log a warning instead

## How was the this patch tested?

Ran the ./dev/run-tests locally
Tested manually on a cluster



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chtyim/spark fixes/SPARK-13441-null-check

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11337


commit a8fb7cb0be0588d4c31a1fc24c3473ef39562be5
Author: Terence Yim <tere...@cask.co>
Date:   2016-02-24T00:27:48Z

[SPARK-13441] [YARN] Fix NPE in yarn Client.createConfArchive method

- Log a warning instead of throwing NPE if either
  $HADOOP_CONF_DIR or $YARN_CONF_DIR is not accessible.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org