[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58143439
  
@mateiz Aside from restoring the `getThreadLocal` method in order to 
preserve API compatibility, is this patch otherwise ready to merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58143704
  
@mateiz @JoshRosen I had put getThreadLocal() back and deprecated it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58143977
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21368/consoleFull)
 for   PR 2624 at commit 
[`a69f30c`](https://github.com/apache/spark/commit/a69f30cdb8e63d526ebee06162d8f1b9f2adb253).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58149038
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21368/consoleFull)
 for   PR 2624 at commit 
[`a69f30c`](https://github.com/apache/spark/commit/a69f30cdb8e63d526ebee06162d8f1b9f2adb253).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58149042
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21368/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58241504
  
Yup, looks good! I'm going to merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2624


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-06 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2624#discussion_r18498163
  
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -119,30 +118,20 @@ class SparkEnv (
 }
 
 object SparkEnv extends Logging {
-  private val env = new ThreadLocal[SparkEnv]
-  @volatile private var lastSetSparkEnv : SparkEnv = _
+  @volatile private var env: SparkEnv = _
 
   private[spark] val driverActorSystemName = sparkDriver
   private[spark] val executorActorSystemName = sparkExecutor
 
   def set(e: SparkEnv) {
-lastSetSparkEnv = e
-env.set(e)
+env = e
   }
 
   /**
-   * Returns the ThreadLocal SparkEnv, if non-null. Else returns the 
SparkEnv
-   * previously set in any thread.
+   * Returns the SparkEnv.
*/
   def get: SparkEnv = {
-Option(env.get()).getOrElse(lastSetSparkEnv)
-  }
-
-  /**
-   * Returns the ThreadLocal SparkEnv.
-   */
-  def getThreadLocal: SparkEnv = {
--- End diff --

Actually I'm surprised MIMA didn't catch this..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-06 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2624#discussion_r18498152
  
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -119,30 +118,20 @@ class SparkEnv (
 }
 
 object SparkEnv extends Logging {
-  private val env = new ThreadLocal[SparkEnv]
-  @volatile private var lastSetSparkEnv : SparkEnv = _
+  @volatile private var env: SparkEnv = _
 
   private[spark] val driverActorSystemName = sparkDriver
   private[spark] val executorActorSystemName = sparkExecutor
 
   def set(e: SparkEnv) {
-lastSetSparkEnv = e
-env.set(e)
+env = e
   }
 
   /**
-   * Returns the ThreadLocal SparkEnv, if non-null. Else returns the 
SparkEnv
-   * previously set in any thread.
+   * Returns the SparkEnv.
*/
   def get: SparkEnv = {
-Option(env.get()).getOrElse(lastSetSparkEnv)
-  }
-
-  /**
-   * Returns the ThreadLocal SparkEnv.
-   */
-  def getThreadLocal: SparkEnv = {
--- End diff --

You should leave this in instead of removing it, because some user code 
might be calling this public method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-06 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2624#discussion_r18498170
  
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -119,30 +118,20 @@ class SparkEnv (
 }
 
 object SparkEnv extends Logging {
-  private val env = new ThreadLocal[SparkEnv]
-  @volatile private var lastSetSparkEnv : SparkEnv = _
+  @volatile private var env: SparkEnv = _
 
   private[spark] val driverActorSystemName = sparkDriver
   private[spark] val executorActorSystemName = sparkExecutor
 
   def set(e: SparkEnv) {
-lastSetSparkEnv = e
-env.set(e)
+env = e
   }
 
   /**
-   * Returns the ThreadLocal SparkEnv, if non-null. Else returns the 
SparkEnv
-   * previously set in any thread.
+   * Returns the SparkEnv.
*/
   def get: SparkEnv = {
-Option(env.get()).getOrElse(lastSetSparkEnv)
-  }
-
-  /**
-   * Returns the ThreadLocal SparkEnv.
-   */
-  def getThreadLocal: SparkEnv = {
--- End diff --

(Even though it's a DeveloperApi we shouldn't break it if we can help it)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2624#discussion_r18498596
  
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -119,30 +118,20 @@ class SparkEnv (
 }
 
 object SparkEnv extends Logging {
-  private val env = new ThreadLocal[SparkEnv]
-  @volatile private var lastSetSparkEnv : SparkEnv = _
+  @volatile private var env: SparkEnv = _
 
   private[spark] val driverActorSystemName = sparkDriver
   private[spark] val executorActorSystemName = sparkExecutor
 
   def set(e: SparkEnv) {
-lastSetSparkEnv = e
-env.set(e)
+env = e
   }
 
   /**
-   * Returns the ThreadLocal SparkEnv, if non-null. Else returns the 
SparkEnv
-   * previously set in any thread.
+   * Returns the SparkEnv.
*/
   def get: SparkEnv = {
-Option(env.get()).getOrElse(lastSetSparkEnv)
-  }
-
-  /**
-   * Returns the ThreadLocal SparkEnv.
-   */
-  def getThreadLocal: SparkEnv = {
--- End diff --

Good point.  Let's throw `@deprecated` on it when we put it back, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57758615
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21237/consoleFull)
 for   PR 2624 at commit 
[`ba77ca4`](https://github.com/apache/spark/commit/ba77ca4f84f8fefff88714f0c0fa196d0a6f844b).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  protected case class Keyword(str: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57758618
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21237/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-03 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57840767
  
Okay, this is a pretty significant change to remove the threadlocal object 
completely. There are two things we can do
- Either we can do something minimal to just clear the reference, so that 
repeated sparkContext creation works from pySpark.
- Or we do this scary refactoring and eliminate the threadlocal completely. 
I know that there are probably no usecases where the threadlocal object on any 
thread would be different from that of the global lastSparkEnv, but are we 
entirely sure?

I think we need more input from others @pwendell @mateiz @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-03 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57850587
  
 Either we can do something minimal to just clear the reference, so that 
repeated sparkContext creation works from pySpark.

I'm not sure that there's an easy, minimal approach that's also correct, 
though.  The problem is that some threads obtain a SparkEnv by calling 
`SparkEnv.get`, so these threads are prone to reading old ThreadLocals that 
haven't been cleaned up.  In order for an approach that clears ThreadLocals to 
be safe, I think we'd need some way to ensure that _any_ thread that sets the 
ThreadLocal eventually clears that ThreadLocal before it's re-used.  I suppose 
that we could audit all calls of `SparkEnv.set()` and add the equivalent of a 
`try ... finally` to ensure that the ThreadLocal is eventually cleared.  This 
is starting to get complex, though, and I'm not sure that it's simpler than 
simply removing the ThreadLocals for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-03 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57853572
  
It's worth noting that the ThreadLocals haven't seemed to cause problems in 
any of the existing uses of Spark / PySpark.  In PySpark Streaming, I think 
we're running into a scenario that's something like this:

- Java invokes a Python callback through the Py4J callback server.  
Internally, the callback server uses some thread pool.
- The Python callback calls back into Java through Py4J.
- Somewhere along the line, `SparkEnv.set()` is called, leaking the current 
SparkEnv into one of the Py4J GatewayServer or CallbackServer pool threads.
- This thread is re-used when a new Python SparkContext is created using 
the same GatewayServer.

I thought of another fix that will allow the ThreadLocals to work: add a 
mutable field to SparkEnv instances that records whether that environment is 
associated with a SparkContext that's been stopped.  In SparkEnv.get(), we can 
check this field to determine whether to return the ThreadLocal or return 
lastSparkEnv.  This approach is more confusing / complex than removing the 
ThreadLocals, though.

I'm still strongly in favor of doing the work to confirm that SparkEnv is 
currently used as though it's a global object and then removing the 
ThreadLocals.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-03 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57892709
  
@JoshRosen the simple fix is to delete the threadlocal variable completely. 
Then any access to the threadlocal variable from any thread (even threadpool in 
Py4J) is going to be reset. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/2624

[SPARK-3762] clear reference of SparkEnv after stop

SparkEnv is cached in ThreadLocal object, so after stop and create a new 
SparkContext, old SparkEnv is still used by some threads, it will trigger many 
problems, for example, pyspark will have problem after restart SparkContext, 
because py4j use thread pool for RPC.

This patch will clear all the references after stop a SparkEnv.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark env

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2624


commit 4d0ea8bf5df513d5d1f4250286ca328192018f08
Author: Davies Liu davies@gmail.com
Date:   2014-10-02T07:10:38Z

clear reference of SparkEnv after stop




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57592595
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21184/consoleFull)
 for   PR 2624 at commit 
[`4d0ea8b`](https://github.com/apache/spark/commit/4d0ea8bf5df513d5d1f4250286ca328192018f08).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57598250
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21184/consoleFull)
 for   PR 2624 at commit 
[`4d0ea8b`](https://github.com/apache/spark/commit/4d0ea8bf5df513d5d1f4250286ca328192018f08).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57598254
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21184/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2624#discussion_r18371260
  
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -130,6 +133,12 @@ object SparkEnv extends Logging {
 env.set(e)
   }
 
+  // clear all the threadlocal references
--- End diff --

What about threads that have called `SparkEnv.set(SparkEnv.get)`?  I don't 
think those ThreadLocals will get cleared.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57718816
  
It looks like SparkEnv has two getter methods: `get`, which returns 
either the ThreadLocal value or the last SparkEnv set _by any thread_, and 
`getThreadLocal`, which just reads the current ThreadLocal.

It turns out that there's no code which calls `getThreadLocal`.  We _do_ 
call `SparkEnv.set()` in a few places, but we seem to always set it using 
either a new SparkEnv (e.g. when starting a SparkContext or Executor) or using 
the value obtained from `SparkEnv.get()`.  Therefore, it seems like SparkEnv 
effectively acts as a global object.  Why not just remove the ThreadLocals and 
have `get` just return some `@volatile` field in SparkEnv that holds the 
current instance (a sort of Singleton pattern)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57721319
  
@JoshRosen I'd like to move in this way, but getThreadLocal() is a public 
API, I'm afraid to remove it.

In the past, I remembered that we can have two different SparkEnv in the 
same JVM, a different SparkEnv for executor to hold different TrackerClients. 
Maybe it's not needed anymore.

Also we can keep the getThreadLocal() and deprecate it, remove the 
ThreadLocal object, how is it? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57721877
  
@davies The SparkEnv class is marked `@developerAPI` and has this note in 
its Scaladoc:

```
 * NOTE: This is not intended for external use. This is exposed for Shark 
and may be made private
 *   in a future release.
```

Nearly every object that's accessible through SparkEnv's public fields is 
`private[spark]`.  I'd be _very_ surprised if there was third-party code that 
was relying on SparkEnv remaining backwards-compatible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57722450
  
@davies:

 In the past, I remembered that we can have two different SparkEnv in the 
same JVM, a different SparkEnv for executor to hold different TrackerClients. 
Maybe it's not needed anymore.

`SparkEnv.create` is only called in two places, SparkContext and Executor's 
constructors.

In Executor, we only create a new SparkContext if we're not running in 
local mode:

```scala
  // Initialize Spark environment (using system properties read above)
  private val env = {
if (!isLocal) {
  val _env = SparkEnv.create(conf, executorId, slaveHostname, 0,
isDriver = false, isLocal = false)
  SparkEnv.set(_env)
  _env.metricsSystem.registerSource(executorSource)
  _env
} else {
  SparkEnv.get
}
  }
```

Since we only have one `Executor` instance per JVM, we only have one 
SparkEnv per JVM (since we don't currently support concurrently-running 
SparkContexts in the same JVM).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57729754
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21232/consoleFull)
 for   PR 2624 at commit 
[`ee62bb7`](https://github.com/apache/spark/commit/ee62bb7afdd728a9195b1549d3a9f4db5709e9ff).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57730092
  
We should probably update the docstrings to remove all references to 
ThreadLocal, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57733279
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21232/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57733269
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21232/consoleFull)
 for   PR 2624 at commit 
[`ee62bb7`](https://github.com/apache/spark/commit/ee62bb7afdd728a9195b1549d3a9f4db5709e9ff).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class GetPeers(blockManagerId: BlockManagerId) extends 
ToBlockManagerMaster`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57739142
  
Hmm, strange test failure:

```
[info] - block generator throttling *** FAILED ***
[info]   org.scalatest.exceptions.TestFailedException was thrown. 
(NetworkReceiverSuite.scala:180)
```

This is a known flaky test, though, so let's try running the tests again.

Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-57751658
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21237/consoleFull)
 for   PR 2624 at commit 
[`ba77ca4`](https://github.com/apache/spark/commit/ba77ca4f84f8fefff88714f0c0fa196d0a6f844b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org