[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55483091 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20257/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55484129 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20257/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-13 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55510191 This looks good to me; merging it into master now. I wonder if we'll see a net reduction in Jenkins flakiness due to using significantly fewer ephemeral ports in

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-13 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2259 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-13 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55510267 Yeah, the bad diffs are especially weird. `class Dummy`? Really? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55077329 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55077384 @JoshRosen The problem that will cause hanging has been fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55164469 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/15/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55164748 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/16/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55164905 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/16/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55166830 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/17/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55166968 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/17/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55183437 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/27/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55185249 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/28/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55191314 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/27/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55193407 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/28/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-09 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55010053 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55015591 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20050/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55032970 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20050/consoleFull)** after a configured wait of `120m`. --- If your project

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55057701 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20060/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55061459 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20062/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55064927 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20060/consoleFull)** after a configured wait of `120m`. --- If your project

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55068034 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20062/consoleFull)** after a configured wait of `120m`. --- If your project

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-09 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55068149 Hmm, I wonder why we're seeing these timeouts. It looks like both tests failed in `recommendation.py`, so it might be worth running those tests locally to see whether

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-09 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-55068303 yeah, I will investigate it locally. On Tue, Sep 9, 2014 at 8:53 PM, Josh Rosen notificati...@github.com wrote: Hmm, I wonder why we're seeing these

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54888018 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2259#discussion_r17268480 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -199,9 +207,47 @@ private[spark] class

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2259#discussion_r17268576 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -199,9 +207,47 @@ private[spark] class

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54890949 One general issue, not specifically related to this patch but still worth fixing, is thread-safety for the collections in PythonWorkerFactory. It looks like there's

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2259#discussion_r17269479 --- Diff: python/pyspark/worker.py --- @@ -69,8 +69,12 @@ def main(infile, outfile): ser = CompressedSerializer(pickleSer) for _

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54892706 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19995/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2259#discussion_r17271207 --- Diff: python/pyspark/worker.py --- @@ -69,8 +69,12 @@ def main(infile, outfile): ser = CompressedSerializer(pickleSer) for _ in

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54904546 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19995/consoleFull)** after a configured wait of `120m`. --- If your project

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54739057 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19943/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54741052 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19943/consoleFull)** after a configured wait of `120m`. --- If your project

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-07 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54755640 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54756131 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19951/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54760002 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19951/consoleFull)** after a configured wait of `120m`. --- If your project

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-07 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54767509 You guys should time out the worker after some time period to avoid it always consuming resources. If we have that, I think it should be on by default -- in general it's

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-07 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54775047 @mateiz It will time out the worker after 1 minute. It will reuse worker by default, can be disabled by 'spark.python.worker.reuse = false', then it will shut down the

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54704111 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54704441 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19904/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54705444 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19904/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54718008 Jenkins, retest this please. On Sat, Sep 6, 2014 at 12:40 AM, Apache Spark QA notificati...@github.com wrote: QA tests have finished

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54729318 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54731896 Do you think worker re-use should be enabled by default? The only problem that I anticipate is for applications that share a single SparkContext with both

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54731921 It would be interesting to measure the end-to-end performance impact for more realistic jobs, especially ones that make use of large numbers of tasks and large

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54733273 It's already enabled by default. I had added benchmark result in the description. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54733394 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19931/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54734462 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19931/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54735492 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19937/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54736301 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19937/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-06 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54738331 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54657265 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54513840 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-03 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/2259 [SPARK-3030] [PySpark] Reuse Python worker Reuse Python worker to avoid the overhead of fork() Python process for each tasks. It also tracks the broadcasts for each worker, avoid sending repeated

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2259#discussion_r17091100 --- Diff: python/run-tests --- @@ -50,7 +50,7 @@ echo Running PySpark tests. Output is in python/unit-tests.log. # Try to test with Python 2.6,

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2259#discussion_r17091311 --- Diff: python/run-tests --- @@ -50,7 +50,7 @@ echo Running PySpark tests. Output is in python/unit-tests.log. # Try to test with Python 2.6,

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54396565 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19732/consoleFull) for PR 2259 at commit

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54399818 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19732/consoleFull) for PR 2259 at commit