GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/4603
[SPARK-2313] Use socket to communicate GatewayServer port back to Python driver This patch changes PySpark so that the GatewayServer's port is communicated back to the Python process that launches it over a local socket instead of a pipe. The old pipe-based approach was brittle and could fail if `spark-submit` printed unexpected to stdout. To accomplish this, I wrote a custom `PythonGatewayServer.main()` function to use in place of Py4J's `GatewayServer.main()`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark SPARK-2313 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4603.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4603 ---- commit 8bf956ea3ac7d9af481acea3a14b9e48dc0ba2fa Author: Josh Rosen <joshro...@databricks.com> Date: 2015-02-14T01:01:09Z Initial cut at passing Py4J gateway port back to driver via socket commit 2f70689aebed4dcee67d2dbc9ee42255f6324b5f Author: Josh Rosen <joshro...@databricks.com> Date: 2015-02-14T04:51:01Z Use stdin PIPE to share fate with driver ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org