GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/30
SPARK-1004. PySpark on YARN This reopens https://github.com/apache/incubator-spark/pull/640 against the new repo You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-1004 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/30.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #30 ---- commit e49ff667154de988a1cb58d90c9743c6c24ef5bc Author: Josh Rosen <joshro...@apache.org> Date: 2014-01-24T18:19:58Z Automatically set Yarn env vars in PySpark (SPARK-1030). commit 59ac972026a7600fded49d906ef27bbb017fc9d2 Author: Josh Rosen <joshro...@apache.org> Date: 2014-01-25T23:28:56Z WIP towards PySpark on YARN: - Remove reliance on SPARK_HOME on the workers. Only the driver should know about SPARK_HOME. On the workers, we ensure that the PySpark Python libraries are added to the PYTHONPATH. - Add a Makefile for generating a "fat zip" that contains PySpark's Python dependencies. This is a bit of a hack and I'd be open to better packaging tools, but this doesn't require any extra Python libraries. This use case doesn't seem to be well-addressed by the existing Python packaging tools: there are plenty of tools to package complete Python environments (such as pyinstaller and virtualenv) or to bundle *individual* libraries (e.g. distutils), but few to generate portable fat zips or eggs. This hasn't been tested with YARN and may not actually compile. commit 54bd8c0aec51d5d5cb24d6453dea2fb627db05cd Author: Josh Rosen <joshro...@apache.org> Date: 2014-02-19T06:27:21Z Add missing setup.py file for PySpark. commit 514b2d0cfc8995b86186d02aebf61500d25df7db Author: Sandy Ryza <sa...@cloudera.com> Date: 2014-02-24T07:06:42Z Improvements commit ee3cc204dcabd7d092e3d6ed205e01c5deffc7ca Author: Sandy Ryza <sa...@cloudera.com> Date: 2014-02-24T07:26:01Z Don't set SPARK_JAR ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---