GitHub user minahlee opened a pull request:
https://github.com/apache/incubator-zeppelin/pull/736
Fix pyspark to work on yarn mode when spark version is lower than or equal
to 1.4.x
### What is this PR for?
pyspark.zip, py4j-\*.zip should be distributed to yarn nodes to make
pyspark function but this hasn't been working after #463 because [`if
(pythonLibs.length ==
pythonLibUris.size())`](https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java#L329)
condition will never be true. This PR fixes this issue by changing this if
condition to be `pythonlibUris.size() == 2`, while integer 2 refers
pyspark.zip and py4j-\*.zip.
In addition, yarn-install documentation has been updated.
### What type of PR is it?
Bug Fix
### Is there a relevant Jira issue?
No. But the issue has reported via [user mailing
list](http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Can-t-get-Pyspark-1-4-1-interpreter-to-work-on-Zeppelin-0-6-td2229.html#a2259)
by Ian Maloney
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/minahlee/incubator-zeppelin
fix/pyspark_on_yarn
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-zeppelin/pull/736.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #736
----
commit 6465ba89025919adcaa3f884459eb2992ed9f33e
Author: Mina Lee <[email protected]>
Date: 2016-02-21T11:24:27Z
Change condition to make pyspark, py4j libraries be distributed to yarn
executors
commit c544decd16555147c498caa1c14fa8b25db091c3
Author: Mina Lee <[email protected]>
Date: 2016-02-21T11:31:01Z
[DOC] Remove redundant Zeppelin build information from yarn_install.md
[DOC] Guide users to set SPARK_HOME to use spark in yarn mode
[DOC] Change spark version to the latest in yarn config example
[DOC] Add note that spark for cdh4 doesn't support yarn
[DOC] Remove spark properties `spark.home` and `spark.yarn.jar` from doc
which doesn't work on zeppelin anymore
[DOC] Fix typos
[DOC] Add info that embedded spark doesn't work on yarn mode anymore when
Spark version is 1.5.0 or higher in README.md
commit 2710c46240a2c047e97e52cd52316d9429185125
Author: Mina Lee <[email protected]>
Date: 2016-02-21T11:45:14Z
[DOC] Remove invalid information of installation location
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---