Vitaly Polonetsky created ZEPPELIN-1518:
-------------------------------------------
Summary: Lambda expressions are not working on CDH 2.7x Spark
Key: ZEPPELIN-1518
URL: https://issues.apache.org/jira/browse/ZEPPELIN-1518
Project: Zeppelin
Issue Type: Bug
Affects Versions: 0.6.1, 0.6.0
Reporter: Vitaly Polonetsky
CDH 5.7.x backported RpcEnv and eliminated the class server in Spark 1.6.0 REPL:
https://github.com/cloudera/spark/commit/e0d03eb30e03f589407c3cf37317a64f18db8257
An attempted fix was performed:
https://github.com/apache/zeppelin/commit/78c7b5567e7fb4985cecf147c39033c554dfc208
Although you can do basic spark operations in zeppelin after this fix, the
following code is now failing:
{quote}
val rdd2 = sc.parallelize(Seq(1,2,3,4,5))
rdd2.filter(_ > 3).count()
{quote}
The lambda expression is not being transferred to the executors:
{{java.lang.ClassNotFoundException:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1}}
As far as I understand Zeppelin supports the RpcEnv for Spark 2.11 only by
using the {{-Yrepl-outdir}} option that is not supported in Spark 2.10
Another way of supporting RpcEnv could be using the spark-submit way of
accessing the new classes through the Rpc. Here's what I've hacked and have it
working locally, but I'm having trouble testing my pull request:
1. In {{SparkInterpreter.createSparkContext_1()}} if {{classServerUri}} is null
after both checks, try to invoke the {{intp.getClassOutputDirectory()}} using
reflection
2. Use the returned value to set sparkConf's {{spark.repl.class.outputDir}}
param
The same method could be used for Spark 2.0 as well, eliminating additional
http server running inside zeppelin for providing lambda classes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)