Rahul Palamuttam created ZEPPELIN-714:
-----------------------------------------
Summary: Assigning spark context to variable results in task not
serializeable error
Key: ZEPPELIN-714
URL: https://issues.apache.org/jira/browse/ZEPPELIN-714
Project: Zeppelin
Issue Type: Bug
Components: zeppelin-interpreter
Affects Versions: 0.5.6
Environment: Scala and Apache Spark
Reporter: Rahul Palamuttam
[~chrismattmann]
We recently observed the following issue with zeppelin:
assigning the spark context (sc) to a new variable and using that variable
causes a Task Not Serializable exception. This error occurs with the
spark-shell as well. However, submitting tasks via spark-submit with scala or
java file doing the same operation does not incur the error.
Below are the three lines of code that will cause the error to happen in the
zeppelin notebook.
val newSC = sc
val temp = 10
val rdd = newSC.parallelize(0 to 10).map(p => p + temp)
For some reason either sc or newSC is being included in the referencing
environment of the closure. Note that if we replace "newSC.parallelize" to
"sc.parallelize", the error goes away.
We came across this when we tried to integrate SciSpark with Zeppelin.
SciSpark has its own SciSparkContext, which is just a wrapper around the
SparkContext. We pass the SparkContext to the SciSparkContext via a base
constructor. You can see the code for the class here :
https://github.com/SciSpark/SciSpark/blob/master/src/main/scala/org/dia/core/SciSparkContext.scala
The SciSparkContext does not extend Spark , it just has SparkContext as a
member and uses the SparkContext to read various types of file formats into an
RDD.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)