Rahul Palamuttam created ZEPPELIN-714:
-----------------------------------------

             Summary: Assigning spark context to variable results in task not 
serializeable error
                 Key: ZEPPELIN-714
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-714
             Project: Zeppelin
          Issue Type: Bug
          Components: zeppelin-interpreter
    Affects Versions: 0.5.6
         Environment: Scala and Apache Spark
            Reporter: Rahul Palamuttam


[~chrismattmann]

We recently observed the following issue with zeppelin:
assigning the spark context (sc) to a new variable and using that variable 
causes a Task Not Serializable exception. This error occurs with the 
spark-shell as well. However, submitting tasks via spark-submit with scala or 
java file doing the same operation does not incur the error.

Below are the three lines of code that will cause the error to happen in the 
zeppelin notebook.

val newSC = sc
val temp = 10
val rdd = newSC.parallelize(0 to 10).map(p => p + temp)

For some reason either sc or newSC is being included in the referencing 
environment of the closure. Note that if we replace "newSC.parallelize" to 
"sc.parallelize", the error goes away. 

We came across this when we tried to integrate SciSpark with Zeppelin.
SciSpark has its own SciSparkContext, which is just a wrapper around the 
SparkContext. We pass the SparkContext to the SciSparkContext via a base 
constructor. You can see the code for the class here :  
https://github.com/SciSpark/SciSpark/blob/master/src/main/scala/org/dia/core/SciSparkContext.scala
The SciSparkContext does not extend Spark , it just has SparkContext as a 
member and uses the SparkContext to read various types of file formats into an 
RDD. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to