Rahul Palamuttam created ZEPPELIN-714: -----------------------------------------
Summary: Assigning spark context to variable results in task not serializeable error Key: ZEPPELIN-714 URL: https://issues.apache.org/jira/browse/ZEPPELIN-714 Project: Zeppelin Issue Type: Bug Components: zeppelin-interpreter Affects Versions: 0.5.6 Environment: Scala and Apache Spark Reporter: Rahul Palamuttam [~chrismattmann] We recently observed the following issue with zeppelin: assigning the spark context (sc) to a new variable and using that variable causes a Task Not Serializable exception. This error occurs with the spark-shell as well. However, submitting tasks via spark-submit with scala or java file doing the same operation does not incur the error. Below are the three lines of code that will cause the error to happen in the zeppelin notebook. val newSC = sc val temp = 10 val rdd = newSC.parallelize(0 to 10).map(p => p + temp) For some reason either sc or newSC is being included in the referencing environment of the closure. Note that if we replace "newSC.parallelize" to "sc.parallelize", the error goes away. We came across this when we tried to integrate SciSpark with Zeppelin. SciSpark has its own SciSparkContext, which is just a wrapper around the SparkContext. We pass the SparkContext to the SciSparkContext via a base constructor. You can see the code for the class here : https://github.com/SciSpark/SciSpark/blob/master/src/main/scala/org/dia/core/SciSparkContext.scala The SciSparkContext does not extend Spark , it just has SparkContext as a member and uses the SparkContext to read various types of file formats into an RDD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)