[ 
https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735214#comment-16735214
 ] 

Liang-Chi Hsieh commented on SPARK-26534:
-----------------------------------------

I think the only difference is using Dataset or RDD. Can you replace with RDD 
and see if the issue is still?

> Closure Cleaner Bug
> -------------------
>
>                 Key: SPARK-26534
>                 URL: https://issues.apache.org/jira/browse/SPARK-26534
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.1
>            Reporter: sam
>            Priority: Major
>
> I've found a strange combination of closures where the closure cleaner 
> doesn't seem to be smart enough to figure out how to remove a reference that 
> is not used. I.e. we get a `org.apache.spark.SparkException: Task not 
> serializable` for a Task that is perfectly serializable.  
>  
> In the example below, the only `val` that is actually needed for the closure 
> of the `map` is `foo`, but it tries to serialise `thingy`.  What is odd is 
> changing this code in a number of subtle ways eliminates the error, which 
> I've tried to highlight using comments inline.
>  
> {code:java}
> import org.apache.spark.sql._
> object Test {
>   val sparkSession: SparkSession =
>     SparkSession.builder.master("local").appName("app").getOrCreate()
>   def apply(): Unit = {
>     import sparkSession.implicits._
>     val landedData: Dataset[String] = 
> sparkSession.sparkContext.makeRDD(Seq("foo", "bar")).toDS()
>     // thingy has to be in this outer scope to reproduce, if in someFunc, 
> cannot reproduce
>     val thingy: Thingy = new Thingy
>     // If not wrapped in someFunc cannot reproduce
>     val someFunc = () => {
>       // If don't reference this foo inside the closer (e.g. just use 
> identity function) cannot reproduce
>       val foo: String = "foo"
>       thingy.run(block = () => {
>         landedData.map(r => {
>           r + foo
>         })
>         .count()
>       })
>     }
>     someFunc()
>   }
> }
> class Thingy {
>   def run[R](block: () => R): R = {
>     block()
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to