sam created SPARK-26534:
---------------------------

             Summary: Closure Cleaner Bug
                 Key: SPARK-26534
                 URL: https://issues.apache.org/jira/browse/SPARK-26534
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.1
            Reporter: sam


I've found a strange combination of closures where the closure cleaner doesn't 
seem to be smart enough to figure out how to remove a reference that is not 
used. I.e. we get a `org.apache.spark.SparkException: Task not serializable` 
for a Task that is perfectly serializable.  

 

In the example below, the only `val` that is actually needed for the closure of 
the `map` is `foo`, but it tries to serialise `thingy`.  What is odd is 
changing this code in a number of subtle ways eliminates the error, which I've 
tried to highlight using comments inline.

 
{code:java}
import org.apache.spark.sql._

object Test {
  val sparkSession: SparkSession =
    SparkSession.builder.master("local").appName("app").getOrCreate()

  def apply(): Unit = {
    import sparkSession.implicits._

    val landedData: Dataset[String] = 
sparkSession.sparkContext.makeRDD(Seq("foo", "bar")).toDS()

    // thingy has to be in this outer scope to reproduce, if in someFunc, 
cannot reproduce
    val thingy: Thingy = new Thingy

    // If not wrapped in someFunc cannot reproduce
    val someFunc = () => {
      // If don't reference this foo inside the closer (e.g. just use identity 
function) cannot reproduce
      val foo: String = "foo"

      thingy.run(block = () => {
        landedData.map(r => {
          r + foo
        })
        .count()
      })
    }

    someFunc()

  }
}

class Thingy {
  def run[R](block: () => R): R = {
    block()
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to