Irina Truong created SPARK-21096:
------------------------------------

             Summary: Pickle error when passing a member variable to Spark 
executors
                 Key: SPARK-21096
                 URL: https://issues.apache.org/jira/browse/SPARK-21096
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.1.1
            Reporter: Irina Truong


There is a pickle error when submitting a spark job that references a member 
variable in a lambda, even when the member variable is a simple type that 
should be serializable.

Here is a minimal example:

https://gist.github.com/j-bennet/8390c6d9a81854696f1a9b42a4ea8278

In the gist above, this method will throw an exception:

{{    def build_fail(self):
        processed = self.rdd.map(lambda row: process_row(row, self.multiplier))
        return processed.collect()
}}

While this method will run just fine:

{{    def build_ok(self):
        mult = self.multiplier
        processed = self.rdd.map(lambda row: process_row(row, mult))
        return processed.collect()
}}

In this example, {{self.multiplier}} is just an int. However, passing it into a 
lambda throws a pickle error, because it is trying to pickle the whole 
{{self}}, and that contains {{sc}}.

If this is the expected behavior, then why should re-assigning 
{{self.multiplier}} to a variable make a difference?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to