[ https://issues.apache.org/jira/browse/SPARK-10544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739492#comment-14739492 ]
Doug Bateman commented on SPARK-10544: -------------------------------------- This also fails in Spark 1.5 pyspark.... sc.parallelize(["the red", "Fox Runs", "FAST"]).map(str.lower).count() Basically it can't pickle the str.lower function unless it's wrapped in a lambda. Same root issue... can't pickle builtin types. > Serialization of Python namedtuple subclasses in functions / closures is > broken > ------------------------------------------------------------------------------- > > Key: SPARK-10544 > URL: https://issues.apache.org/jira/browse/SPARK-10544 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.5.0 > Reporter: Josh Rosen > Priority: Blocker > > The following example works on Spark 1.4.1 but not in 1.5: > {code} > from collections import namedtuple > Person = namedtuple("Person", "id firstName lastName") > rdd = sc.parallelize([1]).map(lambda x: Person(1, "Jon", "Doe")) > rdd.count() > {code} > In 1.5, this gives an "AttributeError: 'builtin_function_or_method' object > has no attribute '__code__'" error. > Digging a bit deeper, it seems that the problem is the serialization of the > {{Person}} class itself, since serializing _instances_ of the class in the > closure seems to work properly: > {code} > from collections import namedtuple > Person = namedtuple("Person", "id firstName lastName") > jon = Person(1, "Jon", "Doe") > rdd = sc.parallelize([1]).map(lambda x: jon) > rdd.count() > {code} > It looks like PySpark has unit tests for serializing individual namedtuples > with cloudpickle.dumps and for serializing RDDs of namedtuples, but I don't > think that we have any tests for namedtuple classes in closures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org