Shixiong Zhu created SPARK-13697: ------------------------------------ Summary: TransformFunctionSerializer.loads doesn't restore the function's module name if it's '__main__' Key: SPARK-13697 URL: https://issues.apache.org/jira/browse/SPARK-13697 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.6.0 Reporter: Shixiong Zhu
Here is a reproducer {code} >>> from pyspark.streaming import StreamingContext >>> from pyspark.streaming.util import TransformFunction >>> ssc = StreamingContext(sc, 1) >>> func = TransformFunction(sc, lambda x: x, sc.serializer) >>> func.rdd_wrapper(lambda x: x) TransformFunction(<function <lambda> at 0x106ac8b18>) >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, >>> func.rdd_wrap_func, func.deserializers))) >>> func2 = ssc._transformerSerializer.loads(bytes) >>> print(func2.func.__module__) None >>> print(func2.rdd_wrap_func.__module__) None >>> {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org