flykobe cheng created SPARK-7892: ------------------------------------ Summary: Python class in __main__ may trigger AssertionError Key: SPARK-7892 URL: https://issues.apache.org/jira/browse/SPARK-7892 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.2.0 Environment: Linux, Python 2.7.3 pickled by Python pickle Lib Reporter: flykobe cheng Priority: Minor
Callback functions for spark transformations and actions will be pickled. If the callback is instancemethod of __main__ module's class, and the class has more than one instancemethod which using class properties or classmethods, the class will be pickled twice, and 'pickle.memoize' twice, then trigger AssertionError. Demo code: class AClass(object): _class_var = {'classkey': 'classval', } def main_object_method(self, item): logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey'])) def main_object_method2(self, item): logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey'])) def test_main_object_method(sc): obj = AClass() res = sc.parallelize(range(4)).map(obj.main_object_method).collect() if __name__ == '__main__': cf = pyspark.SparkConf() cf.set('spark.cores.max', 1) sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = cf) test_main_object_method(sc) Traceback: File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 310, in save_function_tuple save(f_globals) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save f(self, obj) # Call unbound method with explicit self File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 174, in save_dict pickle.Pickler.save_dict(self, obj) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in save_dict self._batch_setitems(obj.iteritems()) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in _batch_setitems save(v) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save f(self, obj) # Call unbound method with explicit self File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 468, in save_global d),obj=obj) File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 638, in save_reduce self.memoize(obj) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in memoize assert id(obj) not in self.memo AssertionError Problem in Python/Lib/pickle.py: def memoize(self, obj): """Store an object in the memo.""" if self.fast: return assert id(obj) not in self.memo memo_len = len(self.memo) self.write(self.put(memo_len)) self.memo[id(obj)] = memo_len, obj -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org