[ https://issues.apache.org/jira/browse/SPARK-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
flykobe cheng updated SPARK-7892: --------------------------------- Description: Callback functions for spark transformations and actions will be pickled. If the callback is instancemethod of __main__ module's class, and the class has more than one instancemethod which using class properties or classmethods, the class will be pickled twice, and 'pickle.memoize' twice, then trigger AssertionError. Demo code and traceback attached. was: Callback functions for spark transformations and actions will be pickled. If the callback is instancemethod of __main__ module's class, and the class has more than one instancemethod which using class properties or classmethods, the class will be pickled twice, and 'pickle.memoize' twice, then trigger AssertionError. Demo code: class AClass(object): _class_var = {'classkey': 'classval', } def main_object_method(self, item): logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey'])) def main_object_method2(self, item): logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey'])) def test_main_object_method(sc): obj = AClass() res = sc.parallelize(range(4)).map(obj.main_object_method).collect() if __name__ == '__main__': cf = pyspark.SparkConf() cf.set('spark.cores.max', 1) sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = cf) test_main_object_method(sc) Traceback: File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 310, in save_function_tuple save(f_globals) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save f(self, obj) # Call unbound method with explicit self File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 174, in save_dict pickle.Pickler.save_dict(self, obj) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in save_dict self._batch_setitems(obj.iteritems()) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in _batch_setitems save(v) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save f(self, obj) # Call unbound method with explicit self File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 468, in save_global d),obj=obj) File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 638, in save_reduce self.memoize(obj) File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in memoize assert id(obj) not in self.memo AssertionError Problem in Python/Lib/pickle.py: def memoize(self, obj): """Store an object in the memo.""" if self.fast: return assert id(obj) not in self.memo memo_len = len(self.memo) self.write(self.put(memo_len)) self.memo[id(obj)] = memo_len, obj > Python class in __main__ may trigger AssertionError > --------------------------------------------------- > > Key: SPARK-7892 > URL: https://issues.apache.org/jira/browse/SPARK-7892 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.2.0 > Environment: Linux, Python 2.7.3 > pickled by Python pickle Lib > Reporter: flykobe cheng > Priority: Minor > > Callback functions for spark transformations and actions will be pickled. > If the callback is instancemethod of __main__ module's class, and the class > has more than one instancemethod which using class properties or > classmethods, the class will be pickled twice, and 'pickle.memoize' twice, > then trigger AssertionError. > Demo code and traceback attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org