flykobe cheng created SPARK-7892:
------------------------------------

             Summary: Python class in __main__ may trigger AssertionError
                 Key: SPARK-7892
                 URL: https://issues.apache.org/jira/browse/SPARK-7892
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.2.0
         Environment: Linux, Python 2.7.3
pickled by Python pickle Lib
            Reporter: flykobe cheng
            Priority: Minor


Callback functions for spark transformations and actions will be pickled. 
If the callback is instancemethod of __main__ module's class, and the class has 
more than one instancemethod which using class properties or classmethods, the 
class will be pickled twice, and 'pickle.memoize' twice, then trigger 
AssertionError.

Demo code:
class AClass(object):
    _class_var = {'classkey': 'classval', } 

    def main_object_method(self, item):
        logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, 
AClass._class_var['classkey']))

    def main_object_method2(self, item):
        logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, 
AClass._class_var['classkey']))

        
def test_main_object_method(sc):
    obj = AClass()
    res = sc.parallelize(range(4)).map(obj.main_object_method).collect()


if __name__ == '__main__':
    cf = pyspark.SparkConf()
    cf.set('spark.cores.max', 1)

    sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = cf)

    test_main_object_method(sc)


Traceback:
  File 
"/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
 line 310, in save_function_tuple
    save(f_globals)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
    f(self, obj) # Call unbound method with explicit self
  File 
"/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
 line 174, in save_dict
    pickle.Pickler.save_dict(self, obj)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in 
save_dict
    self._batch_setitems(obj.iteritems())
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in 
_batch_setitems
    save(v)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
    f(self, obj) # Call unbound method with explicit self
  File 
"/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
 line 468, in save_global
    d),obj=obj)
  File 
"/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
 line 638, in save_reduce
    self.memoize(obj)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in 
memoize
    assert id(obj) not in self.memo 
AssertionError


Problem in Python/Lib/pickle.py:
    def memoize(self, obj):
        """Store an object in the memo."""
        if self.fast:
            return
        assert id(obj) not in self.memo
        memo_len = len(self.memo)
        self.write(self.put(memo_len))
        self.memo[id(obj)] = memo_len, obj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to