[ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919945#comment-16919945
 ] 

Valentyn Tymofieiev commented on BEAM-6158:
-------------------------------------------

The error is happens when main pipeline module has class methods that refer to 
superclass methods using super(). A reference to super in the method code 
creates a cyclical reference inside the object, which dill  currently handles 
via pickling objects by reference. Such approach does not work for restoring a 
pickled  a main session, since object classes need to be defined at the moment 
of unpickling . This issue will be addressed after  
https://github.com/uqfoundation/dill/issues/300. is fixed or we start using 
CloudPickle as a pickler, which is investigated in BEAM-8123. 

In the meantime following workarounds are available:
- don't use super() in the main module.
- refer to superclass methods via SuperClassName.method(self, ...). This is NOT 
an equivalent replacement, but may work in simple class hierarchies. 

> Using --save_main_session fails on Python 3 when main module has invocations 
> of superclass method using 'super' .
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-6158
>                 URL: https://issues.apache.org/jira/browse/BEAM-6158
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-harness
>            Reporter: Mark Liu
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> A typical manifestation of this failure, which can be observed on several 
> Beam examples:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
>     "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
>     exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in <module>                                                
>     run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run                                                     
>     | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__                                                        
>                  
>     self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish                                       
>     (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:                               
>                                                              
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run
>     self._load_main_session(self.local_staging_directory)
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session                                                    
>                                                
>     pickler.load_session(session_file)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 280, in load_session                                                     
>                                                    
>     return dill.load_session(file_path)
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 410, in 
> load_session
>     module = unpickler.load()
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in 
> find_class
>     return StockUnpickler.find_class(self, module, name)
> AttributeError: Can't get attribute 'ParseGameEventFn' on <module 
> 'dataflow_worker.start' from 
> '/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py'> {noformat}
>  
> Note that the example has the following code [1]:
> {code:python}
> class ParseGameEventFn(beam.DoFn):
>   def __init__(self):
>     super(ParseGameEventFn, self).__init__()
> {code}
> https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/examples/complete/game/user_score.py#L81
> +cc: [~tvalentyn] [~robertwb] [~altay]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to