Léopold Boudard created BEAM-9655: ------------------------------------- Summary: Stateful Dataflow runner? Key: BEAM-9655 URL: https://issues.apache.org/jira/browse/BEAM-9655 Project: Beam Issue Type: Wish Components: runner-dataflow Affects Versions: 2.19.0 Reporter: Léopold Boudard
Hi, I'm trying to use python portable DataflowRunner with a [BagStateSpec|[https://beam.apache.org/releases/pydoc/2.6.0/apache_beam.transforms.userstate.html]]. Though I encounter followiung issue: {code:java} Traceback (most recent call last): File "/Users/leopold/.pyenv/versions/3.6.0/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Users/leopold/.pyenv/versions/3.6.0/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/leopold/workspace/BenchmarkListingStreaming/listing_beam_pipeline/test_runner.py", line 49, in <module> run() File "/Users/leopold/workspace/BenchmarkListingStreaming/listing_beam_pipeline/test_runner.py", line 44, in run | 'write to file' >> WriteToText(known_args.output) File "/Users/leopold/.pyenv/versions/BenchmarkListingStreaming/lib/python3.6/site-packages/apache_beam/pipeline.py", line 481, in __exit__ self.run().wait_until_finish() File "/Users/leopold/.pyenv/versions/BenchmarkListingStreaming/lib/python3.6/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1449, in wait_until_finish (self.state, getattr(self._runner, 'last_error_msg', None)), self) apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 648, in do_work work_executor.execute() File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 176, in execute op.start() File "apache_beam/runners/worker/operations.py", line 649, in apache_beam.runners.worker.operations.DoOperation.start File "apache_beam/runners/worker/operations.py", line 651, in apache_beam.runners.worker.operations.DoOperation.start File "apache_beam/runners/worker/operations.py", line 652, in apache_beam.runners.worker.operations.DoOperation.start File "apache_beam/runners/worker/operations.py", line 261, in apache_beam.runners.worker.operations.Operation.start File "apache_beam/runners/worker/operations.py", line 266, in apache_beam.runners.worker.operations.Operation.start File "apache_beam/runners/worker/operations.py", line 597, in apache_beam.runners.worker.operations.DoOperation.setup File "apache_beam/runners/worker/operations.py", line 636, in apache_beam.runners.worker.operations.DoOperation.setup File "apache_beam/runners/common.py", line 866, in apache_beam.runners.common.DoFnRunner.__init__ Exception: Requested execution of a stateful DoFn, but no user state context is available. This likely means that the current runner does not support the execution of stateful DoFns. {code} I've also seen this issue in stackoverflow [https://stackoverflow.com/questions/55413690/does-google-dataflow-support-stateful-pipelines-developed-with-python-sdk] Do you have any idea/ETA when this feature will be available with beam? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)