[jira] [Commented] (BEAM-3757) Shuffle read failed using python 2.2.0
[ https://issues.apache.org/jira/browse/BEAM-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383846#comment-16383846 ] Jonathan Delfour commented on BEAM-3757: I have created a support ticket with google team, they confirm it is not isolated and are investigating. Feel free to close this one, i believe they will contact you should there be an issue with beam itself. > Shuffle read failed using python 2.2.0 > -- > > Key: BEAM-3757 > URL: https://issues.apache.org/jira/browse/BEAM-3757 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.2.0 > Environment: gcp, macos >Reporter: Jonathan Delfour >Assignee: Thomas Groh >Priority: Major > > Hi, > First issue is that the beam 2.3.0 python SDK is apparently not working on > GCP. It gets stuck: > {noformat} > Workflow failed. Causes: (bf832d44290fbf41): The Dataflow appears to be > stuck. You can get help with Cloud Dataflow at > https://cloud.google.com/dataflow/support. > {noformat} > I tried two times. > Reverting back to 2.2.0: it usually works but today, after > 1 hour of > processing, and 30 workers used, I get a failure with these in the logs: > {noformat} > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line > 582, in do_work > work_executor.execute() > File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", > line 167, in execute > op.start() > File "dataflow_worker/shuffle_operations.py", line 49, in > dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start > def start(self): > File "dataflow_worker/shuffle_operations.py", line 50, in > dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start > with self.scoped_start_state: > File "dataflow_worker/shuffle_operations.py", line 65, in > dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start > with self.shuffle_source.reader() as reader: > File "dataflow_worker/shuffle_operations.py", line 67, in > dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start > for key_values in reader: > File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", > line 406, in __iter__ > for entry in entries_iterator: > File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", > line 248, in next > return next(self.iterator) > File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", > line 206, in __iter__ > chunk, next_position = self.reader.Read(start_position, end_position) > File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 138, in > shuffle_client.PyShuffleReader.Read > IOError: Shuffle read failed: INTERNAL: Received RST_STREAM with error code 2 > talking to my-dataflow-02271107-756f-harness-2p65:12346 > {noformat} > i also get some information message: > {noformat} > Refusing to split at 0x7f03a00fe790> at '\x00\x00\x00\t\x1d\x14\x87\xa3\x00\x01': unstarted > {noformat} > For the flow, I am extracting data from BQ, cleaning using pandas, exporting > as a csv file, gzipping and uploading the compressed file to a bucket using > decompressive transcoding (csv export, gzip compression and upload are in the > same 'worker' as they are done in the same beam.DoFn). > PS: i can't find a reasonable way to export the logs from GCP but i can > privately send the log file i have of the run on my machine (the log of the > pipeline) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3757) Shuffle read failed using python 2.2.0
[ https://issues.apache.org/jira/browse/BEAM-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379850#comment-16379850 ] Luke Cwik commented on BEAM-3757: - It is best to reach out to [dataflow-python-feedb...@google.com|mailto:to%c2%a0dataflow-python-feedb...@google.com] for support with a job id and a link to this bug. > Shuffle read failed using python 2.2.0 > -- > > Key: BEAM-3757 > URL: https://issues.apache.org/jira/browse/BEAM-3757 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.2.0 > Environment: gcp, macos >Reporter: Jonathan Delfour >Assignee: Thomas Groh >Priority: Major > > Hi, > First issue is that the beam 2.3.0 python SDK is apparently not working on > GCP. It gets stuck: > {noformat} > Workflow failed. Causes: (bf832d44290fbf41): The Dataflow appears to be > stuck. You can get help with Cloud Dataflow at > https://cloud.google.com/dataflow/support. > {noformat} > I tried two times. > Reverting back to 2.2.0: it usually works but today, after > 1 hour of > processing, and 30 workers used, I get a failure with these in the logs: > {noformat} > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line > 582, in do_work > work_executor.execute() > File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", > line 167, in execute > op.start() > File "dataflow_worker/shuffle_operations.py", line 49, in > dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start > def start(self): > File "dataflow_worker/shuffle_operations.py", line 50, in > dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start > with self.scoped_start_state: > File "dataflow_worker/shuffle_operations.py", line 65, in > dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start > with self.shuffle_source.reader() as reader: > File "dataflow_worker/shuffle_operations.py", line 67, in > dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start > for key_values in reader: > File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", > line 406, in __iter__ > for entry in entries_iterator: > File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", > line 248, in next > return next(self.iterator) > File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", > line 206, in __iter__ > chunk, next_position = self.reader.Read(start_position, end_position) > File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 138, in > shuffle_client.PyShuffleReader.Read > IOError: Shuffle read failed: INTERNAL: Received RST_STREAM with error code 2 > talking to my-dataflow-02271107-756f-harness-2p65:12346 > {noformat} > i also get some information message: > {noformat} > Refusing to split at 0x7f03a00fe790> at '\x00\x00\x00\t\x1d\x14\x87\xa3\x00\x01': unstarted > {noformat} > For the flow, I am extracting data from BQ, cleaning using pandas, exporting > as a csv file, gzipping and uploading the compressed file to a bucket using > decompressive transcoding (csv export, gzip compression and upload are in the > same 'worker' as they are done in the same beam.DoFn). > PS: i can't find a reasonable way to export the logs from GCP but i can > privately send the log file i have of the run on my machine (the log of the > pipeline) -- This message was sent by Atlassian JIRA (v7.6.3#76005)