[ https://issues.apache.org/jira/browse/BEAM-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928073#comment-16928073 ]
Valentyn Tymofieiev commented on BEAM-8216: ------------------------------------------- Over to [~chamikara] to triage. > GCS IO fails with uninformative 'Broken pipe' errors while attempting to > write to a GCS bucket without proper permissions. > -------------------------------------------------------------------------------------------------------------------------- > > Key: BEAM-8216 > URL: https://issues.apache.org/jira/browse/BEAM-8216 > Project: Beam > Issue Type: Bug > Components: io-py-gcp > Reporter: Valentyn Tymofieiev > Assignee: Chamikara Jayalath > Priority: Major > > Obvserved while executing a wordcount IT pipeline: > {noformat} > ./gradlew :sdks:python:test-suites:dataflow:py36:integrationTest \ > -Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it \ > -Dattr=IT > -DpipelineOptions="--project=some_project_different_from_apache_beam_testing \ > --staging_location=gs://some_bucket/ \ > --temp_location=gs://some_bucket/ \ > --input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000* > \ > --output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output \ > --expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710 \ > --num_workers=10 \ > --autoscaling_algorithm=NONE \ > --runner=TestDataflowRunner \ > --sdk_location=/full/path/to/beam/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz" > \ > --info > {noformat} > gs://temp-storage-for-end-to-end-tests/py-it-cloud/output lives in a > different project than was running the pipeline. > This caused a bunch of Broken pipe errors. Console logs: > {noformat} > root: INFO: 2019-09-11T19:06:23.055Z: JOB_MESSAGE_BASIC: Finished operation > read/Read+split+pair_with_one+group/Reify+group/Write > root: INFO: 2019-09-11T19:06:23.157Z: JOB_MESSAGE_BASIC: Executing operation > group/Close > root: INFO: 2019-09-11T19:06:23.208Z: JOB_MESSAGE_BASIC: Finished operation > group/Close > root: INFO: 2019-09-11T19:06:23.263Z: JOB_MESSAGE_BASIC: Executing operation > group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write > root: INFO: 2019-09-11T19:06:25.571Z: JOB_MESSAGE_ERROR: Traceback (most > recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 594, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 666, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > File "/usr/local/lib/python3.6/site-packages/apache_beam/io/iobase.py", > line 1042, in process > self.writer.write(element) > File > "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", > line 393, in write > self.sink.write_record(self.temp_handle, value) > File > "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", > line 137, in write_record > self.write_encoded_record(file_handle, self.coder.encode(value)) > File "/usr/local/lib/python3.6/site-packages/apache_beam/io/textio.py", > line 407, in write_encoded_record > file_handle.write(encoded_value) > File > "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py", line > 202, in write > self._uploader.put(b) > File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py", > line 594, in put > self._conn.send_bytes(data.tobytes()) > File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 200, in > send_bytes > self._send_bytes(m[offset:offset + size]) > File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 397, in > _send_bytes > self._send(header) > File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 368, in > _send > n = write(self._handle, buf) > BrokenPipeError: [Errno 32] Broken pipe > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line > 649, in do_work > ... > root: INFO: 2019-09-11T19:06:33.027Z: JOB_MESSAGE_DEBUG: Executing failure > step failure25 > root: INFO: 2019-09-11T19:06:33.066Z: JOB_MESSAGE_ERROR: Workflow failed. > Causes: > S08:group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write > failed., The job failed because a work item has failed 4 times. Look in > previous log entries for the cause of each one of the 4 failures. For more > information, see https://cloud.google.com/dataflow/docs/guides/common-errors. > The work item was attempted on these workers: > beamapp-valentyn-09111855-09111155-pj3z-harness-5g6h > Root cause: Work item failed., > beamapp-valentyn-09111855-09111155-pj3z-harness-6ccc > Root cause: Work item failed., > beamapp-valentyn-09111855-09111155-pj3z-harness-45pp > Root cause: Work item failed., > beamapp-valentyn-09111855-09111155-pj3z-ha > {noformat} > Errors were gone after I changed the bucket to a bucket in the project where > I ran the pipeline. -- This message was sent by Atlassian Jira (v8.3.2#803003)