Sounds like your input job was somehow incompatible with the Dataflow worker. Running using a clean virtual env should help verify as Ahmet mentioned.
On Mon, Jun 3, 2019 at 5:44 PM Ahmet Altay <al...@google.com> wrote: > Do you have any other changes? Are you trying from head with a clean > virtual environment? > > If you can share a link to dataflow job (in the apache-beam-testing GCP > project), we can try to look at additional logs as well. > > On Mon, Jun 3, 2019 at 1:42 PM Tanay Tummalapalli <ttanay...@gmail.com> > wrote: > >> Hi everyone, >> >> I ran the Integration Tests - >> BigQueryStreamingInsertTransformIntegrationTests[1] and >> BigQueryFileLoadsIT[2] on the master branch locally, with the following >> command: >> ./scripts/run_integration_test.sh --test_opts >> --tests=apache_beam.io.gcp.bigquery_test:BigQueryStreamingInsertTransformIntegrationTests >> The Dataflow jobs for the tests failed with the following error: >> root: INFO: 2019-06-03T18:36:53.021Z: JOB_MESSAGE_ERROR: Traceback (most >> recent call last): >> File >> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", >> line 649, in do_work >> work_executor.execute() >> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", >> line 150, in execute >> test_shuffle_sink=self._test_shuffle_sink) >> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", >> line 116, in create_operation >> is_streaming=False) >> File "apache_beam/runners/worker/operations.py", line 962, in >> apache_beam.runners.worker.operations.create_operation >> op = BatchGroupAlsoByWindowsOperation( >> File "dataflow_worker/shuffle_operations.py", line 219, in >> dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation. >> __init__ >> self.windowing = deserialize_windowing_strategy(self.spec.window_fn) >> File "dataflow_worker/shuffle_operations.py", line 207, in >> dataflow_worker.shuffle_operations.deserialize_windowing_strategy >> return pickler.loads(serialized_data) >> File >> "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", >> line 248, in loads >> c = base64.b64decode(encoded) >> File "/usr/lib/python2.7/base64.py", line 78, in b64decode >> raise TypeError(msg) >> TypeError: Incorrect padding >> >> >> I tested the same tests on the 2.13.0-RC#2 branch as well and they >> passed. These tests also don't fail in the most recent Python post-commit >> tests[3-5]. >> >> Keeping in mind the recent b64 changes in BQ, none of the tests in the >> test classes mentioned above makes use of a "BYTES" type field. >> Would love to get pointers to possible reasons. >> >> Thank You >> - TT >> >> [1] >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_test.py#L479-L630 >> [2] >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py#L358-L528 >> [3] >> https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/ >> [4] >> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/ >> [5] >> https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/ >> >