Sounds like your input job was somehow incompatible with the Dataflow
worker. Running using a clean virtual env should help verify as Ahmet
mentioned.

On Mon, Jun 3, 2019 at 5:44 PM Ahmet Altay <al...@google.com> wrote:

> Do you have any other changes? Are you trying from head with a clean
> virtual environment?
>
> If you can share a link to dataflow job (in the apache-beam-testing GCP
> project), we can try to look at additional logs as well.
>
> On Mon, Jun 3, 2019 at 1:42 PM Tanay Tummalapalli <ttanay...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I ran the Integration Tests -
>> BigQueryStreamingInsertTransformIntegrationTests[1] and
>> BigQueryFileLoadsIT[2] on the master branch locally, with the following
>> command:
>> ./scripts/run_integration_test.sh --test_opts
>> --tests=apache_beam.io.gcp.bigquery_test:BigQueryStreamingInsertTransformIntegrationTests
>> The Dataflow jobs for the tests failed with the following error:
>> root: INFO: 2019-06-03T18:36:53.021Z: JOB_MESSAGE_ERROR: Traceback (most
>> recent call last):
>> File
>> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",
>> line 649, in do_work
>> work_executor.execute()
>> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py",
>> line 150, in execute
>> test_shuffle_sink=self._test_shuffle_sink)
>> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py",
>> line 116, in create_operation
>> is_streaming=False)
>> File "apache_beam/runners/worker/operations.py", line 962, in
>> apache_beam.runners.worker.operations.create_operation
>> op = BatchGroupAlsoByWindowsOperation(
>> File "dataflow_worker/shuffle_operations.py", line 219, in
>> dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.
>> __init__
>> self.windowing = deserialize_windowing_strategy(self.spec.window_fn)
>> File "dataflow_worker/shuffle_operations.py", line 207, in
>> dataflow_worker.shuffle_operations.deserialize_windowing_strategy
>> return pickler.loads(serialized_data)
>> File
>> "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py",
>> line 248, in loads
>> c = base64.b64decode(encoded)
>> File "/usr/lib/python2.7/base64.py", line 78, in b64decode
>> raise TypeError(msg)
>> TypeError: Incorrect padding
>>
>>
>> I tested the same tests on the 2.13.0-RC#2 branch as well and they
>> passed. These tests also don't fail in the most recent Python post-commit
>> tests[3-5].
>>
>> Keeping in mind the recent b64 changes in BQ, none of the tests in the
>> test classes mentioned above makes use of a "BYTES" type field.
>> Would love to get pointers to possible reasons.
>>
>> Thank You
>> - TT
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_test.py#L479-L630
>> [2]
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py#L358-L528
>> [3]
>> https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/
>> [4]
>> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/
>> [5]
>> https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/
>>
>

Reply via email to