[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280113 ]
ASF GitHub Bot logged work on BEAM-6611: ---------------------------------------- Author: ASF GitHub Bot Created on: 20/Jul/19 17:33 Start Date: 20/Jul/19 17:33 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513485596 I'm unable to run the IT tests for BQFL(except for `test_one_job_fails_all_jobs_fail`) locally(even on clean master) and also on a VM due to this error: ``` ERROR: test_multiple_destinations_transform (apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py", line 467, in test_multiple_destinations_transform max_files_per_bundle=-1)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py", line 426, in __exit__ self.run().wait_until_finish() File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py", line 406, in run self._options).run(False) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py", line 419, in run return self.runner.run_pipeline(self, self._options) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/test_direct_runner.py", line 43, in run_pipeline self.result = super(TestDirectRunner, self).run_pipeline(pipeline, options) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/direct_runner.py", line 128, in run_pipeline return runner.run_pipeline(pipeline, options) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 319, in run_pipeline default_environment=self._default_environment)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 326, in run_via_runner_api return self.run_stages(stage_context, stages) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 408, in run_stages stage_context.safe_coders) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 681, in _run_stage result, splits = bundle_manager.process_bundle(data_input, data_output) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1562, in process_bundle part_inputs): File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 586, in result_iterator yield fs.pop().result() File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1561, in <lambda> self._registered).process_bundle(part, expected_outputs), File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1500, in process_bundle result_future = self._controller.control_handler.push(process_bundle_req) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1017, in push response = self.worker.do_instruction(request) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 342, in do_instruction request.instruction_id) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 368, in process_bundle bundle_processor.process_bundle(instruction_id)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 593, in process_bundle data.ptransform_id].process_encoded(data.data) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 143, in process_encoded self.output(decoded_value) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/operations.py", line 256, in output cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/operations.py", line 143, in receive self.consumer.process(windowed_value) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/operations.py", line 594, in process delayed_application = self.dofn_receiver.receive(o) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py", line 795, in receive self.process(windowed_value) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py", line 801, in process self._reraise_augmented(exn) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py", line 868, in _reraise_augmented raise_with_traceback(new_exn) File "/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/future/utils/__init__.py", line 421, in raise_with_traceback raise exc.with_traceback(traceback) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py", line 799, in process return self.do_fn_invoker.invoke_process(windowed_value) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py", line 611, in invoke_process windowed_value, additional_args, additional_kwargs, output_processor) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py", line 683, in _invoke_process_per_window windowed_value, self.process_method(*args_for_process)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py", line 914, in process_outputs for result in results: File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py", line 413, in process additional_load_parameters=additional_parameters) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_tools.py", line 594, in perform_load_job additional_load_parameters=additional_load_parameters) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/utils/retry.py", line 197, in wrapper return fun(*args, **kwargs) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_tools.py", line 350, in _insert_load_job response = self.client.jobs.Insert(request) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 342, in Insert upload=upload, upload_config=upload_config) File "/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 731, in _RunMethod return self.ProcessHttpResponse(method_config, http_response, request) File "/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, http_response, request)) File "/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 604, in __ProcessHttpResponse http_response, method_config=method_config, request=request) RuntimeError: apitools.base.py.exceptions.HttpBadRequestError: HttpError accessing <https://www.googleapis.com/bigquery/v2/projects//jobs?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'date': 'Sat, 20 Jul 2019 17:08:07 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"', 'transfer-encoding': 'chunked', 'status': '400', 'content-length': '632', '-content-encoding': 'gzip'}>, content <{ "error": { "code": 400, "message": "Invalid project ID ''. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.", "errors": [ { "message": "Invalid project ID ''. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.", "domain": "global", "reason": "invalid" } ], "status": "INVALID_ARGUMENT" } } > [while running 'WriteWithMultipleDests/BigQueryBatchFileLoads/ParDo(TriggerLoadJobs)/ParDo(TriggerLoadJobs)'] -------------------- >> begin captured logging << -------------------- ``` That's not the case with the tests being run in Jenkins. Trying to resolve this so I can debug the failing test. Hopefully, it's just a problem with my installation. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 280113) Time Spent: 6h 10m (was: 6h) > A Python Sink for BigQuery with File Loads in Streaming > ------------------------------------------------------- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core > Reporter: Pablo Estrada > Assignee: Tanay Tummalapalli > Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6h 10m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)