[ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280113
 ]

ASF GitHub Bot logged work on BEAM-6611:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Jul/19 17:33
            Start Date: 20/Jul/19 17:33
    Worklog Time Spent: 10m 
      Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513485596
 
 
   I'm unable to run the IT tests for BQFL(except for 
`test_one_job_fails_all_jobs_fail`) locally(even on clean master) and also on a 
VM due to this error:
   ```
   ERROR: test_multiple_destinations_transform 
(apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py",
 line 467, in test_multiple_destinations_transform
       max_files_per_bundle=-1))
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
 line 426, in __exit__
       self.run().wait_until_finish()
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
 line 406, in run
       self._options).run(False)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
 line 419, in run
       return self.runner.run_pipeline(self, self._options)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
 line 43, in run_pipeline
       self.result = super(TestDirectRunner, self).run_pipeline(pipeline, 
options)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 128, in run_pipeline
       return runner.run_pipeline(pipeline, options)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 319, in run_pipeline
       default_environment=self._default_environment))
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 326, in run_via_runner_api
       return self.run_stages(stage_context, stages)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 408, in run_stages
       stage_context.safe_coders)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 681, in _run_stage
       result, splits = bundle_manager.process_bundle(data_input, data_output)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1562, in process_bundle
       part_inputs):
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 586, in 
result_iterator
       yield fs.pop().result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 432, in 
result
       return self.__get_result()
     File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in 
__get_result
       raise self._exception
     File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in 
run
       result = self.fn(*self.args, **self.kwargs)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1561, in <lambda>
       self._registered).process_bundle(part, expected_outputs),
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1500, in process_bundle
       result_future = self._controller.control_handler.push(process_bundle_req)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1017, in push
       response = self.worker.do_instruction(request)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 342, in do_instruction
       request.instruction_id)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 368, in process_bundle
       bundle_processor.process_bundle(instruction_id))
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 593, in process_bundle
       data.ptransform_id].process_encoded(data.data)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 143, in process_encoded
       self.output(decoded_value)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/operations.py",
 line 256, in output
       cython.cast(Receiver, 
self.receivers[output_index]).receive(windowed_value)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/operations.py",
 line 143, in receive
       self.consumer.process(windowed_value)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/operations.py",
 line 594, in process
       delayed_application = self.dofn_receiver.receive(o)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
 line 795, in receive
       self.process(windowed_value)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
 line 801, in process
       self._reraise_augmented(exn)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
 line 868, in _reraise_augmented
       raise_with_traceback(new_exn)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/future/utils/__init__.py",
 line 421, in raise_with_traceback
       raise exc.with_traceback(traceback)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
 line 799, in process
       return self.do_fn_invoker.invoke_process(windowed_value)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
 line 611, in invoke_process
       windowed_value, additional_args, additional_kwargs, output_processor)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
 line 683, in _invoke_process_per_window
       windowed_value, self.process_method(*args_for_process))
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
 line 914, in process_outputs
       for result in results:
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py",
 line 413, in process
       additional_load_parameters=additional_parameters)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_tools.py",
 line 594, in perform_load_job
       additional_load_parameters=additional_load_parameters)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/utils/retry.py",
 line 197, in wrapper
       return fun(*args, **kwargs)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_tools.py",
 line 350, in _insert_load_job
       response = self.client.jobs.Insert(request)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
 line 342, in Insert
       upload=upload, upload_config=upload_config)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/apitools/base/py/base_api.py",
 line 731, in _RunMethod
       return self.ProcessHttpResponse(method_config, http_response, request)
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/apitools/base/py/base_api.py",
 line 737, in ProcessHttpResponse
       self.__ProcessHttpResponse(method_config, http_response, request))
     File 
"/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/apitools/base/py/base_api.py",
 line 604, in __ProcessHttpResponse
       http_response, method_config=method_config, request=request)
   RuntimeError: apitools.base.py.exceptions.HttpBadRequestError: HttpError 
accessing <https://www.googleapis.com/bigquery/v2/projects//jobs?alt=json>: 
response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 
'application/json; charset=UTF-8', 'date': 'Sat, 20 Jul 2019 17:08:07 GMT', 
'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 
'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 
'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"', 'transfer-encoding': 
'chunked', 'status': '400', 'content-length': '632', '-content-encoding': 
'gzip'}>, content <{
     "error": {
       "code": 400,
       "message": "Invalid project ID ''. Project IDs must contain 6-63 
lowercase letters, digits, or dashes. Some project IDs also include domain name 
separated by a colon. IDs must start with a letter and may not end with a 
dash.",
       "errors": [
         {
           "message": "Invalid project ID ''. Project IDs must contain 6-63 
lowercase letters, digits, or dashes. Some project IDs also include domain name 
separated by a colon. IDs must start with a letter and may not end with a 
dash.",
           "domain": "global",
           "reason": "invalid"
         }
       ],
       "status": "INVALID_ARGUMENT"
     }
   }
   > [while running 
'WriteWithMultipleDests/BigQueryBatchFileLoads/ParDo(TriggerLoadJobs)/ParDo(TriggerLoadJobs)']
   -------------------- >> begin captured logging << --------------------
   ```
   That's not the case with the tests being run in Jenkins. 
   
   Trying to resolve this so I can debug the failing test.
   Hopefully, it's just a problem with my installation.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 280113)
    Time Spent: 6h 10m  (was: 6h)

> A Python Sink for BigQuery with File Loads in Streaming
> -------------------------------------------------------
>
>                 Key: BEAM-6611
>                 URL: https://issues.apache.org/jira/browse/BEAM-6611
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Pablo Estrada
>            Assignee: Tanay Tummalapalli
>            Priority: Major
>              Labels: gsoc, gsoc2019, mentor
>          Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to