vchiapaikeo opened a new pull request, #28444:
URL: https://github.com/apache/airflow/pull/28444

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #28441
   related: #28441
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   GCSToBigQueryOperator allows multiple ways to specify schema of the BigQuery 
table:
   
   1. Setting autodetect == True
   1. Setting schema_fields directly with autodetect == False
   1. Setting a schema_object and optionally a schema_object_bucket with 
autodetect == False
   
   This third method seems to be broken in the latest provider version (8.6.0) 
and will always result in this error:
   
   ```
   [2022-12-16, 21:06:18 UTC] {taskinstance.py:1772} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py",
 line 395, in execute
       self.configuration = self._check_schema_fields(self.configuration)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py",
 line 524, in _check_schema_fields
       raise RuntimeError(
   RuntimeError: Table schema was not found. Set autodetect=True to 
automatically set schema fields from source objects or pass schema_fields 
explicitly
   ```
   
   The reason for this is because [this 
block](https://github.com/apache/airflow/blob/25bdbc8e6768712bad6043618242eec9c6632618/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py#L318-L320)
 where `if self.schema_object and self.source_format != "DATASTORE_BACKUP":`. 
fails to set self.schema_fields. It only sets the local variable, 
schema_fields. When self._check_schema_fields is subsequently called 
[here](https://github.com/apache/airflow/blob/25bdbc8e6768712bad6043618242eec9c6632618/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py#L395),
 we enter the [first 
block](https://github.com/apache/airflow/blob/25bdbc8e6768712bad6043618242eec9c6632618/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py#L523-L528)
 because autodetect is false and schema_fields is not set.
   
   This PR sets the instance variable, self.schema_fields when the user passes 
in a schema_obj. Additionally, it uses self.schema_object_bucket instead of the 
erroneous self.bucket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to