vchiapaikeo opened a new pull request, #28444: URL: https://github.com/apache/airflow/pull/28444
<!-- Thank you for contributing! Please make sure that your code changes are covered with tests. And in case of new features or big changes remember to adjust the documentation. Feel free to ping committers for the review! In case of an existing issue, reference it using one of the following: closes: #28441 related: #28441 How to write a good git commit message: http://chris.beams.io/posts/git-commit/ --> GCSToBigQueryOperator allows multiple ways to specify schema of the BigQuery table: 1. Setting autodetect == True 1. Setting schema_fields directly with autodetect == False 1. Setting a schema_object and optionally a schema_object_bucket with autodetect == False This third method seems to be broken in the latest provider version (8.6.0) and will always result in this error: ``` [2022-12-16, 21:06:18 UTC] {taskinstance.py:1772} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py", line 395, in execute self.configuration = self._check_schema_fields(self.configuration) File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py", line 524, in _check_schema_fields raise RuntimeError( RuntimeError: Table schema was not found. Set autodetect=True to automatically set schema fields from source objects or pass schema_fields explicitly ``` The reason for this is because [this block](https://github.com/apache/airflow/blob/25bdbc8e6768712bad6043618242eec9c6632618/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py#L318-L320) where `if self.schema_object and self.source_format != "DATASTORE_BACKUP":`. fails to set self.schema_fields. It only sets the local variable, schema_fields. When self._check_schema_fields is subsequently called [here](https://github.com/apache/airflow/blob/25bdbc8e6768712bad6043618242eec9c6632618/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py#L395), we enter the [first block](https://github.com/apache/airflow/blob/25bdbc8e6768712bad6043618242eec9c6632618/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py#L523-L528) because autodetect is false and schema_fields is not set. This PR sets the instance variable, self.schema_fields when the user passes in a schema_obj. Additionally, it uses self.schema_object_bucket instead of the erroneous self.bucket. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org