shahkshitij15 opened a new issue #21801:
URL: https://github.com/apache/airflow/issues/21801


   ### Apache Airflow version
   
   2.2.4 (latest released)
   
   ### What happened
   
   I was trying to create an external table for a CSV file in GCS using the 
GCSToBigQueryOperator with autodetect=True but ran into some issues. The error 
stated that either schema field or schema object must be mentioned for creating 
an external table configuration. On close inspection of the code, I found out 
that the operator cannot autodetect the schema of the file.
   
   In the 
[file](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py),
 a piece of code seems to be missing when calling the create_external_table 
function at line 262.
   
   This must be an oversight but it **prevents the creation of an external 
table with an automatically deduced schema.**
   
   The **solution** is to pass autodetect=self.autodetect when calling the 
create_external_table function as mentioned below:
   if self.external_table:
       [...]
        autodetect=self.autodetect,
       [...]
   
   ### What you expected to happen
   
   The operator should have autodetected the schema of the CSV file and created 
an external table but it threw an error stating that either schema field or 
schema object must be mentioned for creating external table configuration
   
   This error is due to the fact that the value of autodetect is not being 
passed when calling the create_external_table function in this 
[file](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py)
 at line 262. Also, the default value of autodetect is False in 
create_external_table and so one gets the error as the function receives 
neither autodetect, schema_field or schema_object value
   
   ### How to reproduce
   
   The above issue can be reproduced by calling the GCSToBigQueryOperator with 
the following parameters as follow:
   
   create_external_table = GCSToBigQueryOperator(
          task_id = <task_id>
          bucket = <bucket_name>,
          source_objects = [<gcs path excluding bucket name to csv file>],
          destination_project_dataset_table = 
<project_id>.<dataset_name>.<table_name>,
          schema_fields=None,
          schema_object=None,
          source_format='CSV',
          autodetect = True,
          external_table=True, 
         dag = dag
   )
   
   create_external_table
   
   ### Operating System
   
   macOS Monterey 12.2.1
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Composer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to