Joel Croteau created AIRFLOW-5046: ------------------------------------- Summary: Allow GoogleCloudStorageToBigQueryOperator to accept source_objects as a string or otherwise take input from XCom Key: AIRFLOW-5046 URL: https://issues.apache.org/jira/browse/AIRFLOW-5046 Project: Apache Airflow Issue Type: Improvement Components: contrib, gcp Affects Versions: 1.10.2 Reporter: Joel Croteau
`GoogleCloudStorageToBigQueryOperator` should be able to have its `source_objects` dynamically determined by the results of a previous workflow. This is hard to do with it expecting a list, as any template expansion will render as a string. This could be implemented either as a check for whether `source_objects` is a string, and trying to parse it as a list if it is, or a separate argument for a string encoded as a list. My particular use case for this is as follows: # A daily DAG scans a GCS bucket for all objects created in the last day and loads them into BigQuery. # To find these objects, a `PythonOperator` scans the bucket and returns a list of object names. # A `GoogleCloudStorageToBigQueryOperator` is used to load these objects into BigQuery. The operator should be able to have its list of objects provided by XCom, but there is no functionality to do this, and trying to do a template expansion along the lines of `source_objects='\{{ task_instance.xcom_pull(key="KEY") }}'` doesn't work because this is rendered as a string, which `GoogleCloudStorageToBigQueryOperator` will try to treat as a list, with each character being a single item. -- This message was sent by Atlassian JIRA (v7.6.14#76016)