Hello Airflow dev community! TL;DR; I'd like AF Datasets to support the following use case: I use a BigQuery table MyRawRecords, which is partitioned by date. DAG write_my_raw_records inserts records into said table on a daily schedule and is sometimes rerun to either correct previously inserted records or to insert late arriving data. There is a downstream DAG aggregate_my_raw_records_for_analysis owned by a separate team that runs daily and generates aggregate values against MyRawRecords. DAG B should be rerun anytime the MyRawRecords table is written to (new/updated data for a single partition). The existing Dataset mechanism doesn't support providing the target partition to the consuming DAG.
At Bombora, the above is a common use case. We produce many internally facing data products and would like to leverage a data/event-driven approach to triggering downstream DAGs without the explicit coupling of TriggerDagOperator/ExternalTaskOperator or the implicit coupling of schedule alignment with sensor timeouts. Currently, to leverage Datasets, this use case requires the consuming DAG to figure out which target partition of the table has been updated. Of course, the aggregates for the entire table could be recalculated, but this is a huge waste of resources. It seems like this may be one of many logical next steps with respect to Dataset related features. There has been interest in supporting this use case voiced in the comments on the original Dataset AIP, https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-48+Data+Dependency+Management+and+Data+Driven+Scheduling. Specifically, this and follow up comments: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-48+Data+Dependency+Management+and+Data+Driven+Scheduling?focusedCommentId=217385741#comment-217385741. I have not looked at the Dataset related code and am not sure what the complexity of something like this would be. I assume it wouldn't be trivial, given that the Datasets are currently static objects. Would love to hear some feedback. Thanks! Jeff Payne