Michael-cd30 commented on issue #51563: URL: https://github.com/apache/airflow/issues/51563#issuecomment-2960270703
> > For example, each "transform" DAG will wait for the "extract_load" DAGs to finish. > > [@Michael-cd30](https://github.com/Michael-cd30), this is a really good use-case for Assets. I'd give those a look! > > https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/asset-scheduling.html#schedule-dags-with-assets Assets won't help here as we have several “extract_load” DAGs and several “transform” DAGs without them knowing each other. So we can add new DAGs without having to change the logic and just by tagging the DAGs correctly. Here's how it works. When a “transform” DAG launches, it executes a first sensor that says “Hey guys, is anyone extracting/loading here?" If yes, then the transform DAG says “OK, no problem, I'll reschedule in 10 minutes”. Otherwise, it actually launches the transformations. On the other hand, when a DAG “extract_load” is launched, it also executes a first sensor that says “Hey guys, is there anyone out there transforming?”. If so, then the “extract_load” DAG says “Ok, in that case I'll stop wisely (soft_fail)". In this way, transformations are only executed once all extractions/loads have been properly completed. This guarantees data consistency. But it also guarantees that when a transformation starts, it's no longer possible for an “extract_load” to start. This ensures that there is no starvation of transformation processes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
