Michael-cd30 commented on issue #51563:
URL: https://github.com/apache/airflow/issues/51563#issuecomment-2960270703

   > > For example, each "transform" DAG will wait for the "extract_load" DAGs 
to finish.
   > 
   > [@Michael-cd30](https://github.com/Michael-cd30), this is a really good 
use-case for Assets. I'd give those a look!
   > 
   > 
https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/asset-scheduling.html#schedule-dags-with-assets
   
   Assets won't help here as we have several “extract_load” DAGs and several 
“transform” DAGs without them knowing each other. So we can add new DAGs 
without having to change the logic and just by tagging the DAGs correctly.
   
   Here's how it works.
   
   When a “transform” DAG launches, it executes a first sensor that says “Hey 
guys, is anyone extracting/loading here?" If yes, then the transform DAG says 
“OK, no problem, I'll reschedule in 10 minutes”. Otherwise, it actually 
launches the transformations.
   
   On the other hand, when a DAG “extract_load” is launched, it also executes a 
first sensor that says “Hey guys, is there anyone out there transforming?”. If 
so, then the “extract_load” DAG says “Ok, in that case I'll stop wisely 
(soft_fail)".
   
   In this way, transformations are only executed once all extractions/loads 
have been properly completed. This guarantees data consistency.
   
   But it also guarantees that when a transformation starts, it's no longer 
possible for an “extract_load” to start. This ensures that there is no 
starvation of transformation processes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to