Hello! I have a DAG where the input size (rows) may grow or shrink significantly.
The first step (A) determines the size of the input set and groups into batches of a pre-defined size. The second step I want to generate a task per batch to perform an upload to a third party API (google adwords) / computation. The final step is a sensor which waits for the status of the batch to be completed and then a final task. Thoughts so far: - I don't necessarily need all tasks to execute in parallel I just want to be able to control the number that do through Pools - I could potentially calculate the batch size and number of tasks required at DAG compile time but this would make my DAG loading very slow (as I will have lots of DAGs doing this) - Is changing the number of tasks in a DAG dynamically going to screw up airflow? - I found this https://stackoverflow.com/a/51977800 but it feels a bit of a hack. - I could trigger multiple dagruns but this makes it harder to visualise and trace through the UI Or am i approaching this problem in the wrong way? Thanks for your help, Rob