I also have that requirement and I'm working on a proposal for rescheduling tasks. My current PoC can be found at [1] which uses up_for_retry state which has some problems. I started to make some changes, I hope can make a first proposal this week.
The basic idea is: * A new "reschedule" flag for sensors, if set to True it will raise an AirflowRescheduleException (with the new schedule date) that causes a reschedule * Reschedule requests are recorded in new `task_reschedule` table and visualized in the Gantt view. * A new TI dependency that checks if a task is ready to be re-scheduled Advantages: * This change is backward compatible. Existing sensors behave like before. But it's possible to set the "reschedule" flag. * The timeout and poke_interval are still respected and used to calculate the next schedule time * Custom sensor implementations can even define the next sensible schedule date. * This mechanism can also be used by non-sensor operators Kind Regards, Stefan [1] https://github.com/seelmann/incubator-airflow/tree/reschedule-sensor-3 On 07/10/2018 04:05 PM, Pedro Machado wrote: > I have a few DAGs that use time sensors to wait until data is ready, which > can be several days. > > I have one daily DAG where, for each execution date, I have to repull the > data for the next 7 days to capture changes (late arriving revenue data). > This DAG currently starts 7 TimeDeltaSensors for each execution days with > delays that range from 0 to 6 days. > > I was wondering what the recommendation is for cases like this where a > large number of sensors is needed. > > Are there ways to reduce the footprint of these sensors so that they use > less CPU and memory? > > I noticed that in one of the DAGs that Germain Tanguy had in the > presentation he shared today a sensor was set to time out every 30 seconds > but had a large retry count so instead of running constantly, it runs every > 15 minutes for 30 seconds and then dies. > > Are other people using this pattern? Do you have other suggestions? > > Thanks, > > Pedro >