We are also using the "high number of retries" pattern rather than sensors (S3KeySensor in our case) for similar reason - we have data that arrives for a week "some point after Thursday midnight" -- but that can take 5 or even 8 days for it to arrive. Yay third parties.
It would be nice to have a different kind of sensor (or a flag to the existing ones) so that rather than sitting in a busy loop on an executor they just go back and re-schedule themselves. We've just not gotten around to writing that (we being where I work). -ash > On 10 Jul 2018, at 15:05, Pedro Machado <pe...@205datalab.com> wrote: > > I have a few DAGs that use time sensors to wait until data is ready, which > can be several days. > > I have one daily DAG where, for each execution date, I have to repull the > data for the next 7 days to capture changes (late arriving revenue data). > This DAG currently starts 7 TimeDeltaSensors for each execution days with > delays that range from 0 to 6 days. > > I was wondering what the recommendation is for cases like this where a > large number of sensors is needed. > > Are there ways to reduce the footprint of these sensors so that they use > less CPU and memory? > > I noticed that in one of the DAGs that Germain Tanguy had in the > presentation he shared today a sensor was set to time out every 30 seconds > but had a large retry count so instead of running constantly, it runs every > 15 minutes for 30 seconds and then dies. > > Are other people using this pattern? Do you have other suggestions? > > Thanks, > > Pedro