Re: Using large numbers of sensors, resource consumption

Ash Berlin-Taylor Tue, 10 Jul 2018 07:26:40 -0700

We are also using the "high number of retries" pattern rather than sensors 
(S3KeySensor in our case) for similar reason - we have data that arrives for a 
week "some point after Thursday midnight" -- but that can take 5 or even 8 days 
for it to arrive. Yay third parties.


It would be nice to have a different kind of sensor (or a flag to the existing 
ones) so that rather than sitting in a busy loop on an executor they just go 
back and re-schedule themselves. We've just not gotten around to writing that 
(we being where I work).

-ash

> On 10 Jul 2018, at 15:05, Pedro Machado <pe...@205datalab.com> wrote:
> 
> I have a few DAGs that use time sensors to wait until data is ready, which
> can be several days.
> 
> I have one daily DAG where, for each execution date, I have to repull the
> data for the next 7 days to capture changes (late arriving revenue data).
> This DAG currently starts 7 TimeDeltaSensors for each execution days with
> delays that range from 0 to 6 days.
> 
> I was wondering what the recommendation is for cases like this where a
> large number of sensors is needed.
> 
> Are there ways to reduce the footprint of these sensors so that they use
> less CPU and memory?
> 
> I noticed that in one of the DAGs that Germain Tanguy had in the
> presentation he shared today a sensor was set to time out every 30 seconds
> but had a large retry count so instead of running constantly, it runs every
> 15 minutes for 30 seconds and then dies.
> 
> Are other people using this pattern? Do you have other suggestions?
> 
> Thanks,
> 
> Pedro

Re: Using large numbers of sensors, resource consumption

Reply via email to