Re: Using large numbers of sensors, resource consumption

Maxime Beauchemin Sun, 15 Jul 2018 11:30:34 -0700

There have been conversations in the past around the idea of adding an
`evaluation_method` argument in BaseSensor that would allow for different
options:
1. the current approach which is taking up a slot and poking periodically
(heavy on slot usage)
2. one approach closer to fail/retry approach, likely introducing a new
state representing that it's waiting for the next sensing event (heavy on
overhead, MQ traffic, ...)
3. one where the scheduler itself runs the "poke" method in line, in many
cases it represents very little overhead for the scheduler to run that
task, and the scheduler is already insulated (DAG is parsed in a sub
process) (heavy on the scheduler machine). I think it's reasonable to do
this even without a distributed scheduler, especially for cheap-to-check
sensors.


Another way to mitigate resources that we used at Airbnb is to have a
dedicated sensor queue with machines that are provisioned more aggressively
(say 16 or 32 slots per CPU core), and route the cheap sensing tasks to
those machines.

Max

On Thu, Jul 12, 2018 at 11:51 AM Pedro Machado <pe...@205datalab.com> wrote:

> Thanks, Ash, Alexander, and Stefan for your replies.
>
> I am relatively new to airflow and not familiar with the code base. I like
> the idea of having a more efficient sensor.
>
> The async approach makes sense, but I don't know how well it would fit
> within the existing architecture.
>
> I like that Stefan's "reschedule" approach can fit the current architecture
> and could be implemented sooner. From the user point of view, my only
> feedback is that the UI should not show sensors that are still running as
> failed or up for retry as that would draw attention to things that are
> running as expected. I'll add this comment to the JIRA issue.
>
> Thanks!
>
> Pedro
>
>
> On Tue, Jul 10, 2018 at 9:44 AM Stefan Seelmann <m...@stefan-seelmann.de>
> wrote:
>
> > I also have that requirement and I'm working on a proposal for
> > rescheduling tasks. My current PoC can be found at [1] which uses
> > up_for_retry state which has some problems. I started to make some
> > changes, I hope can make a first proposal this week.
> >
> > The basic idea is:
> > * A new "reschedule" flag for sensors, if set to True it will raise an
> > AirflowRescheduleException (with the new schedule date) that causes a
> > reschedule
> > * Reschedule requests are recorded in new `task_reschedule` table and
> > visualized in the Gantt view.
> > * A new TI dependency that checks if a task is ready to be re-scheduled
> >
> > Advantages:
> > * This change is backward compatible. Existing sensors behave like
> > before. But it's possible to set the "reschedule" flag.
> > * The timeout and poke_interval are still respected and used to
> > calculate the next schedule time
> > * Custom sensor implementations can even define the next sensible
> > schedule date.
> > * This mechanism can also be used by non-sensor operators
> >
> > Kind Regards,
> > Stefan
> >
> > [1]
> https://github.com/seelmann/incubator-airflow/tree/reschedule-sensor-3
> >
> > On 07/10/2018 04:05 PM, Pedro Machado wrote:
> > > I have a few DAGs that use time sensors to wait until data is ready,
> > which
> > > can be several days.
> > >
> > > I have one daily DAG where, for each execution date, I have to repull
> the
> > > data for the next 7 days to capture changes (late arriving revenue
> data).
> > > This DAG currently starts 7 TimeDeltaSensors for each execution days
> with
> > > delays that range from 0 to 6 days.
> > >
> > > I was wondering what the recommendation is for cases like this where a
> > > large number of sensors is needed.
> > >
> > > Are there ways to reduce the footprint of these sensors so that they
> use
> > > less CPU and memory?
> > >
> > > I noticed that in one of the DAGs that Germain Tanguy had in the
> > > presentation he shared today a sensor was set to time out every 30
> > seconds
> > > but had a large retry count so instead of running constantly, it runs
> > every
> > > 15 minutes for 30 seconds and then dies.
> > >
> > > Are other people using this pattern? Do you have other suggestions?
> > >
> > > Thanks,
> > >
> > > Pedro
> > >
> >
> >
>

Re: Using large numbers of sensors, resource consumption

Reply via email to