I'm concerned that we would be making the logic more complex, unless the new sensor 'pokeonce' case is just a high number of retries. And the other overhead of course. Running the poke method inline wouldn't be great for perf either since it's a blocking I/O and would need to be handled async in order to not slow down scheduling.
FWIW, our current setup at Airbnb has a separate queue for sensors with a high number of slots per worker. On Fri, Jul 28, 2017 at 11:14 AM, Maxime Beauchemin < maximebeauche...@gmail.com> wrote: > Thought his was interesting to bubble up to the mailing list. From: > https://github.com/apache/incubator-airflow/pull/2423# > issuecomment-318723842 > > This is about the issue around sensors utilizing a lot of worker slots. The > context is a PR from @shaform introducing sensors that check once and give > up their slot and get reschedule for each sensing operation (as opposed to > the current behavior of sleeping and poking while constantly using the slot > until the criteria is met or timeout is reached) > > --------------- > > *So this is legitimate, but shifts some of the burden of slot utilization > towards other costs like task startups costs and more communication > overhead. These costs may be preferable depending on the > scenario/environment. Starting a task can have significant overhead > depending on the size of the DAG and other factors that depend on the > executor. Say for the upcoming Kubernetes executor, startup may include > booting up a docker instance and doing a shallow clone of the repo.* > > *Since this is a major change, I would argue that we shouldn't change the > current default since organizations have provisioned and stabilized their > environments based on the current behavior. Default behavior could be > changed when moving to 2.0, which isn't really planned or scheduled at the > moment.* > > *Another idea around reducing the overall sensor slot utilization would be > to move that burden towards the scheduler (let's call it the supervisor now > since it does more than just scheduling at this point). My idea there was > to add a flag to BaseSensorOperator that would tell the scheduler to run > the poke method in line with the scheduling instead of using the executor. > In that scenario, there's no startup cost and no communication overhead. > The downside is that it can slow down the scheduler. This would be a great > option where sensing is cheap and fast* > > *That gives us potentially 3 sensor_modes, which I would argue should be > implemented as a BaseOperator argument. Derivative classes can decide to > expose the argument or force it. Administrator could also use > the policy function to force certain sensing mode in certain or all > contexts in their environment.* > > Max >