Thanks for bringing this up. I've added a comment on the Wiki: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17+Airflow+sensor+optimization
Have you looked into the work by Seelmann? Recently he introduced the ability to reschedule sensors. When rescheduling, the slot will be given back to the scheduler after a poke operation. Therefore the slot won't be occupied all the time. The details are in the PR https://github.com/apache/airflow/pull/3596 I would propose to make this the default behavior in Airflow 2.0. Cheers, Fokko Op wo 6 mrt. 2019 om 15:32 schreef Yingbo Wang <ybw...@gmail.com>: > hi, > > I would like to open an AIP for Airflow sensor optimization. > > > *Motivation*: > > Low efficiency in Airflow Sensor Implementation > > Sensors are a special kind of operator that will keep running until a > certain criterion is met. Examples include a specific file landing in HDFS > or S3, a partition appearing in Hive, or a specific time of the day. > Sensors are derived from BaseSensorOperator and run a poke method at a > specified poke_interval until it returns True. > > The reason that the sensor tasks are inefficient is because in current > design, we sprawn a separate worker process for each partition sensor. This > worker might last a long time, until the target partition is available. In > the case where there are many sensor tasks that need to run within certain > time limits, we have to allocate a lot of resources to have enough workers > for the sensor tasks. > > *Idea:* > > We propose two approaches that could address this issues, batch-sensor > and smart-sensor. > > > > Batch-sensor > > The basic idea of batch-sensor is to batch process sensor tasks to save > resources. During running, a batch-sensor will take N partition sensor > requests as the input and poke those N partitions periodically. If the > batch-sensor finds that the criteria of some sensor task is met, the > batch-sensor will update the database about this sensor tasks. > > > To do this, we need to create a sensor basic class called ‘batchable’ and > make all sensors inherit from this basic class. We also need to change the > behavior of schedule regarding a batchable sensor tasks. The schedule will > find as many as possible batchable sensor tasks and run those tasks in a > batch. > > > Smart-sensor > > Smart-sensor is an improvement on top of batch-sensor. > > The idea of smart-sensor is that the worker process of smart-sensor will > run like a service. To do this, we need to persist Sensor details in > Airflow DB and the worker process periodically queries task-instance table > to find sensor tasks; poke the metastore and update the task instance table > if it detects that certain partition or file created. >