+1 (non-binding) for this AIP. I really like the concept and the efficiency improvements. The general SmartSensor concept and the ability to add additional sensor classes is elegant.
>From an implementation perspective, my one area of concern is the "sharding" concept and the configuration / management overhead involved. I may have missed it in the AIP, but would it be possible to add auto-scaling to minimize this configuration? Also, a couple of clarifying questions: 1. Do you know if this is more suitable to certain kinds of sensors vs. others? 2. What do you think about leveraging this to enable "async" operations using Airflow i.e. submit a task and then use a "smart sensor" to check for completion? Best regards, Vikram On Thu, Jun 18, 2020 at 3:38 PM Yingbo Wang <ybw...@gmail.com> wrote: > Hello everyone! > > This email calls for a vote to add the airflow smart sensor at > https://github.com/apache/airflow/pull/5499 > > AIP: > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17%3A+Consolidate+and+de-duplicate+sensor+tasks+in+airflow+Smart+Sensor > > Change summary: > > - Add a new mode called “smart sensor mode”. In smart sensor mode, > instead of holding a long running process for each sensor and poking > periodically, a sensor will only store poke context at sensor_instance > table and then exits with a ‘sensing’ state. > - When the smart sensor mode is enabled, a special set of builtin smart > sensor DAGs (named smart_sensor_group_shard_xxx) is created by the > system; > These DAGs contain SmartSensorOperator task and manage the smart sensor > jobs for the airflow cluster. The SmartSensorOperator task can fetch > hundreds of ‘sensing’ instances from sensor_instance table and poke on > behalf of them in batches. Users don’t need to change their existing > DAGs. > - The smart sensor mode currently supports NamedHivePartitionSensor and > MetastorePartitionSensor however it can easily be extended to support > more > sensor classes. > - Smart sensor mode on/off, the list of smart sensor enabled classes, > and the number of SmartSensorOperator tasks can be configured in airflow > config. > - Sensor logs in smart sensors are populated to each task instance log > UI. > > > A PR https://github.com/apache/airflow/pull/5499 is ready for review from > the committers and community. > > > This email is formally calling for a vote to accept the AIP and PR. Please > note that we will update the PR / feature to fix bugs if we find any. > > > Best > > Yingbo >