+1 (non-binding) for this AIP.

I really like the concept and the efficiency improvements. The general
SmartSensor concept and the ability to add additional sensor classes is
elegant.

>From an implementation perspective, my one area of concern is the
"sharding" concept and the configuration / management overhead involved. I
may have missed it in the AIP, but would it be possible to add auto-scaling
to minimize this configuration?

Also, a couple of clarifying questions:
1. Do you know if this is more suitable to certain kinds of sensors vs.
others?
2. What do you think about leveraging this to enable "async" operations
using Airflow i.e. submit a task and then use a "smart sensor" to check for
completion?

Best regards,

Vikram




On Thu, Jun 18, 2020 at 3:38 PM Yingbo Wang <ybw...@gmail.com> wrote:

> Hello everyone!
>
> This email calls for a vote to add the airflow smart sensor at
> https://github.com/apache/airflow/pull/5499
>
> AIP:
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17%3A+Consolidate+and+de-duplicate+sensor+tasks+in+airflow+Smart+Sensor
>
> Change summary:
>
>    - Add a new mode called “smart sensor mode”. In smart sensor mode,
>    instead of holding a long running process for each sensor and poking
>    periodically, a sensor will only store poke context at sensor_instance
>    table and then exits with a ‘sensing’ state.
>    - When the smart sensor mode is enabled, a special set of builtin smart
>    sensor DAGs (named smart_sensor_group_shard_xxx) is created by the
> system;
>    These DAGs contain SmartSensorOperator task and manage the smart sensor
>    jobs for the airflow cluster. The SmartSensorOperator task can fetch
>    hundreds of ‘sensing’ instances from sensor_instance table and poke on
>    behalf of them in batches. Users don’t need to change their existing
> DAGs.
>    - The smart sensor mode currently supports NamedHivePartitionSensor and
>    MetastorePartitionSensor however it can easily be extended to support
> more
>    sensor classes.
>    - Smart sensor mode on/off, the list of smart sensor enabled classes,
>    and the number of SmartSensorOperator tasks can be configured in airflow
>    config.
>    - Sensor logs in smart sensors are populated to each task instance log
>    UI.
>
>
> A PR https://github.com/apache/airflow/pull/5499 is ready for review from
> the committers and community.
>
>
> This email is formally calling for a vote to accept the AIP and PR. Please
> note that we will update the PR / feature to fix bugs if we find any.
>
>
> Best
>
> Yingbo
>

Reply via email to