Also +1 (non-binding) on the AIP but questions on the implementation.

How would a user enable their own smart sensors? I don’t see any added 
documentation for this. It looks like they need to manually add the name of the 
class to the airflow configuration and do *something* to their sensor class, 
including override the "is_smart_sensor" method (why a method and not an 
attribute?)

Having to enable it in multiple places seems a little cumbersome, why not have 
a "BaseSmartSensor" that the user inherits from like most of the rest of 
Airflow? Sensors inherited from BaseSmartSensor would be "Smart" when smart 
sensors are enabled in the configuration and not smart when smart sensors are 
not enaled.

Damian

-----Original Message-----
From: Vikram Koka <vik...@astronomer.io> 
Sent: Friday, June 19, 2020 00:57
To: dev@airflow.apache.org
Subject: Re: [VOTE] AIP-17: Consolidate and de-duplicate sensor tasks in 
airflow Smart Sensor

+1 (non-binding) for this AIP.

I really like the concept and the efficiency improvements. The general 
SmartSensor concept and the ability to add additional sensor classes is elegant.

>From an implementation perspective, my one area of concern is the "sharding" 
>concept and the configuration / management overhead involved. I may have 
>missed it in the AIP, but would it be possible to add auto-scaling to minimize 
>this configuration?

Also, a couple of clarifying questions:
1. Do you know if this is more suitable to certain kinds of sensors vs.
others?
2. What do you think about leveraging this to enable "async" operations using 
Airflow i.e. submit a task and then use a "smart sensor" to check for 
completion?

Best regards,

Vikram




On Thu, Jun 18, 2020 at 3:38 PM Yingbo Wang <ybw...@gmail.com> wrote:

> Hello everyone!
>
> This email calls for a vote to add the airflow smart sensor at
> https://github.com/apache/airflow/pull/5499
>
> AIP:
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17%3A+Consolid
> ate+and+de-duplicate+sensor+tasks+in+airflow+Smart+Sensor
>
> Change summary:
>
>    - Add a new mode called “smart sensor mode”. In smart sensor mode,
>    instead of holding a long running process for each sensor and poking
>    periodically, a sensor will only store poke context at sensor_instance
>    table and then exits with a ‘sensing’ state.
>    - When the smart sensor mode is enabled, a special set of builtin smart
>    sensor DAGs (named smart_sensor_group_shard_xxx) is created by the 
> system;
>    These DAGs contain SmartSensorOperator task and manage the smart sensor
>    jobs for the airflow cluster. The SmartSensorOperator task can fetch
>    hundreds of ‘sensing’ instances from sensor_instance table and poke on
>    behalf of them in batches. Users don’t need to change their 
> existing DAGs.
>    - The smart sensor mode currently supports NamedHivePartitionSensor and
>    MetastorePartitionSensor however it can easily be extended to 
> support more
>    sensor classes.
>    - Smart sensor mode on/off, the list of smart sensor enabled classes,
>    and the number of SmartSensorOperator tasks can be configured in airflow
>    config.
>    - Sensor logs in smart sensors are populated to each task instance log
>    UI.
>
>
> A PR https://github.com/apache/airflow/pull/5499 is ready for review 
> from the committers and community.
>
>
> This email is formally calling for a vote to accept the AIP and PR. 
> Please note that we will update the PR / feature to fix bugs if we find any.
>
>
> Best
>
> Yingbo
>



=============================================================================== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Reply via email to