Thank you everyone! We'll keep the vote open until 2020-07-27T00:00Z.

On Thu, Jul 23, 2020 at 4:16 AM Tomasz Urbaszek <turbas...@apache.org>
wrote:

> +1 binding
>
> On Wed, Jul 22, 2020 at 4:57 PM Arthur Wiedmer <arthur.wied...@gmail.com>
> wrote:
>
> > +1 (binding)
> >
> > On Tue, Jul 21, 2020 at 11:12 PM Kevin Yang <yrql...@gmail.com> wrote:
> >
> > > Thank you everyone for the feedback! I think the PR covered what is
> being
> > > proposed in the AIP and does not have big controversial change that
> > cannot
> > > be revised easily. The feature itself has also been baked in Airbnb's
> > large
> > > Airflow setup for 6+ months. So consider this my binding +1.
> > >
> > > I would love to call for more votes from the community, esp. from the
> > > committers. As mentioned already above by Vikram, Kaxil and Jarek, this
> > > change would lay a good foundation for further work on
> onboarding/extend
> > to
> > > other operators.
> > >
> > > Release wise, what would be the implication of planning it for 2.1? If
> > that
> > > doesn't mean we block the PR from merging until 2.0 is ready then I'm
> > > totally fine.
> > >
> > >
> > > Cheers,
> > > Kevin Y
> > >
> > > On Mon, Jul 20, 2020 at 9:24 PM Yingbo Wang <ybw...@gmail.com> wrote:
> > >
> > > > It would be nice to include DAG serialization in the smart sensor
> > however
> > > > we may want to put it in a follow up PR. Cong Zhu has explained why
> it
> > > > would need more investigation to put the DAG serialization in a
> > separate
> > > PR
> > > > here
> > https://github.com/apache/airflow/pull/5499#issuecomment-657828649.
> > > >
> > > > In my opinion, the current PR is already too big and it is becoming
> > > harder
> > > > and harder for us to make the branch tracking with the master branch.
> > At
> > > > the moment, I think it makes more sense to merge this change with the
> > > > master before we make additional incremental improvement.
> > > >
> > > > Yingbo
> > > >
> > > >
> > > > On Sun, Jun 21, 2020 at 1:57 PM Jarek Potiuk <
> jarek.pot...@polidea.com
> > >
> > > > wrote:
> > > >
> > > > > I also llke the idea - it is really nice optimisation. As Kaxil
> > > > mentioned -
> > > > > it would be great to include Dag Serialisation.
> > > > >
> > > > > I think for some of the other operators that just send query and
> poll
> > > > > execution - I think once this one is implemented for sensors, we
> can
> > > > think
> > > > > about splitting some of the "operators" to support it. There was
> > > already
> > > > a
> > > > > PokeReschedule discussion
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/rc6f56234342c87f154865489e3a6555609e4b98a8c62ca4997cb6a6c%40%3Cdev.airflow.apache.org%3E
> > > > > which
> > > > > i think might benefit from this architecture - but I'd say - let's
> > > > > implement it for sensors first and then we can think about
> > implementing
> > > > it
> > > > > also for operators.
> > > > >
> > > > > One thing though - is it something that we could plan for 2.1 ? I
> > think
> > > > our
> > > > > 2.0 release gets some important features that focus on delivering
> > most
> > > > > value for Airflow users. And it keeps on slipping. I think
> > implementing
> > > > > such a change will keep some of the committers distracted (even if
> it
> > > is
> > > > > already at a PR stage). Maybe we can start prioritising new
> features
> > to
> > > > 2.1
> > > > > by default unless we have very good reason to implement it in 2.0?
> > > > >
> > > > > J.
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jun 20, 2020 at 5:37 PM Kaxil Naik <kaxiln...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Yingbo,
> > > > > >
> > > > > > I like the concept described in the AIP. I was wondering if we
> > could
> > > > > > leverage Dag Serialization (
> > > > > > https://airflow.readthedocs.io/en/latest/dag-serialization.html)
> > to
> > > > get
> > > > > > the
> > > > > > "task level fields" without re-parsing the DAGs or storing it in
> > the
> > > > new
> > > > > > table.
> > > > > >
> > > > > > And can we use some of the operators like the BigQueryOperator,
> > > > > > SparkOperator which just submits the SQL query and polls until
> > > > > completion?
> > > > > >
> > > > > > Regards,
> > > > > > Kaxil
> > > > > >
> > > > > > On Sat, Jun 20, 2020 at 8:26 AM Yingbo Wang <ybw...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Thanks everyone for the feedback. I will also add the details
> > > > mentioned
> > > > > > in
> > > > > > > this thread into the AIP
> > > > > > >
> > > > > > >
> > > > > > > Q: From an implementation perspective, my one area of concern
> is
> > > the
> > > > > > >
> > > > > > > "sharding" concept and the configuration / management overhead
> > > > > involved.
> > > > > > I
> > > > > > >
> > > > > > > may have missed it in the AIP, but would it be possible to add
> > > > > > auto-scaling
> > > > > > >
> > > > > > > to minimize this configuration?
> > > > > > >
> > > > > > > The “sharding” configuration is an integer which implies the
> > number
> > > > of
> > > > > > > concurrently running smart sensor jobs for the whole airflow
> > > > cluster. A
> > > > > > > proper sharding setting mainly depends on the following issues:
> > 1.
> > > > > > Cluster
> > > > > > > load -- how many sensor tasks need to be executed at the same
> > time.
> > > > 2.
> > > > > > How
> > > > > > > often should each sensor be poked at least once. 3. The
> response
> > > time
> > > > > > for a
> > > > > > > sensor task in the current system. As these answers may vary
> for
> > > > > > different
> > > > > > > systems we leave “sharding” as a configurable field for users
> to
> > > > > satisfy
> > > > > > > different use cases.
> > > > > > >
> > > > > > > Also, a couple of clarifying questions:
> > > > > > >
> > > > > > > 1. Do you know if this is more suitable to certain kinds of
> > sensors
> > > > vs.
> > > > > > >
> > > > > > > Others?
> > > > > > >
> > > > > > > Most sensors should be suitable for the smart sensor. Except if
> > the
> > > > > > > argument needed to initialize a sensor object is
> unserializable,
> > > > e.g. a
> > > > > > > function. Serialize more complex types other than builtin and
> > > > datetime
> > > > > is
> > > > > > > not supported right now but we are planning to add them in the
> > > > future.
> > > > > > >
> > > > > > > 2. What do you think about leveraging this to enable "async"
> > > > operations
> > > > > > >
> > > > > > > using Airflow i.e. submit a task and then use a "smart sensor"
> to
> > > > check
> > > > > > for
> > > > > > >
> > > > > > > Completion?
> > > > > > >
> > > > > > > This is a very good point. We do notice the relationship
> between
> > > > these
> > > > > > two
> > > > > > > ideas. Technically this logic should also work. The “task
> > > submission”
> > > > > map
> > > > > > > to the pre_execute() in a sensor task logic and “check for
> > > > completion”
> > > > > > map
> > > > > > > to the sensor’s poke() function. The current implementation of
> > > > > > > SubDagOperator
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/master/airflow/operators/subdag_operator.py#L144-L177
> > > > > > > >
> > > > > > > actually follows this pattern. If the operator requires no
> > > > > unserializable
> > > > > > > argument to be instantiated, we should already be able to
> > leverage
> > > > the
> > > > > > > async operation in SmartSensor for it.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Q: How would a user enable their own smart sensors? I don’t see
> > any
> > > > > added
> > > > > > > documentation for this. It looks like they need to manually add
> > the
> > > > > name
> > > > > > of
> > > > > > > the class to the airflow configuration and do *something* to
> > their
> > > > > sensor
> > > > > > > class, including override the "is_smart_sensor" method (why a
> > > method
> > > > > and
> > > > > > > not an attribute?)
> > > > > > >
> > > > > > > Having to enable it in multiple places seems a little
> cumbersome,
> > > why
> > > > > not
> > > > > > > have a "BaseSmartSensor" that the user inherits from like most
> of
> > > the
> > > > > > rest
> > > > > > > of Airflow? Sensors inherited from BaseSmartSensor would be
> > "Smart"
> > > > > when
> > > > > > > smart sensors are enabled in the configuration and not smart
> when
> > > > smart
> > > > > > > sensors are not enabled.
> > > > > > >
> > > > > > > Enabling/Disabling the smart sensor is a system level config
> > which
> > > is
> > > > > > > transparent to the individual users. An example of smart sensor
> > > > enabled
> > > > > > > cluster config is as follows:
> > > > > > >
> > > > > > > [smart_sensor]
> > > > > > >
> > > > > > > use_smart_sensor = true
> > > > > > >
> > > > > > > shard_code_upper_limit = 10000
> > > > > > >
> > > > > > > shards = 5
> > > > > > >
> > > > > > > sensor_enabled = NamedHivePartitionSensor,
> > MetastorePartitionSensor
> > > > > > >
> > > > > > >
> > > > > > > The "use_smart_sensor" config indicates if the smart sensor is
> > > > enabled.
> > > > > > The
> > > > > > > "shards" config indicates the number of concurrently running
> > smart
> > > > > sensor
> > > > > > > jobs for the airflow cluster. The "sensor_enabled" config is a
> > list
> > > > of
> > > > > > > sensor class names that will use the smart sensor.  The users
> use
> > > the
> > > > > > same
> > > > > > > class names (e.g. HivePartitionSensor) in their DAGs and they
> > don’t
> > > > > have
> > > > > > > the control to use smart sensors or not, unless they exclude
> > their
> > > > > tasks
> > > > > > > explicits.
> > > > > > >
> > > > > > >
> > > > > > > Existing DAGs don't need to be changed for enabling/disabling
> the
> > > > smart
> > > > > > > sensor.
> > > > > > >
> > > > > > >
> > > > > > > “Is_smart_sensor_compatible” is a class level configuration
> > > (instead
> > > > of
> > > > > > > instance-level) so that the system knows if a particular sensor
> > > > > operator
> > > > > > > can use the smart sensor. Currently only
> NamedHivePartitionSensor
> > > and
> > > > > > > MetastorePartitionSensor
> > > > > > > are enabled to use the smart sensor in the PR.
> > > > > > >
> > > > > > > To include other sensor operators for smart sensors that are
> not
> > > > > included
> > > > > > > in this PR:
> > > > > > >
> > > > > > >    1.
> > > > > > >
> > > > > > >    Add a class attribute "poke_context_fields" to the operator.
> > > > > > >    "poke_context_fields" include all key names used for
> > > initializing
> > > > a
> > > > > > > sensor
> > > > > > >    object.
> > > > > > >    2.
> > > > > > >
> > > > > > >    In airflow.cfg, add the operator’s classname to the session
> of
> > > > > > >    “[smart_sensor]” with the field “sensors_enabled” as
> follows.
> > > > > > >
> > > > > > >
> > > > > > > Yingbo
> > > > > > >
> > > > > > > On Fri, Jun 19, 2020 at 7:27 AM Shaw, Damian P. <
> > > > > > > damian.sha...@credit-suisse.com> wrote:
> > > > > > >
> > > > > > > > Also +1 (non-binding) on the AIP but questions on the
> > > > implementation.
> > > > > > > >
> > > > > > > > How would a user enable their own smart sensors? I don’t see
> > any
> > > > > added
> > > > > > > > documentation for this. It looks like they need to manually
> add
> > > the
> > > > > > name
> > > > > > > of
> > > > > > > > the class to the airflow configuration and do *something* to
> > > their
> > > > > > sensor
> > > > > > > > class, including override the "is_smart_sensor" method (why a
> > > > method
> > > > > > and
> > > > > > > > not an attribute?)
> > > > > > > >
> > > > > > > > Having to enable it in multiple places seems a little
> > cumbersome,
> > > > why
> > > > > > not
> > > > > > > > have a "BaseSmartSensor" that the user inherits from like
> most
> > of
> > > > the
> > > > > > > rest
> > > > > > > > of Airflow? Sensors inherited from BaseSmartSensor would be
> > > "Smart"
> > > > > > when
> > > > > > > > smart sensors are enabled in the configuration and not smart
> > when
> > > > > smart
> > > > > > > > sensors are not enaled.
> > > > > > > >
> > > > > > > > Damian
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Vikram Koka <vik...@astronomer.io>
> > > > > > > > Sent: Friday, June 19, 2020 00:57
> > > > > > > > To: dev@airflow.apache.org
> > > > > > > > Subject: Re: [VOTE] AIP-17: Consolidate and de-duplicate
> sensor
> > > > tasks
> > > > > > in
> > > > > > > > airflow Smart Sensor
> > > > > > > >
> > > > > > > > +1 (non-binding) for this AIP.
> > > > > > > >
> > > > > > > > I really like the concept and the efficiency improvements.
> The
> > > > > general
> > > > > > > > SmartSensor concept and the ability to add additional sensor
> > > > classes
> > > > > is
> > > > > > > > elegant.
> > > > > > > >
> > > > > > > > From an implementation perspective, my one area of concern is
> > the
> > > > > > > > "sharding" concept and the configuration / management
> overhead
> > > > > > involved.
> > > > > > > I
> > > > > > > > may have missed it in the AIP, but would it be possible to
> add
> > > > > > > auto-scaling
> > > > > > > > to minimize this configuration?
> > > > > > > >
> > > > > > > > Also, a couple of clarifying questions:
> > > > > > > > 1. Do you know if this is more suitable to certain kinds of
> > > sensors
> > > > > vs.
> > > > > > > > others?
> > > > > > > > 2. What do you think about leveraging this to enable "async"
> > > > > operations
> > > > > > > > using Airflow i.e. submit a task and then use a "smart
> sensor"
> > to
> > > > > check
> > > > > > > for
> > > > > > > > completion?
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Vikram
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jun 18, 2020 at 3:38 PM Yingbo Wang <
> ybw...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello everyone!
> > > > > > > > >
> > > > > > > > > This email calls for a vote to add the airflow smart sensor
> > at
> > > > > > > > > https://github.com/apache/airflow/pull/5499
> > > > > > > > >
> > > > > > > > > AIP:
> > > > > > > > >
> > > > > > > > >
> > > > > >
> > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17%3A+Consolid
> > > > > > > > > ate+and+de-duplicate+sensor+tasks+in+airflow+Smart+Sensor
> > > > > > > > >
> > > > > > > > > Change summary:
> > > > > > > > >
> > > > > > > > >    - Add a new mode called “smart sensor mode”. In smart
> > sensor
> > > > > mode,
> > > > > > > > >    instead of holding a long running process for each
> sensor
> > > and
> > > > > > poking
> > > > > > > > >    periodically, a sensor will only store poke context at
> > > > > > > sensor_instance
> > > > > > > > >    table and then exits with a ‘sensing’ state.
> > > > > > > > >    - When the smart sensor mode is enabled, a special set
> of
> > > > > builtin
> > > > > > > > smart
> > > > > > > > >    sensor DAGs (named smart_sensor_group_shard_xxx) is
> > created
> > > by
> > > > > the
> > > > > > > > > system;
> > > > > > > > >    These DAGs contain SmartSensorOperator task and manage
> the
> > > > smart
> > > > > > > > sensor
> > > > > > > > >    jobs for the airflow cluster. The SmartSensorOperator
> task
> > > can
> > > > > > fetch
> > > > > > > > >    hundreds of ‘sensing’ instances from sensor_instance
> table
> > > and
> > > > > > poke
> > > > > > > on
> > > > > > > > >    behalf of them in batches. Users don’t need to change
> > their
> > > > > > > > > existing DAGs.
> > > > > > > > >    - The smart sensor mode currently supports
> > > > > > NamedHivePartitionSensor
> > > > > > > > and
> > > > > > > > >    MetastorePartitionSensor however it can easily be
> extended
> > > to
> > > > > > > > > support more
> > > > > > > > >    sensor classes.
> > > > > > > > >    - Smart sensor mode on/off, the list of smart sensor
> > enabled
> > > > > > > classes,
> > > > > > > > >    and the number of SmartSensorOperator tasks can be
> > > configured
> > > > in
> > > > > > > > airflow
> > > > > > > > >    config.
> > > > > > > > >    - Sensor logs in smart sensors are populated to each
> task
> > > > > instance
> > > > > > > log
> > > > > > > > >    UI.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > A PR https://github.com/apache/airflow/pull/5499 is ready
> > for
> > > > > review
> > > > > > > > > from the committers and community.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > This email is formally calling for a vote to accept the AIP
> > and
> > > > PR.
> > > > > > > > > Please note that we will update the PR / feature to fix
> bugs
> > if
> > > > we
> > > > > > find
> > > > > > > > any.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best
> > > > > > > > >
> > > > > > > > > Yingbo
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ===============================================================================
> > > > > > > >
> > > > > > > > Please access the attached hyperlink for an important
> > electronic
> > > > > > > > communications disclaimer:
> > > > > > > >
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ===============================================================================
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >
> > > > > M: +48 660 796 129 <+48660796129>
> > > > > [image: Polidea] <https://www.polidea.com/>
> > > > >
> > > >
> > >
> >
>

Reply via email to