Thanks for the update Vincent!

I also referenced a comment earlier about leveraging patterns for the 80-20
use cases in the poll-model.
However, after seeing how you have defined the syntax for "Asset watchers",
I believe that's already covered in your proposal.
So, all good from my perspective.

Vikram


On Fri, Aug 2, 2024 at 12:29 PM Vincent Beck <vincb...@apache.org> wrote:

> Hello everyone,
>
> After multiple conversations with different actors, I decided to move the
> push based event scheduling out of scope of this AIP. I could not find a
> solution for the push based approach that was satisfactory and I think more
> investigation and data from Airflow users is needed to move on with this
> approach. In order to not block the poll based event scheduling, which did
> not receive any concern, I decided to focus AIP-82 on the poll-based event
> scheduling. I updated the AIP accordingly.
>
> Since the poll-based approach did not receive any concerns (and many
> positive feedbacks), I'll start voting in another thread.
>
> Thanks for your feedbacks!
>
> Vincent
>
> On 2024/08/02 00:50:35 Pavankumar Gopidesu wrote:
> > Thanks Vincent and Kaxil, Agree it can be added to future work.
> >
> > Looking at discussion on event mapping, Today we know the source systems
> > and we are aware of the schemas from source systems (S3, SQS, Event Arc
> > etc;)
> >
> > However, as new services emerge and their schemas evolve, this may
> change.
> >
> > Agree with Kaxil has mentioned that transformers are very effective in
> > converting the required format before sending data to target systems.
> >
> > Is it a good idea to define a schema for events to enforce from airflow?
> > There are some open standards, such as CloudEvents[1], that we could
> > consider.
> >
> > This approach would allow us to maintain versions (v1, v2, ... vn) as we
> > evolve, providing utilities that users can easily plug into and send
> events
> > to Airflow.
> >
> > It would also simplify maintenance within the Airflow system to handle
> > incoming events because we get a known version of the schema from
> > user request.
> >
> >
> > If this is already being addressed, please disregard my comments.
> >
> >
> > [1]: https://github.com/cloudevents/spec
> >
> >
> > Regards,
> >
> > Pavan
> >
> > On Fri, Aug 2, 2024 at 1:40 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > Yes but the big difference is you will create a single user for
> EventBridge
> > > since that is the one sending request to Airflow, single user for
> EventArc
> > > for GCP and 1 user for every other EventListener or application ---- as
> > > compared to 1 user per type of Payload (since Airflow will need to
> > > understand the payload of the original source). So in that case, you
> will
> > > have 1 user,function mapping for S3, 1user,function mapping for
> Redshift,
> > > and it goes on. The former approach is also consistent with our
> Connection
> > > model, where we have one standardized AWS & GCP connection that works
> for
> > > most, if not all, services.
> > >
> > > but the user will have to be added anyway (some kind of
> > > > service account - because the API needs to be authorized - that part
> is
> > > not
> > > > changed). (unless of course you want to use the same user for all
> kinds
> > > of
> > > > external interfaces, which for security point of view is a very bad
> idea
> > > -
> > > > each external system should have their own "service account" - that
> 's
> > > the
> > > > best practices from the security point of view.
> > >
> > >
> > > On Fri, 2 Aug 2024 at 01:22, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >
> > > > I am all for it - if we want to stick to event bridge or similar and
> > > > recommend it to our users, it's perfectly fine for me, It would be
> great
> > > > however to add a documentation explaining the steps and some
> examples -
> > > > ideally for most of our providers and "standard" ways of triggering
> such
> > > an
> > > > event. This is even what I proposed originally whent the first
> version of
> > > > the document was created ( to just document how to map the events
> > > > externally).
> > > >
> > > > BTW. Yes - in this case you need to implement the logic in the
> > > > event bridge, but the user will have to be added anyway (some kind of
> > > > service account - because the API needs to be authorized - that part
> is
> > > not
> > > > changed). (unless of course you want to use the same user for all
> kinds
> > > of
> > > > external interfaces, which for security point of view is a very bad
> idea
> > > -
> > > > each external system should have their own "service account" - that
> 's
> > > the
> > > > best practices from the security point of view.
> > > >
> > > > J,
> > > >
> > > > On Fri, Aug 2, 2024 at 2:01 AM Kaxil Naik <kaxiln...@gmail.com>
> wrote:
> > > >
> > > > > I was discussing this with Vincent. In either case, same as now or
> the
> > > > one
> > > > > proposed in the AIP, a user will have to use something like AWS
> > > > > EventBridge[1] or GCS EventArc [2] where users will consume the
> event
> > > > from
> > > > > object storage (S3 object creation for example), and then they will
> > > have
> > > > to
> > > > > add the Airflow's Create Dataset endpoint to EventBridge [3]. Now,
> if
> > > you
> > > > > just customize the payload to build the URI which is allowed
> (either
> > > via
> > > > > GET / POST) in eventbridge, it works right now. However, with the
> > > current
> > > > > proposal: a user will have to create a new user in Airflow and some
> > > > mapping
> > > > > to a function (that is either in the provider or a new user-defined
> > > > > function) that can understand this specific payload, in this
> example
> > > the
> > > > > payload for S3 events. This will become huge because this means
> that
> > > for
> > > > > each payload, we will have to provide a new function and keep it
> > > updated.
> > > > > From users POV, they will need to create a new user every time for
> a
> > > new
> > > > > service (S3, Redshift, SNS, Bedrock etc). This will again likely
> have
> > > to
> > > > go
> > > > > to the Auth manager backend. Compared to what's available today --
> i.e.
> > > > > building a URI & extra metadata that can not only work with
> EventBridge
> > > > or
> > > > > Eventarc but by any service.
> > > > >
> > > > > Since we already have to use things like EventBridge or EventArc
> for
> > > > > managed service providers to transform the event, it fits well
> with the
> > > > > existing approach. AWS Blog [3] even has a similar example for
> Datadog
> > > > > where they use input transformer "{"detail":"$.detail"}" before
> sending
> > > > it
> > > > > to Datadog's API.
> > > > > >"Having producer of an event generating even in their standard
> way,
> > > make
> > > > > it easy for airflow to consume it as a dataset event without
> external
> > > > > entities":
> > > > >
> > > > >
> > > > > [1]: https://aws.amazon.com/eventbridge/ |
> > > > >
> https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventBridge.html
> > > > > [2]: https://cloud.google.com/eventarc/docs
> > > > > [3]:
> > > > >
> > > > >
> > > >
> > >
> https://aws.amazon.com/blogs/compute/using-api-destinations-with-amazon-eventbridge/
> > > > >
> > > > > On Fri, 2 Aug 2024 at 00:14, Jarek Potiuk <ja...@potiuk.com>
> wrote:
> > > > >
> > > > > > I proposed the mapping - because it's the easiest way (I think)
> to
> > > map
> > > > > > between the "native" source to "airflow" target expectations.
> There
> > > are
> > > > > > many producers of such events, and Airflow is the consumer. And
> it
> > > > seems
> > > > > > appropriate to have a way for our users to easily plug events
> > > produced
> > > > > from
> > > > > > one system into our "events" API - without having to employ
> external
> > > > > > "mapper" (say lambda) doing the conversion. While I think it is
> > > indeed
> > > > "a
> > > > > > bit odd", it's a solution that might leverage most of what we
> have -
> > > > > > authorisation and API exposure via "user" in API.
> > > > > >
> > > > > > While I - myself - find it it a bit unusual, I think it might do
> the
> > > > job,
> > > > > > But I wonder if there is any alternative solution to the problem
> of
> > > > > "Having
> > > > > > producer of an event generating even in their standard way, make
> it
> > > > easy
> > > > > > for airflow to consume it as a dataset event without external
> > > > entities".
> > > > > >
> > > > > > On Fri, Aug 2, 2024 at 12:57 AM Kaxil Naik <kaxiln...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I would love for VOTE to get started on this one. I think most
> of
> > > the
> > > > > > > commenters and those who replied to this email are happy with
> the
> > > > > > proposal
> > > > > > > on the poll-based approach.
> > > > > > >
> > > > > > > Regarding the push-based approach, I am not convinced that the
> > > > proposed
> > > > > > > implementation has any gains over what's already available
> with the
> > > > > > Dataset
> > > > > > > Event Create API; the one user-to-one function mapping is an
> odd
> > > user
> > > > > > > experience. I'm curious to hear what others think.
> > > > > > >
> > > > > > > On Thu, 1 Aug 2024 at 17:39, Kaxil Naik <kaxiln...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > I agree with both of you that it is indeed a good idea and
> that
> > > it
> > > > > can
> > > > > > be
> > > > > > > > added in Future work -- doesn't need to be part of this AIP.
> > > > > > > >
> > > > > > > > Thanks for the interest. I was not aware of such feature and
> this
> > > > > looks
> > > > > > > >> really cool! I definitely think that can be useful for
> Airflow,
> > > > > > > especially
> > > > > > > >> for testing when you can easily replay events received in
> the
> > > > past.
> > > > > > > >> However, I do not think it should be part of the AIP and,
> as you
> > > > > > > mentioned,
> > > > > > > >> if should be a future work or a follow-up item of the AIP.
> > > Please
> > > > > let
> > > > > > me
> > > > > > > >> know if you (or anyone) disagree with this and we can talk
> about
> > > > it.
> > > > > > > >> Otherwise I'll update the future work section of the AIP and
> > > > mention
> > > > > > > this
> > > > > > > >> archive and replay feature.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, 1 Aug 2024 at 16:11, Vincent Beck <
> vincb...@apache.org>
> > > > > wrote:
> > > > > > > >
> > > > > > > >> Hey Pavan,
> > > > > > > >>
> > > > > > > >> Thanks for the interest. I was not aware of such feature and
> > > this
> > > > > > looks
> > > > > > > >> really cool! I definitely think that can be useful for
> Airflow,
> > > > > > > especially
> > > > > > > >> for testing when you can easily replay events received in
> the
> > > > past.
> > > > > > > >> However, I do not think it should be part of the AIP and,
> as you
> > > > > > > mentioned,
> > > > > > > >> if should be a future work or a follow-up item of the AIP.
> > > Please
> > > > > let
> > > > > > me
> > > > > > > >> know if you (or anyone) disagree with this and we can talk
> about
> > > > it.
> > > > > > > >> Otherwise I'll update the future work section of the AIP and
> > > > mention
> > > > > > > this
> > > > > > > >> archive and replay feature.
> > > > > > > >>
> > > > > > > >> On 2024/08/01 01:21:58 Pavankumar Gopidesu wrote:
> > > > > > > >> > Thanks Vincent, I took a look , this is really good. Don't
> > > have
> > > > > > access
> > > > > > > >> to
> > > > > > > >> > the confluence page to comment :) so adding it here.
> > > > > > > >> >
> > > > > > > >> > As events arrive-->do somework-->end.
> > > > > > > >> >
> > > > > > > >> > So I'm uncertain if my comment pertains to the current
> > > poll/push
> > > > > > model
> > > > > > > >> or
> > > > > > > >> > if it fits part of future work(seen event batching ).
> > > > > > > >> >
> > > > > > > >> > Have you given any thought to the event archival
> mechanism and
> > > > > event
> > > > > > > >> > replay? This could significantly aid in testing and
> recovery
> > > of
> > > > > > > workflow
> > > > > > > >> > and testing new functionality with events by just replay
> the
> > > > > events.
> > > > > > > The
> > > > > > > >> > archival mechanism I am referring to is similar to today
> in
> > > AWS
> > > > we
> > > > > > > have
> > > > > > > >> > Event Bridge Archive and Replay.
> > > > > > > >> >
> > > > > > > >> > Regards,
> > > > > > > >> > Pavan
> > > > > > > >> >
> > > > > > > >> > On Thu, Aug 1, 2024 at 1:29 AM Kaxil Naik <
> > > kaxiln...@gmail.com>
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > I actually did manage to take a look, thanks for the
> work. I
> > > > am
> > > > > +1
> > > > > > > on
> > > > > > > >> the
> > > > > > > >> > > poll-based approach -- left a comment on the
> push-based: I
> > > am
> > > > > not
> > > > > > > >> sure of
> > > > > > > >> > > why we need a function since create asset event API
> endpoint
> > > > > > should
> > > > > > > >> have
> > > > > > > >> > > all info needed for what the Asset was.
> > > > > > > >> > >
> > > > > > > >> > > On Thu, 1 Aug 2024 at 01:14, Kaxil Naik <
> > > kaxiln...@gmail.com>
> > > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Thanks Vincent, I will take a look again tomorrow.
> > > > > > > >> > > >
> > > > > > > >> > > > On Tue, 30 Jul 2024 at 18:47, Vincent Beck <
> > > > > vincb...@apache.org
> > > > > > >
> > > > > > > >> wrote:
> > > > > > > >> > > >
> > > > > > > >> > > >> Hi everyone,
> > > > > > > >> > > >>
> > > > > > > >> > > >> I updated the AIP-82 given the different comments and
> > > > > concerns
> > > > > > I
> > > > > > > >> > > >> received. I also tried to reply to all comments
> > > > > individually. I
> > > > > > > >> would
> > > > > > > >> > > >> really appreciate if you can do a second pass and
> let me
> > > > know
> > > > > > > what
> > > > > > > >> you
> > > > > > > >> > > >> think. Overall, this is what I changed in the AIP:
> > > > > > > >> > > >>
> > > > > > > >> > > >> - Push based event-driven scheduling. I updated this
> > > > section
> > > > > > > >> entirely
> > > > > > > >> > > >> because I received many concerns about the previous
> > > > proposal.
> > > > > > The
> > > > > > > >> > > overall
> > > > > > > >> > > >> idea now is to leverage the create asset event API
> > > endpoint
> > > > > to
> > > > > > > send
> > > > > > > >> > > >> notifications from external (e.g. cloud provider) to
> > > > Airflow
> > > > > > > >> > > environment.
> > > > > > > >> > > >>
> > > > > > > >> > > >> - I updated the poll based event-driven scheduling
> DAG
> > > > author
> > > > > > > >> experience
> > > > > > > >> > > >> to use a message queue scenario. I understood that
> this
> > > is
> > > > > > > >> probably the
> > > > > > > >> > > >> main use case we are trying to cover with this AIP,
> thus
> > > I
> > > > > used
> > > > > > > it
> > > > > > > >> as
> > > > > > > >> > > >> example and mentioned it multiple times across the
> AIP.
> > > > > > > >> > > >>
> > > > > > > >> > > >> Thanks again for your time :)
> > > > > > > >> > > >>
> > > > > > > >> > > >>
> > > > > > > >> > > >>
> > > > > > > >> > >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow
> > > > > > > >> > > >>
> > > > > > > >> > > >> Vincent
> > > > > > > >> > > >>
> > > > > > > >> > > >> On 2024/07/29 15:58:23 Vincent Beck wrote:
> > > > > > > >> > > >> > Thanks a lot all for the comments, this is very
> much
> > > > > > > >> appreciated! I
> > > > > > > >> > > >> received many comments from this thread and in
> > > confluence,
> > > > > > thanks
> > > > > > > >> again.
> > > > > > > >> > > >> I'll try to address them all in the AIP and will
> send an
> > > > > email
> > > > > > in
> > > > > > > >> this
> > > > > > > >> > > >> thread once done. I will most likely revisit the
> > > push-based
> > > > > > > >> approach
> > > > > > > >> > > given
> > > > > > > >> > > >> the number of concerns I received, thanks Jarek for
> > > > proposing
> > > > > > > >> another
> > > > > > > >> > > >> solution, I'll probably go down that path.
> > > > > > > >> > > >> >
> > > > > > > >> > > >> > One follow-up question Vikram.
> > > > > > > >> > > >> >
> > > > > > > >> > > >> > > The bespoke triggerer approach completely makes
> sense
> > > > for
> > > > > > the
> > > > > > > >> long
> > > > > > > >> > > >> tail here, but can we do better for the 20% of
> scenarios
> > > > > which
> > > > > > > >> cover
> > > > > > > >> > > well
> > > > > > > >> > > >> over 80% of usage here is the question in my mind.
> Or,
> > > are
> > > > > you
> > > > > > > >> thinking
> > > > > > > >> > > of
> > > > > > > >> > > >> those as being covered in the "push" model?
> > > > > > > >> > > >> >
> > > > > > > >> > > >> > Could you share more details about what is this
> "20% of
> > > > > > > scenarios
> > > > > > > >> > > which
> > > > > > > >> > > >> cover well over 80% of usage" please?
> > > > > > > >> > > >> >
> > > > > > > >> > > >> > Vincent
> > > > > > > >> > > >> >
> > > > > > > >> > > >> > On 2024/07/29 15:37:50 Kaxil Naik wrote:
> > > > > > > >> > > >> > > Thanks Vincent for driving these, I have added my
> > > > > comments
> > > > > > to
> > > > > > > >> the
> > > > > > > >> > > AIP
> > > > > > > >> > > >> too.
> > > > > > > >> > > >> > >
> > > > > > > >> > > >> > > Regards,
> > > > > > > >> > > >> > > Kaxil
> > > > > > > >> > > >> > >
> > > > > > > >> > > >> > > On Fri, 26 Jul 2024 at 20:16, Scheffler Jens
> > > > > > > (XC-AS/EAE-ADA-T)
> > > > > > > >> > > >> > > <jens.scheff...@de.bosch.com.invalid> wrote:
> > > > > > > >> > > >> > >
> > > > > > > >> > > >> > > > +1 on the comments of Vikram and Jarek, added
> main
> > > > > points
> > > > > > > on
> > > > > > > >> > > >> confluence
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > > > Sent from Outlook for iOS<
> https://aka.ms/o0ukef>
> > > > > > > >> > > >> > > > ________________________________
> > > > > > > >> > > >> > > > From: Vikram Koka <vik...@astronomer.io.INVALID
> >
> > > > > > > >> > > >> > > > Sent: Friday, July 26, 2024 8:46:55 PM
> > > > > > > >> > > >> > > > To: dev@airflow.apache.org <
> dev@airflow.apache.org
> > > >
> > > > > > > >> > > >> > > > Subject: Re: [DISCUSS] External event driven
> > > > scheduling
> > > > > > in
> > > > > > > >> Airflow
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > > > Vincent,
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > > > Thanks for writing this up. The overview looks
> > > really
> > > > > > good!
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > > > I will leave my comments in the AIP as well,
> but
> > > at a
> > > > > > high
> > > > > > > >> level
> > > > > > > >> > > >> they are
> > > > > > > >> > > >> > > > both relatively focused on the "how", rather
> than
> > > the
> > > > > > > "what".
> > > > > > > >> > > >> > > > With respect to the pull / polling approach, I
> > > > > completely
> > > > > > > >> agree
> > > > > > > >> > > >> that some
> > > > > > > >> > > >> > > > incarnation of this is needed.
> > > > > > > >> > > >> > > > I am less certain as to how on this part. The
> > > bespoke
> > > > > > > >> triggerer
> > > > > > > >> > > >> approach
> > > > > > > >> > > >> > > > completely makes sense for the long tail here,
> but
> > > > can
> > > > > we
> > > > > > > do
> > > > > > > >> > > better
> > > > > > > >> > > >> for the
> > > > > > > >> > > >> > > > 20% of scenarios which cover well over 80% of
> usage
> > > > > here
> > > > > > is
> > > > > > > >> the
> > > > > > > >> > > >> question in
> > > > > > > >> > > >> > > > my mind. Or, are you thinking of those as being
> > > > covered
> > > > > > in
> > > > > > > >> the
> > > > > > > >> > > >> "push"
> > > > > > > >> > > >> > > > model?
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > > > Which leads to the "push" model approach.
> > > > > > > >> > > >> > > > I am struggling with the same question that
> Jarek
> > > > > raised
> > > > > > > here
> > > > > > > >> > > about
> > > > > > > >> > > >> whether
> > > > > > > >> > > >> > > > we need a new Airflow entity over and beyond
> the
> > > > > existing
> > > > > > > >> REST API
> > > > > > > >> > > >> for the
> > > > > > > >> > > >> > > > same.
> > > > > > > >> > > >> > > > I am concerned about this becoming a vector of
> > > attack
> > > > > on
> > > > > > > >> Airflow.
> > > > > > > >> > > >> > > > I see that this is a hot topic of discussion
> in the
> > > > > > > >> Confluence doc
> > > > > > > >> > > >> as well,
> > > > > > > >> > > >> > > > but wanted to summarize here as well, so it
> didn't
> > > > get
> > > > > > lost
> > > > > > > >> in the
> > > > > > > >> > > >> threads
> > > > > > > >> > > >> > > > of comments.
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > > > Best regards,
> > > > > > > >> > > >> > > > Vikram
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > > > On Fri, Jul 26, 2024 at 5:16 AM Jarek Potiuk <
> > > > > > > >> ja...@potiuk.com>
> > > > > > > >> > > >> wrote:
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > > > > Thanks Vincent. I took a look and I have a
> > > general
> > > > > > > >> comment. I
> > > > > > > >> > > >> > > > > strongly think external driven scheduling is
> > > really
> > > > > > > needed
> > > > > > > >> -
> > > > > > > >> > > >> especially,
> > > > > > > >> > > >> > > > it
> > > > > > > >> > > >> > > > > should be much easier for a user to "plug-in"
> > > such
> > > > an
> > > > > > > >> external
> > > > > > > >> > > >> event to
> > > > > > > >> > > >> > > > > Airflow. And there are two parts of it - as
> > > > correctly
> > > > > > > >> stated
> > > > > > > >> > > >> there - pull
> > > > > > > >> > > >> > > > > and push.
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > > For the pull - I think it would be great to
> have
> > > a
> > > > > kind
> > > > > > > of
> > > > > > > >> > > >> specialized
> > > > > > > >> > > >> > > > > Triggers that will be started when DAG is
> parsed
> > > -
> > > > > and
> > > > > > > >> those
> > > > > > > >> > > >> Triggers
> > > > > > > >> > > >> > > > could
> > > > > > > >> > > >> > > > > generate the events for DAGs. I think
> basically
> > > > > that's
> > > > > > > all
> > > > > > > >> that
> > > > > > > >> > > is
> > > > > > > >> > > >> > > > needed,
> > > > > > > >> > > >> > > > > for example I imagine a pubsub trigger that
> will
> > > > > > > subscribe
> > > > > > > >> to
> > > > > > > >> > > >> messages
> > > > > > > >> > > >> > > > > coming on the pubsub queue and fire "Asset"
> event
> > > > > when
> > > > > > a
> > > > > > > >> message
> > > > > > > >> > > >> is
> > > > > > > >> > > >> > > > > received. Not much controversy there - I am
> not
> > > > sure
> > > > > > > about
> > > > > > > >> the
> > > > > > > >> > > >> polling
> > > > > > > >> > > >> > > > > thing , because I've always believed that
> when
> > > > > > > >> "asyncio-native"
> > > > > > > >> > > >> Trigger
> > > > > > > >> > > >> > > > is
> > > > > > > >> > > >> > > > > run in the asyncio event loop, we do not
> "poll"
> > > > every
> > > > > > > >> second or
> > > > > > > >> > > >> so (but
> > > > > > > >> > > >> > > > > maybe this is just coming from some specific
> > > > triggers
> > > > > > > that
> > > > > > > >> > > >> actually do
> > > > > > > >> > > >> > > > > such regular poll. But yes - there are polls
> > > like
> > > > > > > running
> > > > > > > >> > > select
> > > > > > > >> > > >> on the
> > > > > > > >> > > >> > > > DB
> > > > > > > >> > > >> > > > > that cannot be easily "async-ed" so having a
> > > > > > configurable
> > > > > > > >> > > polling
> > > > > > > >> > > >> time
> > > > > > > >> > > >> > > > > would be good there (but I am not sure maybe
> it's
> > > > > even
> > > > > > > >> possible
> > > > > > > >> > > >> today). I
> > > > > > > >> > > >> > > > > think this would be really great if we have
> that
> > > > > > option,
> > > > > > > >> because
> > > > > > > >> > > >> it makes
> > > > > > > >> > > >> > > > > it much easier to set up the authorization
> for
> > > > > Airlfow
> > > > > > > >> users -
> > > > > > > >> > > >> rather
> > > > > > > >> > > >> > > > than
> > > > > > > >> > > >> > > > > setting up authorization and REST calls
> coming
> > > from
> > > > > an
> > > > > > > >> external
> > > > > > > >> > > >> system,
> > > > > > > >> > > >> > > > we
> > > > > > > >> > > >> > > > > can utilize Connections of Airlfow to
> authorize
> > > > such
> > > > > a
> > > > > > > >> Trigger
> > > > > > > >> > > to
> > > > > > > >> > > >> > > > subscribe
> > > > > > > >> > > >> > > > > to events.
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > > For the push proposal -  as I read the
> proposal,
> > > > the
> > > > > > main
> > > > > > > >> point
> > > > > > > >> > > >> behind it
> > > > > > > >> > > >> > > > > is rather than users having to write
> "Airflow"
> > > way
> > > > of
> > > > > > > >> triggering
> > > > > > > >> > > >> events
> > > > > > > >> > > >> > > > and
> > > > > > > >> > > >> > > > > configuring authentication (using REST API)
> to
> > > > > generate
> > > > > > > >> asset
> > > > > > > >> > > >> events, is
> > > > > > > >> > > >> > > > to
> > > > > > > >> > > >> > > > > make Airflow natively understand external
> ways of
> > > > > > pushing
> > > > > > > >> - and
> > > > > > > >> > > >> > > > effectively
> > > > > > > >> > > >> > > > > authorizing and mapping such incoming
> > > unauthorized
> > > > > > > >> requests into
> > > > > > > >> > > >> event
> > > > > > > >> > > >> > > > that
> > > > > > > >> > > >> > > > > could be generated by an API REST call.
> > > > > > > >> > > >> > > > > I am not really sure honestly if this is
> > > something
> > > > > that
> > > > > > > we
> > > > > > > >> want
> > > > > > > >> > > as
> > > > > > > >> > > >> > > > > "running" in airlfow as an endpoint. I'd say
> such
> > > > an
> > > > > > > >> > > unauthorised
> > > > > > > >> > > >> > > > endpoint
> > > > > > > >> > > >> > > > > is probably not a good idea - for a variety
> of
> > > > > reasons,
> > > > > > > >> mostly
> > > > > > > >> > > >> security.
> > > > > > > >> > > >> > > > > And as I understand the goal is that users
> can
> > > > easily
> > > > > > > >> point at
> > > > > > > >> > > >> > > > "3rd-party"
> > > > > > > >> > > >> > > > > notification to Airflow and get the event
> > > > generated.
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > > My feeling is that while this is needed - it
> > > should
> > > > > be
> > > > > > > >> > > >> externalised from
> > > > > > > >> > > >> > > > > airlfow webserver. The authorization has to
> be
> > > set
> > > > up
> > > > > > > >> anyway
> > > > > > > >> > > >> > > > additionally -
> > > > > > > >> > > >> > > > > unlike in "poll" case - we cannot use
> Connections
> > > > for
> > > > > > > >> > > authorizing
> > > > > > > >> > > >> > > > (because
> > > > > > > >> > > >> > > > > it's not Airlfow that authorizes in an
> external
> > > > > system
> > > > > > -
> > > > > > > >> it's
> > > > > > > >> > > the
> > > > > > > >> > > >> other
> > > > > > > >> > > >> > > > way
> > > > > > > >> > > >> > > > > round). So we have to anyhow setup "something
> > > > extra"
> > > > > in
> > > > > > > >> Airflow
> > > > > > > >> > > to
> > > > > > > >> > > >> > > > > authorize the external system. Which could be
> > > what
> > > > we
> > > > > > > have
> > > > > > > >> now -
> > > > > > > >> > > >> user
> > > > > > > >> > > >> > > > that
> > > > > > > >> > > >> > > > > allows us to trigger the event. Which means
> that
> > > > our
> > > > > > REST
> > > > > > > >> API
> > > > > > > >> > > >> could
> > > > > > > >> > > >> > > > > potentially be used the same way it is now,
> but
> > > we
> > > > > will
> > > > > > > >> need
> > > > > > > >> > > >> "something"
> > > > > > > >> > > >> > > > > (library, lambda function etc.) that users
> could
> > > > > easily
> > > > > > > >> setup in
> > > > > > > >> > > >> the
> > > > > > > >> > > >> > > > > external system to map whatever trigger they
> > > > generate
> > > > > > > >> natively
> > > > > > > >> > > >> (say S3
> > > > > > > >> > > >> > > > file
> > > > > > > >> > > >> > > > > created) to Airflow REST API.
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > > As I see it - this is quite often used (and
> very
> > > > > > > >> practical, that
> > > > > > > >> > > >> you
> > > > > > > >> > > >> > > > deploy
> > > > > > > >> > > >> > > > > a cloud function or lambda that subscribes
> on the
> > > > > event
> > > > > > > >> received
> > > > > > > >> > > >> when
> > > > > > > >> > > >> > > > > S3/GCS is created. So it would be on the
> user to
> > > > > deploy
> > > > > > > >> such a
> > > > > > > >> > > >> lambda -
> > > > > > > >> > > >> > > > but
> > > > > > > >> > > >> > > > > we **could** provide a library of those: say
> s3
> > > > > lambda,
> > > > > > > gcp
> > > > > > > >> > > cloud
> > > > > > > >> > > >> > > > function
> > > > > > > >> > > >> > > > > in respective providers - with documentation
> how
> > > to
> > > > > set
> > > > > > > >> them up,
> > > > > > > >> > > >> and how
> > > > > > > >> > > >> > > > to
> > > > > > > >> > > >> > > > > configure authorization and we would be
> generally
> > > > > > "done".
> > > > > > > >> I am
> > > > > > > >> > > >> just not
> > > > > > > >> > > >> > > > > sure if we need a new entity in Airflow for
> that
> > > > > (Event
> > > > > > > >> > > >> receiver). It
> > > > > > > >> > > >> > > > feels
> > > > > > > >> > > >> > > > > like it asks Airflow to take more
> responsibility,
> > > > > when
> > > > > > we
> > > > > > > >> all
> > > > > > > >> > > >> think on
> > > > > > > >> > > >> > > > what
> > > > > > > >> > > >> > > > > to "remove" from Airflow rather than "add"
> to it
> > > -
> > > > > > > >> especially
> > > > > > > >> > > >> when it
> > > > > > > >> > > >> > > > comes
> > > > > > > >> > > >> > > > > to external integrations. It feels to me that
> > > > Airflow
> > > > > > > >> should
> > > > > > > >> > > make
> > > > > > > >> > > >> it easy
> > > > > > > >> > > >> > > > > to be triggered by such an external system
> and
> > > make
> > > > > it
> > > > > > > >> easy to
> > > > > > > >> > > >> "map" to
> > > > > > > >> > > >> > > > the
> > > > > > > >> > > >> > > > > way we expect to get events triggered, but
> this
> > > > > should
> > > > > > be
> > > > > > > >> done
> > > > > > > >> > > >> outside of
> > > > > > > >> > > >> > > > > Airflow. If the users can easily find in our
> docs
> > > > > when
> > > > > > > they
> > > > > > > >> > > >> search "what
> > > > > > > >> > > >> > > > do
> > > > > > > >> > > >> > > > > I do to externally trigger Airflow on S3
> change":
> > > > > > either
> > > > > > > a)
> > > > > > > >> > > >> configure
> > > > > > > >> > > >> > > > > polling in airflow using s3 Connection, or b)
> > > > > "create a
> > > > > > > >> user +
> > > > > > > >> > > >> deploy
> > > > > > > >> > > >> > > > this
> > > > > > > >> > > >> > > > > lambda with those parameters"  - that is
> "easy
> > > > > enough"
> > > > > > > and
> > > > > > > >> very
> > > > > > > >> > > >> practical
> > > > > > > >> > > >> > > > > as well.
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > > But maybe I am not seeing the whole picture
> and
> > > the
> > > > > > real
> > > > > > > >> problem
> > > > > > > >> > > >> it's
> > > > > > > >> > > >> > > > > solving - so take it as a "first review
> pass" and
> > > > > "guts
> > > > > > > >> > > feeling".
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > > J.
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > > On Thu, Jul 25, 2024 at 10:55 PM Beck,
> Vincent
> > > > > > > >> > > >> > > > <vincb...@amazon.com.invalid
> > > > > > > >> > > >> > > > > >
> > > > > > > >> > > >> > > > > wrote:
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > > > Hello everyone,
> > > > > > > >> > > >> > > > > >
> > > > > > > >> > > >> > > > > > I created a draft AIP regarding "External
> event
> > > > > > driven
> > > > > > > >> > > >> scheduling in
> > > > > > > >> > > >> > > > > > Airflow". This proposal is about adding
> > > > capability
> > > > > in
> > > > > > > >> Airflow
> > > > > > > >> > > to
> > > > > > > >> > > >> > > > schedule
> > > > > > > >> > > >> > > > > > DAGs based on external events. Here are
> some
> > > > > examples
> > > > > > > of
> > > > > > > >> such
> > > > > > > >> > > >> external
> > > > > > > >> > > >> > > > > > events:
> > > > > > > >> > > >> > > > > > - A user signs up to one of the user pool
> > > defined
> > > > > in
> > > > > > my
> > > > > > > >> cloud
> > > > > > > >> > > >> provider
> > > > > > > >> > > >> > > > > > - One of the databases used in my company
> has
> > > > been
> > > > > > > >> updated
> > > > > > > >> > > >> > > > > > - A job in my cloud provider has been
> executed
> > > > > > > >> successfully
> > > > > > > >> > > >> > > > > >
> > > > > > > >> > > >> > > > > > The intent of this AIP is to leverage
> datasets
> > > > > (which
> > > > > > > >> will be
> > > > > > > >> > > >> soon
> > > > > > > >> > > >> > > > > assets)
> > > > > > > >> > > >> > > > > > and update them based on external events. I
> > > would
> > > > > > like
> > > > > > > to
> > > > > > > >> > > >> propose this
> > > > > > > >> > > >> > > > > AIP
> > > > > > > >> > > >> > > > > > for discussion and more importantly, hear
> some
> > > > > > > feedbacks
> > > > > > > >> from
> > > > > > > >> > > >> you :)
> > > > > > > >> > > >> > > > > >
> > > > > > > >> > > >> > > > > >
> > > > > > > >> > > >> > > > > >
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >>
> > > > > > > >> > >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-82%2BExternal%2Bevent%2Bdriven%2Bscheduling%2Bin%2BAirflow&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C9e55ef9af31e4a669ef108dcada3a726%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638576165598178951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3FFvhCI6RA6sPhZoiOBAqzgyTkC6NNYqJYjBRVqEmUY%3D&reserved=0
> > > > > > > >> > > >> > > > <
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >>
> > > > > > > >> > >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > > > >
> > > > > > > >> > > >> > > > > > Vincent
> > > > > > > >> > > >> > > > > >
> > > > > > > >> > > >> > > > >
> > > > > > > >> > > >> > > >
> > > > > > > >> > > >> > >
> > > > > > > >> > > >> >
> > > > > > > >> > > >> >
> > > > > > > >>
> > > > >
> ---------------------------------------------------------------------
> > > > > > > >> > > >> > To unsubscribe, e-mail:
> > > > dev-unsubscr...@airflow.apache.org
> > > > > > > >> > > >> > For additional commands, e-mail:
> > > > > dev-h...@airflow.apache.org
> > > > > > > >> > > >> >
> > > > > > > >> > > >> >
> > > > > > > >> > > >>
> > > > > > > >> > > >>
> > > > > > > >>
> > > > >
> ---------------------------------------------------------------------
> > > > > > > >> > > >> To unsubscribe, e-mail:
> > > dev-unsubscr...@airflow.apache.org
> > > > > > > >> > > >> For additional commands, e-mail:
> > > > dev-h...@airflow.apache.org
> > > > > > > >> > > >>
> > > > > > > >> > > >>
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >>
> > > > >
> ---------------------------------------------------------------------
> > > > > > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > > > > >> For additional commands, e-mail:
> dev-h...@airflow.apache.org
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to