Thanks for the update Vincent! I also referenced a comment earlier about leveraging patterns for the 80-20 use cases in the poll-model. However, after seeing how you have defined the syntax for "Asset watchers", I believe that's already covered in your proposal. So, all good from my perspective.
Vikram On Fri, Aug 2, 2024 at 12:29 PM Vincent Beck <vincb...@apache.org> wrote: > Hello everyone, > > After multiple conversations with different actors, I decided to move the > push based event scheduling out of scope of this AIP. I could not find a > solution for the push based approach that was satisfactory and I think more > investigation and data from Airflow users is needed to move on with this > approach. In order to not block the poll based event scheduling, which did > not receive any concern, I decided to focus AIP-82 on the poll-based event > scheduling. I updated the AIP accordingly. > > Since the poll-based approach did not receive any concerns (and many > positive feedbacks), I'll start voting in another thread. > > Thanks for your feedbacks! > > Vincent > > On 2024/08/02 00:50:35 Pavankumar Gopidesu wrote: > > Thanks Vincent and Kaxil, Agree it can be added to future work. > > > > Looking at discussion on event mapping, Today we know the source systems > > and we are aware of the schemas from source systems (S3, SQS, Event Arc > > etc;) > > > > However, as new services emerge and their schemas evolve, this may > change. > > > > Agree with Kaxil has mentioned that transformers are very effective in > > converting the required format before sending data to target systems. > > > > Is it a good idea to define a schema for events to enforce from airflow? > > There are some open standards, such as CloudEvents[1], that we could > > consider. > > > > This approach would allow us to maintain versions (v1, v2, ... vn) as we > > evolve, providing utilities that users can easily plug into and send > events > > to Airflow. > > > > It would also simplify maintenance within the Airflow system to handle > > incoming events because we get a known version of the schema from > > user request. > > > > > > If this is already being addressed, please disregard my comments. > > > > > > [1]: https://github.com/cloudevents/spec > > > > > > Regards, > > > > Pavan > > > > On Fri, Aug 2, 2024 at 1:40 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > > > Yes but the big difference is you will create a single user for > EventBridge > > > since that is the one sending request to Airflow, single user for > EventArc > > > for GCP and 1 user for every other EventListener or application ---- as > > > compared to 1 user per type of Payload (since Airflow will need to > > > understand the payload of the original source). So in that case, you > will > > > have 1 user,function mapping for S3, 1user,function mapping for > Redshift, > > > and it goes on. The former approach is also consistent with our > Connection > > > model, where we have one standardized AWS & GCP connection that works > for > > > most, if not all, services. > > > > > > but the user will have to be added anyway (some kind of > > > > service account - because the API needs to be authorized - that part > is > > > not > > > > changed). (unless of course you want to use the same user for all > kinds > > > of > > > > external interfaces, which for security point of view is a very bad > idea > > > - > > > > each external system should have their own "service account" - that > 's > > > the > > > > best practices from the security point of view. > > > > > > > > > On Fri, 2 Aug 2024 at 01:22, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > I am all for it - if we want to stick to event bridge or similar and > > > > recommend it to our users, it's perfectly fine for me, It would be > great > > > > however to add a documentation explaining the steps and some > examples - > > > > ideally for most of our providers and "standard" ways of triggering > such > > > an > > > > event. This is even what I proposed originally whent the first > version of > > > > the document was created ( to just document how to map the events > > > > externally). > > > > > > > > BTW. Yes - in this case you need to implement the logic in the > > > > event bridge, but the user will have to be added anyway (some kind of > > > > service account - because the API needs to be authorized - that part > is > > > not > > > > changed). (unless of course you want to use the same user for all > kinds > > > of > > > > external interfaces, which for security point of view is a very bad > idea > > > - > > > > each external system should have their own "service account" - that > 's > > > the > > > > best practices from the security point of view. > > > > > > > > J, > > > > > > > > On Fri, Aug 2, 2024 at 2:01 AM Kaxil Naik <kaxiln...@gmail.com> > wrote: > > > > > > > > > I was discussing this with Vincent. In either case, same as now or > the > > > > one > > > > > proposed in the AIP, a user will have to use something like AWS > > > > > EventBridge[1] or GCS EventArc [2] where users will consume the > event > > > > from > > > > > object storage (S3 object creation for example), and then they will > > > have > > > > to > > > > > add the Airflow's Create Dataset endpoint to EventBridge [3]. Now, > if > > > you > > > > > just customize the payload to build the URI which is allowed > (either > > > via > > > > > GET / POST) in eventbridge, it works right now. However, with the > > > current > > > > > proposal: a user will have to create a new user in Airflow and some > > > > mapping > > > > > to a function (that is either in the provider or a new user-defined > > > > > function) that can understand this specific payload, in this > example > > > the > > > > > payload for S3 events. This will become huge because this means > that > > > for > > > > > each payload, we will have to provide a new function and keep it > > > updated. > > > > > From users POV, they will need to create a new user every time for > a > > > new > > > > > service (S3, Redshift, SNS, Bedrock etc). This will again likely > have > > > to > > > > go > > > > > to the Auth manager backend. Compared to what's available today -- > i.e. > > > > > building a URI & extra metadata that can not only work with > EventBridge > > > > or > > > > > Eventarc but by any service. > > > > > > > > > > Since we already have to use things like EventBridge or EventArc > for > > > > > managed service providers to transform the event, it fits well > with the > > > > > existing approach. AWS Blog [3] even has a similar example for > Datadog > > > > > where they use input transformer "{"detail":"$.detail"}" before > sending > > > > it > > > > > to Datadog's API. > > > > > >"Having producer of an event generating even in their standard > way, > > > make > > > > > it easy for airflow to consume it as a dataset event without > external > > > > > entities": > > > > > > > > > > > > > > > [1]: https://aws.amazon.com/eventbridge/ | > > > > > > https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventBridge.html > > > > > [2]: https://cloud.google.com/eventarc/docs > > > > > [3]: > > > > > > > > > > > > > > > > > > https://aws.amazon.com/blogs/compute/using-api-destinations-with-amazon-eventbridge/ > > > > > > > > > > On Fri, 2 Aug 2024 at 00:14, Jarek Potiuk <ja...@potiuk.com> > wrote: > > > > > > > > > > > I proposed the mapping - because it's the easiest way (I think) > to > > > map > > > > > > between the "native" source to "airflow" target expectations. > There > > > are > > > > > > many producers of such events, and Airflow is the consumer. And > it > > > > seems > > > > > > appropriate to have a way for our users to easily plug events > > > produced > > > > > from > > > > > > one system into our "events" API - without having to employ > external > > > > > > "mapper" (say lambda) doing the conversion. While I think it is > > > indeed > > > > "a > > > > > > bit odd", it's a solution that might leverage most of what we > have - > > > > > > authorisation and API exposure via "user" in API. > > > > > > > > > > > > While I - myself - find it it a bit unusual, I think it might do > the > > > > job, > > > > > > But I wonder if there is any alternative solution to the problem > of > > > > > "Having > > > > > > producer of an event generating even in their standard way, make > it > > > > easy > > > > > > for airflow to consume it as a dataset event without external > > > > entities". > > > > > > > > > > > > On Fri, Aug 2, 2024 at 12:57 AM Kaxil Naik <kaxiln...@gmail.com> > > > > wrote: > > > > > > > > > > > > > I would love for VOTE to get started on this one. I think most > of > > > the > > > > > > > commenters and those who replied to this email are happy with > the > > > > > > proposal > > > > > > > on the poll-based approach. > > > > > > > > > > > > > > Regarding the push-based approach, I am not convinced that the > > > > proposed > > > > > > > implementation has any gains over what's already available > with the > > > > > > Dataset > > > > > > > Event Create API; the one user-to-one function mapping is an > odd > > > user > > > > > > > experience. I'm curious to hear what others think. > > > > > > > > > > > > > > On Thu, 1 Aug 2024 at 17:39, Kaxil Naik <kaxiln...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > I agree with both of you that it is indeed a good idea and > that > > > it > > > > > can > > > > > > be > > > > > > > > added in Future work -- doesn't need to be part of this AIP. > > > > > > > > > > > > > > > > Thanks for the interest. I was not aware of such feature and > this > > > > > looks > > > > > > > >> really cool! I definitely think that can be useful for > Airflow, > > > > > > > especially > > > > > > > >> for testing when you can easily replay events received in > the > > > > past. > > > > > > > >> However, I do not think it should be part of the AIP and, > as you > > > > > > > mentioned, > > > > > > > >> if should be a future work or a follow-up item of the AIP. > > > Please > > > > > let > > > > > > me > > > > > > > >> know if you (or anyone) disagree with this and we can talk > about > > > > it. > > > > > > > >> Otherwise I'll update the future work section of the AIP and > > > > mention > > > > > > > this > > > > > > > >> archive and replay feature. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 1 Aug 2024 at 16:11, Vincent Beck < > vincb...@apache.org> > > > > > wrote: > > > > > > > > > > > > > > > >> Hey Pavan, > > > > > > > >> > > > > > > > >> Thanks for the interest. I was not aware of such feature and > > > this > > > > > > looks > > > > > > > >> really cool! I definitely think that can be useful for > Airflow, > > > > > > > especially > > > > > > > >> for testing when you can easily replay events received in > the > > > > past. > > > > > > > >> However, I do not think it should be part of the AIP and, > as you > > > > > > > mentioned, > > > > > > > >> if should be a future work or a follow-up item of the AIP. > > > Please > > > > > let > > > > > > me > > > > > > > >> know if you (or anyone) disagree with this and we can talk > about > > > > it. > > > > > > > >> Otherwise I'll update the future work section of the AIP and > > > > mention > > > > > > > this > > > > > > > >> archive and replay feature. > > > > > > > >> > > > > > > > >> On 2024/08/01 01:21:58 Pavankumar Gopidesu wrote: > > > > > > > >> > Thanks Vincent, I took a look , this is really good. Don't > > > have > > > > > > access > > > > > > > >> to > > > > > > > >> > the confluence page to comment :) so adding it here. > > > > > > > >> > > > > > > > > >> > As events arrive-->do somework-->end. > > > > > > > >> > > > > > > > > >> > So I'm uncertain if my comment pertains to the current > > > poll/push > > > > > > model > > > > > > > >> or > > > > > > > >> > if it fits part of future work(seen event batching ). > > > > > > > >> > > > > > > > > >> > Have you given any thought to the event archival > mechanism and > > > > > event > > > > > > > >> > replay? This could significantly aid in testing and > recovery > > > of > > > > > > > workflow > > > > > > > >> > and testing new functionality with events by just replay > the > > > > > events. > > > > > > > The > > > > > > > >> > archival mechanism I am referring to is similar to today > in > > > AWS > > > > we > > > > > > > have > > > > > > > >> > Event Bridge Archive and Replay. > > > > > > > >> > > > > > > > > >> > Regards, > > > > > > > >> > Pavan > > > > > > > >> > > > > > > > > >> > On Thu, Aug 1, 2024 at 1:29 AM Kaxil Naik < > > > kaxiln...@gmail.com> > > > > > > > wrote: > > > > > > > >> > > > > > > > > >> > > I actually did manage to take a look, thanks for the > work. I > > > > am > > > > > +1 > > > > > > > on > > > > > > > >> the > > > > > > > >> > > poll-based approach -- left a comment on the > push-based: I > > > am > > > > > not > > > > > > > >> sure of > > > > > > > >> > > why we need a function since create asset event API > endpoint > > > > > > should > > > > > > > >> have > > > > > > > >> > > all info needed for what the Asset was. > > > > > > > >> > > > > > > > > > >> > > On Thu, 1 Aug 2024 at 01:14, Kaxil Naik < > > > kaxiln...@gmail.com> > > > > > > > wrote: > > > > > > > >> > > > > > > > > > >> > > > Thanks Vincent, I will take a look again tomorrow. > > > > > > > >> > > > > > > > > > > >> > > > On Tue, 30 Jul 2024 at 18:47, Vincent Beck < > > > > > vincb...@apache.org > > > > > > > > > > > > > > >> wrote: > > > > > > > >> > > > > > > > > > > >> > > >> Hi everyone, > > > > > > > >> > > >> > > > > > > > >> > > >> I updated the AIP-82 given the different comments and > > > > > concerns > > > > > > I > > > > > > > >> > > >> received. I also tried to reply to all comments > > > > > individually. I > > > > > > > >> would > > > > > > > >> > > >> really appreciate if you can do a second pass and > let me > > > > know > > > > > > > what > > > > > > > >> you > > > > > > > >> > > >> think. Overall, this is what I changed in the AIP: > > > > > > > >> > > >> > > > > > > > >> > > >> - Push based event-driven scheduling. I updated this > > > > section > > > > > > > >> entirely > > > > > > > >> > > >> because I received many concerns about the previous > > > > proposal. > > > > > > The > > > > > > > >> > > overall > > > > > > > >> > > >> idea now is to leverage the create asset event API > > > endpoint > > > > > to > > > > > > > send > > > > > > > >> > > >> notifications from external (e.g. cloud provider) to > > > > Airflow > > > > > > > >> > > environment. > > > > > > > >> > > >> > > > > > > > >> > > >> - I updated the poll based event-driven scheduling > DAG > > > > author > > > > > > > >> experience > > > > > > > >> > > >> to use a message queue scenario. I understood that > this > > > is > > > > > > > >> probably the > > > > > > > >> > > >> main use case we are trying to cover with this AIP, > thus > > > I > > > > > used > > > > > > > it > > > > > > > >> as > > > > > > > >> > > >> example and mentioned it multiple times across the > AIP. > > > > > > > >> > > >> > > > > > > > >> > > >> Thanks again for your time :) > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow > > > > > > > >> > > >> > > > > > > > >> > > >> Vincent > > > > > > > >> > > >> > > > > > > > >> > > >> On 2024/07/29 15:58:23 Vincent Beck wrote: > > > > > > > >> > > >> > Thanks a lot all for the comments, this is very > much > > > > > > > >> appreciated! I > > > > > > > >> > > >> received many comments from this thread and in > > > confluence, > > > > > > thanks > > > > > > > >> again. > > > > > > > >> > > >> I'll try to address them all in the AIP and will > send an > > > > > email > > > > > > in > > > > > > > >> this > > > > > > > >> > > >> thread once done. I will most likely revisit the > > > push-based > > > > > > > >> approach > > > > > > > >> > > given > > > > > > > >> > > >> the number of concerns I received, thanks Jarek for > > > > proposing > > > > > > > >> another > > > > > > > >> > > >> solution, I'll probably go down that path. > > > > > > > >> > > >> > > > > > > > > >> > > >> > One follow-up question Vikram. > > > > > > > >> > > >> > > > > > > > > >> > > >> > > The bespoke triggerer approach completely makes > sense > > > > for > > > > > > the > > > > > > > >> long > > > > > > > >> > > >> tail here, but can we do better for the 20% of > scenarios > > > > > which > > > > > > > >> cover > > > > > > > >> > > well > > > > > > > >> > > >> over 80% of usage here is the question in my mind. > Or, > > > are > > > > > you > > > > > > > >> thinking > > > > > > > >> > > of > > > > > > > >> > > >> those as being covered in the "push" model? > > > > > > > >> > > >> > > > > > > > > >> > > >> > Could you share more details about what is this > "20% of > > > > > > > scenarios > > > > > > > >> > > which > > > > > > > >> > > >> cover well over 80% of usage" please? > > > > > > > >> > > >> > > > > > > > > >> > > >> > Vincent > > > > > > > >> > > >> > > > > > > > > >> > > >> > On 2024/07/29 15:37:50 Kaxil Naik wrote: > > > > > > > >> > > >> > > Thanks Vincent for driving these, I have added my > > > > > comments > > > > > > to > > > > > > > >> the > > > > > > > >> > > AIP > > > > > > > >> > > >> too. > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > Regards, > > > > > > > >> > > >> > > Kaxil > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > On Fri, 26 Jul 2024 at 20:16, Scheffler Jens > > > > > > > (XC-AS/EAE-ADA-T) > > > > > > > >> > > >> > > <jens.scheff...@de.bosch.com.invalid> wrote: > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > +1 on the comments of Vikram and Jarek, added > main > > > > > points > > > > > > > on > > > > > > > >> > > >> confluence > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > Sent from Outlook for iOS< > https://aka.ms/o0ukef> > > > > > > > >> > > >> > > > ________________________________ > > > > > > > >> > > >> > > > From: Vikram Koka <vik...@astronomer.io.INVALID > > > > > > > > > >> > > >> > > > Sent: Friday, July 26, 2024 8:46:55 PM > > > > > > > >> > > >> > > > To: dev@airflow.apache.org < > dev@airflow.apache.org > > > > > > > > > > > >> > > >> > > > Subject: Re: [DISCUSS] External event driven > > > > scheduling > > > > > > in > > > > > > > >> Airflow > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > Vincent, > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > Thanks for writing this up. The overview looks > > > really > > > > > > good! > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > I will leave my comments in the AIP as well, > but > > > at a > > > > > > high > > > > > > > >> level > > > > > > > >> > > >> they are > > > > > > > >> > > >> > > > both relatively focused on the "how", rather > than > > > the > > > > > > > "what". > > > > > > > >> > > >> > > > With respect to the pull / polling approach, I > > > > > completely > > > > > > > >> agree > > > > > > > >> > > >> that some > > > > > > > >> > > >> > > > incarnation of this is needed. > > > > > > > >> > > >> > > > I am less certain as to how on this part. The > > > bespoke > > > > > > > >> triggerer > > > > > > > >> > > >> approach > > > > > > > >> > > >> > > > completely makes sense for the long tail here, > but > > > > can > > > > > we > > > > > > > do > > > > > > > >> > > better > > > > > > > >> > > >> for the > > > > > > > >> > > >> > > > 20% of scenarios which cover well over 80% of > usage > > > > > here > > > > > > is > > > > > > > >> the > > > > > > > >> > > >> question in > > > > > > > >> > > >> > > > my mind. Or, are you thinking of those as being > > > > covered > > > > > > in > > > > > > > >> the > > > > > > > >> > > >> "push" > > > > > > > >> > > >> > > > model? > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > Which leads to the "push" model approach. > > > > > > > >> > > >> > > > I am struggling with the same question that > Jarek > > > > > raised > > > > > > > here > > > > > > > >> > > about > > > > > > > >> > > >> whether > > > > > > > >> > > >> > > > we need a new Airflow entity over and beyond > the > > > > > existing > > > > > > > >> REST API > > > > > > > >> > > >> for the > > > > > > > >> > > >> > > > same. > > > > > > > >> > > >> > > > I am concerned about this becoming a vector of > > > attack > > > > > on > > > > > > > >> Airflow. > > > > > > > >> > > >> > > > I see that this is a hot topic of discussion > in the > > > > > > > >> Confluence doc > > > > > > > >> > > >> as well, > > > > > > > >> > > >> > > > but wanted to summarize here as well, so it > didn't > > > > get > > > > > > lost > > > > > > > >> in the > > > > > > > >> > > >> threads > > > > > > > >> > > >> > > > of comments. > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > Best regards, > > > > > > > >> > > >> > > > Vikram > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > On Fri, Jul 26, 2024 at 5:16 AM Jarek Potiuk < > > > > > > > >> ja...@potiuk.com> > > > > > > > >> > > >> wrote: > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > Thanks Vincent. I took a look and I have a > > > general > > > > > > > >> comment. I > > > > > > > >> > > >> > > > > strongly think external driven scheduling is > > > really > > > > > > > needed > > > > > > > >> - > > > > > > > >> > > >> especially, > > > > > > > >> > > >> > > > it > > > > > > > >> > > >> > > > > should be much easier for a user to "plug-in" > > > such > > > > an > > > > > > > >> external > > > > > > > >> > > >> event to > > > > > > > >> > > >> > > > > Airflow. And there are two parts of it - as > > > > correctly > > > > > > > >> stated > > > > > > > >> > > >> there - pull > > > > > > > >> > > >> > > > > and push. > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > For the pull - I think it would be great to > have > > > a > > > > > kind > > > > > > > of > > > > > > > >> > > >> specialized > > > > > > > >> > > >> > > > > Triggers that will be started when DAG is > parsed > > > - > > > > > and > > > > > > > >> those > > > > > > > >> > > >> Triggers > > > > > > > >> > > >> > > > could > > > > > > > >> > > >> > > > > generate the events for DAGs. I think > basically > > > > > that's > > > > > > > all > > > > > > > >> that > > > > > > > >> > > is > > > > > > > >> > > >> > > > needed, > > > > > > > >> > > >> > > > > for example I imagine a pubsub trigger that > will > > > > > > > subscribe > > > > > > > >> to > > > > > > > >> > > >> messages > > > > > > > >> > > >> > > > > coming on the pubsub queue and fire "Asset" > event > > > > > when > > > > > > a > > > > > > > >> message > > > > > > > >> > > >> is > > > > > > > >> > > >> > > > > received. Not much controversy there - I am > not > > > > sure > > > > > > > about > > > > > > > >> the > > > > > > > >> > > >> polling > > > > > > > >> > > >> > > > > thing , because I've always believed that > when > > > > > > > >> "asyncio-native" > > > > > > > >> > > >> Trigger > > > > > > > >> > > >> > > > is > > > > > > > >> > > >> > > > > run in the asyncio event loop, we do not > "poll" > > > > every > > > > > > > >> second or > > > > > > > >> > > >> so (but > > > > > > > >> > > >> > > > > maybe this is just coming from some specific > > > > triggers > > > > > > > that > > > > > > > >> > > >> actually do > > > > > > > >> > > >> > > > > such regular poll. But yes - there are polls > > > like > > > > > > > running > > > > > > > >> > > select > > > > > > > >> > > >> on the > > > > > > > >> > > >> > > > DB > > > > > > > >> > > >> > > > > that cannot be easily "async-ed" so having a > > > > > > configurable > > > > > > > >> > > polling > > > > > > > >> > > >> time > > > > > > > >> > > >> > > > > would be good there (but I am not sure maybe > it's > > > > > even > > > > > > > >> possible > > > > > > > >> > > >> today). I > > > > > > > >> > > >> > > > > think this would be really great if we have > that > > > > > > option, > > > > > > > >> because > > > > > > > >> > > >> it makes > > > > > > > >> > > >> > > > > it much easier to set up the authorization > for > > > > > Airlfow > > > > > > > >> users - > > > > > > > >> > > >> rather > > > > > > > >> > > >> > > > than > > > > > > > >> > > >> > > > > setting up authorization and REST calls > coming > > > from > > > > > an > > > > > > > >> external > > > > > > > >> > > >> system, > > > > > > > >> > > >> > > > we > > > > > > > >> > > >> > > > > can utilize Connections of Airlfow to > authorize > > > > such > > > > > a > > > > > > > >> Trigger > > > > > > > >> > > to > > > > > > > >> > > >> > > > subscribe > > > > > > > >> > > >> > > > > to events. > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > For the push proposal - as I read the > proposal, > > > > the > > > > > > main > > > > > > > >> point > > > > > > > >> > > >> behind it > > > > > > > >> > > >> > > > > is rather than users having to write > "Airflow" > > > way > > > > of > > > > > > > >> triggering > > > > > > > >> > > >> events > > > > > > > >> > > >> > > > and > > > > > > > >> > > >> > > > > configuring authentication (using REST API) > to > > > > > generate > > > > > > > >> asset > > > > > > > >> > > >> events, is > > > > > > > >> > > >> > > > to > > > > > > > >> > > >> > > > > make Airflow natively understand external > ways of > > > > > > pushing > > > > > > > >> - and > > > > > > > >> > > >> > > > effectively > > > > > > > >> > > >> > > > > authorizing and mapping such incoming > > > unauthorized > > > > > > > >> requests into > > > > > > > >> > > >> event > > > > > > > >> > > >> > > > that > > > > > > > >> > > >> > > > > could be generated by an API REST call. > > > > > > > >> > > >> > > > > I am not really sure honestly if this is > > > something > > > > > that > > > > > > > we > > > > > > > >> want > > > > > > > >> > > as > > > > > > > >> > > >> > > > > "running" in airlfow as an endpoint. I'd say > such > > > > an > > > > > > > >> > > unauthorised > > > > > > > >> > > >> > > > endpoint > > > > > > > >> > > >> > > > > is probably not a good idea - for a variety > of > > > > > reasons, > > > > > > > >> mostly > > > > > > > >> > > >> security. > > > > > > > >> > > >> > > > > And as I understand the goal is that users > can > > > > easily > > > > > > > >> point at > > > > > > > >> > > >> > > > "3rd-party" > > > > > > > >> > > >> > > > > notification to Airflow and get the event > > > > generated. > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > My feeling is that while this is needed - it > > > should > > > > > be > > > > > > > >> > > >> externalised from > > > > > > > >> > > >> > > > > airlfow webserver. The authorization has to > be > > > set > > > > up > > > > > > > >> anyway > > > > > > > >> > > >> > > > additionally - > > > > > > > >> > > >> > > > > unlike in "poll" case - we cannot use > Connections > > > > for > > > > > > > >> > > authorizing > > > > > > > >> > > >> > > > (because > > > > > > > >> > > >> > > > > it's not Airlfow that authorizes in an > external > > > > > system > > > > > > - > > > > > > > >> it's > > > > > > > >> > > the > > > > > > > >> > > >> other > > > > > > > >> > > >> > > > way > > > > > > > >> > > >> > > > > round). So we have to anyhow setup "something > > > > extra" > > > > > in > > > > > > > >> Airflow > > > > > > > >> > > to > > > > > > > >> > > >> > > > > authorize the external system. Which could be > > > what > > > > we > > > > > > > have > > > > > > > >> now - > > > > > > > >> > > >> user > > > > > > > >> > > >> > > > that > > > > > > > >> > > >> > > > > allows us to trigger the event. Which means > that > > > > our > > > > > > REST > > > > > > > >> API > > > > > > > >> > > >> could > > > > > > > >> > > >> > > > > potentially be used the same way it is now, > but > > > we > > > > > will > > > > > > > >> need > > > > > > > >> > > >> "something" > > > > > > > >> > > >> > > > > (library, lambda function etc.) that users > could > > > > > easily > > > > > > > >> setup in > > > > > > > >> > > >> the > > > > > > > >> > > >> > > > > external system to map whatever trigger they > > > > generate > > > > > > > >> natively > > > > > > > >> > > >> (say S3 > > > > > > > >> > > >> > > > file > > > > > > > >> > > >> > > > > created) to Airflow REST API. > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > As I see it - this is quite often used (and > very > > > > > > > >> practical, that > > > > > > > >> > > >> you > > > > > > > >> > > >> > > > deploy > > > > > > > >> > > >> > > > > a cloud function or lambda that subscribes > on the > > > > > event > > > > > > > >> received > > > > > > > >> > > >> when > > > > > > > >> > > >> > > > > S3/GCS is created. So it would be on the > user to > > > > > deploy > > > > > > > >> such a > > > > > > > >> > > >> lambda - > > > > > > > >> > > >> > > > but > > > > > > > >> > > >> > > > > we **could** provide a library of those: say > s3 > > > > > lambda, > > > > > > > gcp > > > > > > > >> > > cloud > > > > > > > >> > > >> > > > function > > > > > > > >> > > >> > > > > in respective providers - with documentation > how > > > to > > > > > set > > > > > > > >> them up, > > > > > > > >> > > >> and how > > > > > > > >> > > >> > > > to > > > > > > > >> > > >> > > > > configure authorization and we would be > generally > > > > > > "done". > > > > > > > >> I am > > > > > > > >> > > >> just not > > > > > > > >> > > >> > > > > sure if we need a new entity in Airflow for > that > > > > > (Event > > > > > > > >> > > >> receiver). It > > > > > > > >> > > >> > > > feels > > > > > > > >> > > >> > > > > like it asks Airflow to take more > responsibility, > > > > > when > > > > > > we > > > > > > > >> all > > > > > > > >> > > >> think on > > > > > > > >> > > >> > > > what > > > > > > > >> > > >> > > > > to "remove" from Airflow rather than "add" > to it > > > - > > > > > > > >> especially > > > > > > > >> > > >> when it > > > > > > > >> > > >> > > > comes > > > > > > > >> > > >> > > > > to external integrations. It feels to me that > > > > Airflow > > > > > > > >> should > > > > > > > >> > > make > > > > > > > >> > > >> it easy > > > > > > > >> > > >> > > > > to be triggered by such an external system > and > > > make > > > > > it > > > > > > > >> easy to > > > > > > > >> > > >> "map" to > > > > > > > >> > > >> > > > the > > > > > > > >> > > >> > > > > way we expect to get events triggered, but > this > > > > > should > > > > > > be > > > > > > > >> done > > > > > > > >> > > >> outside of > > > > > > > >> > > >> > > > > Airflow. If the users can easily find in our > docs > > > > > when > > > > > > > they > > > > > > > >> > > >> search "what > > > > > > > >> > > >> > > > do > > > > > > > >> > > >> > > > > I do to externally trigger Airflow on S3 > change": > > > > > > either > > > > > > > a) > > > > > > > >> > > >> configure > > > > > > > >> > > >> > > > > polling in airflow using s3 Connection, or b) > > > > > "create a > > > > > > > >> user + > > > > > > > >> > > >> deploy > > > > > > > >> > > >> > > > this > > > > > > > >> > > >> > > > > lambda with those parameters" - that is > "easy > > > > > enough" > > > > > > > and > > > > > > > >> very > > > > > > > >> > > >> practical > > > > > > > >> > > >> > > > > as well. > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > But maybe I am not seeing the whole picture > and > > > the > > > > > > real > > > > > > > >> problem > > > > > > > >> > > >> it's > > > > > > > >> > > >> > > > > solving - so take it as a "first review > pass" and > > > > > "guts > > > > > > > >> > > feeling". > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > J. > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > On Thu, Jul 25, 2024 at 10:55 PM Beck, > Vincent > > > > > > > >> > > >> > > > <vincb...@amazon.com.invalid > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > wrote: > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > Hello everyone, > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > I created a draft AIP regarding "External > event > > > > > > driven > > > > > > > >> > > >> scheduling in > > > > > > > >> > > >> > > > > > Airflow". This proposal is about adding > > > > capability > > > > > in > > > > > > > >> Airflow > > > > > > > >> > > to > > > > > > > >> > > >> > > > schedule > > > > > > > >> > > >> > > > > > DAGs based on external events. Here are > some > > > > > examples > > > > > > > of > > > > > > > >> such > > > > > > > >> > > >> external > > > > > > > >> > > >> > > > > > events: > > > > > > > >> > > >> > > > > > - A user signs up to one of the user pool > > > defined > > > > > in > > > > > > my > > > > > > > >> cloud > > > > > > > >> > > >> provider > > > > > > > >> > > >> > > > > > - One of the databases used in my company > has > > > > been > > > > > > > >> updated > > > > > > > >> > > >> > > > > > - A job in my cloud provider has been > executed > > > > > > > >> successfully > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > The intent of this AIP is to leverage > datasets > > > > > (which > > > > > > > >> will be > > > > > > > >> > > >> soon > > > > > > > >> > > >> > > > > assets) > > > > > > > >> > > >> > > > > > and update them based on external events. I > > > would > > > > > > like > > > > > > > to > > > > > > > >> > > >> propose this > > > > > > > >> > > >> > > > > AIP > > > > > > > >> > > >> > > > > > for discussion and more importantly, hear > some > > > > > > > feedbacks > > > > > > > >> from > > > > > > > >> > > >> you :) > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-82%2BExternal%2Bevent%2Bdriven%2Bscheduling%2Bin%2BAirflow&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C9e55ef9af31e4a669ef108dcada3a726%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638576165598178951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3FFvhCI6RA6sPhZoiOBAqzgyTkC6NNYqJYjBRVqEmUY%3D&reserved=0 > > > > > > > >> > > >> > > > < > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > Vincent > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > > >> > > > > > > --------------------------------------------------------------------- > > > > > > > >> > > >> > To unsubscribe, e-mail: > > > > dev-unsubscr...@airflow.apache.org > > > > > > > >> > > >> > For additional commands, e-mail: > > > > > dev-h...@airflow.apache.org > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > > > > > --------------------------------------------------------------------- > > > > > > > >> > > >> To unsubscribe, e-mail: > > > dev-unsubscr...@airflow.apache.org > > > > > > > >> > > >> For additional commands, e-mail: > > > > dev-h...@airflow.apache.org > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > --------------------------------------------------------------------- > > > > > > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > > > > >> For additional commands, e-mail: > dev-h...@airflow.apache.org > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >