[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-898028425 > Is the goal of this PR to replace & generalize this use case or the real issue you are trying to solve is something else ? Something completely different. Pre/postoperators in Vertica are SQL statements called before/after (Those are strings rather than callables) What this PR tries to achieve is to provide an option to override pre_/post_execute methods from the base operator with ones provided as constructor parameters. While it was already possible to override them by extending existing operators, it makes it a bit easier (and with less mental barriers for DAG writers) to provide such methods when you create task in DAG. While previously it required to create a new class and use the new class as operator, this on allows to create methods and pass them as parameters of the cosnstructor - which is easier for many users who write the DAGs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-898029293 LGTM - but serialisation needs to be updated to skip those new fields (tests are failing). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-898261046 Looking at the test, there must be a reason the test it here (not only to annoy the user) and it says explicitly that new fields should not be added, so that makes me wonder if just adding the fields to the test is a good idea.. @kaxil - is that OK that we add new fields to Base Operator? Will that work for already serialized Tasks? to de-serialize them without problems ? Or do we need to do something else besides adding the two two fields to the test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-901396372 Hey @kaxil -> do you think we can just add _pre/_post to the test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-903326972 Hey @kaxil - any comments here? And @malthe - woudl you please rebase after the comment from @kaxil? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-903329602 I was asking @kaxil for comment about the serialisation and reason why the tests explicitly mention come fields :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-904010653 > These two new fields you add don't need to be serialized (and callables can't generally be anyway) -- the general rule is that things needed by the Scheduler should be serialized, and that test was there to make people think about the change. > > In this case, since the scheduler doesn't care about these fields they should be added to the ignore list in the test. Ah cool. I will close/reopen to rebuild, but I think this one is good-to-go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-904457121 Hey @malthe - I think this one, will be very powerful. After some discussions on Slack recently, the potential of those overrideable hooks is very interesting for a number of cases :). Glad we got it in for 2.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-916379253 One of the good cases is to distinguish between DEV/PROD environments. For example some of the operators could be skipped on DEV environments and you might want to add single method throwing "Ski[p" exception in the DEV environment. This is mostly a feature that promotes good engineering practices such as being able to run the same code on different environments, wtihout changing the DAGs or adding artifficial logic in the DAG structure, to handle different execution environments. I think this is the main case (often raised by our users in Slack). If we do not have hooks, the users would have to develop custom operators of different kind to handle different behaviours. Having the configurable hooks you can deploy the same DAGs for different environments without adding complex logic or having to add custom operators. Many of our users (and DAG writers) are Data scientists. You could potentially have the same effect by extending N operators and adding pre/post hooks there but for those users classes and objects are "alien" - they think functional and adding pre/post hooks is so much easier, especially that you can much more easliy have one common function that you pass to a number of different, completley unrelated operators. This is a classic "cross-cutting-concern" implementation IMHO. This is at least what has been my motivation when guiding and approving this one. Any other idea how to approach it? I am not against other solutions, and I would like to hear what anti-patterns it promotes (in the context I described). Could you tell @ash what antipattern is in play here, as I am not sure I see it ? Of course if it somehow makes other parts (non-completed AIP's especially) much more difficult to implement. we can change the way it is implemented, but I think the cabability of making "cross-cutting-concern" - for example tied to an environment where the DAG is executed is a powerful feature that might make the deployment and testing DAGs at scale much easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-916395891 Just to add one more thing @ashb - if there are other features in upcoming AIP's that might make it useless/obsolete/outdated/replaceable - I am all for it. We just simply might not be aware of some of it as we are not following it super closely (I tried with as much time I could spare) - but if there are other ways to achieve those things mentined above or thing that they can break that you are aware of - it would be great to align here. We might simply not know everything you and others working on new AIPs know :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-916426991 I think I even discussed it with @malthe that time based decision will be better done based on AIP-39. So I agree @ashb - this is likely not a good use of those pre/post_hooks. see https://github.com/apache/airflow/pull/17545 discussion . I agree timetable will be WAY better way of handling the "skip_unless"" " ( and honestly I have not paid attention at the example in PR "description" - more at the implementation). And explicit better and implicit surely - I would never encourage a "logical change" in processing the DAG based on those. I think we are aligned here. But the cross-cutting concerns which are not time-related, but environment-related, I still think there is a good use of it. I actually spent last ~ month or so, mostly tuned to listen, analyse respond and help users in slack/stackoverflow/github discussion and this has been explicitly asked at least once (I think this is where the idea was born that it might make sense0: https://apache-airflow.slack.com/archives/CCR6P6JRL/p1629705521304600 but I also personally think that better support for more "enterprise'y" setup where you have "mostly the same but subtly different" environment and "same dags but behaving slightly differently" is something that is really useful. And just one more comment - this is just a bit easier way of doing something that is entirely possible today, so while I understand we might not want to "promote" it, just having pre/post hooks is encouraging similar patterns. Maybe we should rather get rid of those pre/post hooks if we want to go 'clean" as it promotes even "worse" patterns. For me the pre/post executed looked entirely unnatural and useless to be honest - up until we added the possibility of overriding those (which is the cross-cutting-concern I am mentioning), If anything, I think it makes it much cleaner and easier to manage. (Pseudo code of course) Before: ``` def pre_hook(): raise AirflowSkipException() if DEV def MyCustomOperatorA(OperatorA): def pre_execute(): pre_hook() def MyCustomOperatorB(OperatorB): def pre_execute(): pre_hook() with DAG() as dag:. a = MyCustomOperatorA() b = MyCustomOperatorB() a >> b ``` After: ``` def pre_hook(): raise AirflowSkipException() if DEV with DAG() as dag:. a = OperatorA(pre_execute = pre_hook) b = OperatorB(pre_execute = pre_hook) a >> b ``` But I might of course be biased, I usually think more from "development" point of view - so not "how easy and clean it is to run it in produtiton" but "how easy it is to develop and test it in the dev/test/staging env before it gets into PROD". I just see DAG development as a process rather than end-goal. This might be very biased and maybe it can be achieved in a simpler way :). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-916792442 > I'd like to get 2.2 release train started, and in order to not delay that unnecessarily how about we mark this feature as experimental -- that way if we decide we _don't_ want it for any reason we can remove it before 3.0 (as otherwise it's removeal would be a breaking change under SemVer) Absolutely no problem for me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-916793392 We can even print warning when it is used, that it is experimental. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-920781011 I think that might be indeed why @ashb insisted on making that feature experimental - because I see that each one of us has different "use case" in mind when looking at the pre/post hooks. Now when I think of it, it's really closing the gap between more `functional` and more `object oriented` approach of writing DAGs and I think we need to probably figure out a better way of doing that without this hybrid approach where we have Operator classes and functional extensions of them, that does sound like a bit of a "monster hybrid". The thing is - there is nothing "more" that feature adds - you can do everything you do with the pre-/post- hooks currently (also what you do @JavierLopezT ) but in a slightly less verbose way (with less classes and more functions). I do not yet how, but I have a feeling that maybe indeed we should try to find a bit nicer way to incorporate this in the current direction we are taking with more "decorator" / "task" oriented way of writing dags and make sure that the current operator classes ( we have sooo many useful ones) can be much easier integrated with the "functional" way of writing dags and incorporating that pre/post behaviour there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks
potiuk commented on pull request #17576: URL: https://github.com/apache/airflow/pull/17576#issuecomment-986363233 Another possibly interesting use case for pre_execute: https://apache-airflow.slack.com/archives/CCVDRN0F9/p1638740065139100 Basically adding a "session" attribute that would authorize the call with PythonOperators - defined at a DAG level. Pretty useful IMHO: ``` def authorize(self): CREDENTIALS, _= google.auth.default(scopes=['https://www.googleapis.com/auth/cloud-platform']) self.session = AuthorizedSession(CREDENTIALS) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org