[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-08-12 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-898028425


   > Is the goal of this PR to replace & generalize this use case or the real 
issue you are trying to solve is something else ?
   
   Something completely different. Pre/postoperators in Vertica are SQL 
statements called before/after (Those are strings rather than callables)
   
   What this PR tries to achieve is to provide an option to override 
pre_/post_execute methods from the base operator with ones provided as 
constructor parameters. While it was already possible to  override them by 
extending existing operators, it makes it a bit easier (and with less mental 
barriers for DAG writers) to provide such methods when you create task in DAG. 
While previously it required to create a new class and use the new class as 
operator, this on allows to create methods and pass them as parameters of the 
cosnstructor - which is easier for many users who write the DAGs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-08-12 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-898029293


   LGTM - but serialisation needs to be updated to skip those new fields (tests 
are failing).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-08-13 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-898261046


   Looking at the test, there must be a reason the test it here (not only to 
annoy the user) and it says explicitly that new fields should not be added, so 
that makes me wonder if just adding the fields to the test is a good idea..
   
   @kaxil - is that OK that we add new fields to Base Operator? Will that work 
for already serialized Tasks? to de-serialize them without problems ? Or do we 
need to do something else besides adding the two two fields to the test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-08-18 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-901396372


   Hey @kaxil -> do you think we can just add _pre/_post to the test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-08-22 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-903326972


   Hey @kaxil - any comments here? And @malthe - woudl you please rebase after 
the comment from @kaxil?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-08-22 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-903329602


   I was asking @kaxil  for comment about the serialisation and reason why the 
tests explicitly mention come fields :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-08-23 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-904010653


   > These two new fields you add don't need to be serialized (and callables 
can't generally be anyway) -- the general rule is that things needed by the 
Scheduler should be serialized, and that test was there to make people think 
about the change.
   > 
   > In this case, since the scheduler doesn't care about these fields they 
should be added to the ignore list in the test.
   
   Ah cool.  I will close/reopen to rebuild, but I think this one is good-to-go.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-08-24 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-904457121


   Hey @malthe  - I think this one, will be very powerful. After some 
discussions on Slack recently, the potential of those overrideable hooks is 
very interesting for a number of cases :). Glad we got it in for 2.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-09-09 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-916379253


   One of the good cases is to distinguish between DEV/PROD environments. 
   
   For example some of the operators  could be skipped on DEV environments and 
you might want to add single method throwing "Ski[p" exception in the DEV 
environment. 
   
   This is mostly a feature that promotes good engineering practices such as 
being able to run the same code on different environments, wtihout changing the 
DAGs or adding artifficial logic in the DAG structure, to handle different 
execution environments. 
   
   I think this is the main case (often raised by our users in Slack). If we do 
not have hooks, the users would have to develop custom operators of different 
kind to handle different behaviours. Having the configurable hooks you can 
deploy the same DAGs for different environments without adding complex logic or 
having to add custom operators.
   
   Many of our users (and DAG writers) are Data scientists. You could 
potentially have the same effect by extending N operators and adding pre/post 
hooks there but for those users classes and objects are "alien" - they think 
functional and adding pre/post hooks is so much easier, especially that you can 
much more easliy have one common function that you pass to a number of 
different, completley unrelated operators. This is a classic 
"cross-cutting-concern" implementation IMHO.
   
   This is at least what has been my motivation when guiding and approving this 
one.
   
   Any other idea how to approach it? I am not against other solutions, and I 
would like to hear what anti-patterns it promotes (in the context I described). 
Could you tell @ash what antipattern is in play here, as I am not sure  I see 
it ?
   
   Of course if it somehow makes other parts (non-completed AIP's especially) 
much more difficult to implement. we can change the way it is implemented, but 
I think the cabability of making "cross-cutting-concern" - for example tied to 
an environment where the DAG is executed is a powerful feature that might make 
the deployment and testing DAGs at scale much easier. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-09-09 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-916395891


   Just to add one more thing @ashb - if there are other features in upcoming 
AIP's that might make it useless/obsolete/outdated/replaceable - I am all for 
it. We just simply might not be aware of some of it as we are not following it 
super closely (I tried with as much time I could spare) - but if there are 
other ways to achieve those  things mentined above or thing that they can break 
that you are aware of  - it would be great  to align here. 
   
   We might simply not know everything you and others working on new AIPs know 
:)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-09-09 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-916426991


I think I even discussed it with @malthe that time based decision will be 
better done based on AIP-39. So I agree @ashb - this is likely not a good use 
of those pre/post_hooks. see https://github.com/apache/airflow/pull/17545 
discussion . I
   
   agree timetable will be WAY better way of handling the "skip_unless"" 
"  ( and honestly I have not paid attention at the example in PR "description"  
- more at the implementation). And explicit better and implicit surely - I 
would never encourage a "logical change" in processing the DAG based on those. 
I think we are aligned here.
   
   But the cross-cutting concerns which are not time-related, but 
environment-related, I still think there is a good use of it. I actually spent 
last ~ month or so, mostly tuned to listen, analyse respond and help users in 
slack/stackoverflow/github discussion and this has been explicitly asked at 
least once (I think this is where the idea was born that it might make sense0: 
https://apache-airflow.slack.com/archives/CCR6P6JRL/p1629705521304600 but I 
also personally think that better support for more "enterprise'y" setup where 
you have "mostly the same but subtly different" environment and "same dags but 
behaving slightly differently" is something that is really useful.
   
   And just one more comment - this is just a bit easier way of doing something 
that is entirely possible today, so while I understand we might not want to 
"promote" it, just having pre/post hooks is encouraging similar patterns. Maybe 
we should rather get rid of those pre/post hooks if we want to go 'clean" as it 
promotes even "worse" patterns. For me the pre/post executed looked entirely 
unnatural and useless to be honest - up until we added the possibility of 
overriding those (which is the cross-cutting-concern I am mentioning), If 
anything, I think it makes it much cleaner and easier to manage.
   
   (Pseudo code of course)
   
   Before:
   
   ```
   def pre_hook():
raise AirflowSkipException() if DEV
   
   def MyCustomOperatorA(OperatorA):
def pre_execute():
  pre_hook()
   
   def MyCustomOperatorB(OperatorB):
def pre_execute():
  pre_hook()
   
   with DAG() as dag:.
   a = MyCustomOperatorA()
   b =  MyCustomOperatorB() 
   
   a >> b
   ```
   
   After:
   
   ```
   def pre_hook():
raise AirflowSkipException() if DEV
   
   with DAG() as dag:.
   a = OperatorA(pre_execute = pre_hook)
   b =  OperatorB(pre_execute = pre_hook)
   a >> b 
   ```
   
   But I might of course be biased, I usually think more from "development" 
point of view - so not "how easy  and clean it is to run it in produtiton" but 
"how easy it is to develop and test it in the dev/test/staging env before it 
gets into PROD". I just see DAG development as a process rather than end-goal. 
This might be very biased and maybe it can be achieved in a simpler way :). 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-09-10 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-916792442


   > I'd like to get 2.2 release train started, and in order to not delay that 
unnecessarily how about we mark this feature as experimental -- that way if we 
decide we _don't_ want it for any reason we can remove it before 3.0 (as 
otherwise it's removeal would be a breaking change under SemVer)
   
   Absolutely no problem for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-09-10 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-916793392


   We can even print warning when it is used, that it is experimental.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-09-16 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-920781011


   I think that might be indeed why @ashb insisted on making that feature 
experimental - because I see that each one of us has different "use case" in 
mind when looking at the pre/post hooks.
   
   Now when I think of it, it's really closing the gap between more 
`functional` and more `object oriented` approach of writing DAGs and I think we 
need to probably figure out a better way of doing that without this hybrid 
approach where we have Operator classes and functional extensions of them, that 
does sound like a bit of a "monster hybrid".
   
   The thing is - there is nothing "more" that feature adds - you can do 
everything you do with the pre-/post- hooks currently (also what you do 
@JavierLopezT ) but in a slightly less verbose way (with less classes and more 
functions).
   
   I do not yet how, but I have a feeling that maybe indeed we should try to 
find a bit nicer way to incorporate this in the current direction we are taking 
with more "decorator" / "task" oriented way of writing dags and make sure that 
the current operator classes ( we have sooo many useful ones) can be much 
easier integrated with the "functional" way of writing dags and incorporating 
that pre/post behaviour there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #17576: Add pre/post execution hooks

2021-12-05 Thread GitBox


potiuk commented on pull request #17576:
URL: https://github.com/apache/airflow/pull/17576#issuecomment-986363233


   Another possibly interesting use case for pre_execute: 
   
   https://apache-airflow.slack.com/archives/CCVDRN0F9/p1638740065139100
   
   
   Basically adding a "session" attribute that would authorize the call with 
PythonOperators - defined at a DAG level. Pretty useful IMHO: 
   
   ```
   def authorize(self):
 CREDENTIALS, _= 
google.auth.default(scopes=['https://www.googleapis.com/auth/cloud-platform'])
 self.session = AuthorizedSession(CREDENTIALS)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org