Re: [PR] Replace numpy example with practical exercise demonstrating top-level code [airflow]

via GitHub Mon, 23 Oct 2023 02:44:55 -0700


potiuk commented on PR #35097:
URL: https://github.com/apache/airflow/pull/35097#issuecomment-1774812370


   > FWIW I’ve been wondering if it’s worthwhile to implement some magic in 
Dagprocessor to automatically move imports inside task functions so people can 
write DAG files “normally” but receive the function-level import benefits.
   
   That would be rather super-magical if we manage to pull it off IMHO. I thin 
the **most** we should do is to detect and warn such expensive imports (which 
BTW. I think is a good idea) - but manipulating the sources or bytecode of the 
DAG files written by the user is very dangerous, Not only it will change line 
numbers for debugging but there are a number of edge cases - for example user 
might **really** have a good reason to import even expensive imports at module 
(top) level.
   
   There are also all the "transition" cases - DAG imports a utility code that 
imports tensorflow. This is equally expensive (the utility import is). Should 
we move the whole utility import to inside a task? Which task? Maybe the 
utility method also initializes some code that might be needed for all tasks 
(like setting variables needed to authenticate inside the organisation) etc. 
etc.  We really do not want to get involved in those.
   
   But we **could**potentially warn if we see an import that we consider as 
"expensive" after DAG is parsed and warn the user. That would be very simple to 
impement and nice feature I think. We could even likely consider measuring (by 
some smarts or monkeypatching of python stdlib code I think) the time it takes 
to do imports and automatically flag imports that take (say) > 0.2s. Why not ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Replace numpy example with practical exercise demonstrating top-level code [airflow]

Reply via email to