potiuk commented on PR #35097:
URL: https://github.com/apache/airflow/pull/35097#issuecomment-1773816055

   Yeah but it does matter and I think we should explain it to educate the 
users. Similarly as the user who asked the question about PEP8 conflicting with 
our advice, they might not realize they are heavily impacting performance of 
scheduler/DAG file processor. 
   
   Over the last few years or so @uranusjr spend enormous effort on moving a 
number of imports of ours to make sure airflow imports faster, shaving 100s of 
milliseconds sometimes precisely because it **actually** matters. 
   
   I also added answer to the same question in Slack of our (seems that the 
author have been asking in multiple places)
   https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1697819766612259
   
   > Things evolve and PEP8 written 5th of July 2001 have not foreseen many of 
things that happened during those 22 years
   > Look at this Ruff rule for example: 
https://docs.astral.sh/ruff/rules/banned-module-level-imports/
   > And Ruff is the "de-facto" linter for Python that took Python world by the 
storm and provides way more linting capabilities than PEP8 did
   > So if you compare the two, you have the choice of following 22 years old 
rule for that or something that became the de-facto standard over the last 2 
years or so. Up to you, but our advice is to follow the modern trends.
   
   
   So while PEP8 - years ago was good advice, it seems that Python community 
recognised that and "top level imports for expensive modules" is the 
recommended practice (and explicitly showing how to do it makes sense IMHO).
   
   The advice from Ruff authors mentions `torch` and `tensorflow` - both being 
expensive and popular in data science world. 
   
    So why don't we simply clarify it, for example this way - if we would like 
to give really precise advice to our users (I took it from some of our example 
DAGs)
   
   ```python
   
   # It's ok to import modules that are not expensive to load at top-level of 
DAG file
   import random
   import pendulum
   
   # Expensive imports should be avoided as top level imports because DAG is 
imported frequently
   # even if it does not follow PEP8 advice (PEP8 have not foreseen that 
certain imports will be very expensive)
   # DON'T DO THAT - import them locally instead (see below)
   #
   # import pandas
   # import torch
   # import tensorflow
   #
   
         @task()
         def do_stuff_with_pandas_and_torch():
             import pandas
             import torch 
             do_some_operations_using_these
   
   
         @task()
         def do_stuff_with_tensorflow():
             import tensorflow
             do_some_operations_using_these
   .....
   
   
   Would you find this confusing ? I think it would be rather help our users 
with decisions how to write the DAGs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to