potiuk commented on PR #35097: URL: https://github.com/apache/airflow/pull/35097#issuecomment-1773816055
Yeah but it does matter and I think we should explain it to educate the users. Similarly as the user who asked the question about PEP8 conflicting with our advice, they might not realize they are heavily impacting performance of scheduler/DAG file processor. Over the last few years or so @uranusjr spend enormous effort on moving a number of imports of ours to make sure airflow imports faster, shaving 100s of milliseconds sometimes precisely because it **actually** matters. I also added answer to the same question in Slack of our (seems that the author have been asking in multiple places) https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1697819766612259 > Things evolve and PEP8 written 5th of July 2001 have not foreseen many of things that happened during those 22 years > Look at this Ruff rule for example: https://docs.astral.sh/ruff/rules/banned-module-level-imports/ > And Ruff is the "de-facto" linter for Python that took Python world by the storm and provides way more linting capabilities than PEP8 did > So if you compare the two, you have the choice of following 22 years old rule for that or something that became the de-facto standard over the last 2 years or so. Up to you, but our advice is to follow the modern trends. So while PEP8 - years ago was good advice, it seems that Python community recognised that and "top level imports for expensive modules" is the recommended practice (and explicitly showing how to do it makes sense IMHO). The advice from Ruff authors mentions `torch` and `tensorflow` - both being expensive and popular in data science world. So why don't we simply clarify it, for example this way - if we would like to give really precise advice to our users (I took it from some of our example DAGs) ```python # It's ok to import modules that are not expensive to load at top-level of DAG file import random import pendulum # Expensive imports should be avoided as top level imports because DAG is imported frequently # even if it does not follow PEP8 advice (PEP8 have not foreseen that certain imports will be very expensive) # DON'T DO THAT - import them locally instead (see below) # # import pandas # import torch # import tensorflow # @task() def do_stuff_with_pandas_and_torch(): import pandas import torch do_some_operations_using_these @task() def do_stuff_with_tensorflow(): import tensorflow do_some_operations_using_these ..... Would you find this confusing ? I think it would be rather help our users with decisions how to write the DAGs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org