potiuk commented on a change in pull request #6596: [AIRFLOW-6004] Untangle 
Executors class to avoid cyclic imports. Depends on [AIRFLOW-6010]
URL: https://github.com/apache/airflow/pull/6596#discussion_r347606197
 
 

 ##########
 File path: tests/dags/test_subdag.py
 ##########
 @@ -24,7 +24,7 @@
 
 from datetime import datetime, timedelta
 
-from airflow.models import DAG
+from airflow.models.dag import DAG
 
 Review comment:
   I think the main reason for our disagreement here is because (in this case) 
I think more about contributors (especially first-time ones) and you think more 
about DAG developers. This is the old "library" vs "application" thing. And we 
are both kind of right (but with different users in mind).
   
   I agree that users should have a stable place to import DAG and 
AirflowException from (in this view airflow is a library), while it confuses 
developers (and might easily lead to circular imports).
   
   Let's try to find some approach here that will be good for both. I split the 
discussion into several smaller subjects to make it more productive.
   
   ## Finding imports
   
   I know not everyone ;) likes IDEs, but a lot of people does. This is what it 
looks like when I try to (in PyCharm) import AirflowException:
   
   Before my change:
   
   ![Screenshot 2019-11-18 at 20 58 
19](https://user-images.githubusercontent.com/595491/69085738-6c594680-0a46-11ea-9da0-a6356a575b5f.png)
   
   And after:
   
   ![Screenshot 2019-11-18 at 20 59 
01](https://user-images.githubusercontent.com/595491/69086124-84c96100-0a46-11ea-9950-786d76d0efaa.png)
   
   The class that throws deprecation warning is not even shown as importable 
and I know exactly which class I have to import and i even get a chance to 
import it locally. Same with DAG.  It's a nuance and not something must-have, 
but it's nice.
   
   ## Result of the current situation
   
   I can assure that myself (and many other people) got confused at this point 
- which is the place we should import DAG from. And it's hard data, not 
guessing: 
   
   ```
   [potiuk:~/code/airflow] remove-dag-out-of-airflow-package-import+ 3d6h54m19s 
± git show HEAD | grep 'from airflow import DAG' | grep ^- | wc
         80     346    2219
   [potiuk:~/code/airflow] remove-dag-out-of-airflow-package-import+ 3d6h54m23s 
± git show HEAD | grep 'from airflow.models import DAG' | grep ^- | wc
         97     507    4233
   ```
   We have now 80 places where someone used `from airflow import DAG` and 97 
where people used `from airflow.models import DAG`.  Maybe we have some rules, 
but they are neither documented nor enforced.
   
   Can you tell  which way is better? Are you sure people know consequences of 
using them and the circular imports they might create this way ? I really, 
really doubt this. As a new user I cannot reason about the distinction. I am 
confused why those different import path exist. I have no idea which import 
should I use in my code. Seems like people randomly choose the import they 
should use without knowing the consequences. Certainly from an 
"application/contributors" point of view it's not good.
   
   ## DAG developer's perspective
   
   > I'm not a fan of this as it would require almost every single DAG that 
exists right now to be changed.
   
   It does not require them to change - it's merely deprecation warning. But 
indeed it encourages them to use the models.dag package rather than 
airflow.DAG. And yes - I agree that having a single place to import for is good 
for airflow-as-a-library but not necessary good for airflow-as-an-application . 
   
   So maybe we should do something else instead. Mabe the deprecation warning 
should only be thrown if you are importing airflow from within airflow core 
code itself? I will certainly look for a solution here - maybe you can also 
think of something? And it would be ideal the other way round - if the DAG 
developers were discouraged to import the airflow.models.dag DAG.
   
   And thinking more about it in the context of AIP-21 - I think those circular 
imports are only a problem of the "core" of airflow and that the "providers" 
part should be treated differently. Especially that we have to be careful about 
backporting. Since we plan all the providers package to be 1.10.* installable, 
they should use airflow.DAG likely as import. 
   
   I hope you can agree with me that it makes sense for Airflow "core" in 
airflow repository use a single, direct import (airflow.models.dag.DAG) where 
circular imports will be least likely. And then "providers" might use the 
"airflow.DAG" for backwards compatibility (and serving as model for other 
integrations). Then I yet have to see if I can make deprecation works in the 
way I want.
   
   I think pre-commits + AIP-21 are a good place to start - we could enforce 
different rules for different parts of the code in fact and educate people this 
way.
   
   WDYT @ashb?
   
   ## Avoiding cyclic imports 
   
   > We can manage the cyclic imports by explicitly moving some imports to the 
end of airflow/__init__.py
   
   Which ones? Do you know by heart? I am utterly confused when it comes to the 
sequence of imports there and I always have to go through many files and 
imports to understand what's going on -> airflow imports executors -> import 
kubernetes executor -> import pod -> import pod launcher -> import baseoperator 
-> import dag  -> import airflow ....... Good import structure should be in 
fact a .... DAG (!) .... Only then you can start reasoning about it :). 
   
   ## Importing 'airflow' package first 
   
   > Once import airflow has happened then all the packages are loaded an 
import order doesn't matter, so we can handle this all "internally" (which we 
already do via our unit tests). If it's working right now, then no matter what 
order it is imported by the user no import cycles are possible is my 
understanding.
   
   But I think we should really be able to not airflow first. I had really hard 
time trying to fix and correct some tests precisely because of the "import 
airflow" first. That's an anti-pattern especially that in tests we had to 
reload the configuration as it has already been loaded and we had to reload it. 
The way we run it now is really bad for tests.
   
   There is some initialisation magic happening in __init__ which then requires 
you to reload configuration in a specific way and some tests require this, some 
not. I know also @nuclearpinguin had a lot of problems trying to fix it in 
pytest context (and I had it for a long time at many opportunities). 
   
   In an ideal world there should be no requirement to import airflow first 
(eventually). We should be able to run some initialisation method or create an 
object to bootstrap airflow but importing airflow should not have this 
initialisation side-effect (it's really a side-effect).
   
   Running code in __init__.py is, I think, good for library (when you have to 
perform certain initialisations with however loads the library) but it's not 
really good for application such as airflow where you have a few clear 
entry-points and you can run the initialisation explicitly.
   
   I hope that one day we will get there - and this import restructuring might 
be one of the steps that could get us there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to