I'm massively in favour of this. And as a side effect it would solve an issue a reports almost two years ago https://issues.apache.org/jira/browse/AIRFLOW-1931 (https://issues.apache.org/jira/browse/AIRFLOW-1931?jql=project%20%3D%20AIRFLOW%20AND%20text%20~%20%22logging%20import%22)
The one outstanding question is how/where we move settings.initialize and integrate_plugins to. I'm specifically thinking of usecases outside of someone running an airflow subcommand, such as in tests, where you want airflow to be initialized. Perhaps: import airflow; airflow.initialize() Or I wonder if we need that at all? Things sould maybe integrate plugins when they need to (by making a property/method somewhere that is memoized) and likewise in settings? Callers not having to do this would be nicer, certainly. -a On Feb 15 2020, at 12:31 pm, Jarek Potiuk <jarek.pot...@polidea.com> wrote: > TL;DR; I would like to ask the community for opinion about reducing (or > even removing) the number of automated imports we have in > `airflow/__init__.py` for Airflow 2.0. > > This issue is plaguing us for quite a while already and I think we have a > perfect opportunity to solve it in AIrflow 2.0. Currently our > `airflow/__init__.py` file contains the code I copied below. While looking > fairly innocent it causes a lot of problems - because importing anything > from any airflow package automatically imports probably 90% of the airflow > internal code - all models, configurations, utils, Task Instance, > BaseOperator and plenty others (also we initialise all plugins where they > are mostly not needed). What it really is - we have implicit dependencies > in our code that are causing various side effects: > > - pylint detects cyclic dependencies that are super-hard and sometimes > impossible to remove > - mypy and pylint are very slow - mypy parallel more is slowed down by > having to parse whole airflow in multiple instances, and pylint cannot be > run in parallel at all as it starts behaving randomly w/regards cyclic > dependency detections > - we cannot really apply pylint and type annotations to most of the core > classes as it will add even more cyclic dependencies > - last but not least - our CLI is really, really slow because of that - > right now any CLI command even `airflow version` has to pull in and > initialise all the classes. Solving that slowness is impossible without > removing the __init__.py code > > The effect of this change is that most of DAGs and plugins written so far > for 1.10.* will not be compatible with Airflow 2.0 - in all of the DAGs > import paths will have to be changed. > > However as I see it - it's not a problem whatsoever. People will have to > perform migration from 1.10.* -> 2.0 and we know it's not going to be > seamless. We are going to write some tools for the migration and changing > such import paths is super easy fix that we can automate super-easily. > > I'd love to hear community opinion on that. > J. > > *Current `airflow/__init__.py`:* > from typing import Callable, Optional > from airflow import utils > from airflow import settings > from airflow import version > from airflow.utils.log.logging_mixin import LoggingMixin > from airflow.configuration import conf > from airflow.exceptions import AirflowException > from airflow.models.dag import DAG > > __version__ = version.version > settings.initialize() > from airflow.plugins_manager import integrate_plugins > login: Optional[Callable] = None > integrate_plugins() > -- > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> >