I'm massively in favour of this. And as a side effect it would solve an issue a 
reports almost two years ago https://issues.apache.org/jira/browse/AIRFLOW-1931 
(https://issues.apache.org/jira/browse/AIRFLOW-1931?jql=project%20%3D%20AIRFLOW%20AND%20text%20~%20%22logging%20import%22)

The one outstanding question is how/where we move settings.initialize and 
integrate_plugins to. I'm specifically thinking of usecases outside of someone 
running an airflow subcommand, such as in tests, where you want airflow to be 
initialized.
Perhaps:
import airflow; airflow.initialize()
Or I wonder if we need that at all? Things sould maybe integrate plugins when 
they need to (by making a property/method somewhere that is memoized) and 
likewise in settings? Callers not having to do this would be nicer, certainly.
-a
On Feb 15 2020, at 12:31 pm, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
> TL;DR; I would like to ask the community for opinion about reducing (or
> even removing) the number of automated imports we have in
> `airflow/__init__.py` for Airflow 2.0.
>
> This issue is plaguing us for quite a while already and I think we have a
> perfect opportunity to solve it in AIrflow 2.0. Currently our
> `airflow/__init__.py` file contains the code I copied below. While looking
> fairly innocent it causes a lot of problems - because importing anything
> from any airflow package automatically imports probably 90% of the airflow
> internal code - all models, configurations, utils, Task Instance,
> BaseOperator and plenty others (also we initialise all plugins where they
> are mostly not needed). What it really is - we have implicit dependencies
> in our code that are causing various side effects:
>
> - pylint detects cyclic dependencies that are super-hard and sometimes
> impossible to remove
> - mypy and pylint are very slow - mypy parallel more is slowed down by
> having to parse whole airflow in multiple instances, and pylint cannot be
> run in parallel at all as it starts behaving randomly w/regards cyclic
> dependency detections
> - we cannot really apply pylint and type annotations to most of the core
> classes as it will add even more cyclic dependencies
> - last but not least - our CLI is really, really slow because of that -
> right now any CLI command even `airflow version` has to pull in and
> initialise all the classes. Solving that slowness is impossible without
> removing the __init__.py code
>
> The effect of this change is that most of DAGs and plugins written so far
> for 1.10.* will not be compatible with Airflow 2.0 - in all of the DAGs
> import paths will have to be changed.
>
> However as I see it - it's not a problem whatsoever. People will have to
> perform migration from 1.10.* -> 2.0 and we know it's not going to be
> seamless. We are going to write some tools for the migration and changing
> such import paths is super easy fix that we can automate super-easily.
>
> I'd love to hear community opinion on that.
> J.
>
> *Current `airflow/__init__.py`:*
> from typing import Callable, Optional
> from airflow import utils
> from airflow import settings
> from airflow import version
> from airflow.utils.log.logging_mixin import LoggingMixin
> from airflow.configuration import conf
> from airflow.exceptions import AirflowException
> from airflow.models.dag import DAG
>
> __version__ = version.version
> settings.initialize()
> from airflow.plugins_manager import integrate_plugins
> login: Optional[Callable] = None
> integrate_plugins()
> --
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to