TL;DR; I would like to ask the community for opinion about reducing (or
even removing) the number of automated imports we have in
`airflow/__init__.py` for Airflow 2.0.

This issue is plaguing us for quite a while already and I think we have a
perfect opportunity to solve it in AIrflow 2.0. Currently our
`airflow/__init__.py` file contains the code I copied below. While looking
fairly innocent it causes a lot of problems - because importing anything
from any airflow package automatically imports probably 90% of the airflow
internal code - all models, configurations, utils, Task Instance,
BaseOperator and plenty others (also we initialise all plugins where they
are mostly not needed). What it really is - we have implicit dependencies
in our code that are causing various side effects:

   - pylint detects cyclic dependencies that are super-hard and sometimes
   impossible to remove
   - mypy and pylint are very slow - mypy parallel more is slowed down by
   having to parse whole airflow in multiple instances, and pylint cannot be
   run in parallel at all as it starts behaving randomly w/regards cyclic
   dependency detections
   - we cannot really apply pylint and type annotations to most of the core
   classes as it will add even more cyclic dependencies
   - last but not least - our CLI is really, really slow because of that -
   right now any CLI command even `airflow version` has to pull in and
   initialise all the classes. Solving that slowness is impossible without
   removing the __init__.py code

The effect of this change is that most of DAGs and plugins written so far
for 1.10.* will not be compatible with Airflow 2.0 - in all of the DAGs
import paths will have to be changed.

However as I see it - it's not a problem whatsoever. People will  have to
perform migration from 1.10.* -> 2.0 and we know it's not going to be
seamless. We are going to write some tools for the migration and changing
such import paths is super easy fix that we can automate super-easily.

I'd love to hear community opinion on that.

J.


*Current `airflow/__init__.py`:*

from typing import Callable, Optional

from airflow import utils
from airflow import settings
from airflow import version
from airflow.utils.log.logging_mixin import LoggingMixin
from airflow.configuration import conf
from airflow.exceptions import AirflowException
from airflow.models.dag import DAG

__version__ = version.version

settings.initialize()

from airflow.plugins_manager import integrate_plugins

login: Optional[Callable] = None

integrate_plugins()

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to