Hey everyone,

*TL;DR;* I wanted to discuss that maybe (not really sure- but I wanted to
check with others) we could migrate the whole codebase of airflow to
support PEP 563 - postponed evaluation of annotations. PEP 563 is nicely
supported in our case (Python 3.7+) by `from __future__ import annotations`
and pyupgrade automated pre-commit does the bulk of the work of converting
our code automatically - both current and future.

*Is there a POC?*

Yeah. I have a PR that implements it. I actually implemented it today -
literally in a few hours, in-between of other things.

* On one hand it is huge= 3625 files (+16,459 −11,989 lines)
* On the other hand it really only **really** changes type annotations (so
no real code impact) and those changes are semi-automated. I literally got
it "generated" in a couple of hours - and I can easily either rebase or
reapply it on top of new changes.

There are some spelling mistakes to fix and few tests to fix, but this
should be easy.

PR here: https://github.com/apache/airflow/pull/24816

*Why ?*

Our code gets more and more type-annotated and it's cool. But it does not
come for free (for now). The types for functions/classes are evaluated at
import time and it takes some computational effort. Also we have a number
of cases where we have "string" forward references. This is ugly and does
not get as good support for auto-completions and IDEs in general.

PEP 563 solves both problems. There is an alternative PEP 649 that could be
chosen in the future, but I think we can get some benefits now without
waiting for a decision (which even if it comes, will be only implemented in
3.11 or maybe in 3.12 and in the meantime we could get some benefits of PEP
563).

1) Runtime performance is improved in case we do not use type-hints at
runtime (we don't) - static type checkers work as usual, but at runtime
those annotations are placed in _annotations_ dictionary and evaluated only
when runtime type-hint checking is used.

2)  String forward references are not needed.
No longer `-> 'Type' when you define the type a few lines above, simple `->
Type` will do.

Those two on its own would be worth it IMHO, but there is more.

3) the new syntax of types from Python 3.10 is available and it's quite a
bit nicer when it comes to Optional usage. Example:

CLIENT_AUTH: Optional[Union[Tuple[str, str], Any]] = None

turns into:

CLIENT_AUTH: tuple[str, str] | Any | None = None

4) Interestingly enough - we do not need to do anything to use the new
syntax. The pyupgrade which we have in our pre-commit will automatically
convert the "old way" of using Optionals to the new one - as soon as you
add `from __future__ import annotations`. This also means that if anyone
uses the "old way" in the future, pre-commit will catch it and migrate -
this is actually a cool learning experience for all contributors I think as
they can learn and get usd to Python 3.10 syntax today. This conversion is
done in my PR, you can take a look.

5) It is overwhelmingly mostly backwards-compatible for Python 3.7 except
if you used runtime type annotation checking and analysis (which we did
not) except inferring multiple outputs in decorators (literally single
usage in the entire codebase). And the way we use it seems to be able to
continue to work after we add some small fixes - I think just our tests
need to be fixed (I think  9 of the failing tests to fix are falling into
this camp).

*Caveats*

Almost none since we are already Python 3.7+. Basically each of our
python files must start with:

from __future__ import annotations

Performance gets improved for imports. The PEP
https://peps.python.org/pep-0563/ claims that the overhead for the runtime
typing information needed is very small (memory wise) and since we are
almost not using runtime type-hint evaluation memory usage will be lower
because the type hints will not be evaluated at all during runtime.

And similarly, like we do not care about licenses any more (because they
are added automatically) by pre-commit, it could be the same with the
__future__ import.

PEP 649:

The caveat is  that https://peps.python.org/pep-0649/ - alternative
approach to deferring annotation is still being discussed and PEP 563 is
not confirmed.. PEP 649 is the reason why PEP 563 wasr removed from Python
3.10 - because there is no decision which one will go forward in the
future. Discussion here:
https://discuss.python.org/t/type-annotations-pep-649-and-pep-563/11363

However, one could think we are not really affected too much and even if we
have to change for 3.11 it will be as simple. We do not use some heavy
runtime type-hinting checks, and PEP 0649 `from __future__ import
co_annotations' seems to be an easy switch.

*How complex is it to migrate*

It is simple. We Just merge https://github.com/apache/airflow/pull/24816
and maybe add a pre-commit to force the __future__ import in all our files.
That's it.

I did it in a couple of hours and I think it is all that we need - the PR I
did could likely be merged today and it will be complete. Adding the
"__future__" import is very easily automate-able and `pyupgrade` does all
the rest. There is one exclusion that we have to make with
pyupgrade (gcs.py has "list()" method that interferes with list Type).

There were also a few type fixes that had to be done - but nothing major.

That's about it. No drama. Some tests/docs need fixing but looking at the
changes - very few. We can add pre-commits guarding it.

*Options*

In the PR I added the 'from __future__ import annotations` to all files
(semi-automatically). But it does not have to be like that. One can think
that this introduces some "noise". There are plenty of files that do not
need it currently.

On the other hand there is a nice "property" of having it in all the files:

1) We can even add it in the pre-commit to be injected automatically
(should be simple).
2) Pyupgrade will automatically upgrade any future "forward" refs and
optionals for us
3) we can keep consistent style of typing everywhere and remove __futures__
when we get to Python 3.10

But we can choose to do half-the way - for example do not add __future__ in
empty __init__.py files. Or something similar.

I am not 100% convinced myself, but maybe

WDYT? Anyone has experience with it? bad/good one?

J.

Reply via email to