Re: [DISCUSSION] Assessing what is a breaking change for Airflow (SemVer context)

Jarek Potiuk Thu, 26 Jan 2023 00:21:24 -0800

Yeah. The PR was mostly to bring things together- i.e to be able to
say "this is really a public interface of ours". And by just doing it,
I was actually hoping it will spark those kind of discussions where we
will really detail on what it means for each of those to be public.

Having it written down and part of our docs makes it easy to spark
such discussion by pointing to it and asking "do we really mean that
or that?"

I think once we make a few passes and polish it even more and get to
2.6 release when this documentation part will be first time published,
we **could** vote on it before (or maybe lazy-consent if we won't see
many doubts raised).

Whether it's an AIP - I don't think so, It's mostly about things that
are there that our "intentions" are as public. It's not a legally
binding document (I don't think anyone can make a legally binding
backwards compatibility statement), it's merely a statement of intent
of the community that our users can treat like something that they can
build on when extending Airflow. And understand the rules and
expectations when they look for details of each of the "components" we
publish.

Coming back to the original question:

I personally don't think we should limit just Base Executor interface
- because I can imagine there will be enough cases where extending
existing Kubernetes Executor will be far easier than just using the
interface and our users should understand some of the expectations
they might have when doing so.

One of the ways we could make our intentions clear there is to
describe and follow the Python rules for what's internal and what's
public. (under or even dunder in front of the class/method name). This
is not "entirely" clear (protected vs private is somewhat blurry
here). But maybe we could agree on some rules that we will apply in
all such "public" interfaces of ours. We could do it with the executor
classes and methods that we do not "intend" to be extended and rename
them with the under or dunder in front)

Also It can be implemented in one of two ways:

1) mark the methods/classes that we consider as "changeable" via
adding dunder in front and describe it to the users so that they can
understand what the rules are and make the right decisions
2) make it a bit more formal, automate and easier to use by the users
in automated way - just follow what we have done with "common.sql"
where we implemented a process to generate and control generated .pyi
stubs:

Pre-commit that takes care of it:
https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py
Example generated stub file:
https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.pyi

Recently, we've gone through some first "teething" problems with the
"stubgen" approach and while it is not perfect, we managed to bend it
a bit to our expectations even if the original stubgen generator had
some flaws. And I think we would be able to generalise the common.sql
approach to all kinds of public interfaces we have.

One problem with the stub approach is that it's hard to distinguish
"internal" vs. "external" use of some of the interfaces, However in
our case, rather than generate the .pyi files and storing it in the
code of airflow, we could simply generate and publish the "typeshed"
interface for Apache Airflow which will ONLY contain the "externally
public" interfaces (https://github.com/python/typeshed) . What
typeshed is (from the typeshed github):

""
Typeshed contains external type annotations for the Python standard
library and Python builtins, as well as third party packages as
contributed by people external to those projects.
This data can e.g. be used for static analysis, type checking or type inference.
"""

This way a user who wishes to extend airflow could simply install the
`types-apache-airflow==2.6.0`  and this will give a very strong base
on understanding what is / is not public  - usable in automated way

I think it would not be complex to generate such a typeshed package if
we go that direction and it would help us also to protect from
accidental changes (same as we do in common.sql - by automating the
part where we check if the generated interface has not changed in
backwards-incompatible way).

J.

On Thu, Jan 26, 2023 at 12:28 AM Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:
>
> Following up here... that PR has been merged.... At some point we should 
> probably have a vote on that, if it's meant to be actual binding policy.  
> Maybe we're still feeling it out?  But "what is public" is a pretty 
> fundamental concern to the project.  Maybe such a policy itself should be an 
> AIP?
>
> Meanwhile, going down into the weeds a bit, there's one aspect of the doc 
> which came up here: 
> https://github.com/apache/airflow/pull/29147#discussion_r1086214750
>
> It quotes the "policy":
>
> > Airflow has a set of Executors that are considered public. You are free to 
> > extend their functionality
>
> Do we really mean that all executors are public?  Or do we just mean that the 
> executor interface is public, i.e. BaseExecutor?
>
> It's of course better for the healthy development of our built-in executors 
> if we have no backcompat guarantees and can change them as needed.  But yes 
> this means that anyone else "building on" our executors would be wise to 
> "vendor in" our executor code instead of just subclassing. Maybe it's a 
> reasonable assumption that any user with the sophistication and, perhaps, 
> chutzpah to customize one of our executors, has also the sophistication to 
> manage this.
>
> What do y'all think?
>
>
>
>
>

Re: [DISCUSSION] Assessing what is a breaking change for Airflow (SemVer context)

Reply via email to