> - We may not want to make a strong statement like "all executors are public". > It's just
> impossible IMHO. > - Let's just mark a limited number of key methods/interfaces in each executor > as public, > and we ensure a strong backcompat for them. For any methods/interfaces not > marked, no > backcompat guaranteed even between minor versions. This should also be added > into > documentation, as well as highlighted in the executor docstrings. If you missed something, then so did I. I think that is an accurate summary. ________________________________ From: Xiaodong Deng <xdd...@apache.org> Sent: Thursday, January 26, 2023 9:13 PM To: dev@airflow.apache.org Subject: RE: [EXTERNAL][DISCUSSION] Assessing what is a breaking change for Airflow (SemVer context) CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Happy to see the change I made was used as an example here ;-) Honestly the discussion is a bit long and getting increasingly general, so a bit hard to follow for me. But if I understand the conversation & issue correctly, here are my thoughts: - We may not want to make a strong statement like "all executors are public". It's just impossible IMHO. - Let's just mark a limited number of key methods/interfaces in each executor as public, and we ensure a strong backcompat for them. For any methods/interfaces not marked, no backcompat guaranteed even between minor versions. This should also be added into documentation, as well as highlighted in the executor docstrings. I believe I must have missed some important points in the conversation, or even misunderstood some points. But just summarizing what I have understood as well as what I would prefer. Regards, XD On Thu, Jan 26, 2023 at 9:45 AM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote: I understand it's "convenient" if they want to extend k8s executor. But I guess my thought is, I'm not sure that convenience outweighs the competing good which is, freedom to develop k8s executor. I'm not saying don't extend k8s executor -- please do, and contribute it back. But for subclassing / extending, that feels more like "you're on your own", in the sense of, "just vendor the executor". For example in a recent PR, XD added better support for running k8s executor with multiple namespace. Now instead of a single `watcher` on k8s executor we have a `watchers` dict<https://github.com/apache/airflow/pull/28047/files#diff-681de8974a439f70dfa41705f5c1681ecce615fac6c4c715c1978d28d8f0da84L254>. I suppose it's "technically" breaking, in a subclass sense. I don't really think that's the kind of backcompat we should worry about when making changes to k8s executor. Certainly the user-facing executor behavior should follow backcompat guarantees. For example we can't just change the name of `pod_override` (in executor config) since that would break user workflows. Similar, with a conf setting... renaming it without backcompat etc. Where the dag writer "feels" the executor, it should respect semver, certainly. But I'm not sure we need to guarantee backcompat of its internal methods. Another real example: Jed right now is trying to rename pod_id to pod_name, a sensible cleanup. But k8s internals are "public", we can't really do that. And we could imagine lots of grey areas. Thought experiment: maybe we want to move some method call out of `sync` and run it periodically in a thread. That would change what `sync` does and if a subclass depends on that, well I guess we've broken that subclass. WDYT? On Thu, Jan 26, 2023 at 12:21 AM Jarek Potiuk <ja...@potiuk.com<mailto:ja...@potiuk.com>> wrote: Yeah. The PR was mostly to bring things together- i.e to be able to say "this is really a public interface of ours". And by just doing it, I was actually hoping it will spark those kind of discussions where we will really detail on what it means for each of those to be public. Having it written down and part of our docs makes it easy to spark such discussion by pointing to it and asking "do we really mean that or that?" I think once we make a few passes and polish it even more and get to 2.6 release when this documentation part will be first time published, we **could** vote on it before (or maybe lazy-consent if we won't see many doubts raised). Whether it's an AIP - I don't think so, It's mostly about things that are there that our "intentions" are as public. It's not a legally binding document (I don't think anyone can make a legally binding backwards compatibility statement), it's merely a statement of intent of the community that our users can treat like something that they can build on when extending Airflow. And understand the rules and expectations when they look for details of each of the "components" we publish. Coming back to the original question: I personally don't think we should limit just Base Executor interface - because I can imagine there will be enough cases where extending existing Kubernetes Executor will be far easier than just using the interface and our users should understand some of the expectations they might have when doing so. One of the ways we could make our intentions clear there is to describe and follow the Python rules for what's internal and what's public. (under or even dunder in front of the class/method name). This is not "entirely" clear (protected vs private is somewhat blurry here). But maybe we could agree on some rules that we will apply in all such "public" interfaces of ours. We could do it with the executor classes and methods that we do not "intend" to be extended and rename them with the under or dunder in front) Also It can be implemented in one of two ways: 1) mark the methods/classes that we consider as "changeable" via adding dunder in front and describe it to the users so that they can understand what the rules are and make the right decisions 2) make it a bit more formal, automate and easier to use by the users in automated way - just follow what we have done with "common.sql" where we implemented a process to generate and control generated .pyi stubs: Pre-commit that takes care of it: https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py Example generated stub file: https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.pyi Recently, we've gone through some first "teething" problems with the "stubgen" approach and while it is not perfect, we managed to bend it a bit to our expectations even if the original stubgen generator had some flaws. And I think we would be able to generalise the common.sql approach to all kinds of public interfaces we have. One problem with the stub approach is that it's hard to distinguish "internal" vs. "external" use of some of the interfaces, However in our case, rather than generate the .pyi files and storing it in the code of airflow, we could simply generate and publish the "typeshed" interface for Apache Airflow which will ONLY contain the "externally public" interfaces (https://github.com/python/typeshed) . What typeshed is (from the typeshed github): "" Typeshed contains external type annotations for the Python standard library and Python builtins, as well as third party packages as contributed by people external to those projects. This data can e.g. be used for static analysis, type checking or type inference. """ This way a user who wishes to extend airflow could simply install the `types-apache-airflow==2.6.0` and this will give a very strong base on understanding what is / is not public - usable in automated way I think it would not be complex to generate such a typeshed package if we go that direction and it would help us also to protect from accidental changes (same as we do in common.sql - by automating the part where we check if the generated interface has not changed in backwards-incompatible way). J. On Thu, Jan 26, 2023 at 12:28 AM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote: > > Following up here... that PR has been merged.... At some point we should > probably have a vote on that, if it's meant to be actual binding policy. > Maybe we're still feeling it out? But "what is public" is a pretty > fundamental concern to the project. Maybe such a policy itself should be an > AIP? > > Meanwhile, going down into the weeds a bit, there's one aspect of the doc > which came up here: > https://github.com/apache/airflow/pull/29147#discussion_r1086214750 > > It quotes the "policy": > > > Airflow has a set of Executors that are considered public. You are free to > > extend their functionality > > Do we really mean that all executors are public? Or do we just mean that the > executor interface is public, i.e. BaseExecutor? > > It's of course better for the healthy development of our built-in executors > if we have no backcompat guarantees and can change them as needed. But yes > this means that anyone else "building on" our executors would be wise to > "vendor in" our executor code instead of just subclassing. Maybe it's a > reasonable assumption that any user with the sophistication and, perhaps, > chutzpah to customize one of our executors, has also the sophistication to > manage this. > > What do y'all think? > > > > >