Yep. I see your point Daniel. And yes. I think you are right it might be too much to add back-compat guarantees to all executors.
After a bit of thinking on it, I am perfectly fine with clarification that "Executor interface" is public but "K8S executor" internal details might be changed any time. Happy to merge such clarification to the docs - maybe someone can phrase it well. XD Yeah, your summary is perfect. J. On Fri, Jan 27, 2023 at 6:14 AM Xiaodong Deng <[email protected]> wrote: > Happy to see the change I made was used as an example here ;-) > > Honestly the discussion is a bit long and getting increasingly general, so > a bit hard to follow for me. But if I understand the conversation & issue > correctly, here are my thoughts: > > - We may not want to make a strong statement like "*all executors are > public*". It's just impossible IMHO. > - Let's just mark a limited number of key methods/interfaces in each > executor as public, and we ensure a strong backcompat for them. For any > methods/interfaces not marked, no backcompat guaranteed even between minor > versions. This should also be added into documentation, as well as > highlighted in the executor docstrings. > > I believe I must have missed some important points in the conversation, or > even misunderstood some points. But just summarizing what I have understood > as well as what I would prefer. > > > Regards, > XD > > On Thu, Jan 26, 2023 at 9:45 AM Daniel Standish > <[email protected]> wrote: > >> I understand it's "convenient" if they want to extend k8s executor. But >> I guess my thought is, I'm not sure that convenience outweighs the >> competing good which is, freedom to develop k8s executor. I'm not saying >> don't extend k8s executor -- please do, and contribute it back. But for >> subclassing / extending, that feels more like "you're on your own", in the >> sense of, "just vendor the executor". >> >> For example in a recent PR, XD added better support for running k8s >> executor with multiple namespace. Now instead of a single `watcher` on k8s >> executor we have a `watchers` dict >> <https://github.com/apache/airflow/pull/28047/files#diff-681de8974a439f70dfa41705f5c1681ecce615fac6c4c715c1978d28d8f0da84L254>. >> I suppose it's "technically" breaking, in a subclass sense. I don't really >> think that's the kind of backcompat we should worry about when making >> changes to k8s executor. >> >> Certainly the user-facing executor *behavior *should follow backcompat >> guarantees. For example we can't just change the name of `pod_override` >> (in executor config) since that would break user workflows. Similar, with >> a conf setting... renaming it without backcompat etc. Where the dag writer >> "feels" the executor, it should respect semver, certainly. But I'm not >> sure we need to guarantee backcompat of its internal methods. >> >> Another real example: Jed right now is trying to rename pod_id to >> pod_name, a sensible cleanup. But k8s internals are "public", we can't >> really do that. And we could imagine lots of grey areas. Thought >> experiment: maybe we want to move some method call out of `sync` and run it >> periodically in a thread. That would change what `sync` does and if a >> subclass depends on that, well I guess we've broken that subclass. >> >> WDYT? >> >> >> On Thu, Jan 26, 2023 at 12:21 AM Jarek Potiuk <[email protected]> wrote: >> >>> Yeah. The PR was mostly to bring things together- i.e to be able to >>> say "this is really a public interface of ours". And by just doing it, >>> I was actually hoping it will spark those kind of discussions where we >>> will really detail on what it means for each of those to be public. >>> >>> Having it written down and part of our docs makes it easy to spark >>> such discussion by pointing to it and asking "do we really mean that >>> or that?" >>> >>> I think once we make a few passes and polish it even more and get to >>> 2.6 release when this documentation part will be first time published, >>> we **could** vote on it before (or maybe lazy-consent if we won't see >>> many doubts raised). >>> >>> Whether it's an AIP - I don't think so, It's mostly about things that >>> are there that our "intentions" are as public. It's not a legally >>> binding document (I don't think anyone can make a legally binding >>> backwards compatibility statement), it's merely a statement of intent >>> of the community that our users can treat like something that they can >>> build on when extending Airflow. And understand the rules and >>> expectations when they look for details of each of the "components" we >>> publish. >>> >>> Coming back to the original question: >>> >>> I personally don't think we should limit just Base Executor interface >>> - because I can imagine there will be enough cases where extending >>> existing Kubernetes Executor will be far easier than just using the >>> interface and our users should understand some of the expectations >>> they might have when doing so. >>> >>> One of the ways we could make our intentions clear there is to >>> describe and follow the Python rules for what's internal and what's >>> public. (under or even dunder in front of the class/method name). This >>> is not "entirely" clear (protected vs private is somewhat blurry >>> here). But maybe we could agree on some rules that we will apply in >>> all such "public" interfaces of ours. We could do it with the executor >>> classes and methods that we do not "intend" to be extended and rename >>> them with the under or dunder in front) >>> >>> Also It can be implemented in one of two ways: >>> >>> 1) mark the methods/classes that we consider as "changeable" via >>> adding dunder in front and describe it to the users so that they can >>> understand what the rules are and make the right decisions >>> 2) make it a bit more formal, automate and easier to use by the users >>> in automated way - just follow what we have done with "common.sql" >>> where we implemented a process to generate and control generated .pyi >>> stubs: >>> >>> Pre-commit that takes care of it: >>> >>> https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py >>> Example generated stub file: >>> >>> https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.pyi >>> >>> Recently, we've gone through some first "teething" problems with the >>> "stubgen" approach and while it is not perfect, we managed to bend it >>> a bit to our expectations even if the original stubgen generator had >>> some flaws. And I think we would be able to generalise the common.sql >>> approach to all kinds of public interfaces we have. >>> >>> One problem with the stub approach is that it's hard to distinguish >>> "internal" vs. "external" use of some of the interfaces, However in >>> our case, rather than generate the .pyi files and storing it in the >>> code of airflow, we could simply generate and publish the "typeshed" >>> interface for Apache Airflow which will ONLY contain the "externally >>> public" interfaces (https://github.com/python/typeshed) . What >>> typeshed is (from the typeshed github): >>> >>> "" >>> Typeshed contains external type annotations for the Python standard >>> library and Python builtins, as well as third party packages as >>> contributed by people external to those projects. >>> This data can e.g. be used for static analysis, type checking or type >>> inference. >>> """ >>> >>> This way a user who wishes to extend airflow could simply install the >>> `types-apache-airflow==2.6.0` and this will give a very strong base >>> on understanding what is / is not public - usable in automated way >>> >>> I think it would not be complex to generate such a typeshed package if >>> we go that direction and it would help us also to protect from >>> accidental changes (same as we do in common.sql - by automating the >>> part where we check if the generated interface has not changed in >>> backwards-incompatible way). >>> >>> J. >>> >>> On Thu, Jan 26, 2023 at 12:28 AM Daniel Standish >>> <[email protected]> wrote: >>> > >>> > Following up here... that PR has been merged.... At some point we >>> should probably have a vote on that, if it's meant to be actual binding >>> policy. Maybe we're still feeling it out? But "what is public" is a >>> pretty fundamental concern to the project. Maybe such a policy itself >>> should be an AIP? >>> > >>> > Meanwhile, going down into the weeds a bit, there's one aspect of the >>> doc which came up here: >>> https://github.com/apache/airflow/pull/29147#discussion_r1086214750 >>> > >>> > It quotes the "policy": >>> > >>> > > Airflow has a set of Executors that are considered public. You are >>> free to extend their functionality >>> > >>> > Do we really mean that all executors are public? Or do we just mean >>> that the executor interface is public, i.e. BaseExecutor? >>> > >>> > It's of course better for the healthy development of our built-in >>> executors if we have no backcompat guarantees and can change them as >>> needed. But yes this means that anyone else "building on" our executors >>> would be wise to "vendor in" our executor code instead of just subclassing. >>> Maybe it's a reasonable assumption that any user with the sophistication >>> and, perhaps, chutzpah to customize one of our executors, has also the >>> sophistication to manage this. >>> > >>> > What do y'all think? >>> > >>> > >>> > >>> > >>> > >>> >>
