Yep. I see your point  Daniel. And yes. I think you are right it might be
too much to add back-compat guarantees to all executors.

After a bit of thinking on it, I am perfectly fine with clarification that
"Executor interface" is public but "K8S executor" internal details might be
changed any time. Happy to merge such clarification to the docs  - maybe
someone can phrase it well.

XD

Yeah, your summary is perfect.

J.

On Fri, Jan 27, 2023 at 6:14 AM Xiaodong Deng <[email protected]> wrote:

> Happy to see the change I made was used as an example here ;-)
>
> Honestly the discussion is a bit long and getting increasingly general, so
> a bit hard to follow for me. But if I understand the conversation & issue
> correctly, here are my thoughts:
>
> - We may not want to make a strong statement like "*all executors are
> public*". It's just impossible IMHO.
> - Let's just mark a limited number of key methods/interfaces in each
> executor as public, and we ensure a strong backcompat for them. For any
> methods/interfaces not marked, no backcompat guaranteed even between minor
> versions. This should also be added into documentation, as well as
> highlighted in the executor docstrings.
>
> I believe I must have missed some important points in the conversation, or
> even misunderstood some points. But just summarizing what I have understood
> as well as what I would prefer.
>
>
> Regards,
> XD
>
> On Thu, Jan 26, 2023 at 9:45 AM Daniel Standish
> <[email protected]> wrote:
>
>> I understand it's "convenient" if they want to extend k8s executor.  But
>> I guess my thought is, I'm not sure that convenience outweighs the
>> competing good which is, freedom to develop k8s executor.  I'm not saying
>> don't extend k8s executor -- please do, and contribute it back.  But for
>> subclassing / extending, that feels more like "you're on your own", in the
>> sense of, "just vendor the executor".
>>
>> For example in a recent PR, XD added better support for running k8s
>> executor with multiple namespace.  Now instead of a single `watcher` on k8s
>> executor we have a `watchers` dict
>> <https://github.com/apache/airflow/pull/28047/files#diff-681de8974a439f70dfa41705f5c1681ecce615fac6c4c715c1978d28d8f0da84L254>.
>> I suppose it's "technically" breaking, in a subclass sense.  I don't really
>> think that's the kind of backcompat we should worry about when making
>> changes to k8s executor.
>>
>> Certainly the user-facing executor *behavior *should follow backcompat
>> guarantees.  For example we can't just change the name of `pod_override`
>> (in executor config) since that would break user workflows.  Similar, with
>> a conf setting... renaming it without backcompat etc.  Where the dag writer
>> "feels" the executor, it should respect semver, certainly.  But I'm not
>> sure we need to guarantee backcompat of its internal methods.
>>
>> Another real example: Jed right now is trying to rename pod_id to
>> pod_name, a sensible cleanup.  But k8s internals are "public", we can't
>> really do that.  And we could imagine lots of grey areas.  Thought
>> experiment: maybe we want to move some method call out of `sync` and run it
>> periodically in a thread.  That would change what `sync` does and if a
>> subclass depends on that, well I guess we've broken that subclass.
>>
>> WDYT?
>>
>>
>> On Thu, Jan 26, 2023 at 12:21 AM Jarek Potiuk <[email protected]> wrote:
>>
>>> Yeah. The PR was mostly to bring things together- i.e to be able to
>>> say "this is really a public interface of ours". And by just doing it,
>>> I was actually hoping it will spark those kind of discussions where we
>>> will really detail on what it means for each of those to be public.
>>>
>>> Having it written down and part of our docs makes it easy to spark
>>> such discussion by pointing to it and asking "do we really mean that
>>> or that?"
>>>
>>> I think once we make a few passes and polish it even more and get to
>>> 2.6 release when this documentation part will be first time published,
>>> we **could** vote on it before (or maybe lazy-consent if we won't see
>>> many doubts raised).
>>>
>>> Whether it's an AIP - I don't think so, It's mostly about things that
>>> are there that our "intentions" are as public. It's not a legally
>>> binding document (I don't think anyone can make a legally binding
>>> backwards compatibility statement), it's merely a statement of intent
>>> of the community that our users can treat like something that they can
>>> build on when extending Airflow. And understand the rules and
>>> expectations when they look for details of each of the "components" we
>>> publish.
>>>
>>> Coming back to the original question:
>>>
>>> I personally don't think we should limit just Base Executor interface
>>> - because I can imagine there will be enough cases where extending
>>> existing Kubernetes Executor will be far easier than just using the
>>> interface and our users should understand some of the expectations
>>> they might have when doing so.
>>>
>>> One of the ways we could make our intentions clear there is to
>>> describe and follow the Python rules for what's internal and what's
>>> public. (under or even dunder in front of the class/method name). This
>>> is not "entirely" clear (protected vs private is somewhat blurry
>>> here). But maybe we could agree on some rules that we will apply in
>>> all such "public" interfaces of ours. We could do it with the executor
>>> classes and methods that we do not "intend" to be extended and rename
>>> them with the under or dunder in front)
>>>
>>> Also It can be implemented in one of two ways:
>>>
>>> 1) mark the methods/classes that we consider as "changeable" via
>>> adding dunder in front and describe it to the users so that they can
>>> understand what the rules are and make the right decisions
>>> 2) make it a bit more formal, automate and easier to use by the users
>>> in automated way - just follow what we have done with "common.sql"
>>> where we implemented a process to generate and control generated .pyi
>>> stubs:
>>>
>>> Pre-commit that takes care of it:
>>>
>>> https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py
>>> Example generated stub file:
>>>
>>> https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/hooks/sql.pyi
>>>
>>> Recently, we've gone through some first "teething" problems with the
>>> "stubgen" approach and while it is not perfect, we managed to bend it
>>> a bit to our expectations even if the original stubgen generator had
>>> some flaws. And I think we would be able to generalise the common.sql
>>> approach to all kinds of public interfaces we have.
>>>
>>> One problem with the stub approach is that it's hard to distinguish
>>> "internal" vs. "external" use of some of the interfaces, However in
>>> our case, rather than generate the .pyi files and storing it in the
>>> code of airflow, we could simply generate and publish the "typeshed"
>>> interface for Apache Airflow which will ONLY contain the "externally
>>> public" interfaces (https://github.com/python/typeshed) . What
>>> typeshed is (from the typeshed github):
>>>
>>> ""
>>> Typeshed contains external type annotations for the Python standard
>>> library and Python builtins, as well as third party packages as
>>> contributed by people external to those projects.
>>> This data can e.g. be used for static analysis, type checking or type
>>> inference.
>>> """
>>>
>>> This way a user who wishes to extend airflow could simply install the
>>> `types-apache-airflow==2.6.0`  and this will give a very strong base
>>> on understanding what is / is not public  - usable in automated way
>>>
>>> I think it would not be complex to generate such a typeshed package if
>>> we go that direction and it would help us also to protect from
>>> accidental changes (same as we do in common.sql - by automating the
>>> part where we check if the generated interface has not changed in
>>> backwards-incompatible way).
>>>
>>> J.
>>>
>>> On Thu, Jan 26, 2023 at 12:28 AM Daniel Standish
>>> <[email protected]> wrote:
>>> >
>>> > Following up here... that PR has been merged.... At some point we
>>> should probably have a vote on that, if it's meant to be actual binding
>>> policy.  Maybe we're still feeling it out?  But "what is public" is a
>>> pretty fundamental concern to the project.  Maybe such a policy itself
>>> should be an AIP?
>>> >
>>> > Meanwhile, going down into the weeds a bit, there's one aspect of the
>>> doc which came up here:
>>> https://github.com/apache/airflow/pull/29147#discussion_r1086214750
>>> >
>>> > It quotes the "policy":
>>> >
>>> > > Airflow has a set of Executors that are considered public. You are
>>> free to extend their functionality
>>> >
>>> > Do we really mean that all executors are public?  Or do we just mean
>>> that the executor interface is public, i.e. BaseExecutor?
>>> >
>>> > It's of course better for the healthy development of our built-in
>>> executors if we have no backcompat guarantees and can change them as
>>> needed.  But yes this means that anyone else "building on" our executors
>>> would be wise to "vendor in" our executor code instead of just subclassing.
>>> Maybe it's a reasonable assumption that any user with the sophistication
>>> and, perhaps, chutzpah to customize one of our executors, has also the
>>> sophistication to manage this.
>>> >
>>> > What do y'all think?
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>

Reply via email to