Re: [DISCUSSION] Assessing what is a breaking change for Airflow (SemVer context)

Jarek Potiuk Mon, 05 Dec 2022 08:37:31 -0800

> I've just skimmed the PR a bit and honestly it feels a bit clunky. It relies 
> yet again on some after-coding-enforcement and some kind of IDL to generate a 
> stub. Isn't there a kind of pragma that we could use? Or a decorator?


Just a comment on that It uses "stubgen" to generate the .pyi files
from Python code. No IDL. We just get public methods of the package.
Whole complexity come from the stubgen generating "stub-only" packages
(so some boiler plate empty __init__.py code) + handling the case
where we remove a class where it was there before + post-processing in
case we have some exceptions. All the stubs are automatically
generated from existing .py files. No need to even decorate stuff - we
just need to point to the packages/classes that should be our API.

On Mon, Dec 5, 2022 at 4:37 PM Bolke de Bruin <bdbr...@gmail.com> wrote:
>
> on 1)
>
> I've just skimmed the PR a bit and honestly it feels a bit clunky. It relies 
> yet again on some after-coding-enforcement and some kind of IDL to generate a 
> stub. Isn't there a kind of pragma that we could use? Or a decorator?
>
> like:
> (see also: https://gist.github.com/latsa/1f3ed52784a6fb423a937aa030679117)
>
> @public
> def my_func()
>
> or
> def my_func()  # pragma: public
>
> I like the immediate feedback instead of MyPy checks which also fail only 
> during tests. I think the decorator style is used in other projects too, but 
> I don't have references.
>
> Again I just skimmed it so I am shooting from the hip as they say ;-)
>
>
>
> Op za 3 dec. 2022 om 10:25 schreef Jarek Potiuk <ja...@potiuk.com>:
>>
>> Sorry for not following up on this for a bit - it's been hectic these
>> days for me. I think valid points were said, and from the tone of
>> those I feel that we all who participated have the same sense of what
>> is important:
>>
>> 1) users "peace of mind" as top priority: clarity of what they can
>> expect from Airflow, and avoiding surprises when upgrading
>> 2) targeting minimal disruption to user's workflows (though we might
>> never reach absolute 100%)
>> 3) making it easy for contributors and maintainers to decide on
>> breaking/non-breaking behaviours
>>
>> I think there is a main blocker to all of those (also mentioned in the
>> discussion above):
>>
>> We are extremely cautious about any change because there is a lack of
>> agreement/expectations with our users on what is supposed to be the
>> "public API" .
>>
>> # Proposal
>>
>> My proposal  to work on documenting our approach for our users (and
>> for maintainers) in a single page: "What is Airflow Public API?" and
>> what users can expect.
>>
>> There are certain areas where we can define rules and either automate
>> or document (or both) our statement about what is the "public" API and
>> (more importantly) what is clearly NOT on a single page document.
>> Also it should also be accompanied (where possible) with some
>> automation and tooling that would help us to express it in detail (and
>> help our users to validate if they are conforming to the "public
>> API").
>>
>> We won't solve it very quickly, but once we start doing it, it might
>> turn out that it's not that long of a process in fact. And if we start
>> it now - in a few months we might be in a different place.
>>
>> # Some concrete actions we might take
>>
>> 1) On the 'Code" level - we can start to define the API that is
>> considered as "public" and add verification of those for our users. We
>> could implement a similar solution to what I proposed to common.sql
>> https://github.com/apache/airflow/pull/27962 (where I followed Ash's
>> idea to use MyPy stubgen and pre-commits to flag changes to it, and
>> where we harness MyPy capabilities to control how the API is used). I
>> believe that we could apply a similar solution to all providers and
>> eventually even all parts of core, to make it very clear which part of
>> the Airflow API is public and which is not. I think MyPy and
>> strong-ish typing is taking the Python world by a storm, and we could
>> use it as a standard way of communicating to those who use Airflow as
>> a library, which parts are "public".
>>
>> Having .pyi files as part of our packages with "hidden" parts that are
>> not supported to be exposed, seems to be not only a nice communication
>> tool but also has support for all the kind of tooling from day 0 for
>> our users (IDE integrations, automations to check if the right API is
>> used etc.). We could even easily provide guidelines for the users
>> "Here is how you can check if you are using Airflow code properly".
>> Not 100% foolproof but much better than anything else I can imagine.
>>
>> Also having it in place will allow the providers to be finally
>> separated to separate repositories - and we could use MyPy checks
>> rather than running the full test suite with the Providers to verify
>> if changes in Airflow do not break Providers. That would finally make
>> it possible to loosen the coupling we have between Providers and
>> Airflow (currently we basically run whole suite of tests to be certain
>> things are working - but we could simply run providers with MyPy
>> checks if we have proper .pyi files (not the same confidence but very,
>> very close).
>>
>> 2) On the DB level - we already have "AIP-44" as the foundation of
>> telling the users - those are the "Airflow" you can do "this" when you
>> write your DAGs. Direct DB access will be forbidden and we can
>> specifically communicate to the users "do not use DB any more" and we
>> can even work out warnings when our users do. We could even make it a
>> default behaviour later to block direct access by default (but that is
>> likely only in Airflow 3).
>>
>> 3) On the UI level - we could simply explain that UI changes are
>> exempt from the "no removal" policy. We might simply treat all the UI
>> changes as non-breaking by default and loosen our strictness there.
>> This would be very close to the Chrome/Firefox example by Bolke - I
>> think UI changes are not breaking in the sense that you have to fix
>> your code that uses it, it requires simply changing user's habits.
>> We've already done this, That would be simply acknowledging the
>> approach we already used when TreeView was replaced by GridView.
>>
>> 4) Airflow also has also a few non-code interfaces that are considered
>> as part of the platform: statsd metrics is one of them. I can't think
>> of any more but maybe there are more. We could simply make an
>> inventory and discuss our approach on those ONCE and document it. This
>> will avoid discussions, discussions, discussions, and let our users
>> have some clear expectations and maintainers making quick decisions
>> when approving (or not) PRs.
>>
>> # Question
>>
>> Does it sound like a good plan? Is it worth making such an effort ? Or
>> maybe what we have as status-quo is "good enough" and that would be a
>> waste of effort? WDYT?
>>
>> J.
>>
>>
>>
>>
>> J
>
>
>
> --
>
> --
> Bolke de Bruin
> bdbr...@gmail.com

Re: [DISCUSSION] Assessing what is a breaking change for Airflow (SemVer context)

Reply via email to