on 1) I've just skimmed the PR a bit and honestly it feels a bit clunky. It relies yet again on some after-coding-enforcement and some kind of IDL to generate a stub. Isn't there a kind of pragma that we could use? Or a decorator?
like: (see also: https://gist.github.com/latsa/1f3ed52784a6fb423a937aa030679117) @public def my_func() or def my_func() # pragma: public I like the immediate feedback instead of MyPy checks which also fail only during tests. I think the decorator style is used in other projects too, but I don't have references. Again I just skimmed it so I am shooting from the hip as they say ;-) Op za 3 dec. 2022 om 10:25 schreef Jarek Potiuk <ja...@potiuk.com>: > Sorry for not following up on this for a bit - it's been hectic these > days for me. I think valid points were said, and from the tone of > those I feel that we all who participated have the same sense of what > is important: > > 1) users "peace of mind" as top priority: clarity of what they can > expect from Airflow, and avoiding surprises when upgrading > 2) targeting minimal disruption to user's workflows (though we might > never reach absolute 100%) > 3) making it easy for contributors and maintainers to decide on > breaking/non-breaking behaviours > > I think there is a main blocker to all of those (also mentioned in the > discussion above): > > We are extremely cautious about any change because there is a lack of > agreement/expectations with our users on what is supposed to be the > "public API" . > > # Proposal > > My proposal to work on documenting our approach for our users (and > for maintainers) in a single page: "What is Airflow Public API?" and > what users can expect. > > There are certain areas where we can define rules and either automate > or document (or both) our statement about what is the "public" API and > (more importantly) what is clearly NOT on a single page document. > Also it should also be accompanied (where possible) with some > automation and tooling that would help us to express it in detail (and > help our users to validate if they are conforming to the "public > API"). > > We won't solve it very quickly, but once we start doing it, it might > turn out that it's not that long of a process in fact. And if we start > it now - in a few months we might be in a different place. > > # Some concrete actions we might take > > 1) On the 'Code" level - we can start to define the API that is > considered as "public" and add verification of those for our users. We > could implement a similar solution to what I proposed to common.sql > https://github.com/apache/airflow/pull/27962 (where I followed Ash's > idea to use MyPy stubgen and pre-commits to flag changes to it, and > where we harness MyPy capabilities to control how the API is used). I > believe that we could apply a similar solution to all providers and > eventually even all parts of core, to make it very clear which part of > the Airflow API is public and which is not. I think MyPy and > strong-ish typing is taking the Python world by a storm, and we could > use it as a standard way of communicating to those who use Airflow as > a library, which parts are "public". > > Having .pyi files as part of our packages with "hidden" parts that are > not supported to be exposed, seems to be not only a nice communication > tool but also has support for all the kind of tooling from day 0 for > our users (IDE integrations, automations to check if the right API is > used etc.). We could even easily provide guidelines for the users > "Here is how you can check if you are using Airflow code properly". > Not 100% foolproof but much better than anything else I can imagine. > > Also having it in place will allow the providers to be finally > separated to separate repositories - and we could use MyPy checks > rather than running the full test suite with the Providers to verify > if changes in Airflow do not break Providers. That would finally make > it possible to loosen the coupling we have between Providers and > Airflow (currently we basically run whole suite of tests to be certain > things are working - but we could simply run providers with MyPy > checks if we have proper .pyi files (not the same confidence but very, > very close). > > 2) On the DB level - we already have "AIP-44" as the foundation of > telling the users - those are the "Airflow" you can do "this" when you > write your DAGs. Direct DB access will be forbidden and we can > specifically communicate to the users "do not use DB any more" and we > can even work out warnings when our users do. We could even make it a > default behaviour later to block direct access by default (but that is > likely only in Airflow 3). > > 3) On the UI level - we could simply explain that UI changes are > exempt from the "no removal" policy. We might simply treat all the UI > changes as non-breaking by default and loosen our strictness there. > This would be very close to the Chrome/Firefox example by Bolke - I > think UI changes are not breaking in the sense that you have to fix > your code that uses it, it requires simply changing user's habits. > We've already done this, That would be simply acknowledging the > approach we already used when TreeView was replaced by GridView. > > 4) Airflow also has also a few non-code interfaces that are considered > as part of the platform: statsd metrics is one of them. I can't think > of any more but maybe there are more. We could simply make an > inventory and discuss our approach on those ONCE and document it. This > will avoid discussions, discussions, discussions, and let our users > have some clear expectations and maintainers making quick decisions > when approving (or not) PRs. > > # Question > > Does it sound like a good plan? Is it worth making such an effort ? Or > maybe what we have as status-quo is "good enough" and that would be a > waste of effort? WDYT? > > J. > > > > > J > -- -- Bolke de Bruin bdbr...@gmail.com