Hello Everyone. I opened PR with the Public API description proposal.
https://github.com/apache/airflow/pull/28300 WARNING, CAUTION! You might be surprised if you have not followed the recent GptChat stuff (if you've been hiding under a rock somewhere last week) but about 80% of the PR has been actually written this morning by GPT Chat Bot whom I asked questions about the Public API. Fantastic tool, and it's definitely added to my toolbox when it comes to writing documentation (I still believe we will need documentation and humans - even with chat bot answering most of the questions and being really helpful in a number of cases). Have fun reviewing what the GPT Chat bot has written (I already reviewed and corrected all the mistakes - there were a few - and added links and the like), The doc build will fail, I will correct it later, just wanted to send it as soon as I got it in the shape that I think makes sense in general and looking for comments. J. J. On Thu, Dec 8, 2022 at 8:35 PM Ferruzzi, Dennis <[email protected]> wrote: > I think this sounds like a solid plan for moving forward. This whole > discussion is kind of new to me so I've just been following along and > learning, but the concept of defining what constitutes and Public API so we > can iterate faster on "behind the scenes" stuff sounds like a solid plan. > We have a bunch of discussions lately about "well when/why wold a user ever > even USE that method?" and this will definitely make those discussions > easier. > > > ------------------------------ > *From:* Jarek Potiuk <[email protected]> > *Sent:* Thursday, December 8, 2022 10:57 AM > *To:* [email protected] > *Subject:* RE: [EXTERNAL][DISCUSSION] Assessing what is a breaking change > for Airflow (SemVer context) > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > Does anyone have any more comments ? I think I will make a PR soon > describing the "Public API" of Airflow (as a work in progress) and I think > we can discuss any details of it in the PR. > > Also the 'common.api' approach does not need to be the "chosen" to > automate the API description - this is just an example. > > Just so we all know - over the last few months I have been using the > "common.sql" as a "testbed" for various problems and approaches that > involve common code and exposing it to multiple users in the context of > Airflow. We already had some super valuable lessons (and there quite a few > of PRs and issues and discussions I can refer to when we will be discussing > potential splitting out and separating providers in the future as good > examples of what "*might*" happen and what we should take care about > if/when we split. So for me the current approach with MyPy, stubgen and the > way how we are going to keep "API" changes in-check is an experiment that > we **might** apply to Airflow if we find the approach useful > > We do not have to decide now, we do not even have to implement anything in > order to define what we **think** Airflow API is. I hope we can do the > definition first and when we get some lessons from common.sql (which is a > cool example because it's small but it evolves quickly enough to get some > learnings - including some failures we learn from). > > This would be my current approach now: > > * start defining what Public API is > * learn more about "keeping API in-check" from common.sql > * see how we can improve automation around the API check > > J. > > > > > On Mon, Dec 5, 2022 at 8:58 PM Oliveira, Niko <[email protected]> > wrote: > >> 1) users "peace of mind" as top priority: clarity of what they can >> expect from Airflow, and avoiding surprises when upgrading >> 2) targeting minimal disruption to user's workflows (though we might >> never reach absolute 100%) >> 3) making it easy for contributors and maintainers to decide on >> breaking/non-breaking behaviours >> >> Yupp, I agree, this is an accurate encapsulation of the issues at hand. >> >> >> My proposal to work on documenting our approach for our users (and >> for maintainers) in a single page: "What is Airflow Public API?" and >> what users can expect. >> >> I think this is actually a very important piece we've been missing. From >> the SemVer RFC itself it says: >> >> >> "*For this system to work, you first need to declare a public API. This >> may consist of documentation or be enforced by the code itself. Regardless, >> it is important that this API be clear and precise. Once you identify your >> public API, you communicate changes to it with specific increments to your >> version number.*" >> >> So as difficult as I think it will be to accurately describe and automate >> what the Airflow public API is, I think it's a very useful project to >> undertake. Perhaps even codifying it in an AIP. >> At the moment we consider even the deepest/smallest "private" helper >> function within util provider code to be public. This level of public API >> makes iterating and maintaining the code very laborious. So I definitely >> think this is worth the effort. >> I'll need to have a closer look at that PR, but the exact technical >> details can certainly be hammered out later. >> >> Cheers, >> Niko >> >> ------------------------------ >> *From:* Jarek Potiuk <[email protected]> >> *Sent:* Saturday, December 3, 2022 1:25 AM >> *To:* [email protected] >> *Subject:* RE: [EXTERNAL][DISCUSSION] Assessing what is a breaking >> change for Airflow (SemVer context) >> >> CAUTION: This email originated from outside of the organization. Do not >> click links or open attachments unless you can confirm the sender and know >> the content is safe. >> >> >> >> Sorry for not following up on this for a bit - it's been hectic these >> days for me. I think valid points were said, and from the tone of >> those I feel that we all who participated have the same sense of what >> is important: >> >> 1) users "peace of mind" as top priority: clarity of what they can >> expect from Airflow, and avoiding surprises when upgrading >> 2) targeting minimal disruption to user's workflows (though we might >> never reach absolute 100%) >> 3) making it easy for contributors and maintainers to decide on >> breaking/non-breaking behaviours >> >> I think there is a main blocker to all of those (also mentioned in the >> discussion above): >> >> We are extremely cautious about any change because there is a lack of >> agreement/expectations with our users on what is supposed to be the >> "public API" . >> >> # Proposal >> >> My proposal to work on documenting our approach for our users (and >> for maintainers) in a single page: "What is Airflow Public API?" and >> what users can expect. >> >> There are certain areas where we can define rules and either automate >> or document (or both) our statement about what is the "public" API and >> (more importantly) what is clearly NOT on a single page document. >> Also it should also be accompanied (where possible) with some >> automation and tooling that would help us to express it in detail (and >> help our users to validate if they are conforming to the "public >> API"). >> >> We won't solve it very quickly, but once we start doing it, it might >> turn out that it's not that long of a process in fact. And if we start >> it now - in a few months we might be in a different place. >> >> # Some concrete actions we might take >> >> 1) On the 'Code" level - we can start to define the API that is >> considered as "public" and add verification of those for our users. We >> could implement a similar solution to what I proposed to common.sql >> https://github.com/apache/airflow/pull/27962 (where I followed Ash's >> idea to use MyPy stubgen and pre-commits to flag changes to it, and >> where we harness MyPy capabilities to control how the API is used). I >> believe that we could apply a similar solution to all providers and >> eventually even all parts of core, to make it very clear which part of >> the Airflow API is public and which is not. I think MyPy and >> strong-ish typing is taking the Python world by a storm, and we could >> use it as a standard way of communicating to those who use Airflow as >> a library, which parts are "public". >> >> Having .pyi files as part of our packages with "hidden" parts that are >> not supported to be exposed, seems to be not only a nice communication >> tool but also has support for all the kind of tooling from day 0 for >> our users (IDE integrations, automations to check if the right API is >> used etc.). We could even easily provide guidelines for the users >> "Here is how you can check if you are using Airflow code properly". >> Not 100% foolproof but much better than anything else I can imagine. >> >> Also having it in place will allow the providers to be finally >> separated to separate repositories - and we could use MyPy checks >> rather than running the full test suite with the Providers to verify >> if changes in Airflow do not break Providers. That would finally make >> it possible to loosen the coupling we have between Providers and >> Airflow (currently we basically run whole suite of tests to be certain >> things are working - but we could simply run providers with MyPy >> checks if we have proper .pyi files (not the same confidence but very, >> very close). >> >> 2) On the DB level - we already have "AIP-44" as the foundation of >> telling the users - those are the "Airflow" you can do "this" when you >> write your DAGs. Direct DB access will be forbidden and we can >> specifically communicate to the users "do not use DB any more" and we >> can even work out warnings when our users do. We could even make it a >> default behaviour later to block direct access by default (but that is >> likely only in Airflow 3). >> >> 3) On the UI level - we could simply explain that UI changes are >> exempt from the "no removal" policy. We might simply treat all the UI >> changes as non-breaking by default and loosen our strictness there. >> This would be very close to the Chrome/Firefox example by Bolke - I >> think UI changes are not breaking in the sense that you have to fix >> your code that uses it, it requires simply changing user's habits. >> We've already done this, That would be simply acknowledging the >> approach we already used when TreeView was replaced by GridView. >> >> 4) Airflow also has also a few non-code interfaces that are considered >> as part of the platform: statsd metrics is one of them. I can't think >> of any more but maybe there are more. We could simply make an >> inventory and discuss our approach on those ONCE and document it. This >> will avoid discussions, discussions, discussions, and let our users >> have some clear expectations and maintainers making quick decisions >> when approving (or not) PRs. >> >> # Question >> >> Does it sound like a good plan? Is it worth making such an effort ? Or >> maybe what we have as status-quo is "good enough" and that would be a >> waste of effort? WDYT? >> >> J. >> >> >> >> >> J >> >
