Hello Everyone.

I opened PR with the Public API description proposal.

https://github.com/apache/airflow/pull/28300

WARNING, CAUTION!

You might be surprised if you have not followed the recent GptChat stuff
(if you've been hiding under a rock somewhere last week)  but about 80% of
the PR has been actually written this morning by GPT Chat Bot whom I asked
questions about the Public API.

Fantastic tool, and it's definitely added to my toolbox when it comes to
writing documentation (I still believe we will need documentation and
humans - even with chat bot answering most of the questions and being
really helpful in a number of cases).

Have fun reviewing what the GPT Chat bot has written (I already reviewed
and corrected all the mistakes - there were a few - and added links and the
like),

The doc build will fail, I will correct it later, just wanted to send it as
soon as I got it in the shape that I think makes sense in general and
looking for comments.

J.


J.




On Thu, Dec 8, 2022 at 8:35 PM Ferruzzi, Dennis <[email protected]>
wrote:

> I think this sounds like a solid plan for moving forward.   This whole
> discussion is kind of new to me so I've just been following along and
> learning, but the concept of defining what constitutes and Public API so we
> can iterate faster on "behind the scenes" stuff sounds like a solid plan.
> We have a bunch of discussions lately about "well when/why wold a user ever
> even USE that method?" and this will definitely make those discussions
> easier.
>
>
> ------------------------------
> *From:* Jarek Potiuk <[email protected]>
> *Sent:* Thursday, December 8, 2022 10:57 AM
> *To:* [email protected]
> *Subject:* RE: [EXTERNAL][DISCUSSION] Assessing what is a breaking change
> for Airflow (SemVer context)
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
> Does anyone have any more comments ? I think I will make a PR  soon
> describing the "Public API" of Airflow (as a work in progress) and I think
> we can discuss any details of it in the PR.
>
> Also the 'common.api' approach does not need to be the "chosen" to
> automate the API description - this is just an example.
>
> Just so we all know - over the last few months I have been using the
> "common.sql" as a "testbed" for various problems and approaches that
> involve common code and exposing it to multiple users in the context of
> Airflow. We already had some  super valuable lessons (and there quite a few
> of PRs and issues and discussions I can refer to when we will be discussing
> potential splitting out and separating providers in the future as good
> examples of what "*might*" happen and what we should take care about
> if/when we split. So for me the current approach with MyPy, stubgen and the
> way how we are going to keep "API" changes in-check is an experiment that
> we **might** apply to Airflow if we find the approach useful
>
> We do not have to decide now, we do not even have to implement anything in
> order to define what we **think** Airflow API is. I hope we can do the
> definition first and when we get some lessons from common.sql (which is a
> cool example because it's small but it evolves quickly enough to get some
> learnings - including some failures we learn from).
>
> This would be my current approach now:
>
> * start defining what Public API is
> * learn more about "keeping API in-check" from common.sql
> * see how we can improve automation around the API check
>
> J.
>
>
>
>
> On Mon, Dec 5, 2022 at 8:58 PM Oliveira, Niko <[email protected]>
> wrote:
>
>> 1) users "peace of mind" as top priority: clarity of what they can
>> expect from Airflow, and avoiding surprises when upgrading
>> 2) targeting minimal disruption to user's workflows (though we might
>> never reach absolute 100%)
>> 3) making it easy for contributors and maintainers to decide on
>> breaking/non-breaking behaviours
>>
>> Yupp, I agree, this is an accurate encapsulation of the issues at hand.
>>
>>
>> My proposal  to work on documenting our approach for our users (and
>> for maintainers) in a single page: "What is Airflow Public API?" and
>> what users can expect.
>>
>> I think this is actually a very important piece we've been missing. From
>> the SemVer RFC itself it says:
>>
>>
>> "*For this system to work, you first need to declare a public API. This
>> may consist of documentation or be enforced by the code itself. Regardless,
>> it is important that this API be clear and precise. Once you identify your
>> public API, you communicate changes to it with specific increments to your
>> version number.*"
>>
>> So as difficult as I think it will be to accurately describe and automate
>> what the Airflow public API is, I think it's a very useful project to
>> undertake. Perhaps even codifying it in an AIP.
>> At the moment we consider even the deepest/smallest "private" helper
>> function within util provider code to be public. This level of public API
>> makes iterating and maintaining the code very laborious. So I definitely
>> think this is worth the effort.
>> I'll need to have a closer look at that PR, but the exact technical
>> details can certainly be hammered out later.
>>
>> Cheers,
>> Niko
>>
>> ------------------------------
>> *From:* Jarek Potiuk <[email protected]>
>> *Sent:* Saturday, December 3, 2022 1:25 AM
>> *To:* [email protected]
>> *Subject:* RE: [EXTERNAL][DISCUSSION] Assessing what is a breaking
>> change for Airflow (SemVer context)
>>
>> CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>>
>>
>>
>> Sorry for not following up on this for a bit - it's been hectic these
>> days for me. I think valid points were said, and from the tone of
>> those I feel that we all who participated have the same sense of what
>> is important:
>>
>> 1) users "peace of mind" as top priority: clarity of what they can
>> expect from Airflow, and avoiding surprises when upgrading
>> 2) targeting minimal disruption to user's workflows (though we might
>> never reach absolute 100%)
>> 3) making it easy for contributors and maintainers to decide on
>> breaking/non-breaking behaviours
>>
>> I think there is a main blocker to all of those (also mentioned in the
>> discussion above):
>>
>> We are extremely cautious about any change because there is a lack of
>> agreement/expectations with our users on what is supposed to be the
>> "public API" .
>>
>> # Proposal
>>
>> My proposal  to work on documenting our approach for our users (and
>> for maintainers) in a single page: "What is Airflow Public API?" and
>> what users can expect.
>>
>> There are certain areas where we can define rules and either automate
>> or document (or both) our statement about what is the "public" API and
>> (more importantly) what is clearly NOT on a single page document.
>> Also it should also be accompanied (where possible) with some
>> automation and tooling that would help us to express it in detail (and
>> help our users to validate if they are conforming to the "public
>> API").
>>
>> We won't solve it very quickly, but once we start doing it, it might
>> turn out that it's not that long of a process in fact. And if we start
>> it now - in a few months we might be in a different place.
>>
>> # Some concrete actions we might take
>>
>> 1) On the 'Code" level - we can start to define the API that is
>> considered as "public" and add verification of those for our users. We
>> could implement a similar solution to what I proposed to common.sql
>> https://github.com/apache/airflow/pull/27962 (where I followed Ash's
>> idea to use MyPy stubgen and pre-commits to flag changes to it, and
>> where we harness MyPy capabilities to control how the API is used). I
>> believe that we could apply a similar solution to all providers and
>> eventually even all parts of core, to make it very clear which part of
>> the Airflow API is public and which is not. I think MyPy and
>> strong-ish typing is taking the Python world by a storm, and we could
>> use it as a standard way of communicating to those who use Airflow as
>> a library, which parts are "public".
>>
>> Having .pyi files as part of our packages with "hidden" parts that are
>> not supported to be exposed, seems to be not only a nice communication
>> tool but also has support for all the kind of tooling from day 0 for
>> our users (IDE integrations, automations to check if the right API is
>> used etc.). We could even easily provide guidelines for the users
>> "Here is how you can check if you are using Airflow code properly".
>> Not 100% foolproof but much better than anything else I can imagine.
>>
>> Also having it in place will allow the providers to be finally
>> separated to separate repositories - and we could use MyPy checks
>> rather than running the full test suite with the Providers to verify
>> if changes in Airflow do not break Providers. That would finally make
>> it possible to loosen the coupling we have between Providers and
>> Airflow (currently we basically run whole suite of tests to be certain
>> things are working - but we could simply run providers with MyPy
>> checks if we have proper .pyi files (not the same confidence but very,
>> very close).
>>
>> 2) On the DB level - we already have "AIP-44" as the foundation of
>> telling the users - those are the "Airflow" you can do "this" when you
>> write your DAGs. Direct DB access will be forbidden and we can
>> specifically communicate to the users "do not use DB any more" and we
>> can even work out warnings when our users do. We could even make it a
>> default behaviour later to block direct access by default (but that is
>> likely only in Airflow 3).
>>
>> 3) On the UI level - we could simply explain that UI changes are
>> exempt from the "no removal" policy. We might simply treat all the UI
>> changes as non-breaking by default and loosen our strictness there.
>> This would be very close to the Chrome/Firefox example by Bolke - I
>> think UI changes are not breaking in the sense that you have to fix
>> your code that uses it, it requires simply changing user's habits.
>> We've already done this, That would be simply acknowledging the
>> approach we already used when TreeView was replaced by GridView.
>>
>> 4) Airflow also has also a few non-code interfaces that are considered
>> as part of the platform: statsd metrics is one of them. I can't think
>> of any more but maybe there are more. We could simply make an
>> inventory and discuss our approach on those ONCE and document it. This
>> will avoid discussions, discussions, discussions, and let our users
>> have some clear expectations and maintainers making quick decisions
>> when approving (or not) PRs.
>>
>> # Question
>>
>> Does it sound like a good plan? Is it worth making such an effort ? Or
>> maybe what we have as status-quo is "good enough" and that would be a
>> waste of effort? WDYT?
>>
>> J.
>>
>>
>>
>>
>> J
>>
>

Reply via email to