Does anyone have any more comments ? I think I will make a PR  soon
describing the "Public API" of Airflow (as a work in progress) and I think
we can discuss any details of it in the PR.

Also the 'common.api' approach does not need to be the "chosen" to automate
the API description - this is just an example.

Just so we all know - over the last few months I have been using the
"common.sql" as a "testbed" for various problems and approaches that
involve common code and exposing it to multiple users in the context of
Airflow. We already had some  super valuable lessons (and there quite a few
of PRs and issues and discussions I can refer to when we will be discussing
potential splitting out and separating providers in the future as good
examples of what "*might*" happen and what we should take care about
if/when we split. So for me the current approach with MyPy, stubgen and the
way how we are going to keep "API" changes in-check is an experiment that
we **might** apply to Airflow if we find the approach useful

We do not have to decide now, we do not even have to implement anything in
order to define what we **think** Airflow API is. I hope we can do the
definition first and when we get some lessons from common.sql (which is a
cool example because it's small but it evolves quickly enough to get some
learnings - including some failures we learn from).

This would be my current approach now:

* start defining what Public API is
* learn more about "keeping API in-check" from common.sql
* see how we can improve automation around the API check

J.




On Mon, Dec 5, 2022 at 8:58 PM Oliveira, Niko <oniko...@amazon.com.invalid>
wrote:

> 1) users "peace of mind" as top priority: clarity of what they can
> expect from Airflow, and avoiding surprises when upgrading
> 2) targeting minimal disruption to user's workflows (though we might
> never reach absolute 100%)
> 3) making it easy for contributors and maintainers to decide on
> breaking/non-breaking behaviours
>
> Yupp, I agree, this is an accurate encapsulation of the issues at hand.
>
>
> My proposal  to work on documenting our approach for our users (and
> for maintainers) in a single page: "What is Airflow Public API?" and
> what users can expect.
>
> I think this is actually a very important piece we've been missing. From
> the SemVer RFC itself it says:
>
>
> "*For this system to work, you first need to declare a public API. This
> may consist of documentation or be enforced by the code itself. Regardless,
> it is important that this API be clear and precise. Once you identify your
> public API, you communicate changes to it with specific increments to your
> version number.*"
>
> So as difficult as I think it will be to accurately describe and automate
> what the Airflow public API is, I think it's a very useful project to
> undertake. Perhaps even codifying it in an AIP.
> At the moment we consider even the deepest/smallest "private" helper
> function within util provider code to be public. This level of public API
> makes iterating and maintaining the code very laborious. So I definitely
> think this is worth the effort.
> I'll need to have a closer look at that PR, but the exact technical
> details can certainly be hammered out later.
>
> Cheers,
> Niko
>
> ------------------------------
> *From:* Jarek Potiuk <ja...@potiuk.com>
> *Sent:* Saturday, December 3, 2022 1:25 AM
> *To:* dev@airflow.apache.org
> *Subject:* RE: [EXTERNAL][DISCUSSION] Assessing what is a breaking change
> for Airflow (SemVer context)
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Sorry for not following up on this for a bit - it's been hectic these
> days for me. I think valid points were said, and from the tone of
> those I feel that we all who participated have the same sense of what
> is important:
>
> 1) users "peace of mind" as top priority: clarity of what they can
> expect from Airflow, and avoiding surprises when upgrading
> 2) targeting minimal disruption to user's workflows (though we might
> never reach absolute 100%)
> 3) making it easy for contributors and maintainers to decide on
> breaking/non-breaking behaviours
>
> I think there is a main blocker to all of those (also mentioned in the
> discussion above):
>
> We are extremely cautious about any change because there is a lack of
> agreement/expectations with our users on what is supposed to be the
> "public API" .
>
> # Proposal
>
> My proposal  to work on documenting our approach for our users (and
> for maintainers) in a single page: "What is Airflow Public API?" and
> what users can expect.
>
> There are certain areas where we can define rules and either automate
> or document (or both) our statement about what is the "public" API and
> (more importantly) what is clearly NOT on a single page document.
> Also it should also be accompanied (where possible) with some
> automation and tooling that would help us to express it in detail (and
> help our users to validate if they are conforming to the "public
> API").
>
> We won't solve it very quickly, but once we start doing it, it might
> turn out that it's not that long of a process in fact. And if we start
> it now - in a few months we might be in a different place.
>
> # Some concrete actions we might take
>
> 1) On the 'Code" level - we can start to define the API that is
> considered as "public" and add verification of those for our users. We
> could implement a similar solution to what I proposed to common.sql
> https://github.com/apache/airflow/pull/27962 (where I followed Ash's
> idea to use MyPy stubgen and pre-commits to flag changes to it, and
> where we harness MyPy capabilities to control how the API is used). I
> believe that we could apply a similar solution to all providers and
> eventually even all parts of core, to make it very clear which part of
> the Airflow API is public and which is not. I think MyPy and
> strong-ish typing is taking the Python world by a storm, and we could
> use it as a standard way of communicating to those who use Airflow as
> a library, which parts are "public".
>
> Having .pyi files as part of our packages with "hidden" parts that are
> not supported to be exposed, seems to be not only a nice communication
> tool but also has support for all the kind of tooling from day 0 for
> our users (IDE integrations, automations to check if the right API is
> used etc.). We could even easily provide guidelines for the users
> "Here is how you can check if you are using Airflow code properly".
> Not 100% foolproof but much better than anything else I can imagine.
>
> Also having it in place will allow the providers to be finally
> separated to separate repositories - and we could use MyPy checks
> rather than running the full test suite with the Providers to verify
> if changes in Airflow do not break Providers. That would finally make
> it possible to loosen the coupling we have between Providers and
> Airflow (currently we basically run whole suite of tests to be certain
> things are working - but we could simply run providers with MyPy
> checks if we have proper .pyi files (not the same confidence but very,
> very close).
>
> 2) On the DB level - we already have "AIP-44" as the foundation of
> telling the users - those are the "Airflow" you can do "this" when you
> write your DAGs. Direct DB access will be forbidden and we can
> specifically communicate to the users "do not use DB any more" and we
> can even work out warnings when our users do. We could even make it a
> default behaviour later to block direct access by default (but that is
> likely only in Airflow 3).
>
> 3) On the UI level - we could simply explain that UI changes are
> exempt from the "no removal" policy. We might simply treat all the UI
> changes as non-breaking by default and loosen our strictness there.
> This would be very close to the Chrome/Firefox example by Bolke - I
> think UI changes are not breaking in the sense that you have to fix
> your code that uses it, it requires simply changing user's habits.
> We've already done this, That would be simply acknowledging the
> approach we already used when TreeView was replaced by GridView.
>
> 4) Airflow also has also a few non-code interfaces that are considered
> as part of the platform: statsd metrics is one of them. I can't think
> of any more but maybe there are more. We could simply make an
> inventory and discuss our approach on those ONCE and document it. This
> will avoid discussions, discussions, discussions, and let our users
> have some clear expectations and maintainers making quick decisions
> when approving (or not) PRs.
>
> # Question
>
> Does it sound like a good plan? Is it worth making such an effort ? Or
> maybe what we have as status-quo is "good enough" and that would be a
> waste of effort? WDYT?
>
> J.
>
>
>
>
> J
>

Reply via email to