potiuk commented on code in PR #28300: URL: https://github.com/apache/airflow/pull/28300#discussion_r1058655785
########## docs/apache-airflow/administration-and-deployment/public-airflow-interface.rst: ########## @@ -0,0 +1,87 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Public Interface of Airflow +=========================== + +The Public Interface of Apache Airflow is a set of programmatic interfaces that allow developers to interact +with and access certain features of the Apache Airflow system. This includes operations such as +creating and managing DAGs (directed acyclic graphs), managing tasks and their dependencies, +and extending Airflow capabilities by writing new executors, plugins, operators and providers. The +Public Interface can be useful for building custom tools and integrations with other systems, +and for automating certain aspects of the Airflow workflow. + +You can extend Airflow in three ways: + +* By writing new custom Python code (via Operators, Plugins, Provider) +* By using the `Stable REST API <stable-rest-api-ref>`_ (based on the OpenAPI specification) +* By using the `Airflow Command Line Interface (CLI) <cli-and-env-variables-ref.rst>`_ + +How can you extend Apache Airflow with custom Python Code? +========================================================== + +The Public Interface of Airflow consists of a number of different classes and packages that provide access +to the core features and functionality of the system. + +The classes and packages that may be considered as the Public Interface include: + +* The :class:`~airflow.DAG`, which provides a way to define and manage DAGs in Airflow. +* The :class:`~airflow.models.baseoperator.BaseOperator`, which provides a way write custom operators. +* The :class:`~airflow.hooks.base.BaseHook`, which provides a way write custom hooks. +* The :class:`~airflow.models.connection.Connection`, which provides access to external service credentials and configuration. +* The :class:`~airflow.models.variable.Variable`, which provides access to Airflow configuration variables. +* The :class:`~airflow.models.xcom.XCom` which are used to access to inter-task communication data. +* The :class:`~airflow.secrets.BaseSecretsBackend` which are used to define custom secret managers. +* The :class:`~airflow.plugins_manager.AirflowPlugin` which are used to define custom plugins. +* The :class:`~airflow.triggers.base.BaseTrigger`, which are used to implement custom Custom Deferrable Operators (based on ``asyncio``). +* The :class:`~airflow.decorators.base.TaskDecorator`, which provides a way write custom decorators. +* The :class:`~airflow.listeners.listener.ListenerManager` class which provides hooks that can be implemented to respond to DAG/Task lifecycle events. + +.. versionadded:: 2.5 + + Listener public interface has been added in version 2.5. + +* The :class:`~airflow.executors.base_executor.BaseExecutor` - the Executors are the components of Airflow + that are responsible for executing tasks. + +.. versionadded:: 2.6 + + There are a number of different executor implementations built-in Airflow, each with its own unique + characteristics and capabilities. Executor interface was available in earlier version of Airflow but + only as of version 2.6 executors are fully decoupled and Airflow does not rely on built-in set of executors. + You could have implemented (and succeeded) with implementing Executors before Airflow 2.6 and a number + of people succeeded in doing so, but there were some hard-coded behaviours that preferred in-built + executors, and custom executors could not provide full functionality that built-in executors had. + + +What is not part of the Public Interface of Apache Airflow? +=========================================================== + +Everything not mentioned in this document should be considered as non-Public Interface. Review Comment: > It's also important to clarify that some methods are intentionally "private". E.g. in the operator, it might have helper methods. But these methods are more implementation detail right? If users subclass, though, they may use and depend on these. So what do we do? Ideally, from maintainer perspective, we want the overall behavior of the operator to be subject to backcompat but not any of its methods. Currently the convention is, i believe, all methods are assumed public unless prefix with underscore / marked meta private. WDYT? For providers it's far easier because there is no "internal" sharing between multiple modules. Each of them is really either "standalone" or uses "public" classes from other providers. And with common.sql we even introduced stubgen to generate the actual "public interface" that MyPy can surface as errors. And we even have a friction added (necessiety of regenerating the stubs) when someone changes the API by accident. > One problem with the public / private distinction is... from tool perspective ... even using _my_func across modules is sorta forbidden. But there is a difference between the internal "private" and the user-facing "private". We need to be able to write helper code that can be used across modules (internal public), but not have it be user-facing public. This is very right assesment. That's why I proposed to allowlist and be explicit about everything that is public. And from the tool perspective, one thing is important (what we did for common.sql) - to keep the list of all public methods, and whenever something there is *removed* or *updated*, fail the build and make it an explicit action (friction) for the one who modifies it accidentally. We can also add a tool for whoever uses airflow as library to check if they are not using something not "intentionally" public (we've done that for common.sql - via MyPy stubs). But we cannot (yet) do any automation in Airflow, because first we need to agree and document what is and what is not public. We need a good list of things that should be "public" - so that we can (eventually) add friction to removing or updating those and possibly eventually add a tool for others to verify if they are not using something they should not. This doc here is mostly to start the discussion and work-out a good set of classes/packages that we should consider "really public". For now in a document, eventually "guarded automatically". BTW. As I mentioned in many places, we are not able to get it 100% correct. Never. [Hyrum's law](https://www.hyrumslaw.com/) is very clear and IMHO very true about this. But we can state our intentions, and make an effort to keep those "explicitly intended" APIs controlled. And first ... we need to agree on what our intentions are. If we don't document them and don't agree what is and what is not our intention for public API - everyone will have their own understanding of it. And often those understandings will be different one (for example many users still use the DB of Airflow and relies on its structure because they assume they can rely on it). You wrote yourself that you prefer to `minimize backcompat surface` and "everything that's not an operator / sensor / hook should be private". But this is only your understanding - others might say "everything in utils can be used by DAG authors and provider writers". Both statement are possible - both orthogonally different. Both might be deliberate decision we should make as a community. But we need to make that decision and be very explicit about it. This is a starting point that is missing now IMHO. I just want to have a page where we all agree what we intentionally make "public", and have clear rules that all the rest is not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org