potiuk commented on code in PR #28300: URL: https://github.com/apache/airflow/pull/28300#discussion_r1058648489
########## docs/apache-airflow/administration-and-deployment/public-airflow-interface.rst: ########## @@ -0,0 +1,87 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Public Interface of Airflow +=========================== + +The Public Interface of Apache Airflow is a set of programmatic interfaces that allow developers to interact +with and access certain features of the Apache Airflow system. This includes operations such as +creating and managing DAGs (directed acyclic graphs), managing tasks and their dependencies, +and extending Airflow capabilities by writing new executors, plugins, operators and providers. The +Public Interface can be useful for building custom tools and integrations with other systems, +and for automating certain aspects of the Airflow workflow. + +You can extend Airflow in three ways: + +* By writing new custom Python code (via Operators, Plugins, Provider) +* By using the `Stable REST API <stable-rest-api-ref>`_ (based on the OpenAPI specification) +* By using the `Airflow Command Line Interface (CLI) <cli-and-env-variables-ref.rst>`_ + +How can you extend Apache Airflow with custom Python Code? +========================================================== + +The Public Interface of Airflow consists of a number of different classes and packages that provide access +to the core features and functionality of the system. + +The classes and packages that may be considered as the Public Interface include: + +* The :class:`~airflow.DAG`, which provides a way to define and manage DAGs in Airflow. +* The :class:`~airflow.models.baseoperator.BaseOperator`, which provides a way write custom operators. +* The :class:`~airflow.hooks.base.BaseHook`, which provides a way write custom hooks. +* The :class:`~airflow.models.connection.Connection`, which provides access to external service credentials and configuration. +* The :class:`~airflow.models.variable.Variable`, which provides access to Airflow configuration variables. +* The :class:`~airflow.models.xcom.XCom` which are used to access to inter-task communication data. +* The :class:`~airflow.secrets.BaseSecretsBackend` which are used to define custom secret managers. +* The :class:`~airflow.plugins_manager.AirflowPlugin` which are used to define custom plugins. +* The :class:`~airflow.triggers.base.BaseTrigger`, which are used to implement custom Custom Deferrable Operators (based on ``asyncio``). +* The :class:`~airflow.decorators.base.TaskDecorator`, which provides a way write custom decorators. +* The :class:`~airflow.listeners.listener.ListenerManager` class which provides hooks that can be implemented to respond to DAG/Task lifecycle events. + +.. versionadded:: 2.5 + + Listener public interface has been added in version 2.5. + +* The :class:`~airflow.executors.base_executor.BaseExecutor` - the Executors are the components of Airflow + that are responsible for executing tasks. + +.. versionadded:: 2.6 + + There are a number of different executor implementations built-in Airflow, each with its own unique + characteristics and capabilities. Executor interface was available in earlier version of Airflow but + only as of version 2.6 executors are fully decoupled and Airflow does not rely on built-in set of executors. + You could have implemented (and succeeded) with implementing Executors before Airflow 2.6 and a number + of people succeeded in doing so, but there were some hard-coded behaviours that preferred in-built + executors, and custom executors could not provide full functionality that built-in executors had. + + +What is not part of the Public Interface of Apache Airflow? +=========================================================== + +Everything not mentioned in this document should be considered as non-Public Interface. Review Comment: Maybe this is misunderstanding - what I wrote above was not about what provider packages "provides" - but what they "consume". This document indeed does not specify anything about what providers provide - it's only what Airflow "produces" and providers "consume" as users of that. The Community Providers have their own already pretty very well defined interface (with defined json schema) for public API they provide (introduced since the beginning in Airflow 2.0): provider.yaml kept internally and corresponding public `get_provider_info` entrypoint which explains all packages/classes/functionalities that are exposed and also `airflow providers` CLI shows that information. I think (maybe I am wrong) that tverything specified there is public - including automatically generated list of "core extensions" the provider deliver - and the documentation about that is automatically updated based on this meta-data (and things like naming, completeness etc. is automatically verified at PR time with unit tests following AIP-21). And yes. In Providers, whenever something is ment to be internal there is indeed defined with `_` and our sphinx docs building excludes it automatically. Actually all those are things we cross-check against when we decide on making a breaking release of the providers or adding new features. And I think it's very well controlled by the pre-commits/cross-verification and general very limited "code structure" we have there - hooks, contain hooks, operators contain operators etc. But I think this is very much different for Airlow "core" - there are multiple classes that have no `_` and they are not really specified if they could or should be used by the users to rely on and not marked with :private: nor _. And I think keeping that in check could be difficult. But maybe that is a good idea? Do you think @dstandish that we should synchronize and make similar approach for airflow - instead of allowlisting and explicitly specifying what are the public classes we should use, to make an effort to review all those Airlfow classes in airflow and rename them to _* for example to follow what has been done and followed in providers? I would also be very happy if we would do it (and we could even add pre-commit to add some friction to add a new class that is public classes). But I think this might be pretty invasive for multiple reasons. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org