potiuk commented on code in PR #28300:
URL: https://github.com/apache/airflow/pull/28300#discussion_r1058655785


##########
docs/apache-airflow/administration-and-deployment/public-airflow-interface.rst:
##########
@@ -0,0 +1,87 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Public Interface of Airflow
+===========================
+
+The Public Interface of Apache Airflow is a set of programmatic interfaces 
that allow developers to interact
+with and access certain features of the Apache Airflow system. This includes 
operations such as
+creating and managing DAGs (directed acyclic graphs), managing tasks and their 
dependencies,
+and extending Airflow capabilities by writing new executors, plugins, 
operators and providers. The
+Public Interface can be useful for building custom tools and integrations with 
other systems,
+and for automating certain aspects of the Airflow workflow.
+
+You can extend Airflow in three ways:
+
+* By writing new custom Python code (via Operators, Plugins, Provider)
+* By using the `Stable REST API <stable-rest-api-ref>`_ (based on the OpenAPI 
specification)
+* By using the `Airflow Command Line Interface (CLI) 
<cli-and-env-variables-ref.rst>`_
+
+How can you extend Apache Airflow with custom Python Code?
+==========================================================
+
+The Public Interface of Airflow consists of a number of different classes and 
packages that provide access
+to the core features and functionality of the system.
+
+The classes and packages that may be considered as the Public Interface 
include:
+
+* The :class:`~airflow.DAG`, which provides a way to define and manage DAGs in 
Airflow.
+* The :class:`~airflow.models.baseoperator.BaseOperator`, which provides a way 
write custom operators.
+* The :class:`~airflow.hooks.base.BaseHook`, which provides a way write custom 
hooks.
+* The :class:`~airflow.models.connection.Connection`, which provides access to 
external service credentials and configuration.
+* The :class:`~airflow.models.variable.Variable`, which provides access to 
Airflow configuration variables.
+* The :class:`~airflow.models.xcom.XCom` which are used to access to 
inter-task communication data.
+* The :class:`~airflow.secrets.BaseSecretsBackend` which are used to define 
custom secret managers.
+* The :class:`~airflow.plugins_manager.AirflowPlugin` which are used to define 
custom plugins.
+* The :class:`~airflow.triggers.base.BaseTrigger`, which are used to implement 
custom Custom Deferrable Operators (based on ``asyncio``).
+* The :class:`~airflow.decorators.base.TaskDecorator`, which provides a way 
write custom decorators.
+* The :class:`~airflow.listeners.listener.ListenerManager` class which 
provides hooks that can be implemented to respond to DAG/Task lifecycle events.
+
+.. versionadded:: 2.5
+
+   Listener public interface has been added in version 2.5.
+
+* The :class:`~airflow.executors.base_executor.BaseExecutor` - the Executors 
are the components of Airflow
+  that are responsible for executing tasks.
+
+.. versionadded:: 2.6
+
+   There are a number of different executor implementations built-in Airflow, 
each with its own unique
+   characteristics and capabilities. Executor interface was available in 
earlier version of Airflow but
+   only as of version 2.6 executors are fully decoupled and Airflow does not 
rely on built-in set of executors.
+   You could have implemented (and succeeded) with implementing Executors 
before Airflow 2.6 and a number
+   of people succeeded in doing so, but there were some hard-coded behaviours 
that preferred in-built
+   executors, and custom executors could not provide full functionality that 
built-in executors had.
+
+
+What is not part of the Public Interface of Apache Airflow?
+===========================================================
+
+Everything not mentioned in this document should be considered as non-Public 
Interface.

Review Comment:
   > It's also important to clarify that some methods are intentionally 
"private". E.g. in the operator, it might have helper methods. But these 
methods are more implementation detail right? If users subclass, though, they 
may use and depend on these. So what do we do? Ideally, from maintainer 
perspective, we want the overall behavior of the operator to be subject to 
backcompat but not any of its methods. Currently the convention is, i believe, 
all methods are assumed public unless prefix with underscore / marked meta 
private. WDYT?
   
   For providers it's far easier because there is no "internal" sharing between 
multiple modules. Each of them is really either "standalone" or uses "public" 
classes from other providers. And with common.sql we even introduced stubgen to 
generate the actual "public interface" that MyPy can surface as errors. And we 
even have a friction added (necessiety of regenerating the stubs) when someone 
changes the API by accident.
   
   > One problem with the public / private distinction is... from tool 
perspective ... even using _my_func across modules is sorta forbidden. But 
there is a difference between the internal "private" and the user-facing 
"private". We need to be able to write helper code that can be used across 
modules (internal public), but not have it be user-facing public.
   
   This is very right assesment.
   
   That's why I proposed to allowlist and be explicit about everything that is 
public. And from the tool perspective, one thing is  important (what we did for 
common.sql) - to keep the list of all public methods, and whenever something 
there is *removed* or *updated*, fail the build and make it an explicit action 
(friction) for the one who modifies it accidentally. We can also add a tool for 
whoever uses airflow as library to check if they are not using something not 
"intentionally" public (we've done that for common.sql - via MyPy stubs).
   
   But we cannot  (yet) do any automation in Airflow, because first we need to 
agree and document what is and what is not public. We need a good list of 
things that should be "public" - so that we can (eventually) add friction to 
removing or updating those and possibly eventually add a tool for others to 
verify if they are not using something they should not. 
   
   This doc here is mostly to start the discussion and work-out a good set of 
classes/packages that we should consider "really public". For now in a 
document, eventually "guarded automatically".
   
   BTW. As I mentioned in many places, we are not able to get it 100% correct. 
Never. [Hyrum's law](https://www.hyrumslaw.com/) is very clear and IMHO very 
true about this. 
   
   But we can state our intentions, and make an effort to keep those 
"explicitly intended" APIs controlled. 
   
   And first ... we need to agree on what our intentions are. If we don't 
document them and don't agree what is and what is not our intention for public 
API - everyone will have their own understanding of it.  And often those 
understandings will be different one (for example many users still use the DB 
of Airflow and relies on its structure because they assume they can rely on 
it). You wrote yourself that you prefer to `minimize backcompat surface` and 
"everything that's not an operator / sensor / hook should be private". But this 
is only your understanding - others might say "everything in utils can be used 
by DAG authors and provider writers".  Both statement are possible - both 
orthogonally different. Both might be deliberate decision we should make as a 
community. But we need to make that decision and be very explicit about it. 
This is a starting point that is missing now IMHO.
   
   I just want to have a page where we all agree what we intentionally make 
"public", and have clear rules that all the rest is not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to