amoghrajesh commented on code in PR #52196:
URL: https://github.com/apache/airflow/pull/52196#discussion_r2181905823


##########
airflow-core/docs/installation/upgrading_to_airflow3.rst:
##########
@@ -20,6 +20,43 @@ Upgrading to Airflow 3
 
 Apache Airflow 3 is a major release and contains :ref:`breaking 
changes<breaking-changes>`. This guide walks you through the steps required to 
upgrade from Airflow 2.x to Airflow 3.0.
 
+Understanding Airflow 3.x Architecture Changes
+-----------------------------------------------
+
+Airflow 3.x introduces significant architectural changes that improve 
security, scalability, and maintainability. Understanding these changes helps 
you prepare for the upgrade and adapt your workflows accordingly.
+
+Airflow 2.x Architecture
+^^^^^^^^^^^^^^^^^^^^^^^^
+.. image:: ../img/airflow-2-arch.png
+   :alt: Airflow 2.x architecture diagram showing scheduler, metadata 
database, and worker
+   :align: center
+
+- All components communicate directly with the Airflow metadata database.
+- Airflow 2 was designed to run all components within the same network space: 
task code and task execution code (airflow package code that runs user code) 
run in the same process.
+- Workers communicate directly with the Airflow database and execute all user 
code.
+- User code could import sessions and perform malicious actions on the Airflow 
metadata database.
+- The number of connections to the database was excessive, leading to scaling 
challenges.
+
+Airflow 3.x Architecture
+^^^^^^^^^^^^^^^^^^^^^^^^
+.. image:: ../img/airflow-3-arch.png
+   :alt: Airflow 3.x architecture diagram showing the decoupled Execution API 
Server and worker subprocesses
+   :align: center
+
+- The API server is currently the sole access point for the metadata DB for 
tasks and workers.
+- It supports several applications: the Airflow REST API, an internal API for 
the Airflow UI that hosts static JS, and an API for workers to interact with 
when executing TIs via the task execution interface.
+- Workers communicate with the API server instead of directly with the 
database.
+- DAG processor and Triggerer utilize the task execution mechanism for their 
tasks, especially when they require variables or connections.
+
+Database Access Restrictions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+In Airflow 3.x, direct metadata database access from task code is now 
restricted. This is a key security and architectural improvement that affects 
how DAG authors interact with Airflow resources:

Review Comment:
   ```suggestion
   In Airflow 3, direct metadata database access from task code is now 
restricted. This is a key security and architectural improvement that affects 
how DAG authors interact with Airflow resources:
   ```



##########
task-sdk/docs/concepts.rst:
##########
@@ -0,0 +1,62 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+
+Concepts
+========
+
+This section covers the fundamental concepts that DAG authors need to 
understand when working with the Task SDK.
+
+.. note::
+
+    For information about Airflow 3.x architectural changes and database 
access restrictions, see the "Upgrading to Airflow 3" guide in the main Airflow 
documentation.
+
+Terminology
+-----------
+- **Task**: a Python function (decorated with ``@task``) or Operator 
invocation representing a unit of work in a DAG.
+- **Task Execution**: the runtime machinery that executes user tasks in 
isolated subprocesses, managed via the Supervisor and Execution API.
+
+Task Lifecycle
+--------------
+
+Understanding the task lifecycle helps DAG authors write more effective tasks 
and debug issues:
+
+- **Scheduled**: The Airflow scheduler enqueues the task instance. The 
Executor assigns a workload token used for subsequent API authentication and 
validation.

Review Comment:
   ```suggestion
   - **Scheduled**: The Airflow scheduler enqueues the task instance. The 
Executor assigns a workload token used for subsequent API authentication and 
validation with the Airflow API Server.
   ```



##########
task-sdk/docs/index.rst:
##########
@@ -46,47 +50,109 @@ Define a basic DAG and task in just a few lines of Python:
    :end-before: [END simplest_dag]
    :caption: Simplest DAG with :func:`@dag <airflow.sdk.dag>`  and 
:func:`@task <airflow.sdk.task>`
 
-Examples
---------
+2. Public Interface
+-------------------
+
+Direct metadata database access from task code is now restricted. A dedicated 
Task Execution API handles all runtime interactions (state transitions, 
heartbeats, XComs, and resource fetching), ensuring isolation and security.
+
+Airflow now supports a service-oriented architecture, enabling tasks to be 
executed remotely via a new Task Execution API. This API decouples task 
execution from the scheduler and introduces a stable contract for running tasks 
outside of Airflow's traditional runtime environment.
+
+To support remote execution, Airflow provides the Task SDK — a lightweight 
runtime environment for running Airflow tasks in external systems such as 
containers, edge environments, or other runtimes. This lays the groundwork for 
language-agnostic task execution and brings improved isolation, portability, 
and extensibility to Airflow-based workflows.
+
+Airflow 3.0 also introduces a new ``airflow.sdk`` namespace that exposes the 
core authoring interfaces for defining DAGs and tasks. DAG authors should now 
import objects like :class:`airflow.sdk.DAG`, :func:`airflow.sdk.dag`, and 
:func:`airflow.sdk.task` from ``airflow.sdk`` rather than internal modules. 
This new namespace provides a stable, forward-compatible interface for DAG 
authoring across future versions of Airflow.
+
+3. DAG Authoring Enhancements
+-----------------------------
+
+Writing your DAGs is now more consistent in Airflow 3.0. Use the stable 
:mod:`airflow.sdk` interface to define your workflows and tasks.
+
+Why use ``airflow.sdk``?
+^^^^^^^^^^^^^^^^^^^^^^^^
+- Decouple your DAG definitions from Airflow internals (Scheduler, API Server, 
etc.)
+- Enjoy a consistent API that won't break across Airflow upgrades
+- Import only the classes and decorators you need, without installing the full 
Airflow core
+
+**Key imports from airflow.sdk**
+
+**Classes**
+
+- :class:`airflow.sdk.Asset`

Review Comment:
   Add `BaseHook`



##########
task-sdk/docs/concepts.rst:
##########
@@ -0,0 +1,62 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+
+Concepts
+========
+
+This section covers the fundamental concepts that DAG authors need to 
understand when working with the Task SDK.
+
+.. note::
+
+    For information about Airflow 3.x architectural changes and database 
access restrictions, see the "Upgrading to Airflow 3" guide in the main Airflow 
documentation.
+
+Terminology
+-----------
+- **Task**: a Python function (decorated with ``@task``) or Operator 
invocation representing a unit of work in a DAG.
+- **Task Execution**: the runtime machinery that executes user tasks in 
isolated subprocesses, managed via the Supervisor and Execution API.
+
+Task Lifecycle
+--------------
+
+Understanding the task lifecycle helps DAG authors write more effective tasks 
and debug issues:
+
+- **Scheduled**: The Airflow scheduler enqueues the task instance. The 
Executor assigns a workload token used for subsequent API authentication and 
validation.
+- **Queued**: Workers poll the queue to retrieve and reserve queued task 
instances.
+- **Subprocess Launch**: The worker's Supervisor process spawns a dedicated 
subprocess (Task Runner) for the task instance, isolating its execution.
+- **Run API Call**: The Supervisor sends a ``POST /run`` call to the Execution 
API to mark the task as running; the API server responds with a 
``TIRunContext`` (including retry limits, fail-fast flags, etc.).

Review Comment:
   Expand a bit more on what comes in the TiRunContext. (the imp fields atleast)



##########
task-sdk/docs/concepts.rst:
##########
@@ -0,0 +1,62 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+
+Concepts
+========
+
+This section covers the fundamental concepts that DAG authors need to 
understand when working with the Task SDK.
+
+.. note::
+
+    For information about Airflow 3.x architectural changes and database 
access restrictions, see the "Upgrading to Airflow 3" guide in the main Airflow 
documentation.
+
+Terminology
+-----------
+- **Task**: a Python function (decorated with ``@task``) or Operator 
invocation representing a unit of work in a DAG.
+- **Task Execution**: the runtime machinery that executes user tasks in 
isolated subprocesses, managed via the Supervisor and Execution API.
+
+Task Lifecycle
+--------------
+
+Understanding the task lifecycle helps DAG authors write more effective tasks 
and debug issues:
+
+- **Scheduled**: The Airflow scheduler enqueues the task instance. The 
Executor assigns a workload token used for subsequent API authentication and 
validation.
+- **Queued**: Workers poll the queue to retrieve and reserve queued task 
instances.
+- **Subprocess Launch**: The worker's Supervisor process spawns a dedicated 
subprocess (Task Runner) for the task instance, isolating its execution.
+- **Run API Call**: The Supervisor sends a ``POST /run`` call to the Execution 
API to mark the task as running; the API server responds with a 
``TIRunContext`` (including retry limits, fail-fast flags, etc.).
+- **Resource Fetch**: During execution, if the task code requests Airflow 
resources (variables, connections, etc.), it writes a request to STDOUT. The 
Supervisor intercepts it, issues a corresponding API call, and writes the API 
response into the subprocess's STDIN.

Review Comment:
   ```suggestion
   - **Runtime Dependency Fetching**: During execution, if the task code 
requests Airflow resources (variables, connections, etc.), it writes a request 
to STDOUT. The Supervisor receives it and issues a corresponding API call, and 
writes the API response into the subprocess's STDIN.
   ```



##########
task-sdk/docs/concepts.rst:
##########
@@ -0,0 +1,62 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+
+Concepts
+========
+
+This section covers the fundamental concepts that DAG authors need to 
understand when working with the Task SDK.
+
+.. note::
+
+    For information about Airflow 3.x architectural changes and database 
access restrictions, see the "Upgrading to Airflow 3" guide in the main Airflow 
documentation.
+
+Terminology
+-----------
+- **Task**: a Python function (decorated with ``@task``) or Operator 
invocation representing a unit of work in a DAG.
+- **Task Execution**: the runtime machinery that executes user tasks in 
isolated subprocesses, managed via the Supervisor and Execution API.
+
+Task Lifecycle
+--------------
+
+Understanding the task lifecycle helps DAG authors write more effective tasks 
and debug issues:
+
+- **Scheduled**: The Airflow scheduler enqueues the task instance. The 
Executor assigns a workload token used for subsequent API authentication and 
validation.
+- **Queued**: Workers poll the queue to retrieve and reserve queued task 
instances.
+- **Subprocess Launch**: The worker's Supervisor process spawns a dedicated 
subprocess (Task Runner) for the task instance, isolating its execution.
+- **Run API Call**: The Supervisor sends a ``POST /run`` call to the Execution 
API to mark the task as running; the API server responds with a 
``TIRunContext`` (including retry limits, fail-fast flags, etc.).
+- **Resource Fetch**: During execution, if the task code requests Airflow 
resources (variables, connections, etc.), it writes a request to STDOUT. The 
Supervisor intercepts it, issues a corresponding API call, and writes the API 
response into the subprocess's STDIN.
+- **Heartbeats & Token Renewal**: The Task Runner periodically emits ``POST 
/heartbeat`` calls. Each call authenticates via JWT; if the token has expired, 
the API server returns a refreshed token in the ``Refreshed-API-Token`` header.

Review Comment:
   ```suggestion
   - **Heartbeats & Token Renewal**: The Task Runner periodically emits ``POST 
/heartbeat`` calls through the Supervisor. Each call authenticates via JWT; if 
the token has expired, the API server returns a refreshed token in the 
``Refreshed-API-Token`` header.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to