Re: [PR] docs: update architecture overview to Airflow 3 architecture [airflow]

via GitHub Tue, 09 Jun 2026 14:30:08 -0700


o-nikolas commented on code in PR #67994:
URL: https://github.com/apache/airflow/pull/67994#discussion_r3383900701



##########
airflow-core/docs/core-concepts/overview.rst:
##########
@@ -49,15 +49,19 @@ A minimal Airflow installation consists of the following 
components:
   a configuration property of the *scheduler*, not a separate component and 
runs within the scheduler
   process. There are several executors available out of the box, and you can 
also write your own.
 
-* A *Dag processor*, which parses Dag files and serializes them into the
+* A *Dag processor*, which parses Dag files from a *Dag bundle* and serializes 
them into the
   *metadata database*. More about processing Dag files can be found in
   :doc:`/administration-and-deployment/dagfile-processing`
 
-* A *webserver*, which presents a handy user interface to inspect, trigger and 
debug the behaviour of
-  Dags and tasks.
+* A *Dag bundle*, which is configured for the *Dag processor* to parse Dag 
files from and allow *workers* to access the correct version of the Dag file. 
By default, this is a local folder on disk. More about Dag bundles can be found 
in
+  :doc:`/administration-and-deployment/dag-bundles`
 
-* A folder of *Dag files*, which is read by the *scheduler* to figure out what 
tasks to run and when to
-  run them.
+* An *API Server*, which serves the REST API and presents a handy user 
interface to inspect, trigger and debug the behaviour of

Review Comment:
   Nit: "handy user interface" is a bit colloquial/AI sounding.
   
    
   ```suggestion
   * An *API Server*, which serves the REST API and presents a user interface 
to inspect, trigger and debug the behaviour of
   ```



##########
airflow-core/docs/core-concepts/overview.rst:
##########
@@ -49,15 +49,19 @@ A minimal Airflow installation consists of the following 
components:
   a configuration property of the *scheduler*, not a separate component and 
runs within the scheduler
   process. There are several executors available out of the box, and you can 
also write your own.
 
-* A *Dag processor*, which parses Dag files and serializes them into the
+* A *Dag processor*, which parses Dag files from a *Dag bundle* and serializes 
them into the
   *metadata database*. More about processing Dag files can be found in
   :doc:`/administration-and-deployment/dagfile-processing`
 
-* A *webserver*, which presents a handy user interface to inspect, trigger and 
debug the behaviour of
-  Dags and tasks.
+* A *Dag bundle*, which is configured for the *Dag processor* to parse Dag 
files from and allow *workers* to access the correct version of the Dag file. 
By default, this is a local folder on disk. More about Dag bundles can be found 
in
+  :doc:`/administration-and-deployment/dag-bundles`
 
-* A folder of *Dag files*, which is read by the *scheduler* to figure out what 
tasks to run and when to
-  run them.
+* An *API Server*, which serves the REST API and presents a handy user 
interface to inspect, trigger and debug the behaviour of
+  Dags and tasks. The API server is also used by *workers* to communicate 
state back to Airflow, without requiring direct access

Review Comment:
   I'd say Tasks to communicate state back (perhaps even more specifically the 
task supervisor, but I don't think users need that level of detail). Workers 
are a very informal component that don't really exist in all cases, it's very 
dependent on which executor you're using. I know this doc uses that noun a lot, 
but I think we should refrain from it where we can, instead using a more 
specific/accurate noun (especially in new updates).



##########
airflow-core/docs/core-concepts/overview.rst:
##########
@@ -49,15 +49,19 @@ A minimal Airflow installation consists of the following 
components:
   a configuration property of the *scheduler*, not a separate component and 
runs within the scheduler
   process. There are several executors available out of the box, and you can 
also write your own.
 
-* A *Dag processor*, which parses Dag files and serializes them into the
+* A *Dag processor*, which parses Dag files from a *Dag bundle* and serializes 
them into the
   *metadata database*. More about processing Dag files can be found in
   :doc:`/administration-and-deployment/dagfile-processing`
 
-* A *webserver*, which presents a handy user interface to inspect, trigger and 
debug the behaviour of
-  Dags and tasks.
+* A *Dag bundle*, which is configured for the *Dag processor* to parse Dag 
files from and allow *workers* to access the correct version of the Dag file. 
By default, this is a local folder on disk. More about Dag bundles can be found 
in
+  :doc:`/administration-and-deployment/dag-bundles`
 
-* A folder of *Dag files*, which is read by the *scheduler* to figure out what 
tasks to run and when to
-  run them.
+* An *API Server*, which serves the REST API and presents a handy user 
interface to inspect, trigger and debug the behaviour of
+  Dags and tasks. The API server is also used by *workers* to communicate 
state back to Airflow, without requiring direct access
+  to the *metadata database*.
+
+* The *Task SDK*, which is an isolated runtime environment inside the 
*workers* that executes the user-defined Dag code.

Review Comment:
   The "Task SDK" is just that, an SDK, it isn't a runtime environment (and 
that environment is only isolated if you make it isolated, you can still have 
the credentials for the Metadata DB on the workers/compute if you wanted it 
that way).
   
   I would just merge this with the above sentence about the the API server. 
Noting that Tasks use the Task SDK to communicate state back via the Task API 



##########
airflow-core/newsfragments/67994.doc.rst:
##########
@@ -0,0 +1 @@
+Updated the Architecture Overview page to reflect Airflow 3 architecture 
changes: replaced ``webserver`` references with ``api-server``, introduced 
``DAG bundles``as a required component, corrected ``DAG processor`` as required 
in all deployments, and fixed the claim that the scheduler reads DAG files 
directly.

Review Comment:
   I don't think you need a newsfragment for a docs update, that's quite 
overkill.
   
   But if you do really want to keep it:
   
   
   ```suggestion
   Updated the Architecture Overview page to reflect Airflow 3 architecture 
changes: replaced ``webserver`` references with ``API Server``, introduced 
``Dag bundles``as a required component, corrected ``Dag Processor`` as required 
in all deployments, and fixed the claim that the scheduler reads Dag files 
directly.
   ```



##########
airflow-core/docs/core-concepts/overview.rst:
##########
@@ -177,21 +180,22 @@ Helm Chart documentation. Helm chart is one of the ways 
how to deploy Airflow in
 Separate Dag processing architecture
 ....................................
 
-In a more complex installation where security and isolation are important, 
you'll also see the
-standalone *Dag processor* component that allows to separate *scheduler* from 
accessing *Dag files*.
-This is suitable if the deployment focus is on isolation between parsed tasks. 
While Airflow does not yet
-support full multi-tenant features, it can be used to make sure that **Dag 
author** provided code is never
-executed in the context of the scheduler.
+The *Dag processor* is a required component in all Airflow 3 deployments. In 
distributed
+deployments it runs as a standalone process, ensuring the *scheduler* never 
has direct access

Review Comment:
   The Dag Processor is always a standalone process in Airflow 3, whether 
Airflow is deployed in a distributed manner across several compute instances or 
just one one single compute instance.



##########
airflow-core/docs/core-concepts/overview.rst:
##########
@@ -92,14 +96,14 @@ All the components are Python applications that can be 
deployed using various de
 They can have extra *installed packages* installed in their Python 
environment. This is useful for example to
 install custom operators or sensors or extend Airflow functionality with 
custom plugins.
 
-While Airflow can be run in a single machine and with simple installation 
where only *scheduler* and
-*webserver* are deployed, Airflow is designed to be scalable and secure, and 
is able to run in a distributed
+While Airflow can be run in a single machine and with simple installation 
where only *scheduler*, *Dag processor* and
+*API server* are deployed, Airflow is designed to be scalable and secure, and 
is able to run in a distributed
 environment - where various components can run on different machines, with 
different security perimeters
 and can be scaled by running multiple instances of the components above.
 
 The separation of components also allow for increased security, by isolating 
the components from each other
 and by allowing to perform different tasks. For example separating *Dag 
processor* from *scheduler*
-allows to make sure that the *scheduler* does not have access to the *Dag 
files* and cannot execute
+in Airflow 3 makes sure that the *scheduler* does not have access to the *Dag 
bundles* and cannot execute

Review Comment:
   Those two components were separated before Airflow 3, so we're not obliged 
to call that out I don't think. It was possible to run them both ways (separate 
or not) for quite a while.
   



##########
airflow-core/docs/core-concepts/overview.rst:
##########
@@ -177,21 +180,22 @@ Helm Chart documentation. Helm chart is one of the ways 
how to deploy Airflow in
 Separate Dag processing architecture
 ....................................
 
-In a more complex installation where security and isolation are important, 
you'll also see the
-standalone *Dag processor* component that allows to separate *scheduler* from 
accessing *Dag files*.
-This is suitable if the deployment focus is on isolation between parsed tasks. 
While Airflow does not yet
-support full multi-tenant features, it can be used to make sure that **Dag 
author** provided code is never
-executed in the context of the scheduler.
+The *Dag processor* is a required component in all Airflow 3 deployments. In 
distributed
+deployments it runs as a standalone process, ensuring the *scheduler* never 
has direct access
+to *Dag bundles* and cannot execute code provided by a **Dag author**. While 
Airflow does not
+yet support full multi-tenant features, this separation ensures that **Dag 
author** provided
+code is never executed in the context of the *scheduler*.
 
 .. image:: ../img/diagram_dag_processor_airflow_architecture.png
 
 .. note::
 
-    When Dag file is changed there can be cases where the scheduler and the 
worker will see different
-    versions of the Dag until both components catch up. You can avoid the 
issue by making sure Dag is
-    deactivated during deployment and reactivate once finished. If needed, the 
cadence of sync and scan
-    of Dag folder can be configured. Please make sure you really know what you 
are doing if you change
-    the configurations.
+    When using the default local disk *Dag bundle* backend, which does not 
support
+    versioning, there can be cases where the *Dag processor* and *workers* see 
different
+    versions of a DAG until both catch up to the latest files. Versioned *Dag 
bundle*
+    backends (such as git) address this by allowing the *scheduler* to pin a 
specific
+    bundle version when dispatching each task. If needed, the cadence of sync 
and scan
+    of the *Dag bundle* can be configured.

Review Comment:
   ```suggestion
       When using the default local disk *Dag bundle* backend, which does not 
support
       versioning, there can be cases where the *Dag processor* and *workers* 
see different
       versions of a Dag until both catch up to the latest files. Versioned 
*Dag bundle*
       backends (such as Git) address this by allowing the *scheduler* to pin a 
specific
       bundle version when dispatching each task. If needed, the cadence of 
sync and scan
       of the *Dag bundle* can be configured.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: update architecture overview to Airflow 3 architecture [airflow]

Reply via email to