andrewgodwin commented on a change in pull request #15444:
URL: https://github.com/apache/airflow/pull/15444#discussion_r616242002



##########
File path: docs/apache-airflow/concepts/overview.rst
##########
@@ -0,0 +1,96 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Architecture Overview
+=====================
+
+Airflow is a platform that lets you build and run *workflows*. A workflow is 
represented as a :doc:`DAG <dags>` (a Directed Acyclic Graph), and contains 
individual pieces of work called :doc:`tasks`, arranged with dependencies and 
data flows taken into account.
+
+.. image:: /img/edge_label_example.png
+  :alt: An example Airflow DAG, rendered in Graph View
+
+A DAG specifies the dependencies between Tasks, and the order in which to 
execute them and run retries; the Tasks themselves describe what to do, be it 
fetching data, running analysis, triggering other systems, or more.
+
+An Airflow installation generally consists of the following components:
+
+* A :doc:`scheduler <scheduler>`, which handles both triggering scheduled 
workflows, and submitting :doc:`tasks` to the executor to run.
+
+* An :doc:`executor </executor/index>`, which handles running tasks. In the 
default Airflow installation, this runs everything *inside* the scheduler, but 
most production-suitable executors actually push task execution out to 
*workers*.
+
+* A *webserver*, which presents a handy user interface to inspect, trigger and 
debug the behaviour of DAGs and tasks.
+
+* A folder of *DAG files*, read by the scheduler and executor (and any workers 
the executor has)
+
+* A *metadata database*, used by the scheduler, executor and webserver to 
store state.
+
+.. image:: /img/arch-diag-basic.png
+
+Most executors will generally also introduce other components to let them talk 
to their workers - like a task queue - but you can still think of the executor 
and its workers as a single logical component in Airflow overall, handling the 
actual task execution.
+
+Airflow itself is agnostic to what you're running - it will happily 
orchestrate and run anything, either with high-level support from one of our 
providers, or directly as a command using the shell or Python :doc:`operators`.
+
+Workloads
+---------
+
+A DAG runs though a series of :doc:`tasks`, and there are three common types 
of task you will see:
+
+* :doc:`operators`, predefined tasks that you can string together quickly to 
build most parts of your DAGs.
+
+* :doc:`sensors`, a special subclass of Operators which are entirely about 
waiting for an external event to happen.
+
+* A :doc:`taskflow`-decorated ``@task``, which is a custom Python function 
packaged up as a Task.
+
+Internally, these are all actually subclasses of Airflow's ``BaseOperator``, 
and the concepts of Task and Operator are somewhat interchangeable, but it's 
useful to think of them as separate concepts - essentially, Operators and 
Sensors are *templates*, and when you call one in a DAG file, you're making a 
Task.
+
+
+Control Flow

Review comment:
       Yes, it's deliberately repeated twice as it's one of the most confusing 
terminology differences.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to