This is an automated email from the ASF dual-hosted git repository.

ephraimanierobi pushed a commit to branch v2-8-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit a2573503a685c6c1de76a9b91b0543e0b59e9882
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Mon Dec 4 11:07:43 2023 +0100

    Add the section describing the security model of DAG Author capabilities 
(#36022)
    
    * Add the section describing the security model of DAG Author capabilities
    
    This change codifies and explains assumptions and decisions made by
    Airflow maintainers with regards to capabilities of DAG Authors.
    
    While DAG authors are pretty powerful and capable actors in Airflow,
    they cannot do everything and Deployment Managers haw ways to restrict
    their potential capabilities, especially in the context of influencing
    other tasks and common components such as Scheduler, Webserver and
    Triggerer.
    
    This PR adds a chapter explaining those assumptions and decisions and
    tell the Deployment Managers what responsibilities they have with
    that regardsm and what mechanismes they currently have available to
    limit capabilities of DAG Authors.
    
    * Update docs/apache-airflow/security/security_model.rst
    
    Co-authored-by: Pankaj Koti <pankajkoti...@gmail.com>
    
    * Update docs/apache-airflow/security/security_model.rst
    
    Co-authored-by: Pankaj Koti <pankajkoti...@gmail.com>
    
    ---------
    
    Co-authored-by: Pankaj Koti <pankajkoti...@gmail.com>
    (cherry picked from commit 395ac463494dba1478a05a32900218988495889c)
---
 docs/apache-airflow/security/security_model.rst | 280 +++++++++++++++++-------
 1 file changed, 197 insertions(+), 83 deletions(-)

diff --git a/docs/apache-airflow/security/security_model.rst 
b/docs/apache-airflow/security/security_model.rst
index 4b426ea041..867b9faa1a 100644
--- a/docs/apache-airflow/security/security_model.rst
+++ b/docs/apache-airflow/security/security_model.rst
@@ -15,6 +15,9 @@
     specific language governing permissions and limitations
     under the License.
 
+.. contents::
+    :local:
+
 Airflow Security Model
 ======================
 
@@ -32,30 +35,41 @@ Airflow security model - user types
 The Airflow security model involves different types of users with
 varying access and capabilities:
 
-1. **Deployment Managers**: They have the highest level of access and
-   control. They install and configure Airflow, and make decisions about
-   technologies and permissions. They can potentially delete the entire
-   installation and have access to all credentials. Deployment Managers
-   can also decide to keep audits, backups and copies of information
-   outside of Airflow, which are not covered by Airflow's security
-   model.
-
-2. **DAG Authors**: They can upload, modify, and delete DAG files. The
-   code in DAG files is executed on workers and in the DAG File Processor. Note
-   that in the simple deployment configuration, parsing DAGs is executed as
-   a subprocess of the Scheduler process, but with Standalone DAG File 
Processor
-   deployment managers might separate parsing DAGs from the Scheduler process.
-   Therefore, DAG authors can create and change code executed on workers
-   and the DAG File Processor and potentially access the credentials that the 
DAG
-   code uses to access external systems. DAG Authors have full access
-   to the metadata database and internal audit logs.
-
-3. **Authenticated UI users**: They have access to the UI and API. See below
-   for more details on the capabilities authenticated UI users may have.
-
-4. **Non-authenticated UI users**: Airflow doesn't support
-   unauthenticated users by default. If allowed, potential vulnerabilities
-   must be assessed and addressed by the Deployment Manager.
+Deployment Managers
+...................
+
+They have the highest level of access and
+control. They install and configure Airflow, and make decisions about
+technologies and permissions. They can potentially delete the entire
+installation and have access to all credentials. Deployment Managers
+can also decide to keep audits, backups and copies of information
+outside of Airflow, which are not covered by Airflow's security
+model.
+
+DAG Authors
+...........
+
+They can upload, modify, and delete DAG files. The
+code in DAG files is executed on workers and in the DAG File Processor. Note
+that in the simple deployment configuration, parsing DAGs is executed as
+a subprocess of the Scheduler process, but with Standalone DAG File Processor
+deployment managers might separate parsing DAGs from the Scheduler process.
+Therefore, DAG authors can create and change code executed on workers
+and the DAG File Processor and potentially access the credentials that the DAG
+code uses to access external systems. DAG Authors have full access
+to the metadata database and internal audit logs.
+
+Authenticated UI users
+.......................
+
+They have access to the UI and API. See below for more details on the 
capabilities
+authenticated UI users may have.
+
+Non-authenticated UI users
+..........................
+
+Airflow doesn't support unauthenticated users by default. If allowed, 
potential vulnerabilities
+must be assessed and addressed by the Deployment Manager.
 
 Capabilities of authenticated UI users
 --------------------------------------
@@ -67,40 +81,134 @@ scoped as tightly as a single DAG, for example, or as 
broad as Admin.
 Below are four general categories to help conceptualize some of the
 capabilities authenticated users may have:
 
-1. **Admin users**: They manage and grant permissions to other users,
-   with full access to all UI capabilities. They can potentially execute
-   code on workers by configuring connections and need to be trusted not
-   to abuse these privileges. They have access to sensitive credentials
-   and can modify them. By default, they don't have access to
-   system-level configuration. They should be trusted not to misuse
-   sensitive information accessible through connection configuration.
-   They also have the ability to create a Webserver Denial of Service
-   situation and should be trusted not to misuse this capability.
-
-2. **Operations users**: The primary difference between an operator and admin
-   if the ability to manage and grant permissions to other users - only admins
-   are able to do this. Otherwise assume they have the same access as an admin.
-
-3. **Connection configuration users**: They configure connections and
-   potentially execute code on workers during DAG execution. Trust is
-   required to prevent misuse of these privileges. They have full access
-   to sensitive credentials stored in connections and can modify them.
-   Access to sensitive information through connection configuration
-   should be trusted not to be abused. They also have the ability to
-   create a Webserver Denial of Service situation and should be trusted
-   not to misuse this capability.
-
-4. **Audit log users**: They can view audit events for the whole Airflow 
installation.
-
-5. **Normal Users**: They can view and interact with the UI and API.
-   They are able to view and edit DAGs, task instances, and DAG runs, and view 
task logs.
+Admin users
+...........
+
+They manage and grant permissions to other users,
+with full access to all UI capabilities. They can potentially execute
+code on workers by configuring connections and need to be trusted not
+to abuse these privileges. They have access to sensitive credentials
+and can modify them. By default, they don't have access to
+system-level configuration. They should be trusted not to misuse
+sensitive information accessible through connection configuration.
+They also have the ability to create a Webserver Denial of Service
+situation and should be trusted not to misuse this capability.
+
+Operations users
+................
+
+The primary difference between an operator and admin is the ability to manage 
and grant permissions
+to other users - only admins are able to do this. Otherwise assume they have 
the same access as an admin.
+
+Connection configuration users
+..............................
+
+They configure connections and potentially execute code on workers during DAG 
execution. Trust is
+required to prevent misuse of these privileges. They have full access
+to sensitive credentials stored in connections and can modify them.
+Access to sensitive information through connection configuration
+should be trusted not to be abused. They also have the ability to
+create a Webserver Denial of Service situation and should be trusted
+not to misuse this capability.
+
+Audit log users
+...............
+
+They can view audit events for the whole Airflow installation.
+
+Regular users
+.............
+
+They can view and interact with the UI and API. They are able to view and edit 
DAGs,
+task instances, and DAG runs, and view task logs.
 
 For more information on the capabilities of authenticated UI users, see 
:doc:`/security/access-control`.
 
+Capabilities of DAG Authors
+---------------------------
+
+DAG authors are able to submit code - via Python files placed in the 
DAGS_FOLDER - that will be executed
+in a number of circumstances. The code to execute is neither verified, checked 
nor sand-boxed by Airflow
+(that would be very difficult if not impossible to do), so effectively DAG 
authors can execute arbitrary
+code on the workers (part of Celery Workers for Celery Executor, local 
processes run by scheduler in case
+of Local Executor, Task Kubernetes POD in case of Kubernetes Executor), in the 
DAG File Processor
+(which can be either executed as standalone process or can be part of the 
Scheduler) and in the Triggerer.
+
+There are several consequences of this model chosen by Airflow, that 
deployment managers need to be aware of:
+
+Local executor and built-in DAG File Processor
+..............................................
+
+In case of Local Executor and DAG File Processor running as part of the 
Scheduler, DAG authors can execute
+arbitrary code on the machine where scheduler is running. This means that they 
can affect the scheduler
+process itself, and potentially affect the whole Airflow installation - 
including modifying cluster-wide
+policies and changing Airflow configuration. If you are running Airflow with 
one of those settings,
+the Deployment Manager must trust the DAG authors not to abuse this capability.
+
+Celery Executor
+...............
+
+In case of Celery Executor, DAG authors can execute arbitrary code on the 
Celery Workers. This means that
+they can potentially influence all the tasks executed on the same worker. If 
you are running Airflow with
+Celery Executor, the Deployment Manager must trust the DAG authors not to 
abuse this capability and unless
+Deployment Manager separates task execution by queues by Cluster Policies, 
they should assume, there is no
+isolation between tasks.
+
+Kubernetes Executor
+...................
+
+In case of Kubernetes Executor, DAG authors can execute arbitrary code on the 
Kubernetes POD they run. Each
+task is executed in a separate POD, so there is already isolation between 
tasks as generally speaking
+Kubernetes provides isolation between PODs.
+
+Triggerer
+.........
+
+In case of Triggerer, DAG authors can execute arbitrary code in Triggerer. 
Currently there are no
+enforcement mechanisms that would allow to isolate tasks that are using 
deferrable functionality from
+each other and arbitrary code from various tasks can be executed in the same 
process/machine. Deployment
+Manager must trust that DAG authors will not abuse this capability.
+
+DAG files not needed for Scheduler and Webserver
+................................................
+
+The Deployment Manager might isolate the code execution provided by DAG 
authors - particularly in
+Scheduler and Webserver by making sure that the Scheduler and Webserver don't 
even
+have access to the DAG Files (that requires standalone DAG File Processor to 
be deployed). Generally
+speaking - no DAG author provided code should ever be executed in the 
Scheduler or Webserver process.
+
+Allowing DAG authors to execute selected code in Scheduler and Webserver
+........................................................................
+
+There are a number of functionalities that allow the DAG author to use 
pre-registered custom code to be
+executed in scheduler or webserver process - for example they can choose 
custom Timetables, UI plugins,
+Connection UI Fields, Operator extra links, macros, listeners - all of those 
functionalities allow the
+DAG author to choose the code that will be executed in the scheduler or 
webserver process. However this
+should not be arbitrary code that DAG author can add in DAG folder. All those 
functionalities are
+only available via ``plugins`` and ``providers`` mechanisms where the code 
that is executed can only be
+provided by installed packages (or in case of plugins it can also be added to 
PLUGINS folder where DAG
+authors should not have write access to). PLUGINS FOLDER is a legacy mechanism 
coming from Airflow 1.10
+- but we recommend using entrypoint mechanism that allows the Deployment 
Manager to - effectively -
+choose and register the code that will be executed in those contexts. DAG 
Author has no access to
+install or modify packages installed in Webserver and Scheduler, and this is 
the way to prevent
+the DAG Author to execute arbitrary code in those processes.
+
+The Deployment Manager might decide to introduce additional control mechanisms 
to prevent DAG authors from
+executing arbitrary code. This is all fully in hands of the Deployment Manager 
and it is discussed in the
+following chapter.
+
 Responsibilities of Deployment Managers
 ---------------------------------------
 
-Deployment Managers are responsible for deploying airflow and make it 
accessible to the users
+As a Deployment Manager, you should be aware of the capabilities of DAG 
authors and make sure that
+you trust them not to abuse the capabilities they have. You should also make 
sure that you have
+properly configured the Airflow installation to prevent DAG authors from 
executing arbitrary code
+in the Scheduler and Webserver processes.
+
+Deploying and protecting Airflow installation
+.............................................
+
+Deployment Managers are also responsible for deploying airflow and make it 
accessible to the users
 in the way that follows best practices of secure deployment applicable to the 
organization where
 Airflow is deployed. This includes but is not limited to:
 
@@ -112,37 +220,43 @@ Airflow is deployed. This includes but is not limited to:
 * any kind of detection of unusual activity and protection against it
 * choosing the right session backend and configuring it properly including 
timeouts for the session
 
+Limiting DAG Author capabilities
+.................................
+
+The Deployment Manager might also use additional mechanisms to prevent DAG 
authors from executing
+arbitrary code - for example they might introduce tooling around DAG 
submission that would allow
+to review the code before it is deployed, statically-check it and add other 
ways to prevent malicious
+code to be submitted. The way how submitting code to DAG folder is done and 
protected is completely
+up to the Deployment Manager - Airflow does not provide any tooling or 
mechanisms around it and it
+expects that the Deployment Manager will provide the tooling to protect access 
to the DAG folder and
+make sure that only trusted code is submitted there.
+
 Airflow does not implement any of those feature natively, and delegates it to 
the deployment managers
 to deploy all the necessary infrastructure to protect the deployment - as 
external infrastructure components.
 
-Deployment Managers also determine access levels and must understand the 
potential
-damage users can cause. Some Deployment Managers may further limit
-access through fine-grained privileges for the **Authenticated UI
-users**. However, these limitations are outside the basic Airflow's
-security model and are at the discretion of Deployment Managers.
-Examples of fine-grained access control include (but are not limited
-to):
-
--  Limiting login permissions: Restricting the accounts that users can
-   log in with, allowing only specific accounts or roles belonging to
-   access the Airflow system.
-
--  Access restrictions to views or DAGs: Controlling user access to
-   certain views or specific DAGs, ensuring that users can only view or
-   interact with authorized components.
-
--  Implementing static code analysis and code review: Introducing
-   processes such as static code analysis and code review as part of the
-   DAG submission pipeline. This helps enforce code quality standards,
-   security checks, and adherence to best practices before DAGs are
-   deployed.
-
-These examples showcase ways in which Deployment Managers can refine and
-limit user privileges within Airflow, providing tighter control and
-ensuring that users have access only to the necessary components and
-functionalities based on their roles and responsibilities. However,
-fine-grained access control does not provide full isolation and
-separation of access to allow isolation of different user groups in a
-multi-tenant fashion yet. In future versions of Airflow, some
-fine-grained access control features could become part of the Airflow security
-model, as the Airflow community is working on a multi-tenant model currently.
+Limiting access for authenticated UI users
+...........................................
+
+Deployment Managers also determine access levels and must understand the 
potential damage users can cause.
+Some Deployment Managers may further limit access through fine-grained 
privileges for the **Authenticated UI
+users**. However, these limitations are outside the basic Airflow's security 
model and are at the
+discretion of Deployment Managers.
+
+Examples of fine-grained access control include (but are not limited to):
+
+*  Limiting login permissions: Restricting the accounts that users can log in 
with, allowing only specific
+   accounts or roles belonging to access the Airflow system.
+
+*  Access restrictions to views or DAGs: Controlling user access to certain 
views or specific DAGs,
+   ensuring that users can only view or interact with authorized components.
+
+Future: multi-tenancy isolation
+...............................
+
+These examples showcase ways in which Deployment Managers can refine and limit 
user privileges within Airflow,
+providing tighter control and ensuring that users have access only to the 
necessary components and
+functionalities based on their roles and responsibilities. However, 
fine-grained access control does not
+provide full isolation and separation of access to allow isolation of 
different user groups in a
+multi-tenant fashion yet. In future versions of Airflow, some fine-grained 
access control features could
+become part of the Airflow security model, as the Airflow community is working 
on a multi-tenant model
+currently.

Reply via email to