Re: [PR] Add the section describing the security model of DAG Author capabilities [airflow]

via GitHub Sat, 02 Dec 2023 05:29:54 -0800


pankajkoti commented on code in PR #36022:
URL: https://github.com/apache/airflow/pull/36022#discussion_r1412798224



##########
docs/apache-airflow/security/security_model.rst:
##########
@@ -97,10 +97,74 @@ capabilities authenticated users may have:
 
 For more information on the capabilities of authenticated UI users, see 
:doc:`/security/access-control`.
 
+Capabilities of DAG Authors
+---------------------------
+
+DAG authors are able to submit code - via Python files placed in the 
DAG_FOLDER - that will be executed

Review Comment:
   ```suggestion
   DAG authors are able to submit code - via Python files placed in the 
DAGS_FOLDER - that will be executed
   ```



##########
docs/apache-airflow/security/security_model.rst:
##########
@@ -97,10 +97,74 @@ capabilities authenticated users may have:
 
 For more information on the capabilities of authenticated UI users, see 
:doc:`/security/access-control`.
 
+Capabilities of DAG Authors
+---------------------------
+
+DAG authors are able to submit code - via Python files placed in the 
DAG_FOLDER - that will be executed
+in a number of circumstances. The code to execute is not verified nor checked 
nor sand-boxed by Airflow
+(that would be very difficult if not impossible to do), so effectively DAG 
authors can execute arbitrary
+code on the workers (part of Celery Workers for Celery Executor, local 
processes run by scheduler in case
+of Local Executor, Task Kubernetes POD in case of Kubernetes Executor), in the 
DAG File Processor
+(which can be either executed as standalone process or can be part of the 
Scheduler) and in the Triggerer.
+
+There are several consequences of this model chosen by Airflow, that 
deployment managers need to be aware of:
+
+* In case of Local Executor and DAG File Processor running as part of the 
Scheduler, DAG authors can execute
+  arbitrary code on the machine where scheduler is running. This means that 
they can affect the scheduler
+  process itself, and potentially affect the whole Airflow installation - 
including modifying cluster-wide
+  policies and changing Airflow configuration . If you are running Airflow 
with one of those settings,
+  the Deployment Manager must trust the DAG authors not to abuse this 
capability.
+
+* In case of Celery Executor, DAG authors can execute arbitrary code on the 
Celery Workers. This means that
+  they can potentially influence all the task executed on the same worker. If 
you are running Airflow with
+  Celery Executor, the Deployment Manager must trust the DAG authors not to 
abuse this capability and unless
+  Deployment Manager separates task execution by queues by Cluster Policies, 
they should assume, there is no
+  isolation between tasks.
+
+* In case of Kubernetes Executor, DAG authors can execute arbitrary code on 
the Kubernetes POD they run. Each
+  task is executed in a separate POD, so there is already isolation between 
tasks as generally speaking
+  Kubernetes provides isolation between PODs.
+
+* In case of Triggerer, DAG authors can execute arbitrary code in Triggerer. 
Currently there are no
+  enforcement mechanisms that would allow to isolate tasks that are using 
deferrable functionality from
+  each other and arbitrary code from various tasks can be executed in the same 
process/machine. Deployment
+  Manager must trust that DAG authors will not abuse this capability.
+
+* The Deployment Manager might isolate the code execution provided by DAG 
authors - particularly in

Review Comment:
   This point is really helpful 💯 . I am thinking if it would make sense to 
move this list item beneath. The list items above are explaining capabilities 
of DAG authors, there's also one list item below this one explaining more 
capabilities. However this one is more like a helpful suggestion for the 
deployment manager. And there's one more suggestion at the end which tells 
about adding tooling to review. So perhaps we could move these 2 suggestions at 
the end of the list after explaining the capabilities of DAG Authors.



##########
docs/apache-airflow/security/security_model.rst:
##########
@@ -97,10 +97,74 @@ capabilities authenticated users may have:
 
 For more information on the capabilities of authenticated UI users, see 
:doc:`/security/access-control`.
 
+Capabilities of DAG Authors
+---------------------------
+
+DAG authors are able to submit code - via Python files placed in the 
DAG_FOLDER - that will be executed
+in a number of circumstances. The code to execute is not verified nor checked 
nor sand-boxed by Airflow
+(that would be very difficult if not impossible to do), so effectively DAG 
authors can execute arbitrary
+code on the workers (part of Celery Workers for Celery Executor, local 
processes run by scheduler in case
+of Local Executor, Task Kubernetes POD in case of Kubernetes Executor), in the 
DAG File Processor
+(which can be either executed as standalone process or can be part of the 
Scheduler) and in the Triggerer.
+
+There are several consequences of this model chosen by Airflow, that 
deployment managers need to be aware of:
+
+* In case of Local Executor and DAG File Processor running as part of the 
Scheduler, DAG authors can execute
+  arbitrary code on the machine where scheduler is running. This means that 
they can affect the scheduler
+  process itself, and potentially affect the whole Airflow installation - 
including modifying cluster-wide
+  policies and changing Airflow configuration . If you are running Airflow 
with one of those settings,
+  the Deployment Manager must trust the DAG authors not to abuse this 
capability.
+
+* In case of Celery Executor, DAG authors can execute arbitrary code on the 
Celery Workers. This means that
+  they can potentially influence all the task executed on the same worker. If 
you are running Airflow with
+  Celery Executor, the Deployment Manager must trust the DAG authors not to 
abuse this capability and unless
+  Deployment Manager separates task execution by queues by Cluster Policies, 
they should assume, there is no
+  isolation between tasks.
+
+* In case of Kubernetes Executor, DAG authors can execute arbitrary code on 
the Kubernetes POD they run. Each
+  task is executed in a separate POD, so there is already isolation between 
tasks as generally speaking
+  Kubernetes provides isolation between PODs.
+
+* In case of Triggerer, DAG authors can execute arbitrary code in Triggerer. 
Currently there are no
+  enforcement mechanisms that would allow to isolate tasks that are using 
deferrable functionality from
+  each other and arbitrary code from various tasks can be executed in the same 
process/machine. Deployment
+  Manager must trust that DAG authors will not abuse this capability.
+
+* The Deployment Manager might isolate the code execution provided by DAG 
authors - particularly in

Review Comment:
   This point is really helpful 💯 . I am thinking if it would make sense to 
move this list item beneath. The list items above are explaining capabilities 
of DAG authors, there's also one list item below this one explaining more 
capabilities. However this one is more like a helpful suggestion for the 
deployment manager. And there's one more suggestion at the end which tells 
about adding tooling to review. So perhaps we could move these 2 suggestions at 
the end of the list after explaining the capabilities of DAG Authors.



##########
docs/apache-airflow/security/security_model.rst:
##########
@@ -97,10 +97,74 @@ capabilities authenticated users may have:
 
 For more information on the capabilities of authenticated UI users, see 
:doc:`/security/access-control`.
 
+Capabilities of DAG Authors
+---------------------------
+
+DAG authors are able to submit code - via Python files placed in the 
DAG_FOLDER - that will be executed
+in a number of circumstances. The code to execute is not verified nor checked 
nor sand-boxed by Airflow

Review Comment:
   ```suggestion
   in a number of circumstances. The code to execute is neither verified, 
checked nor sand-boxed by Airflow
   ```
   or could be simply "Airflow does not verify, check or sandbox the code to be 
executed".



##########
docs/apache-airflow/security/security_model.rst:
##########
@@ -97,10 +97,74 @@ capabilities authenticated users may have:
 
 For more information on the capabilities of authenticated UI users, see 
:doc:`/security/access-control`.
 
+Capabilities of DAG Authors
+---------------------------
+
+DAG authors are able to submit code - via Python files placed in the 
DAG_FOLDER - that will be executed
+in a number of circumstances. The code to execute is not verified nor checked 
nor sand-boxed by Airflow
+(that would be very difficult if not impossible to do), so effectively DAG 
authors can execute arbitrary
+code on the workers (part of Celery Workers for Celery Executor, local 
processes run by scheduler in case
+of Local Executor, Task Kubernetes POD in case of Kubernetes Executor), in the 
DAG File Processor
+(which can be either executed as standalone process or can be part of the 
Scheduler) and in the Triggerer.
+
+There are several consequences of this model chosen by Airflow, that 
deployment managers need to be aware of:
+
+* In case of Local Executor and DAG File Processor running as part of the 
Scheduler, DAG authors can execute
+  arbitrary code on the machine where scheduler is running. This means that 
they can affect the scheduler
+  process itself, and potentially affect the whole Airflow installation - 
including modifying cluster-wide
+  policies and changing Airflow configuration . If you are running Airflow 
with one of those settings,
+  the Deployment Manager must trust the DAG authors not to abuse this 
capability.
+
+* In case of Celery Executor, DAG authors can execute arbitrary code on the 
Celery Workers. This means that
+  they can potentially influence all the task executed on the same worker. If 
you are running Airflow with

Review Comment:
   ```suggestion
     they can potentially influence all the tasks executed on the same worker. 
If you are running Airflow with
   ```



##########
docs/apache-airflow/security/security_model.rst:
##########
@@ -97,10 +97,74 @@ capabilities authenticated users may have:
 
 For more information on the capabilities of authenticated UI users, see 
:doc:`/security/access-control`.
 
+Capabilities of DAG Authors
+---------------------------
+
+DAG authors are able to submit code - via Python files placed in the 
DAG_FOLDER - that will be executed
+in a number of circumstances. The code to execute is not verified nor checked 
nor sand-boxed by Airflow
+(that would be very difficult if not impossible to do), so effectively DAG 
authors can execute arbitrary
+code on the workers (part of Celery Workers for Celery Executor, local 
processes run by scheduler in case
+of Local Executor, Task Kubernetes POD in case of Kubernetes Executor), in the 
DAG File Processor
+(which can be either executed as standalone process or can be part of the 
Scheduler) and in the Triggerer.
+
+There are several consequences of this model chosen by Airflow, that 
deployment managers need to be aware of:
+
+* In case of Local Executor and DAG File Processor running as part of the 
Scheduler, DAG authors can execute
+  arbitrary code on the machine where scheduler is running. This means that 
they can affect the scheduler
+  process itself, and potentially affect the whole Airflow installation - 
including modifying cluster-wide
+  policies and changing Airflow configuration . If you are running Airflow 
with one of those settings,

Review Comment:
   ```suggestion
     policies and changing Airflow configuration. If you are running Airflow 
with one of those settings,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add the section describing the security model of DAG Author capabilities [airflow]

Reply via email to