potiuk commented on code in PR #36022: URL: https://github.com/apache/airflow/pull/36022#discussion_r1412794928
########## docs/apache-airflow/security/security_model.rst: ########## @@ -97,10 +97,74 @@ capabilities authenticated users may have: For more information on the capabilities of authenticated UI users, see :doc:`/security/access-control`. +Capabilities of DAG Authors +--------------------------- + +DAG authors are able to submit code - via Python files placed in the DAG_FOLDER - that will be executed +in a number of circumstances. The code to execute is not verified nor checked nor sandboxed by Airflow +(that would be very difficult if not impossible to do), so effectively DAG authors can execute arbitrary +code on the workers (part of Celery Workers for Celery Executor, local processes run by scheduler in case +of Local Executor, Task Kubernetes POD in case of Kubernetes Executor), in the DAG File Processor +(which can be either executed as standalone process or can be part of the Scheduler) and in the Triggerer. + +There are several consequences of this model chosen by Airflow, that deployment managers need to be aware of: + +* In case of Local Executor and DAG File Processor running as part of the Scheduler, DAG authors can execute + arbitrary code on the machine where scheduler is running. This means that they can affect the scheduler + process itself, and potentially affect the whole Airflow installation - including modifying cluster-wide + policies and changing Airflow configuration . If you are running Airflow with one of those settings, + the Deployment Manager must trust the DAG authors not to abuse this capability. + +* In case of Celery Executor, DAG authors can execute arbitrary code on the Celery Workers. This means that + they can potentially influence all the task executed on the same worker. If you are running Airflow with + Celery Executor, the Deployment Manager must trust the DAG authors not to abuse this capability and unless + Deployment Manager separates task execution by queues by Cluster Policies, they should assume, there is no + isolation between tasks. + +* In case of Kubernetes Executor, DAG authors can execute arbitrary code on the Kubernetes POD they run. Each + task is executed in a separate POD, so there is already isolation between tasks as generally speaking + Kubernetes provides isolation between PODs. + +* In case of Triggerer, DAG authors can execute arbitrary code in Triggerer. Currently there are no + enforcement mechanisms that would allow to isolate tasks that are using deferrable functionality from + each other and arbitrary code from various tasks can be executed in the same process/machine. Deployment + Manager must trust that DAG authors will not abuse this capability. + +* The Deployment Manager might isolate the code execution provided by DAG authors - particularly in + Scheduler and Webserver by making sure that the Scheduler and Webserver don't even + have access to the DAG Files (that requires standalone DAG File Processor to be deployed). Generally + speaking - no DAG author provided code should ever be executed in the Scheduler or Webserver process. + +* There are a number of functionalities that allow the DAG author to point out the code to be executed in + scheduler or webserver process - for example they can choose custom Timetables, UI plugins, Connection UI Review Comment: Perfect :) . I was just asking in the original PR https://github.com/apache/airflow/pull/35210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org