potiuk commented on code in PR #64760: URL: https://github.com/apache/airflow/pull/64760#discussion_r3045251205
########## airflow-core/docs/security/security_model.rst: ########## @@ -282,6 +312,309 @@ Access to all Dags All Dag authors have access to all Dags in the Airflow deployment. This means that they can view, modify, and update any Dag without restrictions at any time. +.. _jwt-authentication-and-workload-isolation: + +JWT authentication and workload isolation +----------------------------------------- + +Airflow uses JWT (JSON Web Token) authentication for both its public REST API and its internal +Execution API. For a detailed description of the JWT authentication flows, token structure, and +configuration, see :doc:`/security/jwt_token_authentication`. + +Current isolation limitations +............................. + +While Airflow 3 significantly improved the security model by preventing worker task code from +directly accessing the metadata database (workers now communicate exclusively through the +Execution API), **perfect isolation between Dag authors is not yet achieved**. Dag author code +potentially still executes with direct database access in the Dag File Processor and Triggerer. + +**Software guards vs. intentional access** + Airflow implements software-level guards that prevent **accidental and unintentional** direct database + access from Dag author code. The Dag File Processor removes the database session and connection + information before forking child processes that parse Dag files, and worker tasks use the Execution + API exclusively. + + However, these software guards **do not protect against intentional, malicious access**. The child + processes that parse Dag files and execute trigger code run as the **same Unix user** as their parent + processes (the Dag File Processor manager and the Triggerer respectively). Because of how POSIX + process isolation works, a child process running as the same user can retrieve the parent's + credentials through several mechanisms: + + * **Environment variables**: On Linux, any process can read ``/proc/<PID>/environ`` of another + process running as the same user — so database credentials passed via environment variables + (e.g., ``AIRFLOW__DATABASE__SQL_ALCHEMY_CONN``) can be read from the parent process. + * **Configuration files**: If configuration is stored in files, those files must be readable by the + parent process and are therefore also readable by the child process running as the same user. + * **Command-based secrets** (``_CMD`` suffix options): The child process can execute the same + commands to retrieve secrets. + * **Secrets manager access**: If the parent uses a secrets backend, the child can access the same + secrets manager using credentials available in the process environment or filesystem. + + This means that a deliberately malicious Dag author can retrieve database credentials and gain + **full read/write access to the metadata database** — including the ability to modify any Dag, + task instance, connection, or variable. The software guards address accidental access (e.g., a Dag + author importing ``airflow.settings.Session`` out of habit from Airflow 2) but do not prevent a + determined actor from circumventing them. + + On workers, the isolation is stronger: worker processes do not receive database credentials at all + (neither via environment variables nor configuration). Workers communicate exclusively through the + Execution API using short-lived JWT tokens. A task running on a worker genuinely cannot access the + metadata database directly — there are no credentials to retrieve. + +**Dag File Processor and Triggerer potentially bypass JWT authentication** + The Dag File Processor and Triggerer use an in-process transport to access the Execution API, + which bypasses JWT authentication. Since these components execute user-submitted code + (Dag files and trigger code respectively), a Dag author whose code runs in these components + potentially has unrestricted access to all Execution API operations — including the ability to + read any connection, variable, or XCom — without needing a valid JWT token. + + Furthermore, the Dag File Processor has direct access to the metadata database (it needs this to + store serialized Dags). As described above, Dag author code executing in the Dag File Processor + context could potentially retrieve the database credentials from the parent process and access + the database directly, including the JWT signing key configuration if it is available in the + process environment. If a Dag author obtains the JWT signing key, they could forge arbitrary tokens. + +**Dag File Processor and Triggerer are shared across teams** + In the default deployment, a **single Dag File Processor instance** parses all Dag files and a + **single Triggerer instance** handles all triggers — regardless of team assignment. There is no + built-in support for running per-team Dag File Processor or Triggerer instances. This means that + Dag author code from different teams executes within the same process, potentially sharing the + in-process Execution API and direct database access. + + For multi-team deployments that require separation, Deployment Managers must run **separate + Dag File Processor and Triggerer instances per team** as a deployment-level measure (for example, + by configuring each instance to only process bundles belonging to a specific team). However, even + with separate instances, each Dag File Processor and Triggerer potentially retains direct access + to the metadata database — a Dag author whose code runs in these components can potentially + retrieve credentials from the parent process and access the database directly, including reading + or modifying data belonging to other teams, unless the Deployment Manager implements Unix + user-level isolation (see :ref:`deployment-hardening-for-improved-isolation`). + +**No cross-workload isolation in the Execution API** + All worker workloads authenticate to the same Execution API with tokens signed by the same key and + sharing the same audience. While the ``ti:self`` scope enforcement prevents a worker from accessing + another task's specific endpoints (heartbeat, state transitions), shared resources such as connections, + variables, and XComs are accessible to all tasks. There is no isolation between tasks belonging to + different teams or Dag authors at the Execution API level. + +**Token signing key is a shared secret** + In symmetric key mode (``[api_auth] jwt_secret``), the same secret key is used to both generate and + validate tokens. Any component that has access to this secret can forge tokens with arbitrary claims, + including tokens for other task instances or with elevated scopes. + +**Sensitive configuration values can be leaked through logs** + Dag authors can write code that prints environment variables or configuration values to task logs + (e.g., ``print(os.environ)``). Airflow masks known sensitive values in logs, but masking depends on + recognizing the value patterns. Dag authors who intentionally or accidentally log raw environment + variables may expose database credentials, JWT signing keys, Fernet keys, or other secrets in task + logs. Deployment Managers should restrict access to task logs and ensure that sensitive configuration + is only provided to components where it is needed (see the sensitive variables tables below). + +.. _deployment-hardening-for-improved-isolation: + +Deployment hardening for improved isolation +........................................... + +Deployment Managers who require stronger isolation between Dag authors and teams can take the following +measures. Note that these are deployment-specific actions that go beyond Airflow's built-in security +model — Airflow does not enforce these natively. + +**Mandatory code review of Dag files** + Implement a review process for all Dag submissions to Dag bundles. This can include: + + * Requiring pull request reviews before Dag files are deployed. + * Static analysis of Dag code to detect suspicious patterns (e.g., direct database access attempts, + reading environment variables, importing configuration modules). + * Automated linting rules that flag potentially dangerous code. + +**Restrict sensitive configuration to components that need them** + Do not share all configuration parameters across all components. In particular: + + * The JWT signing key (``[api_auth] jwt_secret`` or ``[api_auth] jwt_private_key_path``) should only + be available to components that need to generate tokens (Scheduler/Executor, API Server) and + components that need to validate tokens (API Server). Workers should not have access to the signing + key — they only need the tokens provided to them. + * Connection credentials for external systems should only be available to the API Server + (which serves them to workers via the Execution API), not to the Scheduler, Dag File Processor, + or Triggerer processes directly. + * Database connection strings should only be available to components that need direct database access + (API Server, Scheduler, Dag File Processor), not to workers. Review Comment: Yeah. I think that part is already pretty separated - and there is a bit of a difference between "higher level" (security model) and technical details (workload isolation, jwt_authentication). Some level of duplication is needed as there are different audiences for those, but I think deduplication here is now "good enough" (at least for now). We can later improve it. this was simply small omission on the higher level doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
