A KMS is nothing strictly to do with Kubernetes - both Google Cloud and AWS have a Key Management System. Having extra fields in the JSON relies on extra being JSON which it might not be in every case.
Comments in-line Jasper. The end goal of only allowing certain tasks to decrypt connections is a good one, and sounds useful for a multi-team/multi-tenant Airflow. Even in the case of not using containerised workers having KMS encryption could make things a bit better (although provides no further security over the current as any Airflow process would be able to impersonate another DAG worker and get other secrets) > On 7 Jul 2018, at 08:11, Shah Altaf <mend...@gmail.com> wrote: > > Hi my feedback is - This feels far too k8s specific. There would now be > extra fields in the connection form (and CLI) that are > hosting/implementation specific and not at all agnostic. These could > probably go as additional params in the existing extra field's JSON. That > would avoid any confusion about what those extra fields are and any > k8s-users that want that specific implementation have a place to put it. > > > > > On Fri, Jul 6, 2018 at 9:02 PM Jasper Kahn <jak...@google.com.invalid> > wrote: > >> Hello, >> >> In support of adding fine-grained Connection encryption (Jira Issue: >> https://issues.apache.org/jira/browse/AIRFLOW-2062) I wanted to gather >> feedback on a proposed design, as it affects a few different Airflow >> components. A full design doc is coming next week. >> >> The end goal is to allow per-Connection encryption (as opposed the global >> fernet key) to support providing containerized tasks with independent >> credentials to limit access, and to enable integration with Key Management >> Systems. >> >> >> At a high level, Connection objects will be augmented with 2 additional >> fields: `KMS_type` and `KMS_extras`, which are modeled (somewhat) after the >> existing `conn_type` and `extras` fields. Each connection can be flagged as >> "independently encrypted", which then prompts the user to pick a KMS (from >> a predefined list, like Connection type) and enter the relevant >> authentication and metadata that KMS requires to operate (mirroring how >> choosing a Connection type results in additional configuration). You don't mention where the KMS's will be defined for the user to pick from? Will this be a connection itself? Do we need totally new fields? Would a single extra flag/type field be enough? In the case where kms_conn_id is set what would be in the current extra_field? Although unlikely might we want support for using two different KMS ids concurrently? (therefore a `kms_type` isn't unique enough). Perhaps example values of the case where a connection is KMS encrypted, and one where it isn't would help make this clearer. >> >> The credentials to authenticate with the KMS can either be manually placed >> (like some key files for Connections are now) or, in the case of >> containerized workers, injected as a key file (through file system mapping) >> or environment variable on a per-worker basis. These changes are primarily >> in support of the second (containerized workers) model. >> >> When creating an encrypted Connection, Airflow will generate a >> cryptographic key (likely AES, possibly a separate fernet key) for that >> connection and encrypt the Connection fields. It will then encrypt that key >> (K_conn) using the KMS. With the AWS KMS I think it's possible to let it entirely this process - and you just give it the data to de/encrypt. Where it does handing off this entire process should be achieved - the less crypto code we have to do ourselves the better! >> >> KMS communication happens through KMSClients, which are implemented very >> similarly to Connection types and Hooks, with a mapping from KMS_type to a >> particular Client. New clients can be added by the community (as with >> hooks/Connection Types). The API for a KMSClient is simple: Encryption and >> Decryption. The `encrypt` method would take in K_conn and the configuration >> data, encrypt K_conn through the KMS, and return JSON to be stored in the >> KMS_extra field. `decrypt` is passed this KMS_extra JSON, decrypts K_conn >> though the KMS, and returns K_conn to be used to decrypt the Connection >> data. After both operations, K_conn is purged from memory. "configuration data" is encryption/decryption context that would be passed to the AWS KMS APIs? (for example. I use the AWS one as that is the one I'm most familiar with) >> >> Decryption would be implemented where the Connection is loaded from the >> database or environment. This makes the presence of per-Connection >> encryption transparent to any calling code, much like the fernet encryption >> works now. >> >> >> As mentioned, all feedback and criticism is welcome to try to improve this >> design. Thanks! >> >> Jasper Kahn >>