A KMS is nothing strictly to do with Kubernetes - both Google Cloud and AWS 
have a Key Management System. Having extra fields in the JSON relies on extra 
being JSON which it might not be in every case.

Comments in-line Jasper.

The end goal of only allowing certain tasks to decrypt connections is a good 
one, and sounds useful for a multi-team/multi-tenant Airflow. Even in the case 
of not using containerised workers having KMS encryption could make things a 
bit better (although provides no further security over the current as any 
Airflow process would be able to impersonate another DAG worker and get other 
secrets)


> On 7 Jul 2018, at 08:11, Shah Altaf <mend...@gmail.com> wrote:
> 
> Hi my feedback is - This feels far too k8s specific.  There would now be
> extra fields in the connection form (and CLI) that are
> hosting/implementation specific and not at all agnostic.  These could
> probably go as additional params in the existing extra field's JSON.  That
> would avoid any confusion about what those extra fields are and any
> k8s-users that want that specific implementation have a place to put it.
> 
> 
> 
> 
> On Fri, Jul 6, 2018 at 9:02 PM Jasper Kahn <jak...@google.com.invalid>
> wrote:
> 
>> Hello,
>> 
>> In support of adding fine-grained Connection encryption (Jira Issue:
>> https://issues.apache.org/jira/browse/AIRFLOW-2062) I wanted to gather
>> feedback on a proposed design, as it affects a few different Airflow
>> components. A full design doc is coming next week.
>> 
>> The end goal is to allow per-Connection encryption (as opposed the global
>> fernet key) to support providing containerized tasks with independent
>> credentials to limit access, and to enable integration with Key Management
>> Systems.
>> 
>> 
>> At a high level, Connection objects will be augmented with 2 additional
>> fields: `KMS_type` and `KMS_extras`, which are modeled (somewhat) after the
>> existing `conn_type` and `extras` fields. Each connection can be flagged as
>> "independently encrypted", which then prompts the user to pick a KMS (from
>> a predefined list, like Connection type) and enter the relevant
>> authentication and metadata that KMS requires to operate (mirroring how
>> choosing a Connection type results in additional configuration).

You don't mention where the KMS's will be defined for the user to pick from? 
Will this be a connection itself?

Do we need totally new fields? Would a single extra flag/type field be enough? 
In the case where kms_conn_id is set what would be in the current extra_field? 
Although unlikely might we want support for using two different KMS ids 
concurrently? (therefore a `kms_type` isn't unique enough).

Perhaps example values of the case where a connection is KMS encrypted, and one 
where it isn't would help make this clearer.

>> 
>> The credentials to authenticate with the KMS can either be manually placed
>> (like some key files for Connections are now) or, in the case of
>> containerized workers, injected as a key file (through file system mapping)
>> or environment variable on a per-worker basis. These changes are primarily
>> in support of the second (containerized workers) model.
>> 
>> When creating an encrypted Connection, Airflow will generate a
>> cryptographic key (likely AES, possibly a separate fernet key) for that
>> connection and encrypt the Connection fields. It will then encrypt that key
>> (K_conn) using the KMS.

With the AWS KMS I think it's possible to let it entirely this process - and 
you just give it the data to de/encrypt. Where it does handing off this entire 
process should be achieved - the less crypto code we have to do ourselves the 
better!

>> 
>> KMS communication happens through KMSClients, which are implemented very
>> similarly to Connection types and Hooks, with a mapping from KMS_type to a
>> particular Client. New clients can be added by the community (as with
>> hooks/Connection Types). The API for a KMSClient is simple: Encryption and
>> Decryption. The `encrypt` method would take in K_conn and the configuration
>> data, encrypt K_conn through the KMS, and return JSON to be stored in the
>> KMS_extra field. `decrypt` is passed this KMS_extra JSON, decrypts K_conn
>> though the KMS, and returns K_conn to be used to decrypt the Connection
>> data. After both operations, K_conn is purged from memory.

"configuration data" is encryption/decryption context that would be passed to 
the AWS KMS APIs? (for example. I use the AWS one as that is the one I'm most 
familiar with)

>> 
>> Decryption would be implemented where the Connection is loaded from the
>> database or environment. This makes the presence of per-Connection
>> encryption transparent to any calling code, much like the fernet encryption
>> works now.
>> 
>> 
>> As mentioned, all feedback and criticism is welcome to try to improve this
>> design. Thanks!
>> 
>> Jasper Kahn
>> 

Reply via email to