To avoid any confusion - the stuff, described in the previous mail, is a possible future add-on.
The low level API, already defined and implemented, has the maximal capabilities, as far as Parquet encryption is concerned. High level interface will expose a (useful) subset of these capabilities, as explained in the doc below. Today, we have at least three companies building end-to-end data protection systems using the low level Parquet encryption API. The API is simple; these folks focus on how to manage the keys/auth above it, they are skilled enough to handle that. In the high level interface, we're using this experience to help less skilled users with the key management/auth. There is no auto-magic solution for that, but we will create a set of helper tools and a simple interface to it. The interface concepts will be somewhat similar to the low-level API: pass a list of columns to be encrypted, but now, instead of explicit keys and their metadata, pass master key IDs for each column. See the doc for examples of such table/column properties <https://docs.google.com/document/d/1boH6HPkG0ZhgxcaRkGk3QpZ8X_J91uXZwVGwYN45St4/edit#heading=h.o9oq8a9wa6em>. The translation of master key IDs into encryption keys/metadata will be performed by these helper tools, KMS, etc. In other words, we should build this bottom up, by completing/merging the low level APIs first, and then use the community experience with them to optimally design and build the high level add-on interface/ helper tools. Cheers, Gidon. ---------- Forwarded message --------- From: Gidon Gershinsky <[email protected]> Date: Wed, Jun 5, 2019 at 4:51 PM Subject: High level interface to Parquet encryption To: <[email protected]> Hi all, As discussed at the last sync, we've briefly explored the current proposals for the high level interface to encryption. While the initial goal was to merge them into a single doc, it turned out the 1396 has evolved in the meantime, becoming a full interface system. So we have two parallel proposals, both presented for a community discussion: [1] Crypto Interface for Schema Activation of Parquet Encryption <https://docs.google.com/document/d/17GTQAezl1ZC1pMNHjYU_bPVxMU6DIPjtXOiLclXUlyA/edit#heading=h.r9wntu3s8swd> Corresponds to PARQUET-1396 <https://issues.apache.org/jira/browse/PARQUET-1396> [2] Properties-based Interface to Parquet Encryption <https://docs.google.com/document/d/1boH6HPkG0ZhgxcaRkGk3QpZ8X_J91uXZwVGwYN45St4/edit?usp=sharing> I've created PARQUET-1568 <https://issues.apache.org/jira/browse/PARQUET-1568> for this one. Both title and description of the Jira are subject to change. The doc [2] is not a design draft, but rather a writeup of the current proposal and prototype code, put together mainly to facilitate the community feedback and discussion of goals, approach, etc. Cheers, Gidon
