[ 
https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875247#comment-16875247
 ] 

Xinli Shang edited comment on HIVE-21848 at 6/28/19 9:39 PM:
-------------------------------------------------------------

Hi [~owen.omalley], yes, I looked at the HadoopShims.java earlier. I still 
remember you had a super smart workaround to avoid two round trips to 
generate/encrypt a working key from KMS. It reduced half of the traffic. 

For the nested column questions above, I generally agree that makes sense. 
There are only a few corner cases that we need to discuss.

For the example above "name: struct<first:string,last:string>", if we see the 
table properties have the following entry, "encrypt.columns" = 
"pii:name;other_category:name.first", what do we do? Should we through 
exception? Or we just ignore "other_category:name.first" to let parent to 
override it? 

Do we allow exclusion of some leaf columns not to be encrypted, if their parent 
is specified to be encrypted? I guess people will raise the feature request 
later when it is roll out. 

With that said, I am not objecting the proposal but just some thoughts on 
corner cases. 

 


was (Author: sha...@uber.com):
Hi [~owen.omalley], yes, I looked at the HadoopShims.java earlier. I still 
remember you had a super smart workaround to avoid two round trips to get 
generate/encrypt a working key from KMS. It reduced half of the traffic. 

For the nested column questions above, I generally agree that makes sense. 
There are only a few corner cases that we need to discuss.

For the example above "name: struct<first:string,last:string>", if we see the 
table properties have the following entry, "encrypt.columns" = 
"pii:name;other_category:name.first", what do we do? Should we through 
exception? Or we just ignore "other_category:name.first" to let parent to 
override it? 

Do we allow exclusion of some leaf columns not to be encrypted, if their parent 
is specified to be encrypted? I guess people will raise the feature request 
later when it is roll out. 

With that said, I am not objecting the proposal but just some thoughts on 
corner cases. 

 

> Table property name definition between ORC and Parquet encrytion
> ----------------------------------------------------------------
>
>                 Key: HIVE-21848
>                 URL: https://issues.apache.org/jira/browse/HIVE-21848
>             Project: Hive
>          Issue Type: Task
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>             Fix For: 3.0.0
>
>
> The goal of this Jira is to define a superset of unified table property names 
> that can be used for both Parquet and ORC column encryption. There is no code 
> change needed for this Jira.
> *Background:*
> ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To 
> configure the encryption, e.g. which column is sensitive, what master key to 
> be used, algorithm, etc, table properties can be used. It is important that 
> both Parquet and ORC can use unified names.
> According to the slide 
> [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
>  ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in 
> the Parquet community, it is still discussing to provide several ways and 
> using table properties is one of the options, while there is no detailed 
> design of the table property names yet.
> So it is a good time to discuss within two communities to have unified table 
> names as a superset.
> *Proposal:*
> There are several encryption properties that need to be specified for a 
> table. Here is the list. This is the superset of Parquet and ORC. Some of 
> them might not apply to both.
>  # PII columns including nest columns
>  # Column key metadata, master key metadata
>  # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. 
> ORC might support AES_CTR.
>  # Encryption footer - Parquet allow footer to be encrypted or plaintext
>  # Footer key metadata
> Here is the table properties proposal.  
> |*Table Property Name*|*Value*|*Notes*|
> |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
> |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted 
> footer. By default, it is encrypted.|
> |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to 
> the KMS to define what key metadata is. The metadata should have enough 
> information to figure out the corresponding key by the KMS.  |
> |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column 
> name for example, ‘address.zipcode’. 
>  
> It is up to the KMS to define what key metadata is. The metadata should have 
> enough information to figure out the corresponding key by the KMS.|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to