[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793897#comment-13793897
 ] 

Jerry Chen commented on HIVE-5207:
----------------------------------

Hi Larry, thanks for you pointing out the docs. Yes, we will complement more 
javadocs and document as our next work.
 
{quote}1. TwoTieredKey - exactly the purpose, how it's used what the tiers are, 
etc{quote}
TwoTiredKey is used for the case that the table key is stored in the Hive 
metastore. The table key will be encrypted with the master key which is 
provided externally. In this case, user maintains and manages only the master 
key externally other than manages all the table keys externally. This is useful 
when there is no full-fledged key management system available.
 
{quote}2. External KeyManagement integration - where and what is the expected 
contract for this integration{quote}
To integrate with external key management system, we use the KeyProvider 
interface in HADOOP-9331. Implementation of KeyProvider interface for a 
specified key management system can be set as KeyProvider for retrieving key.
 
{quote}3. A specific usecase description for exporting keys into an external 
keystore and who has the authority to initiate the export and where the 
password comes from{quote}
Exporting of the internal keys comes with the Hive command line. As the 
internal table keys were encrypted with the master key, when performing the 
exporting, the master key must be provided in the environment which is 
controlled by the user.  If the master key is not available, the encrypted 
table keys for exporting cannot be decrypted and thus cannot be exported. The 
KeyProvider implementation for retrieving master key can provide its own 
authentication and authorization for deciding whether the current user has 
access to a specific key.
 
{quote}4. An explanation as to why we should ever store the key with the data 
which seems like a bad idea. I understand that it is encrypted with the master 
secret - which takes me to the next question.  {quote}
Exactly speaking, it is not with the data. The table key is stored in the Hive 
metastore. I see your points at this question. Just as mentioned, for use cases 
that there is no full-fledged and ready to use key management system available, 
it is useful. We provide several alternatives for managing keys. When creating 
an encrypted table, user can specify whether the key is managed externally or 
internally. For externally managed keys, only the key name (alias) will be 
stored in the Hive metastore and the key will be retrieved through KeyProvider 
set in the configuration.
 
{quote}5. Where is the master secret established and stored and how is it 
protected{quote}
Currently, we assume that the user manages the master key. For example, for 
simple uses cases, he can stores the master key in java KeyStore which 
protected by a password and stores in the folder which is read-only for 
specific user or groups. User can also stores the master key in other key 
management system as the master key is retrieved through KeyProvider.
 
Really appreciate your time reviewing this.
Thanks

> Support data encryption for Hive tables
> ---------------------------------------
>
>                 Key: HIVE-5207
>                 URL: https://issues.apache.org/jira/browse/HIVE-5207
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.12.0
>            Reporter: Jerry Chen
>              Labels: Rhino
>         Attachments: HIVE-5207.patch, HIVE-5207.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> For sensitive and legally protected data such as personal information, it is 
> a common practice that the data is stored encrypted in the file system. To 
> enable Hive with the ability to store and query the encrypted data is very 
> crucial for Hive data analysis in enterprise. 
>  
> When creating table, user can specify whether a table is an encrypted table 
> or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
> created, query on the encrypted table is transparent as long as the 
> corresponding key management facilities are set in the running environment of 
> query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
> encryption and decryption. 
>  
> As to key management, we would support several common key management use 
> cases. First, the table key (data key) can be stored in the Hive metastore 
> associated with the table in properties. The table key can be explicit 
> specified or auto generated and will be encrypted with a master key. There 
> are cases that the data being processed is generated by other applications, 
> we need to support externally managed or imported table keys. Also, the data 
> generated by Hive may be consumed by other applications in the system. We 
> need to a tool or command for exporting the table key to a java keystore for 
> using externally.
>  
> To handle versions of Hadoop that do not have crypto support, we can avoid 
> compilation problems by segregating crypto API usage into separate files 
> (shims) to be included only if a flag is defined on the Ant command line 
> (something like –Dcrypto=true).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to