[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425320#comment-13425320
 ] 

Benoy Antony commented on MAPREDUCE-4491:
-----------------------------------------

To Alejandro's questions:

1) If using compression codec for encryption, are you losing the compression 
capabilities if doing using encryption or will it work as a composition?
What I have done is to first compress and then encrypt. I have hardcoded to 
ZIP. I can expose this as a configuration with a choice of {UNCOMPRESSED, ZIP, 
ZLIB, BZIP2}. This is an enhancement that I can add.
I have also provided a DistributedSplitter  so that files can be split into 
smaller files.
I am not aware of an ability to chain multiple compression Codecs, though it 
was a desirable capability in this case. 

2) For the keystores, are you proposing to store them in HDFS use file system 
permissions to protect them?

Actually, I am not proposing to store them in HDFS. The keystores themselves 
are encrypted and a password is required to read keys from them. 

In the use cases that I have encountered, the keystores were external to the 
cluster. They were either on the CLI machine from where the jobs were submitted 
or on a separate machine from where the keys were retrieved based on user's 
credentials. (Alfredo was used in this regard to fetch keys via webservice)
So they were two schemes that I have supported -
  1) reading keys from Java keystore
  2) reading keys from a web Service based keystore  ("Safe")




                
> Encryption and Key Protection
> -----------------------------
>
>                 Key: MAPREDUCE-4491
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: documentation, security, task-controller, tasktracker
>            Reporter: Benoy Antony
>            Assignee: Benoy Antony
>         Attachments: Hadoop_Encryption.pdf
>
>
> When dealing with sensitive data, it is required to keep the data encrypted 
> wherever it is stored. Common use case is to pull encrypted data out of a 
> datasource and store in HDFS for analysis. The keys are stored in an external 
> keystore. 
> The feature adds a customizable framework to integrate different types of 
> keystores, support for Java KeyStore, read keys from keystores, and transport 
> keys from JobClient to Tasks.
> The feature adds PGP encryption as a codec and additional utilities to 
> perform encryption related steps.
> The design document is attached. It explains the requirement, design and use 
> cases.
> Kindly review and comment. Collaboration is very much welcome.
> I have a tested patch for this for 1.1 and will upload it soon as an initial 
> work for further refinement. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to