[ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425320#comment-13425320 ]
Benoy Antony commented on MAPREDUCE-4491: ----------------------------------------- To Alejandro's questions: 1) If using compression codec for encryption, are you losing the compression capabilities if doing using encryption or will it work as a composition? What I have done is to first compress and then encrypt. I have hardcoded to ZIP. I can expose this as a configuration with a choice of {UNCOMPRESSED, ZIP, ZLIB, BZIP2}. This is an enhancement that I can add. I have also provided a DistributedSplitter so that files can be split into smaller files. I am not aware of an ability to chain multiple compression Codecs, though it was a desirable capability in this case. 2) For the keystores, are you proposing to store them in HDFS use file system permissions to protect them? Actually, I am not proposing to store them in HDFS. The keystores themselves are encrypted and a password is required to read keys from them. In the use cases that I have encountered, the keystores were external to the cluster. They were either on the CLI machine from where the jobs were submitted or on a separate machine from where the keys were retrieved based on user's credentials. (Alfredo was used in this regard to fetch keys via webservice) So they were two schemes that I have supported - 1) reading keys from Java keystore 2) reading keys from a web Service based keystore ("Safe") > Encryption and Key Protection > ----------------------------- > > Key: MAPREDUCE-4491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: documentation, security, task-controller, tasktracker > Reporter: Benoy Antony > Assignee: Benoy Antony > Attachments: Hadoop_Encryption.pdf > > > When dealing with sensitive data, it is required to keep the data encrypted > wherever it is stored. Common use case is to pull encrypted data out of a > datasource and store in HDFS for analysis. The keys are stored in an external > keystore. > The feature adds a customizable framework to integrate different types of > keystores, support for Java KeyStore, read keys from keystores, and transport > keys from JobClient to Tasks. > The feature adds PGP encryption as a codec and additional utilities to > perform encryption related steps. > The design document is attached. It explains the requirement, design and use > cases. > Kindly review and comment. Collaboration is very much welcome. > I have a tested patch for this for 1.1 and will upload it soon as an initial > work for further refinement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira