[ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425286#comment-13425286 ]
Benoy Antony commented on MAPREDUCE-4491: ----------------------------------------- To Rob's questions : Different Encryption Keys for Different files: At this point, the PGPCodec supports only one secret key/Key Pair for all input files. What we need is the ability to specify secret keys/key pair per input file. Another enhancement will be to specify secret keys/key pair per each phase like map->output , reduce->output . As you mentioned, this mapping has to specified via configuration. I'll try to add these two enhancements. Decryption/Encryption of different columns within the same file: This is actually left to the mapreduce programmer as he has to do the Decryption/Encryption of the fields programmatically. The programmer can choose to use different keys for different fields in the mapreduce program. Multiple keys can be retrieved from the keystore and these keys can be retrieved in the mapper/reducer using the credentials API. In a higher level interface like Hive, it may be possible to add additional metadata information to specify the key name. Another reviewer also has recommended to add this capability Hive to identify an encryption field and specify the key (name of the key) to be used to decrypt/encrypt it. Thanks for the review and recommendations, Rob. Please let me know if I have not answered the question correctly. > Encryption and Key Protection > ----------------------------- > > Key: MAPREDUCE-4491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: documentation, security, task-controller, tasktracker > Reporter: Benoy Antony > Assignee: Benoy Antony > Attachments: Hadoop_Encryption.pdf > > > When dealing with sensitive data, it is required to keep the data encrypted > wherever it is stored. Common use case is to pull encrypted data out of a > datasource and store in HDFS for analysis. The keys are stored in an external > keystore. > The feature adds a customizable framework to integrate different types of > keystores, support for Java KeyStore, read keys from keystores, and transport > keys from JobClient to Tasks. > The feature adds PGP encryption as a codec and additional utilities to > perform encryption related steps. > The design document is attached. It explains the requirement, design and use > cases. > Kindly review and comment. Collaboration is very much welcome. > I have a tested patch for this for 1.1 and will upload it soon as an initial > work for further refinement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira