[
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425320#comment-13425320
]
Benoy Antony commented on MAPREDUCE-4491:
-----------------------------------------
To Alejandro's questions:
1) If using compression codec for encryption, are you losing the compression
capabilities if doing using encryption or will it work as a composition?
What I have done is to first compress and then encrypt. I have hardcoded to
ZIP. I can expose this as a configuration with a choice of {UNCOMPRESSED, ZIP,
ZLIB, BZIP2}. This is an enhancement that I can add.
I have also provided a DistributedSplitter so that files can be split into
smaller files.
I am not aware of an ability to chain multiple compression Codecs, though it
was a desirable capability in this case.
2) For the keystores, are you proposing to store them in HDFS use file system
permissions to protect them?
Actually, I am not proposing to store them in HDFS. The keystores themselves
are encrypted and a password is required to read keys from them.
In the use cases that I have encountered, the keystores were external to the
cluster. They were either on the CLI machine from where the jobs were submitted
or on a separate machine from where the keys were retrieved based on user's
credentials. (Alfredo was used in this regard to fetch keys via webservice)
So they were two schemes that I have supported -
1) reading keys from Java keystore
2) reading keys from a web Service based keystore ("Safe")
> Encryption and Key Protection
> -----------------------------
>
> Key: MAPREDUCE-4491
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: documentation, security, task-controller, tasktracker
> Reporter: Benoy Antony
> Assignee: Benoy Antony
> Attachments: Hadoop_Encryption.pdf
>
>
> When dealing with sensitive data, it is required to keep the data encrypted
> wherever it is stored. Common use case is to pull encrypted data out of a
> datasource and store in HDFS for analysis. The keys are stored in an external
> keystore.
> The feature adds a customizable framework to integrate different types of
> keystores, support for Java KeyStore, read keys from keystores, and transport
> keys from JobClient to Tasks.
> The feature adds PGP encryption as a codec and additional utilities to
> perform encryption related steps.
> The design document is attached. It explains the requirement, design and use
> cases.
> Kindly review and comment. Collaboration is very much welcome.
> I have a tested patch for this for 1.1 and will upload it soon as an initial
> work for further refinement.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira