[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048355#comment-14048355
 ] 

Arun Suresh commented on MAPREDUCE-5890:
----------------------------------------

[~chris.douglas],
I had initially tried to directly modify the {{IFile}} format to handle the iv. 
The reason I felt this would not be such a clean solution is :
* The {{IFile}} currently does not have a notion of an explicit header/metadata.
* While it is possible to use the {{IFile.Writer}} constructor to write the IV 
and (thus make it transparent to the rest of the code-base). The reading 
code-path is not so straight-forward. There are two classes that extend the 
{{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}). The 
{{InMemoryReader}} totally ignores the inputStream that is initialized in the 
base class constructor and there are places in the codeBase that the input 
stream is not initialized in the Reader but in the {{Segment::init()}} method 
(which in my opinion makes the {{IFile}} abstraction a bit leaky since the 
underlying stream should be handled in its entirity in the IFile 
Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} 
framework) should avoid dealing with the internals of the ).
* Also, I was not able to do away with a lot of if-then checks in the Shuffle 
phase... (another instance of leaky abstraction mentioned in the previous 
point), the implementations of {{MapOutput::shuffle}} method creates 
{{IFileInputStream}}s  directly without an associated {{IFile.Reader}}

> Support for encrypting Intermediate data and spills in local filesystem
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5890
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.4.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Arun Suresh
>              Labels: encryption
>         Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
> MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
> MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
> MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
> org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
> syslog.tar.gz
>
>
> For some sensitive data, encryption while in flight (network) is not 
> sufficient, it is required that while at rest it should be encrypted. 
> HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem 
> using Hadoop FileSystem API. MapReduce intermediate data and spills should 
> also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to