[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041955#comment-14041955
 ] 

Chris Douglas commented on MAPREDUCE-5890:
------------------------------------------

Thanks for updating the patch, Arun. Adding seeks for serving map output would 
be regrettable.

Few nits:
* unused, private static field {{counter}} added to {{Fetcher}}
* unit test should use JUnit4 annotations rather than extending {{TestCase}}
* {noformat}
+      InputStream is = input;
+      is = CryptoUtils.wrap(jobConf, iv, is, offset, compressedLength);
{noformat} is equivalently {{InputStream is = CryptoUtils.wrap(jobConf, iv, 
input, offset, compressedLength);}}
* While not terribly expensive, there are a lot of redundant lookups for the 
encrypted shuffle config parameter.
* There are many counterexamples, but running a MR job is a heavy way to test 
this.
* To be sure I understand the IV logic, it's injected in the stream as a prefix 
to the segment during a merge, but is part of the index record during a spill. 
Is that accurate? Adding a few comments calling this out would be appreciated, 
particularly since it's hard to spot in the merge.
* Has this been tested on spills with intermediate merges? With more than a 
single reduce? Looking at the patch, it looks like it creates the stream with 
the IV, it doesn't reset the IV for each segment (apologies, I haven't tried 
applying it, so I might just be misreading the context).
* Since the IV size is hard-coded in {{CryptoUtils}} to 16 bytes (and part of 
the {{IndexRecord}} format), it should probably fail if the 
{{CryptoCodec::getAlgorithmBlockSize}} returns anything else.

Much of the logic in here is internal to MapReduce, so it would be unfair to 
ask that this create better abstractions than what exists, but the IV handling 
is pretty ad hoc. Other improvements under consideration- particularly native 
implementations and other frameworks building on the {{ShuffleHandler}}- may 
rely on this code, as well as older versions of MapReduce that will fail 
without deploying two versions of the ShuffleHandler.

To make it backwards compatible, the IV can be part of each {{IFile}} segment 
(requiring no changes to {{ShuffleHandler}} or the 
{{SpillRecord}}/{{IndexRecord}} format), or the IVs can be added to the end of 
the {{SpillRecord}}. In the latter case, the {{Fetcher}} will need to request 
that the alternate interpretation by including a header; old versions will get 
the existing interpretation of the {{SpillRecord}}.

> Support for encrypting Intermediate data and spills in local filesystem
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5890
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.4.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Arun Suresh
>              Labels: encryption
>         Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
> org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
> syslog.tar.gz
>
>
> For some sensitive data, encryption while in flight (network) is not 
> sufficient, it is required that while at rest it should be encrypted. 
> HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem 
> using Hadoop FileSystem API. MapReduce intermediate data and spills should 
> also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to