[ 
https://issues.apache.org/jira/browse/HADOOP-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987015#comment-13987015
 ] 

Owen O'Malley commented on HADOOP-10150:
----------------------------------------

I've been working through this. We have two metadata items that we need for 
each file:
* the key name and version
* the iv
Note that the current patches only store the iv, but we really need to store 
the key name and version. The version is absolutely critical because if you 
roll a new key version you don't want to re-write all of the current data.

It seems to me there are three reasonable places to store the small amount of 
metadata:
* at the beginning of the file
* in a side file
* encoded using a filename mangling scheme

The beginning of the file creates trouble because it throws off the block 
calculations that are done by mapreduce. (In other words, if we slide all of 
the data down by 1k, then each input split will always cross HDFS block 
boundaries.) On the other hand, it doesn't add any load to the namenode and 
will always be consistent with the file.

A side file doesn't change the offsets into the file, but does double the 
amount of traffic and storage required on the namenode.

Doing name mangling means the underlying HDFS file names are more complicated, 
but it doesn't mess with either the file offsets or increase the load on the 
namenode.

I think we should do the name mangling. What do others think?


> Hadoop cryptographic file system
> --------------------------------
>
>                 Key: HADOOP-10150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10150
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>              Labels: rhino
>             Fix For: 3.0.0
>
>         Attachments: CryptographicFileSystem.patch, HADOOP cryptographic file 
> system-V2.docx, HADOOP cryptographic file system.pdf, 
> HDFSDataAtRestEncryptionAlternatives.pdf, 
> HDFSDataatRestEncryptionAttackVectors.pdf, 
> HDFSDataatRestEncryptionProposal.pdf, cfs.patch, extended information based 
> on INode feature.patch
>
>
> There is an increasing need for securing data when Hadoop customers use 
> various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so 
> on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based 
> on HADOOP “FilterFileSystem” decorating DFS or other file systems, and 
> transparent to upper layer applications. It’s configurable, scalable and fast.
> High level requirements:
> 1.    Transparent to and no modification required for upper layer 
> applications.
> 2.    “Seek”, “PositionedReadable” are supported for input stream of CFS if 
> the wrapped file system supports them.
> 3.    Very high performance for encryption and decryption, they will not 
> become bottleneck.
> 4.    Can decorate HDFS and all other file systems in Hadoop, and will not 
> modify existing structure of file system, such as namenode and datanode 
> structure if the wrapped file system is HDFS.
> 5.    Admin can configure encryption policies, such as which directory will 
> be encrypted.
> 6.    A robust key management framework.
> 7.    Support Pread and append operations if the wrapped file system supports 
> them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to