[ 
https://issues.apache.org/jira/browse/HADOOP-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995305#comment-13995305
 ] 

Alejandro Abdelnur commented on HADOOP-10150:
---------------------------------------------

[cross-posting with HDFS-6134]

Reopening HDFS-6134

After some offline discussions with Yi, Tianyou, ATM, Todd, Andrew and Charles 
we think is makes more sense to implement encryption for HDFS directly into the 
DistributedFileSystem client and to use CryptoFileSystem support encryption for 
FileSystems that don’t support native encryption.

The reasons for this change of course are:

* If we want to we add support for HDFS transparent compression, the 
compression should be done before the encryption (implying less entropy). If 
compression is to be handled by HDFS DistributedFileSystem, then the encryption 
has to be handled afterwards (in the write path).

* The proposed CryptoSupport abstraction significantly complicates the 
implementation of CryptoFileSystem and the wiring in HDFS FileSystem client.

* Building it directly into HDFS FileSystem client may allow us to avoid an 
extra copy of data.

Because of this, the idea is now:

* A common set of Crypto Input/Output streams. They would be used by 
CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. Note 
we cannot use the JDK Cipher Input/Output streams directly because we need to 
support the additional interfaces that the Hadoop FileSystem streams implement 
(Seekable, PositionedReadable,  ByteBufferReadable, HasFileDescriptor, 
CanSetDropBehind, CanSetReadahead, HasEnhancedByteBufferAccess,  Syncable, 
CanSetDropBehind).

* CryptoFileSystem.
 To support encryption in arbitrary FileSystems.

* HDFS client encryption. To support transparent HDFS encryption.

Both CryptoFilesystem and HDFS client encryption implementations would be built 
using the Crypto Input/Output streams, xAttributes and KeyProvider API.



> Hadoop cryptographic file system
> --------------------------------
>
>                 Key: HADOOP-10150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10150
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>              Labels: rhino
>             Fix For: 3.0.0
>
>         Attachments: CryptographicFileSystem.patch, HADOOP cryptographic file 
> system-V2.docx, HADOOP cryptographic file system.pdf, 
> HDFSDataAtRestEncryptionAlternatives.pdf, 
> HDFSDataatRestEncryptionAttackVectors.pdf, 
> HDFSDataatRestEncryptionProposal.pdf, cfs.patch, extended information based 
> on INode feature.patch
>
>
> There is an increasing need for securing data when Hadoop customers use 
> various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so 
> on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based 
> on HADOOP “FilterFileSystem” decorating DFS or other file systems, and 
> transparent to upper layer applications. It’s configurable, scalable and fast.
> High level requirements:
> 1.    Transparent to and no modification required for upper layer 
> applications.
> 2.    “Seek”, “PositionedReadable” are supported for input stream of CFS if 
> the wrapped file system supports them.
> 3.    Very high performance for encryption and decryption, they will not 
> become bottleneck.
> 4.    Can decorate HDFS and all other file systems in Hadoop, and will not 
> modify existing structure of file system, such as namenode and datanode 
> structure if the wrapped file system is HDFS.
> 5.    Admin can configure encryption policies, such as which directory will 
> be encrypted.
> 6.    A robust key management framework.
> 7.    Support Pread and append operations if the wrapped file system supports 
> them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to