[ 
https://issues.apache.org/jira/browse/HADOOP-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959123#comment-13959123
 ] 

Todd Lipcon commented on HADOOP-10150:
--------------------------------------

A few questions here...

First, let me confirm my understanding of the key structure and storage:

- Client master key: this lives on the Key Management Server, and might be 
different from application to application. In many cases there may be just one 
per cluster, though in a multitenant cluster, perhaps we could have one per 
tenant.
- Data key: this is set per encrypted directory. This key is stored in the 
directory xattr on the NN, but encrypted by the client master key (which the NN 
doesn't know).

So, when a client wants to read a file, the following is the process:
1) Notices that the file is in an encrypted directory. Fetches the encrypted 
data key from the NN's xattr on the directory.
2) Somehow associates this encrypted data key with the master key that was used 
to encrypt it (perhaps it's tagged with some identifier). Fetches the 
appropriate master key from the key store.
2a) The keystore somehow authenticates and authorizes the client's access to 
this key
3) The client decrypts the data key using the master key, and is now able to 
set up a decrypting stream for the file itself. (I've ignored the IV here, but 
assume it's also stored in an xattr)

In terms of attack vectors:
- let's say that the NN disk is stolen. The thief now has access to a bunch of 
keys, but they're all encrypted by various master keys. So we're OK.
- let's say that a client is malicious. It can get whichever master keys it has 
access to from the KMS. If we only have one master key per cluster, then the 
combination of one malicious client plus stealing the fsimage will give up all 
the keys
- let's say that a client has escalated to root access on one of the slave 
nodes in the cluster, or otherwise has malicious access to a NodeManager 
process. By looking at a running MR task, it could steal whatever credentials 
the task is using to access the KMS, and/or dump the memory of the client 
process in order to give up the master key above.

Does the above look right? It would be nice to add to the design doc a clear 
description of the threat model here. Do we assume that the adversary will 
never have root on the cluster? Do we assume the adversary won't have access to 
the "mapred" user (or whoever runs the NM?)

How does the MR task in this context get the credentials to fetch keys from the 
KMS? If the KMS accepts the same authentication tokens as the NameNode, then is 
there any reason that this is more secure than having the NameNode supply the 
keys? Or is it just that decoupling the NameNode and the key server allows this 
approach to work for non-HDFS filesystems, at the expense of an additional 
daemon running a key distribution service?


> Hadoop cryptographic file system
> --------------------------------
>
>                 Key: HADOOP-10150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10150
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>              Labels: rhino
>             Fix For: 3.0.0
>
>         Attachments: CryptographicFileSystem.patch, HADOOP cryptographic file 
> system-V2.docx, HADOOP cryptographic file system.pdf, cfs.patch, extended 
> information based on INode feature.patch
>
>
> There is an increasing need for securing data when Hadoop customers use 
> various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so 
> on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based 
> on HADOOP “FilterFileSystem” decorating DFS or other file systems, and 
> transparent to upper layer applications. It’s configurable, scalable and fast.
> High level requirements:
> 1.    Transparent to and no modification required for upper layer 
> applications.
> 2.    “Seek”, “PositionedReadable” are supported for input stream of CFS if 
> the wrapped file system supports them.
> 3.    Very high performance for encryption and decryption, they will not 
> become bottleneck.
> 4.    Can decorate HDFS and all other file systems in Hadoop, and will not 
> modify existing structure of file system, such as namenode and datanode 
> structure if the wrapped file system is HDFS.
> 5.    Admin can configure encryption policies, such as which directory will 
> be encrypted.
> 6.    A robust key management framework.
> 7.    Support Pread and append operations if the wrapped file system supports 
> them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to