[ https://issues.apache.org/jira/browse/HADOOP-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959123#comment-13959123 ]
Todd Lipcon commented on HADOOP-10150: -------------------------------------- A few questions here... First, let me confirm my understanding of the key structure and storage: - Client master key: this lives on the Key Management Server, and might be different from application to application. In many cases there may be just one per cluster, though in a multitenant cluster, perhaps we could have one per tenant. - Data key: this is set per encrypted directory. This key is stored in the directory xattr on the NN, but encrypted by the client master key (which the NN doesn't know). So, when a client wants to read a file, the following is the process: 1) Notices that the file is in an encrypted directory. Fetches the encrypted data key from the NN's xattr on the directory. 2) Somehow associates this encrypted data key with the master key that was used to encrypt it (perhaps it's tagged with some identifier). Fetches the appropriate master key from the key store. 2a) The keystore somehow authenticates and authorizes the client's access to this key 3) The client decrypts the data key using the master key, and is now able to set up a decrypting stream for the file itself. (I've ignored the IV here, but assume it's also stored in an xattr) In terms of attack vectors: - let's say that the NN disk is stolen. The thief now has access to a bunch of keys, but they're all encrypted by various master keys. So we're OK. - let's say that a client is malicious. It can get whichever master keys it has access to from the KMS. If we only have one master key per cluster, then the combination of one malicious client plus stealing the fsimage will give up all the keys - let's say that a client has escalated to root access on one of the slave nodes in the cluster, or otherwise has malicious access to a NodeManager process. By looking at a running MR task, it could steal whatever credentials the task is using to access the KMS, and/or dump the memory of the client process in order to give up the master key above. Does the above look right? It would be nice to add to the design doc a clear description of the threat model here. Do we assume that the adversary will never have root on the cluster? Do we assume the adversary won't have access to the "mapred" user (or whoever runs the NM?) How does the MR task in this context get the credentials to fetch keys from the KMS? If the KMS accepts the same authentication tokens as the NameNode, then is there any reason that this is more secure than having the NameNode supply the keys? Or is it just that decoupling the NameNode and the key server allows this approach to work for non-HDFS filesystems, at the expense of an additional daemon running a key distribution service? > Hadoop cryptographic file system > -------------------------------- > > Key: HADOOP-10150 > URL: https://issues.apache.org/jira/browse/HADOOP-10150 > Project: Hadoop Common > Issue Type: New Feature > Components: security > Affects Versions: 3.0.0 > Reporter: Yi Liu > Assignee: Yi Liu > Labels: rhino > Fix For: 3.0.0 > > Attachments: CryptographicFileSystem.patch, HADOOP cryptographic file > system-V2.docx, HADOOP cryptographic file system.pdf, cfs.patch, extended > information based on INode feature.patch > > > There is an increasing need for securing data when Hadoop customers use > various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so > on. > HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based > on HADOOP “FilterFileSystem” decorating DFS or other file systems, and > transparent to upper layer applications. It’s configurable, scalable and fast. > High level requirements: > 1. Transparent to and no modification required for upper layer > applications. > 2. “Seek”, “PositionedReadable” are supported for input stream of CFS if > the wrapped file system supports them. > 3. Very high performance for encryption and decryption, they will not > become bottleneck. > 4. Can decorate HDFS and all other file systems in Hadoop, and will not > modify existing structure of file system, such as namenode and datanode > structure if the wrapped file system is HDFS. > 5. Admin can configure encryption policies, such as which directory will > be encrypted. > 6. A robust key management framework. > 7. Support Pread and append operations if the wrapped file system supports > them. -- This message was sent by Atlassian JIRA (v6.2#6252)