[ 
https://issues.apache.org/jira/browse/HDFS-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429961#comment-13429961
 ] 

Aaron T. Myers commented on HDFS-3637:
--------------------------------------

Thanks a lot for the very thorough review, Eli. Updated patch incoming.

bq. Testing?

In addition to the included automated tests, I've tested this on a 4-node 
cluster, reading and writing files, running MR jobs (tera gens/sorts), etc. 
I've seen no issues.

bq. What's the latest performance slowdown for the basic HDFS read/write path 
with RC4 enabled?

I haven't done a really thorough benchmark, but my testing indicates about a 
1.8-2.2x slowdown with RC4, and a much higher slowdown with 3DES. I think this 
description of the relative speed of cipher algorithms in Java is pretty 
accurate:

http://www.javamex.com/tutorials/cryptography/ciphers.shtml

bq. Seems like DFSOutputStream#newBlockReader in the conf.useLegacyBlockReader 
conditional should use a precondition or throw an RTE (eg AssertionError) if 
encryptionKey is null, otherwise the client will just consider this a dead DN 
and keep trying.

Good point. Changed to a RuntimeException.

bq. In the other case it should blow up if encryptionKey is null right, 
otherwise we can have it enabled server side but allow a client not to use it?

Not quite sure what you mean by this. In which case should we blow up if 
encryptionKey is null? Note that the client will never be allowed to not use 
encryption if the DN is configured to use it. The error message won't be nice, 
but no data will ever be transmitted in the clear.

bq. The dfs.encrypt.data.transfer description that this is a server-side config

Done.

bq. Add dfs.encrypt.data.transfer.algorithm with out a default and list two 
supported values?

Added the following:
{code}
<property>
  <name>dfs.encrypt.data.transfer.algorithm</name>
  <value></value>
  <description>
    This value may be set to either "3des" or "rc4". If nothing is set, then
    the configured JCE default on the system is used (usually 3DES.) It is
    widely believed that 3DES is more cryptographically secure, but RC4 is
    substantially faster.
  </description>
</property>
{code}

bq. Shouldn't shouldEncryptData throw an exception if server defaults is null 
instead of assume it shouldn't encrypt? Seems more secure, eg if we ever 
introduce a bug that results in the NN returning a null server default (should 
never happen currently).

No, for compatibility purposes. With the current implementation, an upgraded 
client talking to an older server (without encryption support) will correctly 
conclude that it does not need to encrypt data. Again, if we ever were to 
introduce a bug like you describe, nothing would be sent in the clear, and the 
client would blow up eventually.

bq. Consider pulling out the block manager not setting the block pool ID bug to 
a separate change?

Sorry, it's not a bug. It's because I changed BlockTokenSecretManager to take 
the BlockPoolId at creation time, instead of every time a BlockToken is 
created. This is a reasonable change to make since a single 
BlockTokenSecretManager cannot actually issue valid BlockTokens for anything 
but a single BlockPoolId. Sorry, I should have mentioned this change in my 
description of the patch.

bq. Use DFS_BLOCK_ACCESS_TOKEN_LIFETIME_DEFAULT instead of 15s?

This wouldn't be right, since we've lowered the key update interval and token 
lifetime earlier in the test. It also needs to be a few multiples of the block 
token lifetime, since several block tokens are valid at any given time (the 
current and the last two, by default.)

bq. Also perhaps update the relevant NN java doc to indicate that "getting" the 
key generates a new key with this timeout.

I called it "getEncryptionKey" to be in keeping with "getDelegationToken". More 
appropriate for these would probably be "generate" instead of "get". What are 
your thoughts on this?

bq. Jira for supporting encryption or remove this TODO?

Well, since we're sort of phasing out support for RemoteBlockReader, I doubt 
such a JIRA will actually ever be implemented. Perhaps we should just remove 
the TODO?

bq. Are the sendReadResult write timeout and DFSOutputStream#flush a separate 
issue or something introduced here?

It's no functional change - just a refactor so that 
RemoteBlockReader2#writeReadResult takes a stream as an argument, instead of 
always creating a new stream from the given socket.
                
> Add support for encrypting the DataTransferProtocol
> ---------------------------------------------------
>
>                 Key: HDFS-3637
>                 URL: https://issues.apache.org/jira/browse/HDFS-3637
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, hdfs client, security
>    Affects Versions: 2.0.0-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-3637.patch, HDFS-3637.patch, HDFS-3637.patch
>
>
> Currently all HDFS RPCs performed by NNs/DNs/clients can be optionally 
> encrypted. However, actual data read or written between DNs and clients (or 
> DNs to DNs) is sent in the clear. When processing sensitive data on a shared 
> cluster, confidentiality of the data read/written from/to HDFS may be desired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to