[jira] [Updated] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

Yi Liu (JIRA) Fri, 16 May 2014 11:45:18 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yi Liu updated HADOOP-10603:
----------------------------

    Attachment: HADOOP-10603.3.patch

Hi [~clamb] and [~tucu00], thanks for your good comments. I merge part of your 
attached code to the new patch, it’s good.

Definitions of {{Encryptor}} and {{Decryptor}} interfaces in latest patch are 
as following:
{code}
public interface Encryptor {
  …
  …
  public void encrypt(ByteBuffer inBuffer, ByteBuffer outBuffer) throws 
IOException;
}
{code}
For each encryption, it’s better to invoke few Cipher# methods, these calls are 
expensive. We could treat whole input stream or output stream as 
encryption/decryption unit, instead of each time  encrypt/decrypt as an unit. 

{quote}
The AESCTREncryptor/AESCTRDecryptor do update() instead of doFinal() on the 
cipher. This could lead to incomplete decryption of the buffer in some cipher 
implementations since the contract is that a doFinal() must be done.
{quote}
For the encryption mode which needs padding, we should use {{doFinal}}, for 
CTR, {{update}} is OK. The difference between these two is {{update}} will 
maintain the internal state, we can utilize this, so for each “encrypt/decrypt” 
we only need call one Cipher# method. From {{Cipher#update}} javadoc, we can 
know it decrypts data, for JCE default provider, and diaceros, and even UPDATE 
interface of OpenSSL, they all work in this way.

I also agree with you that some cipher implementation may be not good, and 
{{update}} doesn't not encrypt/decrypt data for CTR mode, we should be able to 
handle this, in this situation we need to invoke {{doFinal}}.
Please review the patch to see how we handle all the possible situations. In 
most cases, we only invoke one Cipher# method for each ecrypt/decrypt.

So in the attached code {{Cipher#init}}, {{Cipher#update}}, {{Cipher#doFinal}} 
in each #process method is not necessary.
{code}
void process(Cipher cipher, int mode, SecretKeySpec key, byte[] originalIV,
…
      cipher.init(mode, key, new IvParameterSpec(workingIV));
      int mod = (int) (absolutePos % BLOCK_SIZE);
      cipher.update(IV_FORWARD_SINK, 0, mod, IV_FORWARD_SINK, 0);
      cipher.doFinal(in, out);
…   
  }
{code}

In the latest patch of stream implementations,
For decryption:
{code}
@Override
  public int read(byte[] b, int off, int len) throws IOException {
    checkStream();
    if ((off | len | (off + len) | (b.length - (off + len))) < 0) {
      throw new IndexOutOfBoundsException();
    }
    
    int remaining = outBuffer.remaining();
    if (remaining > 0) {
      int n = Math.min(len, remaining);
      outBuffer.get(b, off, n);
      return n;
    } else {
      int n = 0;
      if (in instanceof ByteBufferReadable) {
        n = ((ByteBufferReadable) in).read(inBuffer);
        if (n <= 0) {
          return n;
        }
      } else {
        int toRead = inBuffer.remaining();
        byte[] tmp = getTmpBuf();
        n = in.read(tmp, 0, toRead);
        if (n <= 0) {
          return n;
        }
        inBuffer.put(tmp, 0, n);
      }
      streamOffset += n; // Read n bytes
      return process(b, off, n);
    }
  }
{code}
We need to use {{ByteBufferReadable}}, it avoids copy. So for HDFS, it does.

{quote}
The current implementation assumes the counter portion always starts
with zero, right?
{quote}
I will add this in next version of patch.

For other issues, they should be resolved in latest patch, please review the 
patch again, thanks. We also have more test cases to cover them in HDFS-6405.
I will also add test cases in common.


> Crypto input and output streams implementing Hadoop stream interfaces
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-10603
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10603
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Alejandro Abdelnur
>            Assignee: Yi Liu
>             Fix For: 3.0.0
>
>         Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, 
> HADOOP-10603.3.patch, HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JDK Cipher Input/Output streams directly because we 
> need to support the additional interfaces that the Hadoop FileSystem streams 
> implement (Seekable, PositionedReadable, ByteBufferReadable, 
> HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
> HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

Reply via email to