Due to the complete missing of an optimization in the IBM proposal 
I am currently working on a GCM version as well. My current work 
includes: 

- EVP support for the CTR128 modes *1) (AES and Camellia), as these 
are required in the GCTR [SP800-38D] function of the GCM (instead of 
a block-wise use of the ECB mode),

- replacement of the byte-wise shift/xor/swap loops by using platform 
selective enrolled macros for BE/LE 8/16/32/64 bit architectures, 

- versions with SSE2/SSE3 support (if OPENSSL_ia32cap signals a valid 
processor), reducing the number of asm instructions within a loop to 
16 with a 4k table, and to 26 with a 256byte table (w/o: 34 and 62),

- replacement of allocated tables by local (stack) tables (as the table 
generation is now faster than the overhead for an alloc), removal of the 
64k table mode (as it is inefficient due to cache misses), removal of 
the 8k table mode (takes more instructions in the loop than an optimized 
4k table), 

- better execution of multiple blocks within GCTR and GHASH [SP800-38D] 
to optimize the use of local tables.


To be done:

- a SSSE3/PSHUFB version (currently do not have a suited processor),

- a PCLMULDQ version (same as above),

- rewrite/optimization of the counter increment function,

- introduction of a OPENSSL_ia32cap2 macro (see [openssl.org #2176]) 
for a safe use of SSE3, SSSE3 and PCLMULDQ extensions, 

- redesign of the EVP interface.


My suggestion for the interface is to introduce functions like *2)
EVP_AeadInit(EVP_AEAD_CTX *ectx, EVP_CIPHER_CTX *ctx,...); 

where EVP_AEAD_CTX could be EVP_gcm(), EVP_ccm(), EVP_eax(), 
EVP_cwc() and EVP_ocb() for the moment *3),

and EVP_CIPHER_CTX could be EVP_aes_128_ctr(), EVP_aes_192_ctr(), 
EVP_aes_256_ctr(), EVP_camellia_128_ctr(), EVP_camellia_192_ctr(),
EVP_camellia_256_ctr() for EVP_gcm(), and so on.


It would be nice if we could bring our parts together to build and  
test a version having all advantages together.

Peter-Michael


*1)Though the CTR128 modes have a full 128-bit counter and [SP800-38D] 
specifies a 32-bit counter at LSB, there is a defined limit of 64 gigabytes 
per invocation, which effectively prevents a counter overflow into the 
33rd bit. So, using the CTR128 instead of a CTR32 is possible. 

*2) AEAD = Authenticated Encryption with Associated Data
[http://en.wikipedia.org/wiki/AEAD_block_cipher_modes_of_operation]

*3) Depends on the status of eventual patents.

--
Peter-Michael Hager - HAGER-ELECTRONICS GmbH - Germany


-----Original Message-----
From: owner-openssl-...@openssl.org [mailto:owner-openssl-...@openssl.org] 
On Behalf Of Andy Polyakov via RT
Sent: Tuesday, March 02, 2010 6:01 PM
To: pwal...@au1.ibm.com
Cc: openssl-dev@openssl.org
Subject: Re: [openssl.org #2162] Updated CMAC, CCM, GCM code

> This is an update to the sources (only) for the CMAC, CCM and GCM code we
> donated previously.

Just to denote that alternative GCM implementation is available now,
see http://cvs.openssl.org/rlog?f=openssl/crypto/modes/gcm128.c. It's
initial version and interface is still subject to change. Things that
won't change is that the module in question is cipher agnostic (modulo
block-size), doesn't rely on EVP (i.e. will be free from circular
dependency when deployed from EVP) and is more aggressively optimized.
Latter refers to the fact that unlike code proposed by IBM it uses full
machine word logical operations instead of byte-oriented ones. As for
multiplication itself it currently opts for 4-bit multiplication. 1-bit
and 8-bit subroutines are available and tested. A.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to