Re: [openssl.org #2162] Updated CMAC, CCM, GCM code

Andy Polyakov Mon, 08 Mar 2010 15:25:06 -0800

- versions with SSE2/SSE3 support (if OPENSSL_ia32cap signals a validprocessor), reducing the number of asm instructions within a loop to16 with a 4k table, and to 26 with a 256byte table (w/o: 34 and 62),

Are these numbers really per loop spin, i.e. per every 8 and 4 bitsrespectively, or per byte for either? In another message you wrote thatyou observe "over 50% improvement" with SSE2 code. On which platform?Which compiler? Etc.

I've sketched *32-bit* integer and MMX (yes, pure MMX) gcm_gmult_4bit,i.e. one operating with 256 bytes table. MMX code was observed toprocess one byte in ~35 cycles on P4(*) and in ~22 cycles on Core2 andOpteron, which is ~2-3x faster that code generated by gcc. If comparedto integer assembler MMX code was observed to be ~35% faster on Core2and Opteron and 2.5x faster on P4. Latter is because I've chosen shrdfor integer assembler and it just kills P4.

(*) CPU oscillator's *cycles* per byte, not instructions. In case youwonder I have 25 instructions per *byte* in MMX loop and 27 in integer loop.


A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Re: [openssl.org #2162] Updated CMAC, CCM, GCM code

Reply via email to