ni...@lysator.liu.se (Niels Möller) writes:

> I've also added a cbc-aes128-encrypt.asm.
> That gives more significant speedup, almost 60%. I think main reason for
> the speedup is that we avoid reloading subkeys between blocks.

I've continued this path, see branch aes-cbc. The aes128 variant is at 

https://git.lysator.liu.se/nettle/nettle/-/blob/aes-cbc/x86_64/aesni/cbc-aes128-encrypt.asm

Benchmark results are positive but a bit puzzling. On my laptop (AMD
Ryzen 5) I get

            aes128  ECB encrypt 5450.18

This is the latest version, doing two blocks per iteration.

            aes128  CBC encrypt  547.34

The general CBC mode written in C, with one call to aes128_encrypt per
block. 10(!) times slower than ECB.

        cbc_aes128      encrypt  865.11

The new assembly function. Almost 60% speedup over the old code, which
is nice, and large enough that it seems motivated to have the new
functin. But still 6 times slower than ECB. I'm not sure why. Let's look
a bit closer at cycle numbers.

Not sure I get accurate cycle numbers (it's a bit tricky with variable
features and turbo modes and whatnot), but it looks like ECB mode is 6
cycles per block, which would be consistent with issue of two aesenc
instructions per block. While the CBC mode is 37 cycles per block,
almost 4 cycles per aesenc. 

This could be explained if (i) latency of aesenc is 3-4 cycles, and (ii)
the processor's out-of-order machinery results in as many as 7-8 blocks
processed in parallel when executing the ECB loop, i.e., instruction
issue for 3-4 iterations through the loop before the results of the
first iteration is ready.

The interface for the new function is 

  struct cbc_aes128_ctx CBC_CTX(struct aes128_ctx, AES_BLOCK_SIZE);
  void
  cbc_aes128_encrypt(struct cbc_aes128_ctx *ctx, size_t length, 
                     uint8_t *dst, const uint8_t *src);

I'm not that fond of the struct cbc_aes128_ctx though, which includes
both (constant) subkeys and iv. So I'm considering changing that to

  void
  cbc_aes128_encrypt(const struct aes128_ctx *ctx, uint8_t *iv,
                     size_t length, uint8_t *dst, const uint8_t *src);

I.e., similar to cbc_encrypt, but without the arguments
nettle_cipher_func *f, size_t block_size.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to