Re: [PATCH v2 0/2] Implement AES on ARM using x86 instructions and vv

Richard Henderson Wed, 31 May 2023 09:34:13 -0700

On 5/31/23 04:22, Ard Biesheuvel wrote:

Use the host native instructions to implement the AES instructions
exposed by the emulated target. The mapping is not 1:1, so it requires a
bit of fiddling to get the right result.


This is still RFC material - the current approach feels too ad-hoc, but
given the non-1:1 correspondence, doing a proper abstraction is rather
difficult.

Changes since v1/RFC:
- add second patch to implement x86 AES instructions on ARM hosts - this
   helps illustrate what an abstraction should cover.
- use cpuinfo framework to detect host support for AES instructions.
- implement ARM aesimc using x86 aesimc directly

Patch #1 produces a 1.5-2x speedup in tests using the Linux kernel's
tcrypt benchmark (mode=500)

Patch #2 produces a 2-3x speedup. The discrepancy is most likely due to
the fact that ARM uses two instructions to implement a single AES round,
whereas x86 only uses one.

Thanks. I spent some time yesterday looking at this, with an encrypted disk test case andcould only measure 0.6% and 0.5% for total overhead of decrypt and encrypt respectively.

As for the design of an abstraction: I imagine we could introduce a
host/aes.h API that implements some building blocks that the TCG helper
implementation could use.


Indeed.  I was considering interfaces like

/* Perform SubBytes + ShiftRows on state. */
Int128 aesenc_SB_SR(Int128 state);

/* Perform MixColumns on state. */
Int128 aesenc_MC(Int128 state);

/* Perform SubBytes + ShiftRows + MixColumns on state. */
Int128 aesenc_SB_SR_MC(Int128 state);

/* Perform SubBytes + ShiftRows + MixColumns + AddRoundKey. */
Int128 aesenc_SB_SR_MC_AK(Int128 state, Int128 roundkey);

and so forth for aesdec as well. All but aesenc_MC should be implementable on x86 andPower7, and all of them on aarch64.

I suppose it really depends on whether there is a third host
architecture that could make use of this, and how its AES instructions
map onto the primitive AES ops above.


There is Power6 (v{,n}cipher{,last}) and RISC-V Zkn (aes64{es,esm,ds,dsm,im})

I got hung up yesterday was understanding the different endian requirements of 
x86 vs Power.

ppc64:

    asm("lxvd2x 32,0,%1;"
        "lxvd2x 33,0,%2;"
        "vcipher 0,0,1;"
        "stxvd2x 32,0,%0"
        : : "r"(o), "r"(i), "r"(k), : "memory", "v0", "v1", "v2");

ppc64le:

    unsigned char le[16] = {8,9,10,11,12,13,14,15,0,1,2,3,4,5,6,7};
    asm("lxvd2x 32,0,%1;"
        "lxvd2x 33,0,%2;"
        "lxvd2x 34,0,%3;"
        "vperm 0,0,0,2;"
        "vperm 1,1,1,2;"
        "vcipher 0,0,1;"
        "vperm 0,0,0,2;"
        "stxvd2x 32,0,%0"
        : : "r"(o), "r"(i), "r"(k), "r"(le) : "memory", "v0", "v1", "v2");

There are also differences in their AES_Te* based C routines as well, which made me wonderif we are handling host endianness differences correctly in emulation right now. I thinkI should most definitely add some generic-ish tests for this...

r~

Re: [PATCH v2 0/2] Implement AES on ARM using x86 instructions and vv

Reply via email to