On Tue, 30 May 2023 at 18:43, Richard Henderson <richard.hender...@linaro.org> wrote: > > On 5/30/23 06:52, Ard Biesheuvel wrote: > > +#ifdef __x86_64__ > > + if (have_aes()) { > > + __m128i *d = (__m128i *)rd; > > + > > + *d = decrypt ? _mm_aesdeclast_si128(rk.vec ^ st.vec, (__m128i){}) > > + : _mm_aesenclast_si128(rk.vec ^ st.vec, (__m128i){}); > > Do I correctly understand that the ARM xor is pre-shift > > > + return; > > + } > > +#endif > > + > > /* xor state vector with round key */ > > rk.l[0] ^= st.l[0]; > > rk.l[1] ^= st.l[1]; > > (like so) > > whereas the x86 xor is post-shift > > > void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg > > *s) > > { > > int i; > > Reg st = *v; > > Reg rk = *s; > > > > for (i = 0; i < 8 << SHIFT; i++) { > > d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i & 15] + (i & > > ~15))]); > > } > > (like so, from target/i386/ops_sse.h)? >
Indeed. Using the primitive operations defined in the AES paper, we basically have the following for n rounds of AES (for n in {10, 12, 14}) for (n-1 rounds) { AddRoundKey ShiftRows SubBytes MixColumns } AddRoundKey ShiftRows SubBytes AddRoundKey AddRoundKey is just XOR, but it is incorporated into the instructions that combine a couple of these steps. So on x86, we have aesenc: ShiftRows SubBytes MixColumns AddRoundKey aesenclast: ShiftRows SubBytes AddRoundKey and on ARM we have aese: AddRoundKey ShiftRows SubBytes aesmc: MixColumns > What might help: could we do the reverse -- emulate the x86 aesdeclast > instruction with > the aarch64 aesd instruction? > Help in what sense? To emulate the x86 instructions on a ARM host? But yes, aesenclast can be implement using aese in a similar way, i.e., by passing a {0} vector as the round key into the instruction, and performing the XOR explicitly using the real round key afterwards.