On Fri, 14 Dec 2018 at 05:24, Richard Henderson <richard.hender...@linaro.org> wrote: > > This is the main crypto routine, an implementation of QARMA. > This matches, as much as possible, ARM pseudocode. > > Signed-off-by: Richard Henderson <richard.hender...@linaro.org> > --- > target/arm/helper-a64.c | 241 +++++++++++++++++++++++++++++++++++++++- > 1 file changed, 240 insertions(+), 1 deletion(-) > > diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c > index 19486b9677..1da7867a42 100644 > --- a/target/arm/helper-a64.c > +++ b/target/arm/helper-a64.c > @@ -1057,10 +1057,249 @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp) > * Helpers for ARMv8.3-PAuth. > */ > > +static uint64_t pac_cell_shuffle(uint64_t i) > +{ > + uint64_t o = 0; > + > + o |= extract64(i, 52, 4); > + o |= extract64(i, 24, 4) << 4; > + o |= extract64(i, 44, 4) << 8; > + o |= extract64(i, 0, 4) << 12; > + > + o |= extract64(i, 28, 4) << 16; > + o |= extract64(i, 48, 4) << 20; > + o |= extract64(i, 4, 4) << 24; > + o |= extract64(i, 40, 4) << 28; > + > + o |= i & MAKE_64BIT_MASK(32, 4);
Can't we just use o |= extract64(i, 32, 4) << 32; to stay parallel with everything else? Similarly below. > + o |= extract64(i, 12, 4) << 36; > + o |= extract64(i, 56, 4) << 40; > + o |= extract64(i, 8, 4) << 44; The pseudocode in the DDI0487D.a Arm Arm says that bits outdata<47:44> are indata<23:20>... > + > + o |= extract64(i, 36, 4) << 48; > + o |= extract64(i, 16, 4) << 52; ...and these don't match either... > + o |= extract64(i, 40, 4) << 56; and this definitely looks wrong as we've already used bits 43:40 earlier. > + o |= i & MAKE_64BIT_MASK(60, 4); > + > + return o; > +} > +static int rot_cell(int cell, int n) > +{ > + cell |= cell << 4; > + cell >>= n; > + return cell & 0xf; This doesn't seem to do what the RotCell pseudocode does? Unless I've made an error, RotCell(ABCD, 1) == BCDA, but rot_cell(ABCD, 1) == DABC. > +} > + > +static uint64_t pac_mult(uint64_t i) > +{ > + uint64_t o = 0; > + int b; > + > + for (b = 0; b < 4 * 4; b += 4) { > + int i0, i4, i8, ic, t0, t1, t2, t3; > + > + i0 = extract64(i, b, 4); > + i4 = extract64(i, b + 4 * 4, 4); > + i8 = extract64(i, b + 8 * 4, 4); > + ic = extract64(i, b + 12 * 4, 4); > + > + t0 = rot_cell(i8, 1) ^ rot_cell(i4, 2) ^ rot_cell(i0, 1); > + t1 = rot_cell(ic, 1) ^ rot_cell(i4, 1) ^ rot_cell(i0, 2); > + t2 = rot_cell(ic, 2) ^ rot_cell(i8, 1) ^ rot_cell(i0, 1); > + t3 = rot_cell(ic, 2) ^ rot_cell(i8, 2) ^ rot_cell(i4, 1); Shouldn't the first term for t3 be rot_cell(ic, 1) ? > + > + o |= (uint64_t)t3 << b; > + o |= (uint64_t)t2 << (b + 4 * 4); > + o |= (uint64_t)t1 << (b + 8 * 4); > + o |= (uint64_t)t0 << (b + 12 * 4); > + } > + return o; > +} thanks -- PMM