2017-04-01 5:44 GMT+02:00 Eric Biggers <ebigge...@gmail.com>:
> Also, I realized that for gf128mul_x_lle() now that we aren't using the table 
> we
> don't need to shift '_tt' but rather can use the constant 0xe100000000000000:
>
>         /* equivalent to (u64)gf128mul_table_le[(b << 7) & 0xff] << 48
>          * (see crypto/gf128mul.c): */
>         u64 _tt = gf128mul_mask_from_bit(b, 0) & 0xe100000000000000;
>
>         r->b = cpu_to_be64((b >> 1) | (a << 63));
>         r->a = cpu_to_be64((a >> 1) ^ _tt);
>
> I think that would be better and you could send a v4 to do it that way if you
> want.  It's not a huge deal though.

Yes, I was hoping the compiler would be wise enough to fold the shift
into the constant, but I didn't actually check the assembly output...
I took the time to write a quick benchmark and the version without
shift is indeed notably faster.

That said, I'll go the extra mile and send a v4.

Thanks for the review!

O.M.

Reply via email to