Maamoun TK <maamoun...@googlemail.com> writes:

> +L4x_loop:
[...]
> +    C polynomial multiplication "classical" pre-processing
> +    xxmrghd        VSR(C23h),VSR(C2),VSR(C3)
> +    xxmrgld        VSR(C23l),VSR(C2),VSR(C3)
> +    xxmrghd        VSR(C01h),VSR(C0),VSR(C1)
> +    xxmrgld        VSR(C01l),VSR(C0),VSR(C1)
> +
> +    C polynomial multiplication "classical"
> +    vpmsumd        C3,C3,H1                      C M3 = H^1l*C3h⊕H^1h*C3l
> +    vpmsumd        C2,C2,H2                      C M2 = H^2l*C2h⊕H^2h*C2l
> +    vpmsumd        C1,C1,H3                      C M1 = H^3l*C1h⊕H^3h*C1l
> +    vpmsumd        C0,C0,H4                      C M0 = H^4l*C0h⊕H^4h*C0l
> +    vpmsumd        C23h,C23h,H21h                C H23 = H^2h*C2h⊕H^1h*C3h
> +    vpmsumd        C23l,C23l,H21l                C L23 = H^2l*C2l⊕H^1l*C3l
> +    vpmsumd        C01h,C01h,H43h                C H01 = H^4h*C0h⊕H^4h*C1h
> +    vpmsumd        C01l,C01l,H43l                C L01 = H^4l*C0l⊕H^4l*C1l

So you do 4 blocks with only 10 vpmsumd instructions (the 8 above, and 2
more below for the reduction). That's nice! It would be even nicer if
the H terms could be rearranged to eliminate the pre-processing of the C
values.

And it is all with polynomials bits reversed compared to the spec? I
find the spec,
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf,
not crystal clear, but as I understand it, the bit that is least
significant when interpreted as a plain integer corresponds to the
highest power of x, while instructions like vpmsumd use the opposite
(and more natural, imo) convention.

This is how I understand bit reversal. We want

  c(x) = a(x) b(x) mod p(x) = a(x) b(x) + q(x) p(x)

where a, b and c all are 128 bits, p is 129 bits, and q is 127 bits
(chosen so the high 127 bits of the 255 bit product cancel). Reverse as

  c'(x) = x^127 c(1/x), similarly for a', b',

  q'(x) = x^126 q(1/x), p'(x) = x^128 p(1/x)

Then we get

  c'(x) = [a'(x) b'(x) + q'(x) p'(x)] / x^127

i.e., q' now cancels the *low* 127 bits of the product. Which can also be
written as

  c'(x) = a'(x) b'(x) / x^127 (mod p'(x))

So in this way, bit reversal is a bit similar to montgomery
representation for modular arithmetic on integers. And if 128 and
corresponding bit boundary is more convenient than 127, one can either
shift the product one bit left, or premultiply one of the factors with x
mod p'(x).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to