Maamoun TK <maamoun...@googlemail.com> writes:

> Thanks for the clarification, I just misunderstanded the division with the
> partial reduction in a previous reply.
>
> Ok, so you mean a polynomial division of  b_0(x) by P(x) where P(x) = X^128
> +  X^127 +  X^126 +  X^121 + 1
> b_0(x)/P(x) = (b_0(x)*(p^-1 mod P(x))) mod P(x)
> b_0(x)/P(x) = (b_0(x)*(p')) mod P(x)
> P(x) = X^64 +  X^63 +  X^62 +  X^57
> P' = p^-1 mod P(x) = X^63 +  X^62 +  X^57
> so the constant 0xC2

Hmm, I'm not following you (and it seems you are using P(x) to refer to
two different polynomials). The operation is a bit similar to Montgomery
redc, and it has taken me some time to get used to the initially strange
mix of operations mod P(x) and mod x^k (or in the integer case, it's a
mix of mod m, with m odd, and mod 2^k).

  b_0(x) / x^64 (mod P(x)) 

can be computed in two different ways, giving the same result: 

   (i) Compute the inverse u(x) = (x^64){-1} mod P(x), then multiply
       b_0(x) u(x) (mod P(x))

       We get u(x) = x^64 + x^63 + x^62 + x^57, with degree only 64.
       (This is thanks to the special structure of P(x); an inverse of
       some arbitrary element mod a 128-degree polynomial would be
       expected to be larger).
       

  (ii) Add a multiple of P(x) to b_0(x) to make the the lowest 64
       coefficients all cancel, then divide by x^64 by simply shifting /
       subtracting 64 from exponents of remaining terms. And because of
       the special structure of P(x), P(x) = 1 mod x^64, the right
       multiple is b_0(x) P(x). So

       b_0(x) + P(x) b_0(x)

       is a degree 191 polynomial, with the least 64 coefficients all
       zero. When simplified, it boils down to the same b_0(x) u(x), no
       additional reduction needed.

I tend to think about reductions (from either end) in terms of
cancelling bits, so to me, (ii) is a familiar way to think about it.

> let me show you part of the new implementation of _nettle_gcm_init_key in
> PPC
>
>     C --- calculate [H = H << 1 modulo polynomial] ---
>
>     vupkhsb        EMSB,H                        C extend most significant
> bit to first byte
>     vspltisb       B1,1                          C
> 0x01010101010101010101010101010101
>     vspltb         EMSB,EMSB,0                   C first byte
> quadword-extend
>     vsl            H,H,B1                        C H = H << 1
>     vand           EMSB,EMSB,POLY                C EMSB &=
> 0xC2000000000000000000000000000001
>     vxor           ZERO,ZERO,ZERO                C
> 0x00000000000000000000000000000000
>     vxor           H,H,EMSB                      C H ^= EMSB
>
>     xxmrghd        VSR(POLY_L),VSR(ZERO),VSR(POLY) C
> 0x0000000000000000C200000000000000
>     xxmrghd        VSR(POLY_H),VSR(POLY),VSR(ZERO) C
> 0xC2000000000000000000000000000000
>
>     C --- calculate [H^2 = H*H] ---
>
>     xxswapd        VSR(Hm),VSR(H)
>     vpmsumd        HP,H,POLY_L
>     vxor           L,HP,Hm
>     xxmrghd        VSR(M),VSR(H),VSR(HP)
>     xxmrgld        VSR(L),VSR(H),VSR(L)
>
>     vpmsumd        R,M,H
>     vpmsumd        F,L,H
>
>     vpmsumd        T,F,POLY_H
>     xxswapd        VSR(F),VSR(F)
>     vxor           R,R,T
>     vxor           R,R,F
>
> R still yields the wrong value of H^2, did I miss something in the
> implementation or did it wrong?

I'm afraid I don't follow all details (and I would suggest trying to get
the most basic variant working first, doing only a single block at a
time). But one potential issue is the powers of x. If H(x) is the
original key, then I think you need to precompute values corresponding
to

  x H(x) (mod P(x))
  x H(x)^2 (mod P(x))

But it looks like you may be computing

  (x H(x))^2 / x^128 =  x^2 H(x)^2 / x^128

which is different. It's easy to get confused, but I think the
precomputation needs a standard mod P(x), i.e., adding a multiple of
P(x) cancelling the most significant coefficients, rather than the more
efficient reduction cancelling the least significant coefficients.

If you're familiar with montgomery multiplication of integers, there
it's possible to consistently use the operation a,b -> ab / 2^k
everywhere, but that requires that all inputs are transformed to
montgomery form, a --> 2^k a mod m. And in this case, it's desirable to
scale the precomputed values appropriately, so no such transformation is
needed for the inputs corresponding to message blocks.

To compute more powers of H, one could do the standard reduction only
once, essentially transforming H to montgomery form,

  H' = H(x) x^128 (mod P(x)),

then further powers of H can use the reduction of least significant
coefficients,

  (x H(x)) H' / x^128 = x H(x)^2

  (x H(x)^2) H' / x^128 = x H(x)^3

etc.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to