Maamoun TK <maamoun...@googlemail.com> writes:

> +Lmod:
> +    C --- process the modulo bytes, padding the low-order bytes with zeros
> ---
> +
> +    cmpldi         LENGTH,0
> +    beq            Ldone
> +
> +    C load table elements
> +    li             r8,1*TableElemAlign
> +    lxvd2x         VSR(H1M),0,TABLE
> +    lxvd2x         VSR(H1L),r8,TABLE
> +
> +    C push every modulo byte to the stack and load them with padding into
> vector register
> +    vxor           ZERO,ZERO,ZERO
> +    addi           r8,SP,-16
> +    stvx           ZERO,0,r8
> +Lstb_loop:
> +    subic.         LENGTH,LENGTH,1
> +    lbzx           r7,LENGTH,DATA
> +    stbx           r7,LENGTH,r8
> +    bne            Lstb_loop
> +    lxvd2x         VSR(C0),0,r8

It's always a bit annoying to have to deal with leftovers like this
in the assembly code. Can we avoid having to store it to memory and read
back? I can see three other approaches:

1. Loop, reading a byte at a time, and shift into a target register. I
   guess we would need to assemble the bytes in a regular register, and
   then transfer the final value to a vector register. Is that
   expensive?

2. Round the address down to make it aligned, read an aligned word and,
   only if needed, the next word. And shift and mask to get the needed
   bytes. I think it is fine to read a few bytes outside of the input
   area, as long as the reads do *not* cross any word boundary (and
   hence a potential page boundary). We do things like this in some
   other places, but then for reading unaligned data in general, not
   just leftover parts.

3. Adapt the internal C/asm interface, so that the assembly routine only
   needs to handle complete blocks. It could provide a gcm_gf_mul, and
   let the C code handle partial blocks using memxor + gcm_gf_mul.

I would guess (1) or maybe (3) is the most reasonable. I don't think
performance is that important, since it looks like for each message,
this case can happen only for the last call to gcm_update and the last
call to gcm_encrypt/gcm_decrypt.

What about test coverage? It looks like we have test cases for sizes up
to 8 blocks, and for partial blocks, so I guess that should be fine?

Reards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to