Another patch for register defines. I apologize for that.

regards,
Mamone

On Tue, Nov 17, 2020 at 11:10 PM Maamoun TK <maamoun...@googlemail.com>
wrote:

> I replaced the method of using the stack to handle the leftovers with the
> first approach, also I changed some vector registers in the defines because
> I defined `LE_MASK' in a non-volatile register which is not
> always preserved.
>
> This patch is built on the top ppc-gcm branch.
>
> regards,
> Mamone
>
> On Sat, Nov 14, 2020 at 8:11 PM Maamoun TK <maamoun...@googlemail.com>
> wrote:
>
>> For the first approach I can think of this method:
>> lxvd2x      VSR(C0),0,DATA
>> IF_LE(`
>> vperm       C0,C0,C0,LE_MASK
>> ')
>> slwi        LENGTH,LENGTH,4     (Shift left 4 bitls because vsro get
>> bit[121:124])
>> vspltisb    v10,-1
>> (0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF)
>> mtvrwz      v11,LENGTH             (LENGTH in bit[57:60])
>> xxspltd     VSR(v11),VSR(v11),0 (LENGTH in bit[121:124])
>> vsro        v10,v10,v11                  (Sift right by octet)
>> vnot        v10,v10
>> vand        C0,C0,v10
>>
>> I recommend the third approach so we don't have to deal with the leftover
>> bytes in the upcoming implementations but the problem is that
>> gcm_init_key() initialize the table for the compatible gcm_hash() function,
>> that means we can't process the remaining bytes using gcm_gf_mul() of
>> gcm_gf_shift_8() because its table potentially has not been initialized, so
>> I'm thinking of keeping gcm_gf_mul() of the one that don't need table
>> (where GCM_TABLE_BITS == 0) and always process the remaining bytes with
>> this function.
>>
>> The test coverage is fine, I can't think of any potential untested cases.
>>
>> regards,
>> Mamone
>>
>>
>> On Sat, Nov 14, 2020 at 6:54 PM Niels Möller <ni...@lysator.liu.se>
>> wrote:
>>
>>> Maamoun TK <maamoun...@googlemail.com> writes:
>>>
>>> > +Lmod:
>>> > +    C --- process the modulo bytes, padding the low-order bytes with
>>> zeros
>>> > ---
>>> > +
>>> > +    cmpldi         LENGTH,0
>>> > +    beq            Ldone
>>> > +
>>> > +    C load table elements
>>> > +    li             r8,1*TableElemAlign
>>> > +    lxvd2x         VSR(H1M),0,TABLE
>>> > +    lxvd2x         VSR(H1L),r8,TABLE
>>> > +
>>> > +    C push every modulo byte to the stack and load them with padding
>>> into
>>> > vector register
>>> > +    vxor           ZERO,ZERO,ZERO
>>> > +    addi           r8,SP,-16
>>> > +    stvx           ZERO,0,r8
>>> > +Lstb_loop:
>>> > +    subic.         LENGTH,LENGTH,1
>>> > +    lbzx           r7,LENGTH,DATA
>>> > +    stbx           r7,LENGTH,r8
>>> > +    bne            Lstb_loop
>>> > +    lxvd2x         VSR(C0),0,r8
>>>
>>> It's always a bit annoying to have to deal with leftovers like this
>>> in the assembly code. Can we avoid having to store it to memory and read
>>> back? I can see three other approaches:
>>>
>>> 1. Loop, reading a byte at a time, and shift into a target register. I
>>>    guess we would need to assemble the bytes in a regular register, and
>>>    then transfer the final value to a vector register. Is that
>>>    expensive?
>>>
>>> 2. Round the address down to make it aligned, read an aligned word and,
>>>    only if needed, the next word. And shift and mask to get the needed
>>>    bytes. I think it is fine to read a few bytes outside of the input
>>>    area, as long as the reads do *not* cross any word boundary (and
>>>    hence a potential page boundary). We do things like this in some
>>>    other places, but then for reading unaligned data in general, not
>>>    just leftover parts.
>>>
>>> 3. Adapt the internal C/asm interface, so that the assembly routine only
>>>    needs to handle complete blocks. It could provide a gcm_gf_mul, and
>>>    let the C code handle partial blocks using memxor + gcm_gf_mul.
>>>
>>> I would guess (1) or maybe (3) is the most reasonable. I don't think
>>> performance is that important, since it looks like for each message,
>>> this case can happen only for the last call to gcm_update and the last
>>> call to gcm_encrypt/gcm_decrypt.
>>>
>>> What about test coverage? It looks like we have test cases for sizes up
>>> to 8 blocks, and for partial blocks, so I guess that should be fine?
>>>
>>> Reards,
>>> /Niels
>>>
>>> --
>>> Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
>>> Internet email is subject to wholesale government surveillance.
>>>
>>
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to