I replaced the method of using the stack to handle the leftovers with the first approach, also I changed some vector registers in the defines because I defined `LE_MASK' in a non-volatile register which is not always preserved.
This patch is built on the top ppc-gcm branch. regards, Mamone On Sat, Nov 14, 2020 at 8:11 PM Maamoun TK <maamoun...@googlemail.com> wrote: > For the first approach I can think of this method: > lxvd2x VSR(C0),0,DATA > IF_LE(` > vperm C0,C0,C0,LE_MASK > ') > slwi LENGTH,LENGTH,4 (Shift left 4 bitls because vsro get > bit[121:124]) > vspltisb v10,-1 > (0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF) > mtvrwz v11,LENGTH (LENGTH in bit[57:60]) > xxspltd VSR(v11),VSR(v11),0 (LENGTH in bit[121:124]) > vsro v10,v10,v11 (Sift right by octet) > vnot v10,v10 > vand C0,C0,v10 > > I recommend the third approach so we don't have to deal with the leftover > bytes in the upcoming implementations but the problem is that > gcm_init_key() initialize the table for the compatible gcm_hash() function, > that means we can't process the remaining bytes using gcm_gf_mul() of > gcm_gf_shift_8() because its table potentially has not been initialized, so > I'm thinking of keeping gcm_gf_mul() of the one that don't need table > (where GCM_TABLE_BITS == 0) and always process the remaining bytes with > this function. > > The test coverage is fine, I can't think of any potential untested cases. > > regards, > Mamone > > > On Sat, Nov 14, 2020 at 6:54 PM Niels Möller <ni...@lysator.liu.se> wrote: > >> Maamoun TK <maamoun...@googlemail.com> writes: >> >> > +Lmod: >> > + C --- process the modulo bytes, padding the low-order bytes with >> zeros >> > --- >> > + >> > + cmpldi LENGTH,0 >> > + beq Ldone >> > + >> > + C load table elements >> > + li r8,1*TableElemAlign >> > + lxvd2x VSR(H1M),0,TABLE >> > + lxvd2x VSR(H1L),r8,TABLE >> > + >> > + C push every modulo byte to the stack and load them with padding >> into >> > vector register >> > + vxor ZERO,ZERO,ZERO >> > + addi r8,SP,-16 >> > + stvx ZERO,0,r8 >> > +Lstb_loop: >> > + subic. LENGTH,LENGTH,1 >> > + lbzx r7,LENGTH,DATA >> > + stbx r7,LENGTH,r8 >> > + bne Lstb_loop >> > + lxvd2x VSR(C0),0,r8 >> >> It's always a bit annoying to have to deal with leftovers like this >> in the assembly code. Can we avoid having to store it to memory and read >> back? I can see three other approaches: >> >> 1. Loop, reading a byte at a time, and shift into a target register. I >> guess we would need to assemble the bytes in a regular register, and >> then transfer the final value to a vector register. Is that >> expensive? >> >> 2. Round the address down to make it aligned, read an aligned word and, >> only if needed, the next word. And shift and mask to get the needed >> bytes. I think it is fine to read a few bytes outside of the input >> area, as long as the reads do *not* cross any word boundary (and >> hence a potential page boundary). We do things like this in some >> other places, but then for reading unaligned data in general, not >> just leftover parts. >> >> 3. Adapt the internal C/asm interface, so that the assembly routine only >> needs to handle complete blocks. It could provide a gcm_gf_mul, and >> let the C code handle partial blocks using memxor + gcm_gf_mul. >> >> I would guess (1) or maybe (3) is the most reasonable. I don't think >> performance is that important, since it looks like for each message, >> this case can happen only for the last call to gcm_update and the last >> call to gcm_encrypt/gcm_decrypt. >> >> What about test coverage? It looks like we have test cases for sizes up >> to 8 blocks, and for partial blocks, so I guess that should be fine? >> >> Reards, >> /Niels >> >> -- >> Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. >> Internet email is subject to wholesale government surveillance. >> > _______________________________________________ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs