Richard Henderson <r...@twiddle.net> writes: > On 10/26/2016 08:47 PM, David Gibson wrote: >>> > +void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b) >>> > +{ >>> > + int i; >>> > + uint8_t s = 0; >>> > + for (i = 0; i < 16; i++) { >>> > + s ^= (b->u8[i] & 1); >>> > + } >>> > + r->u64[LO_IDX] = (!s) ? 0 : 1; >>> > + r->u64[HI_IDX] = 0; >>> > +} >>> > + >> I think you can implement these better. First mask with 0x01010101 >> (of the appropriate length) to extract the LSB bits of each byte. >> Then XOR the two halves together, then quarters and so forth, >> ln2(size) times to arrive at the parity. This is similar to the usual >> Hamming weight implementation. >> > > You don't even have to mask with 0x01010101 to start. Just fold halves til > you > get to the byte level and then mask with 1.
Right, it does reduce number of operations: +#define SIZE_MASK(x) ((1ULL << (x)) - 1) +static uint64_t vparity(uint64_t f1, uint64_t f2, int size) +{ + uint64_t res = f1 ^ f2; + if (size == 8) return res; + return vparity(res & SIZE_MASK(size/2), res >> (size/2), size/2); +} + Regards Nikunj