ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros

Richard Henderson Sun, 27 Jan 2019 10:08:38 -0800

On 1/27/19 9:45 AM, Mark Cave-Ayland wrote:
>> I would expect the i < n/2 loop to be faster, because the assignments are
>> unconditional.  FWIW.
> 
> Do you have any idea as to how much faster? Is it something that would show
> up as significant within the context of QEMU?


I don't have any numbers on that, no.

> As well as eliminating the HI_IDX/LO_IDX constants I do find the updated
> version much easier to read, so I would prefer to keep it if possible.
> What about unrolling the loop into 2 separate ones...

I doubt that would be helpful.

I would think that

#define VMRG_DO(name, access, ofs)
...
    int i, half = ARRAY_SIZE(r->access(0)) / 2;
...
    for (i = 0; i < half; i++) {
        result.access(2 * i + 0) = a->access(i + ofs);
        result.access(2 * i + 1) = b->access(i + ofs);
    }

where OFS = 0 for HI and half for LO is best.  I find it quite readable, and it
avoids duplicating code between LO and HI as you're currently doing.


r~

Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros

Reply via email to