On 1/27/19 9:45 AM, Mark Cave-Ayland wrote:
>> I would expect the i < n/2 loop to be faster, because the assignments are
>> unconditional. FWIW.
>
> Do you have any idea as to how much faster? Is it something that would show
> up as significant within the context of QEMU?
I don't have any numbers on that, no.
> As well as eliminating the HI_IDX/LO_IDX constants I do find the updated
> version much easier to read, so I would prefer to keep it if possible.
> What about unrolling the loop into 2 separate ones...
I doubt that would be helpful.
I would think that
#define VMRG_DO(name, access, ofs)
...
int i, half = ARRAY_SIZE(r->access(0)) / 2;
...
for (i = 0; i < half; i++) {
result.access(2 * i + 0) = a->access(i + ofs);
result.access(2 * i + 1) = b->access(i + ofs);
}
where OFS = 0 for HI and half for LO is best. I find it quite readable, and it
avoids duplicating code between LO and HI as you're currently doing.
r~