On 27.02.19 17:20, Richard Henderson wrote:
> On 2/26/19 3:39 AM, David Hildenbrand wrote:
>> +    for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(es); dst_idx++) {
>> +        src_idx = dst_idx / 2;
>> +        if (!high) {
>> +            src_idx += NUM_VEC_ELEMENTS(es) / 2;
>> +        }
>> +        if (dst_idx % 2 == 0) {
>> +            read_vec_element_i64(tmp, v2, src_idx, es);
>> +        } else {
>> +            read_vec_element_i64(tmp, v3, src_idx, es);
>> +        }
>> +        write_vec_element_i64(tmp, dst_v, dst_idx, es);
>> +    }
> 
> TODO: Note that you do not need a vector temporary here, so long as you load
> both source elements before writing, and you iterate in the proper direction.
> 
> For VMRL, iterate forward as you do now.  The element access order for MO_32:
> 
>  read  v2: 2   3
>  read  v3:   2   3
>  write v1: 0 1 2 3
> 
> For VMRH, iterate backward:
> 
>  read  v2: 1   0
>  read  v3:   1   0
>  write v1: 3 2 1 0
> 
> 
> r~
> 

Let's have a look for VMRH when iterating forward (My brain is a little
slow in the morning):

v1[0] = v2[0]
v1[1] = v3[0]
v1[2] = v2[1]
v1[3] = v3[1]

If all would overlap

v1[0] = v1[0]
v1[1] = v1[0] -> v1[0] already modified
v1[2] = v1[1] -> v1[1] already modified
v1[3] = v1[1] -> v1[1] already modified

When iterating backwards:

v1[3] = v3[1]
v1[2] = v2[1]
v1[1] = v3[0]
v1[0] = v2[0]

If all would overlap

v1[3] = v1[1]
v1[2] = v1[1]
v1[1] = v1[0]
v1[0] = v1[0]


VMRH when iterating forward:

v1[0] = v2[2]
v1[1] = v3[2]
v1[2] = v2[3]
v1[3] = v3[3]

If all would overlap

v1[0] = v1[2]
v1[1] = v1[2]
v1[2] = v1[3]
v1[3] = v1[3]

Perfect :) I'll split up the two cases! Thanks!

-- 

Thanks,

David / dhildenb

Reply via email to