On 27.02.19 17:20, Richard Henderson wrote: > On 2/26/19 3:39 AM, David Hildenbrand wrote: >> + for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(es); dst_idx++) { >> + src_idx = dst_idx / 2; >> + if (!high) { >> + src_idx += NUM_VEC_ELEMENTS(es) / 2; >> + } >> + if (dst_idx % 2 == 0) { >> + read_vec_element_i64(tmp, v2, src_idx, es); >> + } else { >> + read_vec_element_i64(tmp, v3, src_idx, es); >> + } >> + write_vec_element_i64(tmp, dst_v, dst_idx, es); >> + } > > TODO: Note that you do not need a vector temporary here, so long as you load > both source elements before writing, and you iterate in the proper direction. > > For VMRL, iterate forward as you do now. The element access order for MO_32: > > read v2: 2 3 > read v3: 2 3 > write v1: 0 1 2 3 > > For VMRH, iterate backward: > > read v2: 1 0 > read v3: 1 0 > write v1: 3 2 1 0 > > > r~ >
Let's have a look for VMRH when iterating forward (My brain is a little slow in the morning): v1[0] = v2[0] v1[1] = v3[0] v1[2] = v2[1] v1[3] = v3[1] If all would overlap v1[0] = v1[0] v1[1] = v1[0] -> v1[0] already modified v1[2] = v1[1] -> v1[1] already modified v1[3] = v1[1] -> v1[1] already modified When iterating backwards: v1[3] = v3[1] v1[2] = v2[1] v1[1] = v3[0] v1[0] = v2[0] If all would overlap v1[3] = v1[1] v1[2] = v1[1] v1[1] = v1[0] v1[0] = v1[0] VMRH when iterating forward: v1[0] = v2[2] v1[1] = v3[2] v1[2] = v2[3] v1[3] = v3[3] If all would overlap v1[0] = v1[2] v1[1] = v1[2] v1[2] = v1[3] v1[3] = v1[3] Perfect :) I'll split up the two cases! Thanks! -- Thanks, David / dhildenb