On 5/20/19 4:25 AM, Arnd Bergmann wrote: > On Sun, May 19, 2019 at 7:11 PM Alex Elder <el...@linaro.org> wrote: >> On 5/17/19 1:44 PM, Alex Elder wrote: >>> On 5/17/19 1:33 PM, Arnd Bergmann wrote: >>>> On Fri, May 17, 2019 at 8:08 PM Alex Elder <el...@linaro.org> >> >> So it seems that I must *not* apply a volatile qualifier, >> because doing so restricts the compiler from making the >> single instruction optimization. > > Right, I guess that makes sense. > >> If I've missed something and you have another suggestion for >> me to try let me know and I'll try it. > > A memcpy() might do the right thing as well. Another idea would
I find memcpy() does the right thing. > be a cast to __int128 like I find that my environment supports 128 bit integers. But... > #ifdef CONFIG_ARCH_SUPPORTS_INT128 > typedef __int128 tre128_t; > #else > typedef struct { __u64 a; __u64 b; } tre128_t; > #else > > static inline void set_tre(struct gsi_tre *dest_tre, struct gs_tre *src_tre) > { > *(volatile tre128_t *)dest_tre = *(tre128_t *)src_tre; > } ...this produces two 8-bit assignments. Could it be because it's implemented as two 64-bit values? I think so. Dropping the volatile qualifier produces a single "stp" instruction. The only other thing I thought I could do to encourage the compiler to do the right thing is define the type (or variables) to have 128-bit alignment. And doing that for the original simple assignment didn't change the (desirable) outcome, but I don't think it's really necessary in this case, considering the single instruction uses two 64-bit registers. I'm going to leave it as it was originally; it's the simplest: *dest_tre = tre; I added a comment about structuring the code this way with the intention of getting the single instruction. If a different compiler produces different result -Alex > > Arnd >