On 9/6/2024 2:02 PM, Varghese, Vipin wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > <snipped> > >> > > >> --- a/app/test-pmd/macswap_sse.h >> > > >> +++ b/app/test-pmd/macswap_sse.h >> > > >> @@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t >> nb, >> > > >> uint64_t ol_flags; >> > > >> int i; >> > > >> int r; >> > > >> - __m128i addr0, addr1, addr2, addr3; >> > > >> + register __m128i addr0, addr1, addr2, addr3; >> > > > Some compilers treat register as a no-op. Are you sure? Did you check >> with godbolt. >> > > >> > > Thank you Stephen, I have tested the code changes on Linux using GCC >> > > and Clang compiler. >> > > >> > > In both cases in Linux environment, we have seen the the values >> > > loaded onto register `xmm`. >> > > >> > > ``` >> > > registerconst__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, 5, 4, >> > > 3, 2, 1, 0, 11, 10, 9, 8, 7, 6); vmovdqaxmm0, xmmwordptr[rip+ >> > > .LCPI0_0] >> >> Yep, that what I would probably expect: one time load before the loop starts, >> right? >> Curious what exactly it would generate then if 'register' keyword is missed? >> BTW, on my box, gcc-11 with '-O3 -msse4.2 ...' I am seeing expected >> behavior without 'register' keyword. >> Is it some particular compiler version that misbehaves? > > Thank you, Konstantin, for this pointer. I have been trying this > understand this a bit more internally. Here are my observations > > 1. shuf simd ISA works on XMM register only. > 2. Any values from variables has to be loaded to `xmm` register before > processing. > 3. when compiled for `-march=native` with compiler not aware (SoC Arch > gcc weights) without patch might have generating with `movzx eax, BYTE > PTR [rbp-48]` > 4. when register keyword is applied for both shufl_mask and addr, the > compiler generates trying to get the variables directly into xmm using ` > vmovdqu (%rsi),%xmm1` > > So, I think you are right, from gcc12.3 and gcc 13.1 which supports `- > march=znver4` this problem will not come. >
Hi Konstantin, Stephen, There is no negative impact of adding 'register' keyword, right? At worst it is useless, but Vipin can demonstrate that it has benefit for some cases, so I think OK to get it.