[AMD Official Use Only - AMD Internal Distribution Only]
<snipped> > > > >> --- a/app/test-pmd/macswap_sse.h > > > >> +++ b/app/test-pmd/macswap_sse.h > > > >> @@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t > nb, > > > >> uint64_t ol_flags; > > > >> int i; > > > >> int r; > > > >> - __m128i addr0, addr1, addr2, addr3; > > > >> + register __m128i addr0, addr1, addr2, addr3; > > > > Some compilers treat register as a no-op. Are you sure? Did you check > with godbolt. > > > > > > Thank you Stephen, I have tested the code changes on Linux using GCC > > > and Clang compiler. > > > > > > In both cases in Linux environment, we have seen the the values > > > loaded onto register `xmm`. > > > > > > ``` > > > registerconst__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, 5, 4, > > > 3, 2, 1, 0, 11, 10, 9, 8, 7, 6); vmovdqaxmm0, xmmwordptr[rip+ > > > .LCPI0_0] > > Yep, that what I would probably expect: one time load before the loop starts, > right? > Curious what exactly it would generate then if 'register' keyword is missed? > BTW, on my box, gcc-11 with '-O3 -msse4.2 ...' I am seeing expected > behavior without 'register' keyword. > Is it some particular compiler version that misbehaves? Thank you, Konstantin, for this pointer. I have been trying this understand this a bit more internally. Here are my observations 1. shuf simd ISA works on XMM register only. 2. Any values from variables has to be loaded to `xmm` register before processing. 3. when compiled for `-march=native` with compiler not aware (SoC Arch gcc weights) without patch might have generating with ` movzx eax, BYTE PTR [rbp-48]` 4. when register keyword is applied for both shufl_mask and addr, the compiler generates trying to get the variables directly into xmm using ` vmovdqu (%rsi),%xmm1` So, I think you are right, from gcc12.3 and gcc 13.1 which supports `-march=znver4` this problem will not come. > > > > > > > ``` > > > > > > Both cases we have performance improvement. > > > > > > > > > Can you please help us understand if we have missed out something? > > > > Ok, not sure why compiler would not decide to already use a register here?