On 9/6/2024 2:02 PM, Varghese, Vipin wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> <snipped>
>> > > >> --- a/app/test-pmd/macswap_sse.h
>> > > >> +++ b/app/test-pmd/macswap_sse.h
>> > > >> @@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t
>> nb,
>> > > >>        uint64_t ol_flags;
>> > > >>        int i;
>> > > >>        int r;
>> > > >> -     __m128i addr0, addr1, addr2, addr3;
>> > > >> +     register __m128i addr0, addr1, addr2, addr3;
>> > > > Some compilers treat register as a no-op. Are you sure? Did you check
>> with godbolt.
>> > >
>> > > Thank you Stephen, I have tested the code changes on Linux using GCC
>> > > and Clang compiler.
>> > >
>> > > In both cases in Linux environment, we have seen the the values
>> > > loaded onto register `xmm`.
>> > >
>> > > ```
>> > > registerconst__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, 5, 4,
>> > > 3, 2, 1, 0, 11, 10, 9, 8, 7, 6); vmovdqaxmm0, xmmwordptr[rip+
>> > > .LCPI0_0]
>> Yep, that what I would probably expect: one time load before the loop starts,
>> right?
>> Curious  what exactly it would generate then if 'register' keyword is missed?
>> BTW, on my box,  gcc-11  with '-O3 -msse4.2 ...'  I am seeing expected
>> behavior without 'register' keyword.
>> Is it some particular compiler version that misbehaves?
> Thank you, Konstantin, for this pointer. I have been trying this
> understand this a bit more internally. Here are my observations
> 1. shuf simd ISA works on XMM register only.
> 2. Any values from variables has to be loaded to `xmm` register before
> processing.
> 3. when compiled for `-march=native` with compiler not aware (SoC Arch
> gcc weights) without patch might have generating with `movzx   eax, BYTE
> PTR [rbp-48]`
> 4. when register keyword is applied for both shufl_mask and addr, the
> compiler generates trying to get the variables directly into xmm using `
> vmovdqu (%rsi),%xmm1`
> So, I think you are right, from gcc12.3 and gcc 13.1 which supports `-
> march=znver4` this problem will not come.

Hi Konstantin, Stephen,

There is no negative impact of adding 'register' keyword, right? At
worst it is useless, but Vipin can demonstrate that it has benefit for
some cases, so I think OK to get it.

Reply via email to