On 9/6/2024 2:02 PM, Varghese, Vipin wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
> <snipped>
>  
>> > > >> --- a/app/test-pmd/macswap_sse.h
>> > > >> +++ b/app/test-pmd/macswap_sse.h
>> > > >> @@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t
>> nb,
>> > > >>        uint64_t ol_flags;
>> > > >>        int i;
>> > > >>        int r;
>> > > >> -     __m128i addr0, addr1, addr2, addr3;
>> > > >> +     register __m128i addr0, addr1, addr2, addr3;
>> > > > Some compilers treat register as a no-op. Are you sure? Did you check
>> with godbolt.
>> > >
>> > > Thank you Stephen, I have tested the code changes on Linux using GCC
>> > > and Clang compiler.
>> > >
>> > > In both cases in Linux environment, we have seen the the values
>> > > loaded onto register `xmm`.
>> > >
>> > > ```
>> > > registerconst__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, 5, 4,
>> > > 3, 2, 1, 0, 11, 10, 9, 8, 7, 6); vmovdqaxmm0, xmmwordptr[rip+
>> > > .LCPI0_0]
>> 
>> Yep, that what I would probably expect: one time load before the loop starts,
>> right?
>> Curious  what exactly it would generate then if 'register' keyword is missed?
>> BTW, on my box,  gcc-11  with '-O3 -msse4.2 ...'  I am seeing expected
>> behavior without 'register' keyword.
>> Is it some particular compiler version that misbehaves?
>  
> Thank you, Konstantin, for this pointer. I have been trying this
> understand this a bit more internally. Here are my observations
>  
> 1. shuf simd ISA works on XMM register only.
> 2. Any values from variables has to be loaded to `xmm` register before
> processing.
> 3. when compiled for `-march=native` with compiler not aware (SoC Arch
> gcc weights) without patch might have generating with `movzx   eax, BYTE
> PTR [rbp-48]`
> 4. when register keyword is applied for both shufl_mask and addr, the
> compiler generates trying to get the variables directly into xmm using `
> vmovdqu (%rsi),%xmm1`
>  
> So, I think you are right, from gcc12.3 and gcc 13.1 which supports `-
> march=znver4` this problem will not come.
>  

Hi Konstantin, Stephen,

There is no negative impact of adding 'register' keyword, right? At
worst it is useless, but Vipin can demonstrate that it has benefit for
some cases, so I think OK to get it.

Reply via email to