On 28/11/2022 07:40, Tobias Burnus wrote:
It turned out that cprop cleverly propagated the unspec_volatile
to the preceding (pseudo)register, permitting to remove the
'set (s0) (pseudoregister)' at -O2.  Unfortunately, it does
matter whether the assignment is done to 's2' (previously: pseudoregister)
or to s1. – Just having a hard register is not enough ...

Solution: Use USE (alias gen_rtx_USE) instead.

Additionally, I removed the s0 modification (that should lead to the unchanged result) by adding 'gcn_operand_part (DImode, reg, 1)' and then working with SImode. Result:

   if (__builtin_gcn_first_call_this_thread_p())
     x = 42;

becomes now (with -O2) the following; the builtin code is up to to (and including)
'.L2', the rest is the 'if' and 'x=42':

         s_lshr_b32      s2, s1, 16
         s_cmpk_lg_u32   s2, 12345
         s_mov_b32       s12, scc
         s_mov_b32       vcc_lo, scc
         s_mov_b32       vcc_hi, 0
         s_cbranch_vccz  .L2
         s_and_b32       s2, s1, 65535   (= 0xFFFF)
         s_or_b32        s1, s2, 809041920 (= 0x30390000 = (12345 << 16))
.L2:
         s_getpc_b64     s[2:3]
         s_add_u32       s2, s2, x@rel32@lo+4
         s_addc_u32      s3, s3, x@rel32@hi+4
         s_mov_b32       vcc_lo, s12
         s_mov_b32       vcc_hi, 0
         s_cbranch_vccz  .L3
         s_mov_b32       s12, 42
         v_writelane_b32 v0, s12, 0
         s_mov_b64       exec, 1
         global_store_dword      v1, v0, s[2:3]
.L3:


OK for mainline?

OK.

Andrew

Reply via email to