[Mesa-dev] [Bug 108949] RADV: Subgroup codegen is sub-optimal

2019-02-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108949

Connor Abbott  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from Connor Abbott  ---
Yes, the LLVM patch to re-enable the DPP combining pass landed recently:
https://github.com/llvm-mirror/llvm/commit/a0ecdf4bba1ba47b4dd8550c5a8c4a3a9183832d#diff-ad4812397731e1d4ff6992207b4d38fa

So neither of these should be issues anymore.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108949] RADV: Subgroup codegen is sub-optimal

2019-02-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108949

--- Comment #3 from Samuel Pitoiset  ---
Can this be closed now?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108949] RADV: Subgroup codegen is sub-optimal

2018-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108949

--- Comment #2 from mais...@archlinux.us ---
Interesting. No, haven't tried with an LLVM that recent. I'll post when I have
results.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108949] RADV: Subgroup codegen is sub-optimal

2018-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108949

--- Comment #1 from Connor Abbott  ---
This should be fixed by
https://github.com/llvm-mirror/llvm/commit/e3924b1c15606bb5bf98392e0c20e731b4965311
which was just committed 5 days ago. You'll need to build LLVM and Mesa master
to try it out.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108949] RADV: Subgroup codegen is sub-optimal

2018-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108949

Bug ID: 108949
   Summary: RADV: Subgroup codegen is sub-optimal
   Product: Mesa
   Version: 18.2
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Drivers/Vulkan/radeon
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: mais...@archlinux.us
QA Contact: mesa-dev@lists.freedesktop.org

I have some code using subgroups which generates suboptimal code where I expect
more use of SGPRs, but I see a lot of VGPRs/vector loads being used instead.
The code-gen is worse than AMDVLK and much worse than AMD's Windows driver as a
result. I filed a similar issue here:
https://github.com/GPUOpen-Drivers/AMDVLK/issues/68.
On a more useful, complicated test, I get 0% uplift from subgroup on RADV, 5%
on AMDVLK and 15% on Windows. GPU is RX 470 (Polaris). Mesa version is 18.2.5.

With a trivial test:
https://github.com/Themaister/Granite/blob/master/tests/assets/shaders/subgroup.comp
I expect the subgroupBroadcastFirst(subgroupOr) to trigger all scalar loads,
but I get in the loop:

BB629_1:
s_load_dwordx4 s[8:11], s[0:1], 0x0  ;
C00A0200 
s_ff1_i32_b32 s3, s2 ;
BE831002
v_mul_u32_u24_e64 v7, s3, 48 ;
D1080007 00016003
v_or_b32_e32 v5, 4, v7   ;
280A0E84
v_mad_u32_u24 v10, s3, 48, 20;
D1C3000A 02516003
v_mad_u32_u24 v8, s3, 48, 16 ;
D1C30008 02416003
s_waitcnt lgkmcnt(0) ;
BF8C007F
*   buffer_load_dwordx2 v[5:6], v5, s[8:11], 0 offen ;
E0541000 80020505
*   buffer_load_dword v10, v10, s[8:11], 0 offen ;
E0501000 80020A0A
*   buffer_load_dword v14, v7, s[8:11], 0 offen  ;
E0501000 80020E07
*   buffer_load_dword v8, v8, s[8:11], 0 offen   ;
E0501000 80020808
v_mad_u32_u24 v11, s3, 48, 24;
D1C3000B 02616003
v_or_b32_e32 v7, 12, v7  ;
280E0E8C
v_mad_u32_u24 v12, s3, 48, 28;
D1C3000C 02716003
*   buffer_load_dword v7, v7, s[8:11], 0 offen   ;
E0501000 80020707
v_mad_u32_u24 v9, s3, 48, 32 ;
D1C30009 02816003
buffer_load_dword v11, v11, s[8:11], 0 offen ;
E0501000 80020B0B
v_mad_u32_u24 v13, s3, 48, 36;
D1C3000D 02916003
*   buffer_load_dword v12, v12, s[8:11], 0 offen ;
E0501000 80020C0C
*   buffer_load_dword v9, v9, s[8:11], 0 offen   ;
E0501000 80020909
...

where Windows codegen is:

label_0028:
  s_cmp_eq_i32  s0, 0   // 00A0:
BF008000
  s_cbranch_scc1  label_0052// 00A4:
BF850028
  s_and_b32 s1, s3, 0x  // 00A8:
8601FF03 
  s_ff1_i32_b32  s4, s0 // 00B0:
BE841000
  s_andn2_b32   s1, s1, 0x3fff  // 00B4:
8901FF01 3FFF
  s_mul_i32 s5, s4, 48  // 00BC:
9205B004
  s_mov_b32 s12, s2 // 00C0:
BE8C0002
  s_mov_b32 s13, s1 // 00C4:
BE8D0001
  s_movk_i32s14, 0x // 00C8:
B00E
  s_mov_b32 s15, 0x00024fac // 00CC:
BE8F00FF 00024FAC
  s_buffer_load_dwordx8  s[16:23], s[12:15], s5 // 00D4:
C02C0406 0005
  s_add_u32 s1, s5, 32  // 00DC:
8001A005
  s_buffer_load_dwordx4  s[12:15], s[12:15], s1 // 00E0:
C0280306 0001
  s_lshl_b32s1, 1, s4   // 00E8:
8E010481
  s_xor_b32 s0, s0, s1  // 00EC:
88000100
  s_waitcnt vmcnt(0) & lgkmcnt(0)   // 00F0:
BF8C0070
...

The subgroupOr is implemented strangely, getting similar code as AMDVLK, i.e.
this:

v_mov_b32_dpp v7, v7  quad_perm:[1,0,3,2] row_mask:0xf bank_mask:0xf ;
7E0E02FA FF00B107
v_or_b32_e32 v5, v5, v7  ;
280A0F05
v_mov_b32_e32 v7, v5 ;
7E0E0305
s_nop 1  ;
BF81
v_mov_b32_dpp v7, v7  quad_perm