On 24/10/2025 18:03, Alex Deucher wrote:
On Fri, Oct 24, 2025 at 12:36 PM Tvrtko Ursulin
<[email protected]> wrote:
On 24/10/2025 17:11, Alex Deucher wrote:
On Wed, Oct 8, 2025 at 10:28 AM Tvrtko Ursulin
<[email protected]> wrote:
On 08/10/2025 15:08, Alex Deucher wrote:
Applied the series. Thanks!
Thank you and fingers crossed I did not fat finger anything that your CI
wouldn't find!
Sorry for the late reply. This series broke the VCN IB tests on at
least navi4x. The issue is a GPUVM page fault caused by VCN when
running the IB tests.
I don't see any obvious problems with the patches, but I haven't had a
chance to dig too much further.
Oh I see the problem.. sorry about that.
I missed the fact that after amdgpu_vcn_unified_ring_ib_header() I need
to update the pointer.
Good news is that only VCN has this pattern.
Okay to send a fix next week? Would you need a fixup or a replacement
patch in case you want to revert it in the meantime?
Fixup is fine. I'll squash it into the original VCN patch and give it
a test next week.
How did it go? I noticed amd-staging-drm-next currently has the original
buggy version.
Given other patches from the series are also missing perhaps you are
testing it in stages?
Regards,
Tvrtko
I wish something similar could be done for amdgpu_ring_write too, but
that one is waiting on Christian to, AFAIR, become idle enough to
untangle some ptr masking issues.
Regards,
Tvrtko
On Thu, Sep 11, 2025 at 7:42 AM Tvrtko Ursulin
<[email protected]> wrote:
In short, this series mostly does a lot of replacing of this pattern:
ib->ptr[ib->length_dw++] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
ib->ptr[ib->length_dw++] = lower_32_bits(pe);
ib->ptr[ib->length_dw++] = upper_32_bits(pe);
ib->ptr[ib->length_dw++] = ndw - 1;
for (; ndw > 0; ndw -= 2) {
ib->ptr[ib->length_dw++] = lower_32_bits(value);
ib->ptr[ib->length_dw++] = upper_32_bits(value);
value += incr;
}
With this one:
u32 *ptr = &ib->ptr[ib->length_dw];
*ptr++ = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
*ptr++ = lower_32_bits(pe);
*ptr++ = upper_32_bits(pe);
*ptr++ = ndw - 1;
for (; ndw > 0; ndw -= 2) {
*ptr++ = lower_32_bits(value);
*ptr++ = upper_32_bits(value);
value += incr;
}
ib->length_dw = ptr - ib->ptr;
Latter avoids register reloads and length updates on every dword written, and on
the overall makes the IB emission much more compact:
add/remove: 0/1 grow/shrink: 10/58 up/down: 260/-6598 (-6338)
Function old new delta
sdma_v7_0_ring_pad_ib 99 127 +28
sdma_v6_0_ring_pad_ib 99 127 +28
sdma_v5_2_ring_pad_ib 99 127 +28
sdma_v5_0_ring_pad_ib 99 127 +28
sdma_v4_4_2_ring_pad_ib 99 127 +28
sdma_v4_0_ring_pad_ib 99 127 +28
sdma_v3_0_ring_pad_ib 99 127 +28
sdma_v2_4_ring_pad_ib 99 127 +28
cik_sdma_ring_pad_ib 99 127 +28
si_dma_ring_pad_ib 36 44 +8
amdgpu_ring_generic_pad_ib 56 52 -4
si_dma_emit_fill_buffer 108 71 -37
si_dma_vm_write_pte 158 115 -43
amdgpu_vcn_dec_sw_send_msg 810 767 -43
si_dma_vm_copy_pte 137 87 -50
si_dma_emit_copy_buffer 134 84 -50
sdma_v3_0_vm_write_pte 163 102 -61
sdma_v2_4_vm_write_pte 163 102 -61
cik_sdma_vm_write_pte 163 102 -61
sdma_v7_0_vm_write_pte 168 105 -63
sdma_v7_0_emit_fill_buffer 119 56 -63
sdma_v6_0_vm_write_pte 168 105 -63
sdma_v6_0_emit_fill_buffer 119 56 -63
sdma_v5_2_vm_write_pte 168 105 -63
sdma_v5_2_emit_fill_buffer 119 56 -63
sdma_v5_0_vm_write_pte 168 105 -63
sdma_v5_0_emit_fill_buffer 119 56 -63
sdma_v4_4_2_vm_write_pte 168 105 -63
sdma_v4_4_2_emit_fill_buffer 119 56 -63
sdma_v4_0_vm_write_pte 168 105 -63
sdma_v4_0_emit_fill_buffer 119 56 -63
sdma_v3_0_emit_fill_buffer 116 53 -63
sdma_v2_4_emit_fill_buffer 116 53 -63
cik_sdma_emit_fill_buffer 116 53 -63
sdma_v6_0_emit_copy_buffer 169 76 -93
sdma_v5_2_emit_copy_buffer 169 76 -93
sdma_v5_0_emit_copy_buffer 169 76 -93
sdma_v4_4_2_emit_copy_buffer 169 76 -93
sdma_v4_0_emit_copy_buffer 169 76 -93
sdma_v3_0_vm_copy_pte 158 64 -94
sdma_v3_0_emit_copy_buffer 155 61 -94
sdma_v2_4_vm_copy_pte 158 64 -94
sdma_v2_4_emit_copy_buffer 155 61 -94
cik_sdma_vm_copy_pte 158 64 -94
cik_sdma_emit_copy_buffer 155 61 -94
sdma_v6_0_vm_copy_pte 163 68 -95
sdma_v5_2_vm_copy_pte 163 68 -95
sdma_v5_0_vm_copy_pte 163 68 -95
sdma_v4_4_2_vm_copy_pte 163 68 -95
sdma_v4_0_vm_copy_pte 163 68 -95
sdma_v7_0_vm_copy_pte 183 75 -108
sdma_v7_0_emit_copy_buffer 317 202 -115
si_dma_vm_set_pte_pde 338 214 -124
amdgpu_vce_get_destroy_msg 784 652 -132
sdma_v7_0_vm_set_pte_pde 218 72 -146
sdma_v6_0_vm_set_pte_pde 218 72 -146
sdma_v5_2_vm_set_pte_pde 218 72 -146
sdma_v5_0_vm_set_pte_pde 218 72 -146
sdma_v4_4_2_vm_set_pte_pde 218 72 -146
sdma_v4_0_vm_set_pte_pde 218 72 -146
sdma_v3_0_vm_set_pte_pde 215 69 -146
sdma_v2_4_vm_set_pte_pde 215 69 -146
cik_sdma_vm_set_pte_pde 215 69 -146
amdgpu_vcn_unified_ring_ib_header 172 - -172
gfx_v9_4_2_run_shader.constprop 739 532 -207
uvd_v6_0_enc_ring_test_ib 1464 1162 -302
uvd_v7_0_enc_ring_test_ib 1464 1138 -326
amdgpu_vce_ring_test_ib 1357 936 -421
amdgpu_vcn_enc_ring_test_ib 2042 1524 -518
Total: Before=9262623, After=9256285, chg -0.07%
* Notice how _pad_ib functions have grown. I think the compiler used the
opportunity to unroll the loops.
** Series was only smoke tested on the Steam Deck.
Tvrtko Ursulin (16):
drm/amdgpu: Use memset32 for IB padding
drm/amdgpu: More compact VCE IB emission
drm/amdgpu: More compact VCN IB emission
drm/amdgpu: More compact UVD 6 IB emission
drm/amdgpu: More compact UVD 7 IB emission
drm/amdgpu: More compact SI SDMA emission
drm/amdgpu: More compact CIK SDMA IB emission
drm/amdgpu: More compact GFX 9.4.2 IB emission
drm/amdgpu: More compact SDMA 2.4 IB emission
drm/amdgpu: More compact SDMA 3.0 IB emission
drm/amdgpu: More compact SDMA 4.0 IB emission
drm/amdgpu: More compact SDMA 4.4.2 IB emission
drm/amdgpu: More compact SDMA 5.0 IB emission
drm/amdgpu: More compact SDMA 5.2 IB emission
drm/amdgpu: More compact SDMA 6.0 IB emission
drm/amdgpu: More compact SDMA 7.0 IB emission
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 12 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 90 +++++++++--------
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 101 ++++++++++---------
drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 105 ++++++++++++--------
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 46 ++++-----
drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 108 ++++++++++++--------
drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 108 ++++++++++++--------
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 109 ++++++++++++---------
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 108 ++++++++++++--------
drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 106 ++++++++++++--------
drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 110 ++++++++++++---------
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 110 ++++++++++++---------
drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 119 +++++++++++++----------
drivers/gpu/drm/amd/amdgpu/si_dma.c | 84 +++++++++-------
drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 66 +++++++------
drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 66 +++++++------
16 files changed, 849 insertions(+), 599 deletions(-)
--
2.48.0