On Fri, Oct 24, 2025 at 12:36 PM Tvrtko Ursulin <[email protected]> wrote: > > > On 24/10/2025 17:11, Alex Deucher wrote: > > On Wed, Oct 8, 2025 at 10:28 AM Tvrtko Ursulin > > <[email protected]> wrote: > >> > >> > >> On 08/10/2025 15:08, Alex Deucher wrote: > >>> Applied the series. Thanks! > >> > >> Thank you and fingers crossed I did not fat finger anything that your CI > >> wouldn't find! > > > > Sorry for the late reply. This series broke the VCN IB tests on at > > least navi4x. The issue is a GPUVM page fault caused by VCN when > > running the IB tests. > > I don't see any obvious problems with the patches, but I haven't had a > > chance to dig too much further. > > Oh I see the problem.. sorry about that. > > I missed the fact that after amdgpu_vcn_unified_ring_ib_header() I need > to update the pointer. > > Good news is that only VCN has this pattern. > > Okay to send a fix next week? Would you need a fixup or a replacement > patch in case you want to revert it in the meantime?
Fixup is fine. I'll squash it into the original VCN patch and give it a test next week. Thanks! Alex > > Regards, > > Tvrtko > > >> I wish something similar could be done for amdgpu_ring_write too, but > >> that one is waiting on Christian to, AFAIR, become idle enough to > >> untangle some ptr masking issues. > >> > >> Regards, > >> > >> Tvrtko > >> > >>> On Thu, Sep 11, 2025 at 7:42 AM Tvrtko Ursulin > >>> <[email protected]> wrote: > >>>> > >>>> In short, this series mostly does a lot of replacing of this pattern: > >>>> > >>>> ib->ptr[ib->length_dw++] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) | > >>>> SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR); > >>>> ib->ptr[ib->length_dw++] = lower_32_bits(pe); > >>>> ib->ptr[ib->length_dw++] = upper_32_bits(pe); > >>>> ib->ptr[ib->length_dw++] = ndw - 1; > >>>> for (; ndw > 0; ndw -= 2) { > >>>> ib->ptr[ib->length_dw++] = lower_32_bits(value); > >>>> ib->ptr[ib->length_dw++] = upper_32_bits(value); > >>>> value += incr; > >>>> } > >>>> > >>>> With this one: > >>>> > >>>> u32 *ptr = &ib->ptr[ib->length_dw]; > >>>> > >>>> *ptr++ = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) | > >>>> SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR); > >>>> *ptr++ = lower_32_bits(pe); > >>>> *ptr++ = upper_32_bits(pe); > >>>> *ptr++ = ndw - 1; > >>>> for (; ndw > 0; ndw -= 2) { > >>>> *ptr++ = lower_32_bits(value); > >>>> *ptr++ = upper_32_bits(value); > >>>> value += incr; > >>>> } > >>>> > >>>> ib->length_dw = ptr - ib->ptr; > >>>> > >>>> Latter avoids register reloads and length updates on every dword > >>>> written, and on > >>>> the overall makes the IB emission much more compact: > >>>> > >>>> add/remove: 0/1 grow/shrink: 10/58 up/down: 260/-6598 (-6338) > >>>> Function old new delta > >>>> sdma_v7_0_ring_pad_ib 99 127 +28 > >>>> sdma_v6_0_ring_pad_ib 99 127 +28 > >>>> sdma_v5_2_ring_pad_ib 99 127 +28 > >>>> sdma_v5_0_ring_pad_ib 99 127 +28 > >>>> sdma_v4_4_2_ring_pad_ib 99 127 +28 > >>>> sdma_v4_0_ring_pad_ib 99 127 +28 > >>>> sdma_v3_0_ring_pad_ib 99 127 +28 > >>>> sdma_v2_4_ring_pad_ib 99 127 +28 > >>>> cik_sdma_ring_pad_ib 99 127 +28 > >>>> si_dma_ring_pad_ib 36 44 +8 > >>>> amdgpu_ring_generic_pad_ib 56 52 -4 > >>>> si_dma_emit_fill_buffer 108 71 -37 > >>>> si_dma_vm_write_pte 158 115 -43 > >>>> amdgpu_vcn_dec_sw_send_msg 810 767 -43 > >>>> si_dma_vm_copy_pte 137 87 -50 > >>>> si_dma_emit_copy_buffer 134 84 -50 > >>>> sdma_v3_0_vm_write_pte 163 102 -61 > >>>> sdma_v2_4_vm_write_pte 163 102 -61 > >>>> cik_sdma_vm_write_pte 163 102 -61 > >>>> sdma_v7_0_vm_write_pte 168 105 -63 > >>>> sdma_v7_0_emit_fill_buffer 119 56 -63 > >>>> sdma_v6_0_vm_write_pte 168 105 -63 > >>>> sdma_v6_0_emit_fill_buffer 119 56 -63 > >>>> sdma_v5_2_vm_write_pte 168 105 -63 > >>>> sdma_v5_2_emit_fill_buffer 119 56 -63 > >>>> sdma_v5_0_vm_write_pte 168 105 -63 > >>>> sdma_v5_0_emit_fill_buffer 119 56 -63 > >>>> sdma_v4_4_2_vm_write_pte 168 105 -63 > >>>> sdma_v4_4_2_emit_fill_buffer 119 56 -63 > >>>> sdma_v4_0_vm_write_pte 168 105 -63 > >>>> sdma_v4_0_emit_fill_buffer 119 56 -63 > >>>> sdma_v3_0_emit_fill_buffer 116 53 -63 > >>>> sdma_v2_4_emit_fill_buffer 116 53 -63 > >>>> cik_sdma_emit_fill_buffer 116 53 -63 > >>>> sdma_v6_0_emit_copy_buffer 169 76 -93 > >>>> sdma_v5_2_emit_copy_buffer 169 76 -93 > >>>> sdma_v5_0_emit_copy_buffer 169 76 -93 > >>>> sdma_v4_4_2_emit_copy_buffer 169 76 -93 > >>>> sdma_v4_0_emit_copy_buffer 169 76 -93 > >>>> sdma_v3_0_vm_copy_pte 158 64 -94 > >>>> sdma_v3_0_emit_copy_buffer 155 61 -94 > >>>> sdma_v2_4_vm_copy_pte 158 64 -94 > >>>> sdma_v2_4_emit_copy_buffer 155 61 -94 > >>>> cik_sdma_vm_copy_pte 158 64 -94 > >>>> cik_sdma_emit_copy_buffer 155 61 -94 > >>>> sdma_v6_0_vm_copy_pte 163 68 -95 > >>>> sdma_v5_2_vm_copy_pte 163 68 -95 > >>>> sdma_v5_0_vm_copy_pte 163 68 -95 > >>>> sdma_v4_4_2_vm_copy_pte 163 68 -95 > >>>> sdma_v4_0_vm_copy_pte 163 68 -95 > >>>> sdma_v7_0_vm_copy_pte 183 75 -108 > >>>> sdma_v7_0_emit_copy_buffer 317 202 -115 > >>>> si_dma_vm_set_pte_pde 338 214 -124 > >>>> amdgpu_vce_get_destroy_msg 784 652 -132 > >>>> sdma_v7_0_vm_set_pte_pde 218 72 -146 > >>>> sdma_v6_0_vm_set_pte_pde 218 72 -146 > >>>> sdma_v5_2_vm_set_pte_pde 218 72 -146 > >>>> sdma_v5_0_vm_set_pte_pde 218 72 -146 > >>>> sdma_v4_4_2_vm_set_pte_pde 218 72 -146 > >>>> sdma_v4_0_vm_set_pte_pde 218 72 -146 > >>>> sdma_v3_0_vm_set_pte_pde 215 69 -146 > >>>> sdma_v2_4_vm_set_pte_pde 215 69 -146 > >>>> cik_sdma_vm_set_pte_pde 215 69 -146 > >>>> amdgpu_vcn_unified_ring_ib_header 172 - -172 > >>>> gfx_v9_4_2_run_shader.constprop 739 532 -207 > >>>> uvd_v6_0_enc_ring_test_ib 1464 1162 -302 > >>>> uvd_v7_0_enc_ring_test_ib 1464 1138 -326 > >>>> amdgpu_vce_ring_test_ib 1357 936 -421 > >>>> amdgpu_vcn_enc_ring_test_ib 2042 1524 -518 > >>>> Total: Before=9262623, After=9256285, chg -0.07% > >>>> > >>>> * Notice how _pad_ib functions have grown. I think the compiler used the > >>>> opportunity to unroll the loops. > >>>> > >>>> ** Series was only smoke tested on the Steam Deck. > >>>> > >>>> Tvrtko Ursulin (16): > >>>> drm/amdgpu: Use memset32 for IB padding > >>>> drm/amdgpu: More compact VCE IB emission > >>>> drm/amdgpu: More compact VCN IB emission > >>>> drm/amdgpu: More compact UVD 6 IB emission > >>>> drm/amdgpu: More compact UVD 7 IB emission > >>>> drm/amdgpu: More compact SI SDMA emission > >>>> drm/amdgpu: More compact CIK SDMA IB emission > >>>> drm/amdgpu: More compact GFX 9.4.2 IB emission > >>>> drm/amdgpu: More compact SDMA 2.4 IB emission > >>>> drm/amdgpu: More compact SDMA 3.0 IB emission > >>>> drm/amdgpu: More compact SDMA 4.0 IB emission > >>>> drm/amdgpu: More compact SDMA 4.4.2 IB emission > >>>> drm/amdgpu: More compact SDMA 5.0 IB emission > >>>> drm/amdgpu: More compact SDMA 5.2 IB emission > >>>> drm/amdgpu: More compact SDMA 6.0 IB emission > >>>> drm/amdgpu: More compact SDMA 7.0 IB emission > >>>> > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 12 ++- > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 90 +++++++++-------- > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 101 ++++++++++--------- > >>>> drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 105 ++++++++++++-------- > >>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 46 ++++----- > >>>> drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 108 ++++++++++++-------- > >>>> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 108 ++++++++++++-------- > >>>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 109 ++++++++++++--------- > >>>> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 108 ++++++++++++-------- > >>>> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 106 ++++++++++++-------- > >>>> drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 110 ++++++++++++--------- > >>>> drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 110 ++++++++++++--------- > >>>> drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 119 +++++++++++++---------- > >>>> drivers/gpu/drm/amd/amdgpu/si_dma.c | 84 +++++++++------- > >>>> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 66 +++++++------ > >>>> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 66 +++++++------ > >>>> 16 files changed, 849 insertions(+), 599 deletions(-) > >>>> > >>>> -- > >>>> 2.48.0 > >>>> > >> >
