On Fri, Oct 31, 2025 at 4:44 AM Tvrtko Ursulin <[email protected]> wrote: > > > On 24/10/2025 18:03, Alex Deucher wrote: > > On Fri, Oct 24, 2025 at 12:36 PM Tvrtko Ursulin > > <[email protected]> wrote: > >> > >> > >> On 24/10/2025 17:11, Alex Deucher wrote: > >>> On Wed, Oct 8, 2025 at 10:28 AM Tvrtko Ursulin > >>> <[email protected]> wrote: > >>>> > >>>> > >>>> On 08/10/2025 15:08, Alex Deucher wrote: > >>>>> Applied the series. Thanks! > >>>> > >>>> Thank you and fingers crossed I did not fat finger anything that your CI > >>>> wouldn't find! > >>> > >>> Sorry for the late reply. This series broke the VCN IB tests on at > >>> least navi4x. The issue is a GPUVM page fault caused by VCN when > >>> running the IB tests. > >>> I don't see any obvious problems with the patches, but I haven't had a > >>> chance to dig too much further. > >> > >> Oh I see the problem.. sorry about that. > >> > >> I missed the fact that after amdgpu_vcn_unified_ring_ib_header() I need > >> to update the pointer. > >> > >> Good news is that only VCN has this pattern. > >> > >> Okay to send a fix next week? Would you need a fixup or a replacement > >> patch in case you want to revert it in the meantime? > > > > Fixup is fine. I'll squash it into the original VCN patch and give it > > a test next week. > > How did it go? I noticed amd-staging-drm-next currently has the original > buggy version.
Those were reverted. > > Given other patches from the series are also missing perhaps you are > testing it in stages? I was running it through CI again and noticed that polaris seems to regress with these changes, so likely related to the VCE or UVD v6 changes. Alex > > Regards, > > Tvrtko > > >>>> I wish something similar could be done for amdgpu_ring_write too, but > >>>> that one is waiting on Christian to, AFAIR, become idle enough to > >>>> untangle some ptr masking issues. > >>>> > >>>> Regards, > >>>> > >>>> Tvrtko > >>>> > >>>>> On Thu, Sep 11, 2025 at 7:42 AM Tvrtko Ursulin > >>>>> <[email protected]> wrote: > >>>>>> > >>>>>> In short, this series mostly does a lot of replacing of this pattern: > >>>>>> > >>>>>> ib->ptr[ib->length_dw++] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) > >>>>>> | > >>>>>> SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR); > >>>>>> ib->ptr[ib->length_dw++] = lower_32_bits(pe); > >>>>>> ib->ptr[ib->length_dw++] = upper_32_bits(pe); > >>>>>> ib->ptr[ib->length_dw++] = ndw - 1; > >>>>>> for (; ndw > 0; ndw -= 2) { > >>>>>> ib->ptr[ib->length_dw++] = lower_32_bits(value); > >>>>>> ib->ptr[ib->length_dw++] = upper_32_bits(value); > >>>>>> value += incr; > >>>>>> } > >>>>>> > >>>>>> With this one: > >>>>>> > >>>>>> u32 *ptr = &ib->ptr[ib->length_dw]; > >>>>>> > >>>>>> *ptr++ = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) | > >>>>>> SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR); > >>>>>> *ptr++ = lower_32_bits(pe); > >>>>>> *ptr++ = upper_32_bits(pe); > >>>>>> *ptr++ = ndw - 1; > >>>>>> for (; ndw > 0; ndw -= 2) { > >>>>>> *ptr++ = lower_32_bits(value); > >>>>>> *ptr++ = upper_32_bits(value); > >>>>>> value += incr; > >>>>>> } > >>>>>> > >>>>>> ib->length_dw = ptr - ib->ptr; > >>>>>> > >>>>>> Latter avoids register reloads and length updates on every dword > >>>>>> written, and on > >>>>>> the overall makes the IB emission much more compact: > >>>>>> > >>>>>> add/remove: 0/1 grow/shrink: 10/58 up/down: 260/-6598 (-6338) > >>>>>> Function old new delta > >>>>>> sdma_v7_0_ring_pad_ib 99 127 +28 > >>>>>> sdma_v6_0_ring_pad_ib 99 127 +28 > >>>>>> sdma_v5_2_ring_pad_ib 99 127 +28 > >>>>>> sdma_v5_0_ring_pad_ib 99 127 +28 > >>>>>> sdma_v4_4_2_ring_pad_ib 99 127 +28 > >>>>>> sdma_v4_0_ring_pad_ib 99 127 +28 > >>>>>> sdma_v3_0_ring_pad_ib 99 127 +28 > >>>>>> sdma_v2_4_ring_pad_ib 99 127 +28 > >>>>>> cik_sdma_ring_pad_ib 99 127 +28 > >>>>>> si_dma_ring_pad_ib 36 44 +8 > >>>>>> amdgpu_ring_generic_pad_ib 56 52 -4 > >>>>>> si_dma_emit_fill_buffer 108 71 -37 > >>>>>> si_dma_vm_write_pte 158 115 -43 > >>>>>> amdgpu_vcn_dec_sw_send_msg 810 767 -43 > >>>>>> si_dma_vm_copy_pte 137 87 -50 > >>>>>> si_dma_emit_copy_buffer 134 84 -50 > >>>>>> sdma_v3_0_vm_write_pte 163 102 -61 > >>>>>> sdma_v2_4_vm_write_pte 163 102 -61 > >>>>>> cik_sdma_vm_write_pte 163 102 -61 > >>>>>> sdma_v7_0_vm_write_pte 168 105 -63 > >>>>>> sdma_v7_0_emit_fill_buffer 119 56 -63 > >>>>>> sdma_v6_0_vm_write_pte 168 105 -63 > >>>>>> sdma_v6_0_emit_fill_buffer 119 56 -63 > >>>>>> sdma_v5_2_vm_write_pte 168 105 -63 > >>>>>> sdma_v5_2_emit_fill_buffer 119 56 -63 > >>>>>> sdma_v5_0_vm_write_pte 168 105 -63 > >>>>>> sdma_v5_0_emit_fill_buffer 119 56 -63 > >>>>>> sdma_v4_4_2_vm_write_pte 168 105 -63 > >>>>>> sdma_v4_4_2_emit_fill_buffer 119 56 -63 > >>>>>> sdma_v4_0_vm_write_pte 168 105 -63 > >>>>>> sdma_v4_0_emit_fill_buffer 119 56 -63 > >>>>>> sdma_v3_0_emit_fill_buffer 116 53 -63 > >>>>>> sdma_v2_4_emit_fill_buffer 116 53 -63 > >>>>>> cik_sdma_emit_fill_buffer 116 53 -63 > >>>>>> sdma_v6_0_emit_copy_buffer 169 76 -93 > >>>>>> sdma_v5_2_emit_copy_buffer 169 76 -93 > >>>>>> sdma_v5_0_emit_copy_buffer 169 76 -93 > >>>>>> sdma_v4_4_2_emit_copy_buffer 169 76 -93 > >>>>>> sdma_v4_0_emit_copy_buffer 169 76 -93 > >>>>>> sdma_v3_0_vm_copy_pte 158 64 -94 > >>>>>> sdma_v3_0_emit_copy_buffer 155 61 -94 > >>>>>> sdma_v2_4_vm_copy_pte 158 64 -94 > >>>>>> sdma_v2_4_emit_copy_buffer 155 61 -94 > >>>>>> cik_sdma_vm_copy_pte 158 64 -94 > >>>>>> cik_sdma_emit_copy_buffer 155 61 -94 > >>>>>> sdma_v6_0_vm_copy_pte 163 68 -95 > >>>>>> sdma_v5_2_vm_copy_pte 163 68 -95 > >>>>>> sdma_v5_0_vm_copy_pte 163 68 -95 > >>>>>> sdma_v4_4_2_vm_copy_pte 163 68 -95 > >>>>>> sdma_v4_0_vm_copy_pte 163 68 -95 > >>>>>> sdma_v7_0_vm_copy_pte 183 75 -108 > >>>>>> sdma_v7_0_emit_copy_buffer 317 202 -115 > >>>>>> si_dma_vm_set_pte_pde 338 214 -124 > >>>>>> amdgpu_vce_get_destroy_msg 784 652 -132 > >>>>>> sdma_v7_0_vm_set_pte_pde 218 72 -146 > >>>>>> sdma_v6_0_vm_set_pte_pde 218 72 -146 > >>>>>> sdma_v5_2_vm_set_pte_pde 218 72 -146 > >>>>>> sdma_v5_0_vm_set_pte_pde 218 72 -146 > >>>>>> sdma_v4_4_2_vm_set_pte_pde 218 72 -146 > >>>>>> sdma_v4_0_vm_set_pte_pde 218 72 -146 > >>>>>> sdma_v3_0_vm_set_pte_pde 215 69 -146 > >>>>>> sdma_v2_4_vm_set_pte_pde 215 69 -146 > >>>>>> cik_sdma_vm_set_pte_pde 215 69 -146 > >>>>>> amdgpu_vcn_unified_ring_ib_header 172 - -172 > >>>>>> gfx_v9_4_2_run_shader.constprop 739 532 -207 > >>>>>> uvd_v6_0_enc_ring_test_ib 1464 1162 -302 > >>>>>> uvd_v7_0_enc_ring_test_ib 1464 1138 -326 > >>>>>> amdgpu_vce_ring_test_ib 1357 936 -421 > >>>>>> amdgpu_vcn_enc_ring_test_ib 2042 1524 -518 > >>>>>> Total: Before=9262623, After=9256285, chg -0.07% > >>>>>> > >>>>>> * Notice how _pad_ib functions have grown. I think the compiler used > >>>>>> the > >>>>>> opportunity to unroll the loops. > >>>>>> > >>>>>> ** Series was only smoke tested on the Steam Deck. > >>>>>> > >>>>>> Tvrtko Ursulin (16): > >>>>>> drm/amdgpu: Use memset32 for IB padding > >>>>>> drm/amdgpu: More compact VCE IB emission > >>>>>> drm/amdgpu: More compact VCN IB emission > >>>>>> drm/amdgpu: More compact UVD 6 IB emission > >>>>>> drm/amdgpu: More compact UVD 7 IB emission > >>>>>> drm/amdgpu: More compact SI SDMA emission > >>>>>> drm/amdgpu: More compact CIK SDMA IB emission > >>>>>> drm/amdgpu: More compact GFX 9.4.2 IB emission > >>>>>> drm/amdgpu: More compact SDMA 2.4 IB emission > >>>>>> drm/amdgpu: More compact SDMA 3.0 IB emission > >>>>>> drm/amdgpu: More compact SDMA 4.0 IB emission > >>>>>> drm/amdgpu: More compact SDMA 4.4.2 IB emission > >>>>>> drm/amdgpu: More compact SDMA 5.0 IB emission > >>>>>> drm/amdgpu: More compact SDMA 5.2 IB emission > >>>>>> drm/amdgpu: More compact SDMA 6.0 IB emission > >>>>>> drm/amdgpu: More compact SDMA 7.0 IB emission > >>>>>> > >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 12 ++- > >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 90 +++++++++-------- > >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 101 ++++++++++--------- > >>>>>> drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 105 ++++++++++++-------- > >>>>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 46 ++++----- > >>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 108 ++++++++++++-------- > >>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 108 ++++++++++++-------- > >>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 109 > >>>>>> ++++++++++++--------- > >>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 108 ++++++++++++-------- > >>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 106 ++++++++++++-------- > >>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 110 > >>>>>> ++++++++++++--------- > >>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 110 > >>>>>> ++++++++++++--------- > >>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 119 > >>>>>> +++++++++++++---------- > >>>>>> drivers/gpu/drm/amd/amdgpu/si_dma.c | 84 +++++++++------- > >>>>>> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 66 +++++++------ > >>>>>> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 66 +++++++------ > >>>>>> 16 files changed, 849 insertions(+), 599 deletions(-) > >>>>>> > >>>>>> -- > >>>>>> 2.48.0 > >>>>>> > >>>> > >> >
