On Fri, Oct 31, 2025 at 10:41 AM Tvrtko Ursulin <[email protected]> wrote: > > > On 31/10/2025 14:21, Alex Deucher wrote: > > On Fri, Oct 31, 2025 at 4:44 AM Tvrtko Ursulin > > <[email protected]> wrote: > >> > >> > >> On 24/10/2025 18:03, Alex Deucher wrote: > >>> On Fri, Oct 24, 2025 at 12:36 PM Tvrtko Ursulin > >>> <[email protected]> wrote: > >>>> > >>>> > >>>> On 24/10/2025 17:11, Alex Deucher wrote: > >>>>> On Wed, Oct 8, 2025 at 10:28 AM Tvrtko Ursulin > >>>>> <[email protected]> wrote: > >>>>>> > >>>>>> > >>>>>> On 08/10/2025 15:08, Alex Deucher wrote: > >>>>>>> Applied the series. Thanks! > >>>>>> > >>>>>> Thank you and fingers crossed I did not fat finger anything that your > >>>>>> CI > >>>>>> wouldn't find! > >>>>> > >>>>> Sorry for the late reply. This series broke the VCN IB tests on at > >>>>> least navi4x. The issue is a GPUVM page fault caused by VCN when > >>>>> running the IB tests. > >>>>> I don't see any obvious problems with the patches, but I haven't had a > >>>>> chance to dig too much further. > >>>> > >>>> Oh I see the problem.. sorry about that. > >>>> > >>>> I missed the fact that after amdgpu_vcn_unified_ring_ib_header() I need > >>>> to update the pointer. > >>>> > >>>> Good news is that only VCN has this pattern. > >>>> > >>>> Okay to send a fix next week? Would you need a fixup or a replacement > >>>> patch in case you want to revert it in the meantime? > >>> > >>> Fixup is fine. I'll squash it into the original VCN patch and give it > >>> a test next week. > >> > >> How did it go? I noticed amd-staging-drm-next currently has the original > >> buggy version. > > > > Those were reverted. > > > >> > >> Given other patches from the series are also missing perhaps you are > >> testing it in stages? > > > > I was running it through CI again and noticed that polaris seems to > > regress with these changes, so likely related to the VCE or UVD v6 > > changes. > > Not sure I follow and sorry if there is still some confusion. > > I did not spot anything obviously wrong in VCE and UVD v6. > > But currently a broken version of "drm/amdgpu: More compact VCN IB > emission" is in the staging branch. It needs to be squashed with > "drm/amdgpu: Fixup VCN IB emission".
It's already been reverted: https://gitlab.freedesktop.org/agd5f/linux/-/commit/bdb17ea8a391885c884d4e8dfb27a1e22d3d03ce I was planning to land the whole series along with the VCN fix squashed into the original, but the CI for polaris keeps failing with this set. I haven't had time to dig further yet. Alex > > Regards, > > Tvrtko > > > > Alex > > > >> > >> Regards, > >> > >> Tvrtko > >> > >>>>>> I wish something similar could be done for amdgpu_ring_write too, but > >>>>>> that one is waiting on Christian to, AFAIR, become idle enough to > >>>>>> untangle some ptr masking issues. > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Tvrtko > >>>>>> > >>>>>>> On Thu, Sep 11, 2025 at 7:42 AM Tvrtko Ursulin > >>>>>>> <[email protected]> wrote: > >>>>>>>> > >>>>>>>> In short, this series mostly does a lot of replacing of this pattern: > >>>>>>>> > >>>>>>>> ib->ptr[ib->length_dw++] = > >>>>>>>> SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) | > >>>>>>>> SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR); > >>>>>>>> ib->ptr[ib->length_dw++] = lower_32_bits(pe); > >>>>>>>> ib->ptr[ib->length_dw++] = upper_32_bits(pe); > >>>>>>>> ib->ptr[ib->length_dw++] = ndw - 1; > >>>>>>>> for (; ndw > 0; ndw -= 2) { > >>>>>>>> ib->ptr[ib->length_dw++] = lower_32_bits(value); > >>>>>>>> ib->ptr[ib->length_dw++] = upper_32_bits(value); > >>>>>>>> value += incr; > >>>>>>>> } > >>>>>>>> > >>>>>>>> With this one: > >>>>>>>> > >>>>>>>> u32 *ptr = &ib->ptr[ib->length_dw]; > >>>>>>>> > >>>>>>>> *ptr++ = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) | > >>>>>>>> SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR); > >>>>>>>> *ptr++ = lower_32_bits(pe); > >>>>>>>> *ptr++ = upper_32_bits(pe); > >>>>>>>> *ptr++ = ndw - 1; > >>>>>>>> for (; ndw > 0; ndw -= 2) { > >>>>>>>> *ptr++ = lower_32_bits(value); > >>>>>>>> *ptr++ = upper_32_bits(value); > >>>>>>>> value += incr; > >>>>>>>> } > >>>>>>>> > >>>>>>>> ib->length_dw = ptr - ib->ptr; > >>>>>>>> > >>>>>>>> Latter avoids register reloads and length updates on every dword > >>>>>>>> written, and on > >>>>>>>> the overall makes the IB emission much more compact: > >>>>>>>> > >>>>>>>> add/remove: 0/1 grow/shrink: 10/58 up/down: 260/-6598 (-6338) > >>>>>>>> Function old new delta > >>>>>>>> sdma_v7_0_ring_pad_ib 99 127 +28 > >>>>>>>> sdma_v6_0_ring_pad_ib 99 127 +28 > >>>>>>>> sdma_v5_2_ring_pad_ib 99 127 +28 > >>>>>>>> sdma_v5_0_ring_pad_ib 99 127 +28 > >>>>>>>> sdma_v4_4_2_ring_pad_ib 99 127 +28 > >>>>>>>> sdma_v4_0_ring_pad_ib 99 127 +28 > >>>>>>>> sdma_v3_0_ring_pad_ib 99 127 +28 > >>>>>>>> sdma_v2_4_ring_pad_ib 99 127 +28 > >>>>>>>> cik_sdma_ring_pad_ib 99 127 +28 > >>>>>>>> si_dma_ring_pad_ib 36 44 +8 > >>>>>>>> amdgpu_ring_generic_pad_ib 56 52 -4 > >>>>>>>> si_dma_emit_fill_buffer 108 71 -37 > >>>>>>>> si_dma_vm_write_pte 158 115 -43 > >>>>>>>> amdgpu_vcn_dec_sw_send_msg 810 767 -43 > >>>>>>>> si_dma_vm_copy_pte 137 87 -50 > >>>>>>>> si_dma_emit_copy_buffer 134 84 -50 > >>>>>>>> sdma_v3_0_vm_write_pte 163 102 -61 > >>>>>>>> sdma_v2_4_vm_write_pte 163 102 -61 > >>>>>>>> cik_sdma_vm_write_pte 163 102 -61 > >>>>>>>> sdma_v7_0_vm_write_pte 168 105 -63 > >>>>>>>> sdma_v7_0_emit_fill_buffer 119 56 -63 > >>>>>>>> sdma_v6_0_vm_write_pte 168 105 -63 > >>>>>>>> sdma_v6_0_emit_fill_buffer 119 56 -63 > >>>>>>>> sdma_v5_2_vm_write_pte 168 105 -63 > >>>>>>>> sdma_v5_2_emit_fill_buffer 119 56 -63 > >>>>>>>> sdma_v5_0_vm_write_pte 168 105 -63 > >>>>>>>> sdma_v5_0_emit_fill_buffer 119 56 -63 > >>>>>>>> sdma_v4_4_2_vm_write_pte 168 105 -63 > >>>>>>>> sdma_v4_4_2_emit_fill_buffer 119 56 -63 > >>>>>>>> sdma_v4_0_vm_write_pte 168 105 -63 > >>>>>>>> sdma_v4_0_emit_fill_buffer 119 56 -63 > >>>>>>>> sdma_v3_0_emit_fill_buffer 116 53 -63 > >>>>>>>> sdma_v2_4_emit_fill_buffer 116 53 -63 > >>>>>>>> cik_sdma_emit_fill_buffer 116 53 -63 > >>>>>>>> sdma_v6_0_emit_copy_buffer 169 76 -93 > >>>>>>>> sdma_v5_2_emit_copy_buffer 169 76 -93 > >>>>>>>> sdma_v5_0_emit_copy_buffer 169 76 -93 > >>>>>>>> sdma_v4_4_2_emit_copy_buffer 169 76 -93 > >>>>>>>> sdma_v4_0_emit_copy_buffer 169 76 -93 > >>>>>>>> sdma_v3_0_vm_copy_pte 158 64 -94 > >>>>>>>> sdma_v3_0_emit_copy_buffer 155 61 -94 > >>>>>>>> sdma_v2_4_vm_copy_pte 158 64 -94 > >>>>>>>> sdma_v2_4_emit_copy_buffer 155 61 -94 > >>>>>>>> cik_sdma_vm_copy_pte 158 64 -94 > >>>>>>>> cik_sdma_emit_copy_buffer 155 61 -94 > >>>>>>>> sdma_v6_0_vm_copy_pte 163 68 -95 > >>>>>>>> sdma_v5_2_vm_copy_pte 163 68 -95 > >>>>>>>> sdma_v5_0_vm_copy_pte 163 68 -95 > >>>>>>>> sdma_v4_4_2_vm_copy_pte 163 68 -95 > >>>>>>>> sdma_v4_0_vm_copy_pte 163 68 -95 > >>>>>>>> sdma_v7_0_vm_copy_pte 183 75 -108 > >>>>>>>> sdma_v7_0_emit_copy_buffer 317 202 -115 > >>>>>>>> si_dma_vm_set_pte_pde 338 214 -124 > >>>>>>>> amdgpu_vce_get_destroy_msg 784 652 -132 > >>>>>>>> sdma_v7_0_vm_set_pte_pde 218 72 -146 > >>>>>>>> sdma_v6_0_vm_set_pte_pde 218 72 -146 > >>>>>>>> sdma_v5_2_vm_set_pte_pde 218 72 -146 > >>>>>>>> sdma_v5_0_vm_set_pte_pde 218 72 -146 > >>>>>>>> sdma_v4_4_2_vm_set_pte_pde 218 72 -146 > >>>>>>>> sdma_v4_0_vm_set_pte_pde 218 72 -146 > >>>>>>>> sdma_v3_0_vm_set_pte_pde 215 69 -146 > >>>>>>>> sdma_v2_4_vm_set_pte_pde 215 69 -146 > >>>>>>>> cik_sdma_vm_set_pte_pde 215 69 -146 > >>>>>>>> amdgpu_vcn_unified_ring_ib_header 172 - -172 > >>>>>>>> gfx_v9_4_2_run_shader.constprop 739 532 -207 > >>>>>>>> uvd_v6_0_enc_ring_test_ib 1464 1162 -302 > >>>>>>>> uvd_v7_0_enc_ring_test_ib 1464 1138 -326 > >>>>>>>> amdgpu_vce_ring_test_ib 1357 936 -421 > >>>>>>>> amdgpu_vcn_enc_ring_test_ib 2042 1524 -518 > >>>>>>>> Total: Before=9262623, After=9256285, chg -0.07% > >>>>>>>> > >>>>>>>> * Notice how _pad_ib functions have grown. I think the compiler used > >>>>>>>> the > >>>>>>>> opportunity to unroll the loops. > >>>>>>>> > >>>>>>>> ** Series was only smoke tested on the Steam Deck. > >>>>>>>> > >>>>>>>> Tvrtko Ursulin (16): > >>>>>>>> drm/amdgpu: Use memset32 for IB padding > >>>>>>>> drm/amdgpu: More compact VCE IB emission > >>>>>>>> drm/amdgpu: More compact VCN IB emission > >>>>>>>> drm/amdgpu: More compact UVD 6 IB emission > >>>>>>>> drm/amdgpu: More compact UVD 7 IB emission > >>>>>>>> drm/amdgpu: More compact SI SDMA emission > >>>>>>>> drm/amdgpu: More compact CIK SDMA IB emission > >>>>>>>> drm/amdgpu: More compact GFX 9.4.2 IB emission > >>>>>>>> drm/amdgpu: More compact SDMA 2.4 IB emission > >>>>>>>> drm/amdgpu: More compact SDMA 3.0 IB emission > >>>>>>>> drm/amdgpu: More compact SDMA 4.0 IB emission > >>>>>>>> drm/amdgpu: More compact SDMA 4.4.2 IB emission > >>>>>>>> drm/amdgpu: More compact SDMA 5.0 IB emission > >>>>>>>> drm/amdgpu: More compact SDMA 5.2 IB emission > >>>>>>>> drm/amdgpu: More compact SDMA 6.0 IB emission > >>>>>>>> drm/amdgpu: More compact SDMA 7.0 IB emission > >>>>>>>> > >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 12 ++- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 90 +++++++++-------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 101 > >>>>>>>> ++++++++++--------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 105 > >>>>>>>> ++++++++++++-------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 46 ++++----- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 108 > >>>>>>>> ++++++++++++-------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 108 > >>>>>>>> ++++++++++++-------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 109 > >>>>>>>> ++++++++++++--------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 108 > >>>>>>>> ++++++++++++-------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 106 > >>>>>>>> ++++++++++++-------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 110 > >>>>>>>> ++++++++++++--------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 110 > >>>>>>>> ++++++++++++--------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 119 > >>>>>>>> +++++++++++++---------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/si_dma.c | 84 +++++++++------- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 66 +++++++------ > >>>>>>>> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 66 +++++++------ > >>>>>>>> 16 files changed, 849 insertions(+), 599 deletions(-) > >>>>>>>> > >>>>>>>> -- > >>>>>>>> 2.48.0 > >>>>>>>> > >>>>>> > >>>> > >> >
