On Fri, Oct 24, 2025 at 12:36 PM Tvrtko Ursulin
<[email protected]> wrote:
>
>
> On 24/10/2025 17:11, Alex Deucher wrote:
> > On Wed, Oct 8, 2025 at 10:28 AM Tvrtko Ursulin
> > <[email protected]> wrote:
> >>
> >>
> >> On 08/10/2025 15:08, Alex Deucher wrote:
> >>> Applied the series.  Thanks!
> >>
> >> Thank you and fingers crossed I did not fat finger anything that your CI
> >> wouldn't find!
> >
> > Sorry for the late reply.  This series broke the VCN IB tests on at
> > least navi4x.  The issue is a GPUVM page fault caused by VCN when
> > running the IB tests.
> > I don't see any obvious problems with the patches, but I haven't had a
> > chance to dig too much further.
>
> Oh I see the problem.. sorry about that.
>
> I missed the fact that after amdgpu_vcn_unified_ring_ib_header() I need
> to update the pointer.
>
> Good news is that only VCN has this pattern.
>
> Okay to send a fix next week? Would you need a fixup or a replacement
> patch in case you want to revert it in the meantime?

Fixup is fine.  I'll squash it into the original VCN patch and give it
a test next week.

Thanks!

Alex

>
> Regards,
>
> Tvrtko
>
> >> I wish something similar could be done for amdgpu_ring_write too, but
> >> that one is waiting on Christian to, AFAIR, become idle enough to
> >> untangle some ptr masking issues.
> >>
> >> Regards,
> >>
> >> Tvrtko
> >>
> >>> On Thu, Sep 11, 2025 at 7:42 AM Tvrtko Ursulin
> >>> <[email protected]> wrote:
> >>>>
> >>>> In short, this series mostly does a lot of replacing of this pattern:
> >>>>
> >>>>          ib->ptr[ib->length_dw++] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
> >>>>                  SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
> >>>>          ib->ptr[ib->length_dw++] = lower_32_bits(pe);
> >>>>          ib->ptr[ib->length_dw++] = upper_32_bits(pe);
> >>>>          ib->ptr[ib->length_dw++] = ndw - 1;
> >>>>          for (; ndw > 0; ndw -= 2) {
> >>>>                 ib->ptr[ib->length_dw++] = lower_32_bits(value);
> >>>>                 ib->ptr[ib->length_dw++] = upper_32_bits(value);
> >>>>                  value += incr;
> >>>>          }
> >>>>
> >>>> With this one:
> >>>>
> >>>>          u32 *ptr = &ib->ptr[ib->length_dw];
> >>>>
> >>>>          *ptr++ = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
> >>>>                   SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
> >>>>          *ptr++ = lower_32_bits(pe);
> >>>>          *ptr++ = upper_32_bits(pe);
> >>>>          *ptr++ = ndw - 1;
> >>>>          for (; ndw > 0; ndw -= 2) {
> >>>>                  *ptr++ = lower_32_bits(value);
> >>>>                  *ptr++ = upper_32_bits(value);
> >>>>                   value += incr;
> >>>>           }
> >>>>
> >>>>          ib->length_dw = ptr - ib->ptr;
> >>>>
> >>>> Latter avoids register reloads and length updates on every dword 
> >>>> written, and on
> >>>> the overall makes the IB emission much more compact:
> >>>>
> >>>> add/remove: 0/1 grow/shrink: 10/58 up/down: 260/-6598 (-6338)
> >>>> Function                                     old     new   delta
> >>>> sdma_v7_0_ring_pad_ib                         99     127     +28
> >>>> sdma_v6_0_ring_pad_ib                         99     127     +28
> >>>> sdma_v5_2_ring_pad_ib                         99     127     +28
> >>>> sdma_v5_0_ring_pad_ib                         99     127     +28
> >>>> sdma_v4_4_2_ring_pad_ib                       99     127     +28
> >>>> sdma_v4_0_ring_pad_ib                         99     127     +28
> >>>> sdma_v3_0_ring_pad_ib                         99     127     +28
> >>>> sdma_v2_4_ring_pad_ib                         99     127     +28
> >>>> cik_sdma_ring_pad_ib                          99     127     +28
> >>>> si_dma_ring_pad_ib                            36      44      +8
> >>>> amdgpu_ring_generic_pad_ib                    56      52      -4
> >>>> si_dma_emit_fill_buffer                      108      71     -37
> >>>> si_dma_vm_write_pte                          158     115     -43
> >>>> amdgpu_vcn_dec_sw_send_msg                   810     767     -43
> >>>> si_dma_vm_copy_pte                           137      87     -50
> >>>> si_dma_emit_copy_buffer                      134      84     -50
> >>>> sdma_v3_0_vm_write_pte                       163     102     -61
> >>>> sdma_v2_4_vm_write_pte                       163     102     -61
> >>>> cik_sdma_vm_write_pte                        163     102     -61
> >>>> sdma_v7_0_vm_write_pte                       168     105     -63
> >>>> sdma_v7_0_emit_fill_buffer                   119      56     -63
> >>>> sdma_v6_0_vm_write_pte                       168     105     -63
> >>>> sdma_v6_0_emit_fill_buffer                   119      56     -63
> >>>> sdma_v5_2_vm_write_pte                       168     105     -63
> >>>> sdma_v5_2_emit_fill_buffer                   119      56     -63
> >>>> sdma_v5_0_vm_write_pte                       168     105     -63
> >>>> sdma_v5_0_emit_fill_buffer                   119      56     -63
> >>>> sdma_v4_4_2_vm_write_pte                     168     105     -63
> >>>> sdma_v4_4_2_emit_fill_buffer                 119      56     -63
> >>>> sdma_v4_0_vm_write_pte                       168     105     -63
> >>>> sdma_v4_0_emit_fill_buffer                   119      56     -63
> >>>> sdma_v3_0_emit_fill_buffer                   116      53     -63
> >>>> sdma_v2_4_emit_fill_buffer                   116      53     -63
> >>>> cik_sdma_emit_fill_buffer                    116      53     -63
> >>>> sdma_v6_0_emit_copy_buffer                   169      76     -93
> >>>> sdma_v5_2_emit_copy_buffer                   169      76     -93
> >>>> sdma_v5_0_emit_copy_buffer                   169      76     -93
> >>>> sdma_v4_4_2_emit_copy_buffer                 169      76     -93
> >>>> sdma_v4_0_emit_copy_buffer                   169      76     -93
> >>>> sdma_v3_0_vm_copy_pte                        158      64     -94
> >>>> sdma_v3_0_emit_copy_buffer                   155      61     -94
> >>>> sdma_v2_4_vm_copy_pte                        158      64     -94
> >>>> sdma_v2_4_emit_copy_buffer                   155      61     -94
> >>>> cik_sdma_vm_copy_pte                         158      64     -94
> >>>> cik_sdma_emit_copy_buffer                    155      61     -94
> >>>> sdma_v6_0_vm_copy_pte                        163      68     -95
> >>>> sdma_v5_2_vm_copy_pte                        163      68     -95
> >>>> sdma_v5_0_vm_copy_pte                        163      68     -95
> >>>> sdma_v4_4_2_vm_copy_pte                      163      68     -95
> >>>> sdma_v4_0_vm_copy_pte                        163      68     -95
> >>>> sdma_v7_0_vm_copy_pte                        183      75    -108
> >>>> sdma_v7_0_emit_copy_buffer                   317     202    -115
> >>>> si_dma_vm_set_pte_pde                        338     214    -124
> >>>> amdgpu_vce_get_destroy_msg                   784     652    -132
> >>>> sdma_v7_0_vm_set_pte_pde                     218      72    -146
> >>>> sdma_v6_0_vm_set_pte_pde                     218      72    -146
> >>>> sdma_v5_2_vm_set_pte_pde                     218      72    -146
> >>>> sdma_v5_0_vm_set_pte_pde                     218      72    -146
> >>>> sdma_v4_4_2_vm_set_pte_pde                   218      72    -146
> >>>> sdma_v4_0_vm_set_pte_pde                     218      72    -146
> >>>> sdma_v3_0_vm_set_pte_pde                     215      69    -146
> >>>> sdma_v2_4_vm_set_pte_pde                     215      69    -146
> >>>> cik_sdma_vm_set_pte_pde                      215      69    -146
> >>>> amdgpu_vcn_unified_ring_ib_header            172       -    -172
> >>>> gfx_v9_4_2_run_shader.constprop              739     532    -207
> >>>> uvd_v6_0_enc_ring_test_ib                   1464    1162    -302
> >>>> uvd_v7_0_enc_ring_test_ib                   1464    1138    -326
> >>>> amdgpu_vce_ring_test_ib                     1357     936    -421
> >>>> amdgpu_vcn_enc_ring_test_ib                 2042    1524    -518
> >>>> Total: Before=9262623, After=9256285, chg -0.07%
> >>>>
> >>>> * Notice how _pad_ib functions have grown. I think the compiler used the
> >>>> opportunity to unroll the loops.
> >>>>
> >>>> ** Series was only smoke tested on the Steam Deck.
> >>>>
> >>>> Tvrtko Ursulin (16):
> >>>>     drm/amdgpu: Use memset32 for IB padding
> >>>>     drm/amdgpu: More compact VCE IB emission
> >>>>     drm/amdgpu: More compact VCN IB emission
> >>>>     drm/amdgpu: More compact UVD 6 IB emission
> >>>>     drm/amdgpu: More compact UVD 7 IB emission
> >>>>     drm/amdgpu: More compact SI SDMA emission
> >>>>     drm/amdgpu: More compact CIK SDMA IB emission
> >>>>     drm/amdgpu: More compact GFX 9.4.2 IB emission
> >>>>     drm/amdgpu: More compact SDMA 2.4 IB emission
> >>>>     drm/amdgpu: More compact SDMA 3.0 IB emission
> >>>>     drm/amdgpu: More compact SDMA 4.0 IB emission
> >>>>     drm/amdgpu: More compact SDMA 4.4.2 IB emission
> >>>>     drm/amdgpu: More compact SDMA 5.0 IB emission
> >>>>     drm/amdgpu: More compact SDMA 5.2 IB emission
> >>>>     drm/amdgpu: More compact SDMA 6.0 IB emission
> >>>>     drm/amdgpu: More compact SDMA 7.0 IB emission
> >>>>
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 ++-
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c  |  90 +++++++++--------
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c  | 101 ++++++++++---------
> >>>>    drivers/gpu/drm/amd/amdgpu/cik_sdma.c    | 105 ++++++++++++--------
> >>>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c  |  46 ++++-----
> >>>>    drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   | 108 ++++++++++++--------
> >>>>    drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 108 ++++++++++++--------
> >>>>    drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 109 ++++++++++++---------
> >>>>    drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 108 ++++++++++++--------
> >>>>    drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 106 ++++++++++++--------
> >>>>    drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   | 110 ++++++++++++---------
> >>>>    drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 110 ++++++++++++---------
> >>>>    drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c   | 119 +++++++++++++----------
> >>>>    drivers/gpu/drm/amd/amdgpu/si_dma.c      |  84 +++++++++-------
> >>>>    drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c    |  66 +++++++------
> >>>>    drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c    |  66 +++++++------
> >>>>    16 files changed, 849 insertions(+), 599 deletions(-)
> >>>>
> >>>> --
> >>>> 2.48.0
> >>>>
> >>
>

Reply via email to