On 31/10/2025 14:48, Alex Deucher wrote:
On Fri, Oct 31, 2025 at 10:41 AM Tvrtko Ursulin
<[email protected]> wrote:


On 31/10/2025 14:21, Alex Deucher wrote:
On Fri, Oct 31, 2025 at 4:44 AM Tvrtko Ursulin
<[email protected]> wrote:


On 24/10/2025 18:03, Alex Deucher wrote:
On Fri, Oct 24, 2025 at 12:36 PM Tvrtko Ursulin
<[email protected]> wrote:


On 24/10/2025 17:11, Alex Deucher wrote:
On Wed, Oct 8, 2025 at 10:28 AM Tvrtko Ursulin
<[email protected]> wrote:


On 08/10/2025 15:08, Alex Deucher wrote:
Applied the series.  Thanks!

Thank you and fingers crossed I did not fat finger anything that your CI
wouldn't find!

Sorry for the late reply.  This series broke the VCN IB tests on at
least navi4x.  The issue is a GPUVM page fault caused by VCN when
running the IB tests.
I don't see any obvious problems with the patches, but I haven't had a
chance to dig too much further.

Oh I see the problem.. sorry about that.

I missed the fact that after amdgpu_vcn_unified_ring_ib_header() I need
to update the pointer.

Good news is that only VCN has this pattern.

Okay to send a fix next week? Would you need a fixup or a replacement
patch in case you want to revert it in the meantime?

Fixup is fine.  I'll squash it into the original VCN patch and give it
a test next week.

How did it go? I noticed amd-staging-drm-next currently has the original
buggy version.

Those were reverted.


Given other patches from the series are also missing perhaps you are
testing it in stages?

I was running it through CI again and noticed that polaris seems to
regress with these changes, so likely related to the VCE or UVD v6
changes.

Not sure I follow and sorry if there is still some confusion.

I did not spot anything obviously wrong in VCE and UVD v6.

But currently a broken version of "drm/amdgpu: More compact VCN IB
emission" is in the staging branch. It needs to be squashed with
"drm/amdgpu: Fixup VCN IB emission".

It's already been reverted:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/bdb17ea8a391885c884d4e8dfb27a1e22d3d03ce

I was planning to land the whole series along with the VCN fix
squashed into the original, but the CI for polaris keeps failing with
this set.  I haven't had time to dig further yet.

Ah you probably mean VCN is not relevant for Polaris? What do you recommend for sort of a code name to IP block version decoding? If something like that exists in the public domain.

Regards,

Tvrtko

I wish something similar could be done for amdgpu_ring_write too, but
that one is waiting on Christian to, AFAIR, become idle enough to
untangle some ptr masking issues.

Regards,

Tvrtko

On Thu, Sep 11, 2025 at 7:42 AM Tvrtko Ursulin
<[email protected]> wrote:

In short, this series mostly does a lot of replacing of this pattern:

            ib->ptr[ib->length_dw++] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
                    SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
            ib->ptr[ib->length_dw++] = lower_32_bits(pe);
            ib->ptr[ib->length_dw++] = upper_32_bits(pe);
            ib->ptr[ib->length_dw++] = ndw - 1;
            for (; ndw > 0; ndw -= 2) {
                   ib->ptr[ib->length_dw++] = lower_32_bits(value);
                   ib->ptr[ib->length_dw++] = upper_32_bits(value);
                    value += incr;
            }

With this one:

            u32 *ptr = &ib->ptr[ib->length_dw];

            *ptr++ = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
                     SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
            *ptr++ = lower_32_bits(pe);
            *ptr++ = upper_32_bits(pe);
            *ptr++ = ndw - 1;
            for (; ndw > 0; ndw -= 2) {
                    *ptr++ = lower_32_bits(value);
                    *ptr++ = upper_32_bits(value);
                     value += incr;
             }

            ib->length_dw = ptr - ib->ptr;

Latter avoids register reloads and length updates on every dword written, and on
the overall makes the IB emission much more compact:

add/remove: 0/1 grow/shrink: 10/58 up/down: 260/-6598 (-6338)
Function                                     old     new   delta
sdma_v7_0_ring_pad_ib                         99     127     +28
sdma_v6_0_ring_pad_ib                         99     127     +28
sdma_v5_2_ring_pad_ib                         99     127     +28
sdma_v5_0_ring_pad_ib                         99     127     +28
sdma_v4_4_2_ring_pad_ib                       99     127     +28
sdma_v4_0_ring_pad_ib                         99     127     +28
sdma_v3_0_ring_pad_ib                         99     127     +28
sdma_v2_4_ring_pad_ib                         99     127     +28
cik_sdma_ring_pad_ib                          99     127     +28
si_dma_ring_pad_ib                            36      44      +8
amdgpu_ring_generic_pad_ib                    56      52      -4
si_dma_emit_fill_buffer                      108      71     -37
si_dma_vm_write_pte                          158     115     -43
amdgpu_vcn_dec_sw_send_msg                   810     767     -43
si_dma_vm_copy_pte                           137      87     -50
si_dma_emit_copy_buffer                      134      84     -50
sdma_v3_0_vm_write_pte                       163     102     -61
sdma_v2_4_vm_write_pte                       163     102     -61
cik_sdma_vm_write_pte                        163     102     -61
sdma_v7_0_vm_write_pte                       168     105     -63
sdma_v7_0_emit_fill_buffer                   119      56     -63
sdma_v6_0_vm_write_pte                       168     105     -63
sdma_v6_0_emit_fill_buffer                   119      56     -63
sdma_v5_2_vm_write_pte                       168     105     -63
sdma_v5_2_emit_fill_buffer                   119      56     -63
sdma_v5_0_vm_write_pte                       168     105     -63
sdma_v5_0_emit_fill_buffer                   119      56     -63
sdma_v4_4_2_vm_write_pte                     168     105     -63
sdma_v4_4_2_emit_fill_buffer                 119      56     -63
sdma_v4_0_vm_write_pte                       168     105     -63
sdma_v4_0_emit_fill_buffer                   119      56     -63
sdma_v3_0_emit_fill_buffer                   116      53     -63
sdma_v2_4_emit_fill_buffer                   116      53     -63
cik_sdma_emit_fill_buffer                    116      53     -63
sdma_v6_0_emit_copy_buffer                   169      76     -93
sdma_v5_2_emit_copy_buffer                   169      76     -93
sdma_v5_0_emit_copy_buffer                   169      76     -93
sdma_v4_4_2_emit_copy_buffer                 169      76     -93
sdma_v4_0_emit_copy_buffer                   169      76     -93
sdma_v3_0_vm_copy_pte                        158      64     -94
sdma_v3_0_emit_copy_buffer                   155      61     -94
sdma_v2_4_vm_copy_pte                        158      64     -94
sdma_v2_4_emit_copy_buffer                   155      61     -94
cik_sdma_vm_copy_pte                         158      64     -94
cik_sdma_emit_copy_buffer                    155      61     -94
sdma_v6_0_vm_copy_pte                        163      68     -95
sdma_v5_2_vm_copy_pte                        163      68     -95
sdma_v5_0_vm_copy_pte                        163      68     -95
sdma_v4_4_2_vm_copy_pte                      163      68     -95
sdma_v4_0_vm_copy_pte                        163      68     -95
sdma_v7_0_vm_copy_pte                        183      75    -108
sdma_v7_0_emit_copy_buffer                   317     202    -115
si_dma_vm_set_pte_pde                        338     214    -124
amdgpu_vce_get_destroy_msg                   784     652    -132
sdma_v7_0_vm_set_pte_pde                     218      72    -146
sdma_v6_0_vm_set_pte_pde                     218      72    -146
sdma_v5_2_vm_set_pte_pde                     218      72    -146
sdma_v5_0_vm_set_pte_pde                     218      72    -146
sdma_v4_4_2_vm_set_pte_pde                   218      72    -146
sdma_v4_0_vm_set_pte_pde                     218      72    -146
sdma_v3_0_vm_set_pte_pde                     215      69    -146
sdma_v2_4_vm_set_pte_pde                     215      69    -146
cik_sdma_vm_set_pte_pde                      215      69    -146
amdgpu_vcn_unified_ring_ib_header            172       -    -172
gfx_v9_4_2_run_shader.constprop              739     532    -207
uvd_v6_0_enc_ring_test_ib                   1464    1162    -302
uvd_v7_0_enc_ring_test_ib                   1464    1138    -326
amdgpu_vce_ring_test_ib                     1357     936    -421
amdgpu_vcn_enc_ring_test_ib                 2042    1524    -518
Total: Before=9262623, After=9256285, chg -0.07%

* Notice how _pad_ib functions have grown. I think the compiler used the
opportunity to unroll the loops.

** Series was only smoke tested on the Steam Deck.

Tvrtko Ursulin (16):
       drm/amdgpu: Use memset32 for IB padding
       drm/amdgpu: More compact VCE IB emission
       drm/amdgpu: More compact VCN IB emission
       drm/amdgpu: More compact UVD 6 IB emission
       drm/amdgpu: More compact UVD 7 IB emission
       drm/amdgpu: More compact SI SDMA emission
       drm/amdgpu: More compact CIK SDMA IB emission
       drm/amdgpu: More compact GFX 9.4.2 IB emission
       drm/amdgpu: More compact SDMA 2.4 IB emission
       drm/amdgpu: More compact SDMA 3.0 IB emission
       drm/amdgpu: More compact SDMA 4.0 IB emission
       drm/amdgpu: More compact SDMA 4.4.2 IB emission
       drm/amdgpu: More compact SDMA 5.0 IB emission
       drm/amdgpu: More compact SDMA 5.2 IB emission
       drm/amdgpu: More compact SDMA 6.0 IB emission
       drm/amdgpu: More compact SDMA 7.0 IB emission

      drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 ++-
      drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c  |  90 +++++++++--------
      drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c  | 101 ++++++++++---------
      drivers/gpu/drm/amd/amdgpu/cik_sdma.c    | 105 ++++++++++++--------
      drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c  |  46 ++++-----
      drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   | 108 ++++++++++++--------
      drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 108 ++++++++++++--------
      drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 109 ++++++++++++---------
      drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 108 ++++++++++++--------
      drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 106 ++++++++++++--------
      drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   | 110 ++++++++++++---------
      drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 110 ++++++++++++---------
      drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c   | 119 +++++++++++++----------
      drivers/gpu/drm/amd/amdgpu/si_dma.c      |  84 +++++++++-------
      drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c    |  66 +++++++------
      drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c    |  66 +++++++------
      16 files changed, 849 insertions(+), 599 deletions(-)

--
2.48.0






Reply via email to