Applied the series.  Thanks!

Alex

On Thu, Sep 11, 2025 at 7:42 AM Tvrtko Ursulin
<[email protected]> wrote:
>
> In short, this series mostly does a lot of replacing of this pattern:
>
>        ib->ptr[ib->length_dw++] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
>                SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
>        ib->ptr[ib->length_dw++] = lower_32_bits(pe);
>        ib->ptr[ib->length_dw++] = upper_32_bits(pe);
>        ib->ptr[ib->length_dw++] = ndw - 1;
>        for (; ndw > 0; ndw -= 2) {
>               ib->ptr[ib->length_dw++] = lower_32_bits(value);
>               ib->ptr[ib->length_dw++] = upper_32_bits(value);
>                value += incr;
>        }
>
> With this one:
>
>        u32 *ptr = &ib->ptr[ib->length_dw];
>
>        *ptr++ = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
>                 SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
>        *ptr++ = lower_32_bits(pe);
>        *ptr++ = upper_32_bits(pe);
>        *ptr++ = ndw - 1;
>        for (; ndw > 0; ndw -= 2) {
>                *ptr++ = lower_32_bits(value);
>                *ptr++ = upper_32_bits(value);
>                 value += incr;
>         }
>
>        ib->length_dw = ptr - ib->ptr;
>
> Latter avoids register reloads and length updates on every dword written, and 
> on
> the overall makes the IB emission much more compact:
>
> add/remove: 0/1 grow/shrink: 10/58 up/down: 260/-6598 (-6338)
> Function                                     old     new   delta
> sdma_v7_0_ring_pad_ib                         99     127     +28
> sdma_v6_0_ring_pad_ib                         99     127     +28
> sdma_v5_2_ring_pad_ib                         99     127     +28
> sdma_v5_0_ring_pad_ib                         99     127     +28
> sdma_v4_4_2_ring_pad_ib                       99     127     +28
> sdma_v4_0_ring_pad_ib                         99     127     +28
> sdma_v3_0_ring_pad_ib                         99     127     +28
> sdma_v2_4_ring_pad_ib                         99     127     +28
> cik_sdma_ring_pad_ib                          99     127     +28
> si_dma_ring_pad_ib                            36      44      +8
> amdgpu_ring_generic_pad_ib                    56      52      -4
> si_dma_emit_fill_buffer                      108      71     -37
> si_dma_vm_write_pte                          158     115     -43
> amdgpu_vcn_dec_sw_send_msg                   810     767     -43
> si_dma_vm_copy_pte                           137      87     -50
> si_dma_emit_copy_buffer                      134      84     -50
> sdma_v3_0_vm_write_pte                       163     102     -61
> sdma_v2_4_vm_write_pte                       163     102     -61
> cik_sdma_vm_write_pte                        163     102     -61
> sdma_v7_0_vm_write_pte                       168     105     -63
> sdma_v7_0_emit_fill_buffer                   119      56     -63
> sdma_v6_0_vm_write_pte                       168     105     -63
> sdma_v6_0_emit_fill_buffer                   119      56     -63
> sdma_v5_2_vm_write_pte                       168     105     -63
> sdma_v5_2_emit_fill_buffer                   119      56     -63
> sdma_v5_0_vm_write_pte                       168     105     -63
> sdma_v5_0_emit_fill_buffer                   119      56     -63
> sdma_v4_4_2_vm_write_pte                     168     105     -63
> sdma_v4_4_2_emit_fill_buffer                 119      56     -63
> sdma_v4_0_vm_write_pte                       168     105     -63
> sdma_v4_0_emit_fill_buffer                   119      56     -63
> sdma_v3_0_emit_fill_buffer                   116      53     -63
> sdma_v2_4_emit_fill_buffer                   116      53     -63
> cik_sdma_emit_fill_buffer                    116      53     -63
> sdma_v6_0_emit_copy_buffer                   169      76     -93
> sdma_v5_2_emit_copy_buffer                   169      76     -93
> sdma_v5_0_emit_copy_buffer                   169      76     -93
> sdma_v4_4_2_emit_copy_buffer                 169      76     -93
> sdma_v4_0_emit_copy_buffer                   169      76     -93
> sdma_v3_0_vm_copy_pte                        158      64     -94
> sdma_v3_0_emit_copy_buffer                   155      61     -94
> sdma_v2_4_vm_copy_pte                        158      64     -94
> sdma_v2_4_emit_copy_buffer                   155      61     -94
> cik_sdma_vm_copy_pte                         158      64     -94
> cik_sdma_emit_copy_buffer                    155      61     -94
> sdma_v6_0_vm_copy_pte                        163      68     -95
> sdma_v5_2_vm_copy_pte                        163      68     -95
> sdma_v5_0_vm_copy_pte                        163      68     -95
> sdma_v4_4_2_vm_copy_pte                      163      68     -95
> sdma_v4_0_vm_copy_pte                        163      68     -95
> sdma_v7_0_vm_copy_pte                        183      75    -108
> sdma_v7_0_emit_copy_buffer                   317     202    -115
> si_dma_vm_set_pte_pde                        338     214    -124
> amdgpu_vce_get_destroy_msg                   784     652    -132
> sdma_v7_0_vm_set_pte_pde                     218      72    -146
> sdma_v6_0_vm_set_pte_pde                     218      72    -146
> sdma_v5_2_vm_set_pte_pde                     218      72    -146
> sdma_v5_0_vm_set_pte_pde                     218      72    -146
> sdma_v4_4_2_vm_set_pte_pde                   218      72    -146
> sdma_v4_0_vm_set_pte_pde                     218      72    -146
> sdma_v3_0_vm_set_pte_pde                     215      69    -146
> sdma_v2_4_vm_set_pte_pde                     215      69    -146
> cik_sdma_vm_set_pte_pde                      215      69    -146
> amdgpu_vcn_unified_ring_ib_header            172       -    -172
> gfx_v9_4_2_run_shader.constprop              739     532    -207
> uvd_v6_0_enc_ring_test_ib                   1464    1162    -302
> uvd_v7_0_enc_ring_test_ib                   1464    1138    -326
> amdgpu_vce_ring_test_ib                     1357     936    -421
> amdgpu_vcn_enc_ring_test_ib                 2042    1524    -518
> Total: Before=9262623, After=9256285, chg -0.07%
>
> * Notice how _pad_ib functions have grown. I think the compiler used the
> opportunity to unroll the loops.
>
> ** Series was only smoke tested on the Steam Deck.
>
> Tvrtko Ursulin (16):
>   drm/amdgpu: Use memset32 for IB padding
>   drm/amdgpu: More compact VCE IB emission
>   drm/amdgpu: More compact VCN IB emission
>   drm/amdgpu: More compact UVD 6 IB emission
>   drm/amdgpu: More compact UVD 7 IB emission
>   drm/amdgpu: More compact SI SDMA emission
>   drm/amdgpu: More compact CIK SDMA IB emission
>   drm/amdgpu: More compact GFX 9.4.2 IB emission
>   drm/amdgpu: More compact SDMA 2.4 IB emission
>   drm/amdgpu: More compact SDMA 3.0 IB emission
>   drm/amdgpu: More compact SDMA 4.0 IB emission
>   drm/amdgpu: More compact SDMA 4.4.2 IB emission
>   drm/amdgpu: More compact SDMA 5.0 IB emission
>   drm/amdgpu: More compact SDMA 5.2 IB emission
>   drm/amdgpu: More compact SDMA 6.0 IB emission
>   drm/amdgpu: More compact SDMA 7.0 IB emission
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c  |  90 +++++++++--------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c  | 101 ++++++++++---------
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c    | 105 ++++++++++++--------
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c  |  46 ++++-----
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   | 108 ++++++++++++--------
>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 108 ++++++++++++--------
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 109 ++++++++++++---------
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 108 ++++++++++++--------
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 106 ++++++++++++--------
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   | 110 ++++++++++++---------
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 110 ++++++++++++---------
>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c   | 119 +++++++++++++----------
>  drivers/gpu/drm/amd/amdgpu/si_dma.c      |  84 +++++++++-------
>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c    |  66 +++++++------
>  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c    |  66 +++++++------
>  16 files changed, 849 insertions(+), 599 deletions(-)
>
> --
> 2.48.0
>

Reply via email to