On Mon, Sep 15, 2025 at 9:23 AM <[email protected]> wrote:
>
> On Mon, 2025-09-15 at 09:07 -0400, Alex Deucher wrote:
> > On Sat, Sep 13, 2025 at 1:28 AM <[email protected]> wrote:
> > >
> > > On Fri, 2025-09-12 at 15:38 -0400, Alex Deucher wrote:
> > > > On Thu, Sep 11, 2025 at 2:18 PM Alex Deucher
> > > > <[email protected]>
> > > > wrote:
> > > > >
> > > > > On Thu, Sep 11, 2025 at 1:25 PM Alex Deucher
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > SDMA 5.2.x has increased transfer limits.
> > > > > >
> > > > > > v2: fix harder, use shifts to make it more obvious
> > > > > >
> > > > > > Signed-off-by: Alex Deucher <[email protected]>
> > > > > > ---
> > > > > >  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 4 ++--
> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > index a8e39df29f343..bf227eadbe487 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > @@ -2065,11 +2065,11 @@ static void
> > > > > > sdma_v5_2_emit_fill_buffer(struct amdgpu_ib *ib,
> > > > > >  }
> > > > > >
> > > > > >  static const struct amdgpu_buffer_funcs
> > > > > > sdma_v5_2_buffer_funcs =
> > > > > > {
> > > > > > -       .copy_max_bytes = 0x400000,
> > > > > > +       .copy_max_bytes = 1 << 30,
> > > > > >         .copy_num_dw = 7,
> > > > > >         .emit_copy_buffer = sdma_v5_2_emit_copy_buffer,
> > > > > >
> > > > > > -       .fill_max_bytes = 0x400000,
> > > > > > +       .fill_max_bytes = 1 << 30,
> > > > >
> > > > > The hw docs and PAL differ here.  I've asked the hw designers
> > > > > to
> > > > > clarify.
> > > >
> > > > The HW team verified that the hardware supports the extended
> > > > range
> > > > for
> > > > both copies and fills.
> > > >
> > > > Alex
> > >
> > > Hi Alex,
> > >
> > > This is still pretty confusing.
> > > According to PAL, only SDMA v6 has the extended range for fills,
> > > and it
> > > can do 4 bytes fewer.
> > >
> > > Are you sure that PAL is wrong about this?
> >
> > I can talk to the PAL team as well.  I talked to the hardware
> > designers and they verified that the hardware has the higher limit.
> > It's the same underlying hardware so it makes sense that both copies
> > and fills would have the same limit.
>
> I am worried that they found some issues with it and that's why they
> didn't enable it.

No objections from me.

>
> >
> > >
> > > For reference:
> > > https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/sdma/gfx10/gfx10DmaCmdBuffer.cpp
> > > https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/sdma/gfx12/gfx12DmaCmdBuffer.cpp
> > >
> > > MaxCopySize on GFX10: 1 << 22
> > > MaxCopySize on GFX10.3+: 1 << 30
> > >
> > > MaxFillSize on GFX10-10.3: (1 << 22 - 1) & ~3
> > > MaxFillSize on GFX11+: (1 << 30 - 1) & ~3
> > > This makes sense because they program the count field in the packet
> > > using the byte count minus four.
> >
> > They are setting up the packet for dword fill rather than byte fill
> > so
> > count becomes dword aligned:
> >
> >     // Because we will set fillsize = 2, the low two bits of our
> > "count" are ignored, but we still program
> >     // this in terms of bytes.
>
> Yes. I thought we would prefer to use dword fill in the kernel as well,
> isn't that the case? I thought dword fill is faster and everything that
> the kernel fills would be already dword aligned. Am I missing
> something?

Yes, the kernel could be switched to use dword fills as well.

Alex

Reply via email to