On Mon, 2025-09-15 at 09:25 -0400, Alex Deucher wrote:
> On Mon, Sep 15, 2025 at 9:23 AM <[email protected]> wrote:
> > 
> > On Mon, 2025-09-15 at 09:07 -0400, Alex Deucher wrote:
> > > On Sat, Sep 13, 2025 at 1:28 AM <[email protected]> wrote:
> > > > 
> > > > On Fri, 2025-09-12 at 15:38 -0400, Alex Deucher wrote:
> > > > > On Thu, Sep 11, 2025 at 2:18 PM Alex Deucher
> > > > > <[email protected]>
> > > > > wrote:
> > > > > > 
> > > > > > On Thu, Sep 11, 2025 at 1:25 PM Alex Deucher
> > > > > > <[email protected]> wrote:
> > > > > > > 
> > > > > > > SDMA 5.2.x has increased transfer limits.
> > > > > > > 
> > > > > > > v2: fix harder, use shifts to make it more obvious
> > > > > > > 
> > > > > > > Signed-off-by: Alex Deucher <[email protected]>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 4 ++--
> > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > > b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > > index a8e39df29f343..bf227eadbe487 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > > > > > > @@ -2065,11 +2065,11 @@ static void
> > > > > > > sdma_v5_2_emit_fill_buffer(struct amdgpu_ib *ib,
> > > > > > >  }
> > > > > > > 
> > > > > > >  static const struct amdgpu_buffer_funcs
> > > > > > > sdma_v5_2_buffer_funcs =
> > > > > > > {
> > > > > > > -       .copy_max_bytes = 0x400000,
> > > > > > > +       .copy_max_bytes = 1 << 30,
> > > > > > >         .copy_num_dw = 7,
> > > > > > >         .emit_copy_buffer = sdma_v5_2_emit_copy_buffer,
> > > > > > > 
> > > > > > > -       .fill_max_bytes = 0x400000,
> > > > > > > +       .fill_max_bytes = 1 << 30,
> > > > > > 
> > > > > > The hw docs and PAL differ here.  I've asked the hw
> > > > > > designers
> > > > > > to
> > > > > > clarify.
> > > > > 
> > > > > The HW team verified that the hardware supports the extended
> > > > > range
> > > > > for
> > > > > both copies and fills.
> > > > > 
> > > > > Alex
> > > > 
> > > > Hi Alex,
> > > > 
> > > > This is still pretty confusing.
> > > > According to PAL, only SDMA v6 has the extended range for
> > > > fills,
> > > > and it
> > > > can do 4 bytes fewer.
> > > > 
> > > > Are you sure that PAL is wrong about this?
> > > 
> > > I can talk to the PAL team as well.  I talked to the hardware
> > > designers and they verified that the hardware has the higher
> > > limit.
> > > It's the same underlying hardware so it makes sense that both
> > > copies
> > > and fills would have the same limit.
> > 
> > I am worried that they found some issues with it and that's why
> > they
> > didn't enable it.
> 
> No objections from me.
> 
> > 
> > > 
> > > > 
> > > > For reference:
> > > > https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/sdma/gfx10/gfx10DmaCmdBuffer.cpp
> > > > https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/sdma/gfx12/gfx12DmaCmdBuffer.cpp
> > > > 
> > > > MaxCopySize on GFX10: 1 << 22
> > > > MaxCopySize on GFX10.3+: 1 << 30
> > > > 
> > > > MaxFillSize on GFX10-10.3: (1 << 22 - 1) & ~3
> > > > MaxFillSize on GFX11+: (1 << 30 - 1) & ~3
> > > > This makes sense because they program the count field in the
> > > > packet
> > > > using the byte count minus four.
> > > 
> > > They are setting up the packet for dword fill rather than byte
> > > fill
> > > so
> > > count becomes dword aligned:
> > > 
> > >     // Because we will set fillsize = 2, the low two bits of our
> > > "count" are ignored, but we still program
> > >     // this in terms of bytes.
> > 
> > Yes. I thought we would prefer to use dword fill in the kernel as
> > well,
> > isn't that the case? I thought dword fill is faster and everything
> > that
> > the kernel fills would be already dword aligned. Am I missing
> > something?
> 
> Yes, the kernel could be switched to use dword fills as well.
> 

Oh, I see.
I thought the kernel already used dword fills.
If it doesn't, I can write a patch to do so.

Reply via email to