Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-14 Thread Rob Clark
On Fri, May 14, 2021 at 12:54 AM Pekka Paalanen  wrote:
>
> On Wed, 12 May 2021 07:57:26 -0700
> Rob Clark  wrote:
>
> > On Wed, May 12, 2021 at 1:23 AM Pekka Paalanen  wrote:
> > >
> > > On Tue, 11 May 2021 18:44:17 +0200
> > > Daniel Vetter  wrote:
> > >
> > > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > > > From: Rob Clark 
> > > > > > > > >
> > > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank 
> > > > > > > > > on "video
> > > > > > > > > mode" type displays, which is pointless and unnecessary.  Add 
> > > > > > > > > an
> > > > > > > > > optional helper vfunc to determine if a plane is attached to 
> > > > > > > > > a CRTC
> > > > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Rob Clark 
>
> ...
>
> > > > > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > > > > > per-crtc basis and hoist the loop and check if dirtyfb is needed 
> > > > > > > out
> > > > > > > of drm_atomic_helper_dirtyfb()
> > > > > >
> > > > > > That's still using information that userspace doesn't have, which 
> > > > > > is a
> > > > > > bit irky. We might as well go with your thing here then.
> > > > >
> > > > > arguably, this is something we should expose to userspace.. for DSI
> > > > > command-mode panels, you probably want to make a different decision
> > > > > with regard to how many buffers in your flip-chain..
> > > > >
> > > > > Possibly we should add/remove the fb_damage_clips property depending
> > > > > on the display type (ie. video/pull vs cmd/push mode)?
> > > >
> > > > I'm not sure whether atomic actually needs this exposed:
> > > > - clients will do full flips for every frame anyway, I've not heard of
> > > >   anyone seriously doing frontbuffer rendering.
> > >
> > > That may or may not be changing, depending on whether the DRM drivers
> > > will actually support tearing flips. There has been a huge amount of
> > > debate for needing tearing for Wayland [1], and while I haven't really
> > > joined that discussion, using front-buffer rendering (blits) to work
> > > around the driver inability to flip-tear might be something some people
> > > will want.
> >
> > jfwiw, there is a lot of hw that just can't do tearing pageflips.. I
> > think this probably includes most arm hw.  What is done instead is to
> > skip the pageflip and render directly to the front-buffer.
> >
> > EGL_KHR_mutable_render_buffer is a thing you might be interested in..
> > it is wired up for android on i965 and there is a WIP MR[1] for
> > mesa/st (gallium):
> >
> > Possibly it could be useful to add support for platform_wayland?
> >
> > [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10685
>
> Thanks Rob, that's interesting.
>
> I would like to say that EGL Wayland platform cannot and has no reason
> to support frontbuffer rendering in Wayland clients, because the
> compositor may be reading the current client frontbuffer at any time,
> including *not reading it again* until another update is posted via
> Wayland. So if a Wayland client is doing frontbuffer rendering, then I
> would expect it to be very likely that the window might almost never
> show a "good" picture, meaning that it is literally just the
> half-rendered frame after the current one with continuously updating
> clients.
>
> That is because a Wayland client doing frontbuffer rendering is
> completely unrelated to the Wayland compositor putting the client
> buffer on scanout.
>
> Frontbuffer rendering used by a Wayland compositor would be used for
> fallback tearing updates, where the rendering is roughly just a blit
> from a client buffer. By definition, it means blit instead of scanout
> from client buffers, so the performance/power hit is going to be...
> let's say observable.
>
> If a Wayland client did frontbuffer rendering, and then it used a
> shadow buffer of its own to minimise the level of garbage on screen by
> doing only blits into the frontbuffer, that would again be a blit. And
> then the compositor might be doing another blit because it doesn't know
> the client is doing frontbuffer rendering which would theoretically
> allow the compositor to scan out the client buffer.
>
> There emerges the need for a Wayland extension for clients to be
> telling the compositor explicitly that they are going to be doing
> frontbuffer rendering now, it is ok to put the client buffer on scanout
> even if you want to do tearing updates on hardware that cannot
> tear-flip.
>
> However, knowing that a client buffer is good for scanout is not
> sufficient for scanout in a compositor, s

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-14 Thread Pekka Paalanen
On Wed, 12 May 2021 07:57:26 -0700
Rob Clark  wrote:

> On Wed, May 12, 2021 at 1:23 AM Pekka Paalanen  wrote:
> >
> > On Tue, 11 May 2021 18:44:17 +0200
> > Daniel Vetter  wrote:
> >  
> > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:  
> > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote: 
> > > >  
> > > > >
> > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  
> > > > > wrote:  
> > > > > >
> > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  
> > > > > > wrote:  
> > > > > > >
> > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:  
> > > > > > > > From: Rob Clark 
> > > > > > > >
> > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > > > "video
> > > > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > > > optional helper vfunc to determine if a plane is attached to a 
> > > > > > > > CRTC
> > > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark   

...

> > > > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > > > > per-crtc basis and hoist the loop and check if dirtyfb is needed out
> > > > > > of drm_atomic_helper_dirtyfb()  
> > > > >
> > > > > That's still using information that userspace doesn't have, which is a
> > > > > bit irky. We might as well go with your thing here then.  
> > > >
> > > > arguably, this is something we should expose to userspace.. for DSI
> > > > command-mode panels, you probably want to make a different decision
> > > > with regard to how many buffers in your flip-chain..
> > > >
> > > > Possibly we should add/remove the fb_damage_clips property depending
> > > > on the display type (ie. video/pull vs cmd/push mode)?  
> > >
> > > I'm not sure whether atomic actually needs this exposed:
> > > - clients will do full flips for every frame anyway, I've not heard of
> > >   anyone seriously doing frontbuffer rendering.  
> >
> > That may or may not be changing, depending on whether the DRM drivers
> > will actually support tearing flips. There has been a huge amount of
> > debate for needing tearing for Wayland [1], and while I haven't really
> > joined that discussion, using front-buffer rendering (blits) to work
> > around the driver inability to flip-tear might be something some people
> > will want.  
> 
> jfwiw, there is a lot of hw that just can't do tearing pageflips.. I
> think this probably includes most arm hw.  What is done instead is to
> skip the pageflip and render directly to the front-buffer.
> 
> EGL_KHR_mutable_render_buffer is a thing you might be interested in..
> it is wired up for android on i965 and there is a WIP MR[1] for
> mesa/st (gallium):
> 
> Possibly it could be useful to add support for platform_wayland?
> 
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10685

Thanks Rob, that's interesting.

I would like to say that EGL Wayland platform cannot and has no reason
to support frontbuffer rendering in Wayland clients, because the
compositor may be reading the current client frontbuffer at any time,
including *not reading it again* until another update is posted via
Wayland. So if a Wayland client is doing frontbuffer rendering, then I
would expect it to be very likely that the window might almost never
show a "good" picture, meaning that it is literally just the
half-rendered frame after the current one with continuously updating
clients.

That is because a Wayland client doing frontbuffer rendering is
completely unrelated to the Wayland compositor putting the client
buffer on scanout.

Frontbuffer rendering used by a Wayland compositor would be used for
fallback tearing updates, where the rendering is roughly just a blit
from a client buffer. By definition, it means blit instead of scanout
from client buffers, so the performance/power hit is going to be...
let's say observable.

If a Wayland client did frontbuffer rendering, and then it used a
shadow buffer of its own to minimise the level of garbage on screen by
doing only blits into the frontbuffer, that would again be a blit. And
then the compositor might be doing another blit because it doesn't know
the client is doing frontbuffer rendering which would theoretically
allow the compositor to scan out the client buffer.

There emerges the need for a Wayland extension for clients to be
telling the compositor explicitly that they are going to be doing
frontbuffer rendering now, it is ok to put the client buffer on scanout
even if you want to do tearing updates on hardware that cannot
tear-flip.

However, knowing that a client buffer is good for scanout is not
sufficient for scanout in a compositor, so it might still not be
scanned out. If the compositor is instead render-compositing, you have
the first problem of "likely never a good picture".

I'm sure there can be specialised use cases (e.g. a game console
Wayland compositor) where scanout can be guaranteed, but generally

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-12 Thread Rob Clark
On Wed, May 12, 2021 at 1:23 AM Pekka Paalanen  wrote:
>
> On Tue, 11 May 2021 18:44:17 +0200
> Daniel Vetter  wrote:
>
> > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
> > > >
> > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> > > > >
> > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
> > > > > >
> > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > From: Rob Clark 
> > > > > > >
> > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > > "video
> > > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > > optional helper vfunc to determine if a plane is attached to a 
> > > > > > > CRTC
> > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > >
> > > > > > > Signed-off-by: Rob Clark 
> > > > > >
> > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > legacy uapi
> > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > anything
> > > > > > beyond what userspace also has available. So adding hacks for them 
> > > > > > feels
> > > > > > really bad.
> > > > >
> > > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > > (or similar) is actually required or is a no-op.
> > > > >
> > > > > But it is perhaps less of a problem because this essentially boils
> > > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > > non-vsync'd rendering just pageflips and throws away extra frames from
> > > > > the app?
> > > >
> > > > Yeah it's about not adequately batching up rendering and syncing with
> > > > hw. bare metal x11 is just especially stupid about it :-)
> > > >
> > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > either.
> > > > > > We've had this problem already on the fbcon emulation side (which 
> > > > > > also
> > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and 
> > > > > > the fix
> > > > > > there was to have a worker which batches up all the updates and 
> > > > > > avoids any
> > > > > > stalls in bad places.
> > > > >
> > > > > I'm not too worried about fbcon not being able to render faster than
> > > > > vblank.  OTOH it is a pretty big problem for x11
> > > >
> > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > > (if sometimes flushes each character, so you have to pile them up into
> > > > a single update if that's still pending).
> > > >
> > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > probably get
> > > > > > away with assuming there's only a single fb, so the implementation 
> > > > > > becomes
> > > > > > pretty simple:
> > > > > >
> > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > - if there's already a dirty fb pending on a different fb, we stall 
> > > > > > for
> > > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > > track is
> > > > > >   reset to NULL)
> > > > > > - if it's pending on the same fb we just toss away all the updates 
> > > > > > and go
> > > > > >   with a full update, since merging the clip rects is too much work 
> > > > > > :-) I
> > > > > >   think there's helpers so you could be slightly more clever and 
> > > > > > just have
> > > > > >   an overall bounding box
> > > > >
> > > > > This doesn't really fix the problem, you still end up delaying sending
> > > > > the next back-buffer to mesa
> > > >
> > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > tracking corruption is possible, but that's not the kernel's problem.
> > > > So how would anything get held up in userspace.
> > >
> > > the part about stalling if a dirtyfb is pending was what I was worried
> > > about.. but I suppose you meant the worker stalling, rather than
> > > userspace stalling (where I had interpreted it the other way around).
> > > As soon as userspace needs to stall, you're losing again.
> >
> > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> > of dirtyfb request in the kernel.
> >
> > But also I never expect userspace that uses dirtyfb to actually hit this
> > stall point (otherwise we'd need to look at this again). It would really
> > be only there as defense against abuse.
> >
> > > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > > > per-crtc basis and hoist the loop and check if dirtyfb is needed out
> > > > > of drm_atomic_helper_dirtyfb()
> > > >
> > > > That's still using information that userspace doesn't have, which is a
> > > > bit irky. We might as well go with your thing here then.
> > >
> > > arguably, this is something we should expose to userspace.. for DSI
> > > command-mode panels, you probably want to make a dif

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-12 Thread Daniel Vetter
On Wed, May 12, 2021 at 11:46 AM Pekka Paalanen  wrote:
>
> On Wed, 12 May 2021 10:44:29 +0200
> Daniel Vetter  wrote:
>
> > On Wed, May 12, 2021 at 11:23:30AM +0300, Pekka Paalanen wrote:
> > > On Tue, 11 May 2021 18:44:17 +0200
> > > Daniel Vetter  wrote:
> > >
> > > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > > > From: Rob Clark 
> > > > > > > > >
> > > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank 
> > > > > > > > > on "video
> > > > > > > > > mode" type displays, which is pointless and unnecessary.  Add 
> > > > > > > > > an
> > > > > > > > > optional helper vfunc to determine if a plane is attached to 
> > > > > > > > > a CRTC
> > > > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Rob Clark 
> > > > > > > >
> > > > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > > > legacy uapi
> > > > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > > > anything
> > > > > > > > beyond what userspace also has available. So adding hacks for 
> > > > > > > > them feels
> > > > > > > > really bad.
> > > > > > >
> > > > > > > I suppose the root problem is that userspace doesn't know if 
> > > > > > > dirtyfb
> > > > > > > (or similar) is actually required or is a no-op.
> > > > > > >
> > > > > > > But it is perhaps less of a problem because this essentially boils
> > > > > > > down to "x11 vs wayland", and it seems like wayland compositors 
> > > > > > > for
> > > > > > > non-vsync'd rendering just pageflips and throws away extra frames 
> > > > > > > from
> > > > > > > the app?
> > > > > >
> > > > > > Yeah it's about not adequately batching up rendering and syncing 
> > > > > > with
> > > > > > hw. bare metal x11 is just especially stupid about it :-)
> > > > > >
> > > > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > > > either.
> > > > > > > > We've had this problem already on the fbcon emulation side 
> > > > > > > > (which also
> > > > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), 
> > > > > > > > and the fix
> > > > > > > > there was to have a worker which batches up all the updates and 
> > > > > > > > avoids any
> > > > > > > > stalls in bad places.
> > > > > > >
> > > > > > > I'm not too worried about fbcon not being able to render faster 
> > > > > > > than
> > > > > > > vblank.  OTOH it is a pretty big problem for x11
> > > > > >
> > > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > > > the same with fbcon, which trivially can get ahead of vblank 
> > > > > > otherwise
> > > > > > (if sometimes flushes each character, so you have to pile them up 
> > > > > > into
> > > > > > a single update if that's still pending).
> > > > > >
> > > > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > > > probably get
> > > > > > > > away with assuming there's only a single fb, so the 
> > > > > > > > implementation becomes
> > > > > > > > pretty simple:
> > > > > > > >
> > > > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > > > - if there's already a dirty fb pending on a different fb, we 
> > > > > > > > stall for
> > > > > > > >   the worker to start processing that one already (i.e. the fb 
> > > > > > > > we track is
> > > > > > > >   reset to NULL)
> > > > > > > > - if it's pending on the same fb we just toss away all the 
> > > > > > > > updates and go
> > > > > > > >   with a full update, since merging the clip rects is too much 
> > > > > > > > work :-) I
> > > > > > > >   think there's helpers so you could be slightly more clever 
> > > > > > > > and just have
> > > > > > > >   an overall bounding box
> > > > > > >
> > > > > > > This doesn't really fix the problem, you still end up delaying 
> > > > > > > sending
> > > > > > > the next back-buffer to mesa
> > > > > >
> > > > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > > > tracking corruption is possible, but that's not the kernel's 
> > > > > > problem.
> > > > > > So how would anything get held up in userspace.
> > > > >
> > > > > the part about stalling if a dirtyfb is pending was what I was worried
> > > > > about.. but I suppose you meant the worker stalling, rather than
> > > > > userspace stalling (where I had interpreted it the other way around).
> > > > > As soon as userspace needs to stall, you're losing again.
> > > >
> > > > Nah, I did mean userspace stalling, so we can't pile up unlimited 
> > > > amounts
> > > > of dirtyfb request in the ke

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-12 Thread Pekka Paalanen
On Wed, 12 May 2021 10:44:29 +0200
Daniel Vetter  wrote:

> On Wed, May 12, 2021 at 11:23:30AM +0300, Pekka Paalanen wrote:
> > On Tue, 11 May 2021 18:44:17 +0200
> > Daniel Vetter  wrote:
> >   
> > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:  
> > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote: 
> > > >
> > > > >
> > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > > From: Rob Clark 
> > > > > > > >
> > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > > > "video
> > > > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > > > optional helper vfunc to determine if a plane is attached to a 
> > > > > > > > CRTC
> > > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark 
> > > > > > >
> > > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > > legacy uapi
> > > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > > anything
> > > > > > > beyond what userspace also has available. So adding hacks for 
> > > > > > > them feels
> > > > > > > really bad.
> > > > > >
> > > > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > > > (or similar) is actually required or is a no-op.
> > > > > >
> > > > > > But it is perhaps less of a problem because this essentially boils
> > > > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > > > non-vsync'd rendering just pageflips and throws away extra frames 
> > > > > > from
> > > > > > the app?
> > > > >
> > > > > Yeah it's about not adequately batching up rendering and syncing with
> > > > > hw. bare metal x11 is just especially stupid about it :-)
> > > > >
> > > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > > either.
> > > > > > > We've had this problem already on the fbcon emulation side (which 
> > > > > > > also
> > > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), 
> > > > > > > and the fix
> > > > > > > there was to have a worker which batches up all the updates and 
> > > > > > > avoids any
> > > > > > > stalls in bad places.
> > > > > >
> > > > > > I'm not too worried about fbcon not being able to render faster than
> > > > > > vblank.  OTOH it is a pretty big problem for x11
> > > > >
> > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > > > (if sometimes flushes each character, so you have to pile them up into
> > > > > a single update if that's still pending).
> > > > >
> > > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > > probably get
> > > > > > > away with assuming there's only a single fb, so the 
> > > > > > > implementation becomes
> > > > > > > pretty simple:
> > > > > > >
> > > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > > - if there's already a dirty fb pending on a different fb, we 
> > > > > > > stall for
> > > > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > > > track is
> > > > > > >   reset to NULL)
> > > > > > > - if it's pending on the same fb we just toss away all the 
> > > > > > > updates and go
> > > > > > >   with a full update, since merging the clip rects is too much 
> > > > > > > work :-) I
> > > > > > >   think there's helpers so you could be slightly more clever and 
> > > > > > > just have
> > > > > > >   an overall bounding box
> > > > > >
> > > > > > This doesn't really fix the problem, you still end up delaying 
> > > > > > sending
> > > > > > the next back-buffer to mesa
> > > > >
> > > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > > tracking corruption is possible, but that's not the kernel's problem.
> > > > > So how would anything get held up in userspace.
> > > > 
> > > > the part about stalling if a dirtyfb is pending was what I was worried
> > > > about.. but I suppose you meant the worker stalling, rather than
> > > > userspace stalling (where I had interpreted it the other way around).
> > > > As soon as userspace needs to stall, you're losing again.
> > > 
> > > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> > > of dirtyfb request in the kernel.
> > > 
> > > But also I never expect userspace that uses dirtyfb to actually hit this
> > > stall point (otherwise we'd need to look at this again). It would really
> > > be only there as defense against abuse.
> > >   
> > > > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > > > > per

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-12 Thread Daniel Vetter
On Wed, May 12, 2021 at 11:23:30AM +0300, Pekka Paalanen wrote:
> On Tue, 11 May 2021 18:44:17 +0200
> Daniel Vetter  wrote:
> 
> > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:  
> > > >
> > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:  
> > > > >
> > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  
> > > > > wrote:  
> > > > > >
> > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:  
> > > > > > > From: Rob Clark 
> > > > > > >
> > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > > "video
> > > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > > optional helper vfunc to determine if a plane is attached to a 
> > > > > > > CRTC
> > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > >
> > > > > > > Signed-off-by: Rob Clark   
> > > > > >
> > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > legacy uapi
> > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > anything
> > > > > > beyond what userspace also has available. So adding hacks for them 
> > > > > > feels
> > > > > > really bad.  
> > > > >
> > > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > > (or similar) is actually required or is a no-op.
> > > > >
> > > > > But it is perhaps less of a problem because this essentially boils
> > > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > > non-vsync'd rendering just pageflips and throws away extra frames from
> > > > > the app?  
> > > >
> > > > Yeah it's about not adequately batching up rendering and syncing with
> > > > hw. bare metal x11 is just especially stupid about it :-)
> > > >  
> > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > either.
> > > > > > We've had this problem already on the fbcon emulation side (which 
> > > > > > also
> > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and 
> > > > > > the fix
> > > > > > there was to have a worker which batches up all the updates and 
> > > > > > avoids any
> > > > > > stalls in bad places.  
> > > > >
> > > > > I'm not too worried about fbcon not being able to render faster than
> > > > > vblank.  OTOH it is a pretty big problem for x11  
> > > >
> > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > > (if sometimes flushes each character, so you have to pile them up into
> > > > a single update if that's still pending).
> > > >  
> > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > probably get
> > > > > > away with assuming there's only a single fb, so the implementation 
> > > > > > becomes
> > > > > > pretty simple:
> > > > > >
> > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > - if there's already a dirty fb pending on a different fb, we stall 
> > > > > > for
> > > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > > track is
> > > > > >   reset to NULL)
> > > > > > - if it's pending on the same fb we just toss away all the updates 
> > > > > > and go
> > > > > >   with a full update, since merging the clip rects is too much work 
> > > > > > :-) I
> > > > > >   think there's helpers so you could be slightly more clever and 
> > > > > > just have
> > > > > >   an overall bounding box  
> > > > >
> > > > > This doesn't really fix the problem, you still end up delaying sending
> > > > > the next back-buffer to mesa  
> > > >
> > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > tracking corruption is possible, but that's not the kernel's problem.
> > > > So how would anything get held up in userspace.  
> > > 
> > > the part about stalling if a dirtyfb is pending was what I was worried
> > > about.. but I suppose you meant the worker stalling, rather than
> > > userspace stalling (where I had interpreted it the other way around).
> > > As soon as userspace needs to stall, you're losing again.  
> > 
> > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> > of dirtyfb request in the kernel.
> > 
> > But also I never expect userspace that uses dirtyfb to actually hit this
> > stall point (otherwise we'd need to look at this again). It would really
> > be only there as defense against abuse.
> > 
> > > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > > > per-crtc basis and hoist the loop and check if dirtyfb is needed out
> > > > > of drm_atomic_helper_dirtyfb()  
> > > >
> > > > That's still using information that userspace doesn't have, which is a
> > > > bit irky. We might as well go with your thing here then.  
> > > 
> > > arguably, this is something we should expose to userspace.. for DSI

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-12 Thread Pekka Paalanen
On Tue, 11 May 2021 18:44:17 +0200
Daniel Vetter  wrote:

> On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:  
> > >
> > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:  
> > > >
> > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:  
> > > > >
> > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:  
> > > > > > From: Rob Clark 
> > > > > >
> > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > "video
> > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > optional helper vfunc to determine if a plane is attached to a CRTC
> > > > > > that actually needs dirtyfb, and skip over them.
> > > > > >
> > > > > > Signed-off-by: Rob Clark   
> > > > >
> > > > > So this is a bit annoying because the idea of all these "remap legacy 
> > > > > uapi
> > > > > to atomic constructs" helpers is that they shouldn't need/use anything
> > > > > beyond what userspace also has available. So adding hacks for them 
> > > > > feels
> > > > > really bad.  
> > > >
> > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > (or similar) is actually required or is a no-op.
> > > >
> > > > But it is perhaps less of a problem because this essentially boils
> > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > non-vsync'd rendering just pageflips and throws away extra frames from
> > > > the app?  
> > >
> > > Yeah it's about not adequately batching up rendering and syncing with
> > > hw. bare metal x11 is just especially stupid about it :-)
> > >  
> > > > > Also I feel like it's not entirely the right thing to do here either.
> > > > > We've had this problem already on the fbcon emulation side (which also
> > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and 
> > > > > the fix
> > > > > there was to have a worker which batches up all the updates and 
> > > > > avoids any
> > > > > stalls in bad places.  
> > > >
> > > > I'm not too worried about fbcon not being able to render faster than
> > > > vblank.  OTOH it is a pretty big problem for x11  
> > >
> > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > (if sometimes flushes each character, so you have to pile them up into
> > > a single update if that's still pending).
> > >  
> > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > probably get
> > > > > away with assuming there's only a single fb, so the implementation 
> > > > > becomes
> > > > > pretty simple:
> > > > >
> > > > > - 1 worker, and we keep track of a single pending fb
> > > > > - if there's already a dirty fb pending on a different fb, we stall 
> > > > > for
> > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > track is
> > > > >   reset to NULL)
> > > > > - if it's pending on the same fb we just toss away all the updates 
> > > > > and go
> > > > >   with a full update, since merging the clip rects is too much work 
> > > > > :-) I
> > > > >   think there's helpers so you could be slightly more clever and just 
> > > > > have
> > > > >   an overall bounding box  
> > > >
> > > > This doesn't really fix the problem, you still end up delaying sending
> > > > the next back-buffer to mesa  
> > >
> > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > tracking corruption is possible, but that's not the kernel's problem.
> > > So how would anything get held up in userspace.  
> > 
> > the part about stalling if a dirtyfb is pending was what I was worried
> > about.. but I suppose you meant the worker stalling, rather than
> > userspace stalling (where I had interpreted it the other way around).
> > As soon as userspace needs to stall, you're losing again.  
> 
> Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> of dirtyfb request in the kernel.
> 
> But also I never expect userspace that uses dirtyfb to actually hit this
> stall point (otherwise we'd need to look at this again). It would really
> be only there as defense against abuse.
> 
> > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > > per-crtc basis and hoist the loop and check if dirtyfb is needed out
> > > > of drm_atomic_helper_dirtyfb()  
> > >
> > > That's still using information that userspace doesn't have, which is a
> > > bit irky. We might as well go with your thing here then.  
> > 
> > arguably, this is something we should expose to userspace.. for DSI
> > command-mode panels, you probably want to make a different decision
> > with regard to how many buffers in your flip-chain..
> > 
> > Possibly we should add/remove the fb_damage_clips property depending
> > on the display type (ie. video/pull vs cmd/push mode)?  
> 
> I'm not sure whether atomic actually needs this exposed:
> -

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 10:42:58AM -0700, Rob Clark wrote:
> On Tue, May 11, 2021 at 10:21 AM Daniel Vetter  wrote:
> >
> > On Tue, May 11, 2021 at 10:19:57AM -0700, Rob Clark wrote:
> > > On Tue, May 11, 2021 at 9:44 AM Daniel Vetter  wrote:
> > > >
> > > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > > > From: Rob Clark 
> > > > > > > > >
> > > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank 
> > > > > > > > > on "video
> > > > > > > > > mode" type displays, which is pointless and unnecessary.  Add 
> > > > > > > > > an
> > > > > > > > > optional helper vfunc to determine if a plane is attached to 
> > > > > > > > > a CRTC
> > > > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Rob Clark 
> > > > > > > >
> > > > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > > > legacy uapi
> > > > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > > > anything
> > > > > > > > beyond what userspace also has available. So adding hacks for 
> > > > > > > > them feels
> > > > > > > > really bad.
> > > > > > >
> > > > > > > I suppose the root problem is that userspace doesn't know if 
> > > > > > > dirtyfb
> > > > > > > (or similar) is actually required or is a no-op.
> > > > > > >
> > > > > > > But it is perhaps less of a problem because this essentially boils
> > > > > > > down to "x11 vs wayland", and it seems like wayland compositors 
> > > > > > > for
> > > > > > > non-vsync'd rendering just pageflips and throws away extra frames 
> > > > > > > from
> > > > > > > the app?
> > > > > >
> > > > > > Yeah it's about not adequately batching up rendering and syncing 
> > > > > > with
> > > > > > hw. bare metal x11 is just especially stupid about it :-)
> > > > > >
> > > > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > > > either.
> > > > > > > > We've had this problem already on the fbcon emulation side 
> > > > > > > > (which also
> > > > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), 
> > > > > > > > and the fix
> > > > > > > > there was to have a worker which batches up all the updates and 
> > > > > > > > avoids any
> > > > > > > > stalls in bad places.
> > > > > > >
> > > > > > > I'm not too worried about fbcon not being able to render faster 
> > > > > > > than
> > > > > > > vblank.  OTOH it is a pretty big problem for x11
> > > > > >
> > > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > > > the same with fbcon, which trivially can get ahead of vblank 
> > > > > > otherwise
> > > > > > (if sometimes flushes each character, so you have to pile them up 
> > > > > > into
> > > > > > a single update if that's still pending).
> > > > > >
> > > > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > > > probably get
> > > > > > > > away with assuming there's only a single fb, so the 
> > > > > > > > implementation becomes
> > > > > > > > pretty simple:
> > > > > > > >
> > > > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > > > - if there's already a dirty fb pending on a different fb, we 
> > > > > > > > stall for
> > > > > > > >   the worker to start processing that one already (i.e. the fb 
> > > > > > > > we track is
> > > > > > > >   reset to NULL)
> > > > > > > > - if it's pending on the same fb we just toss away all the 
> > > > > > > > updates and go
> > > > > > > >   with a full update, since merging the clip rects is too much 
> > > > > > > > work :-) I
> > > > > > > >   think there's helpers so you could be slightly more clever 
> > > > > > > > and just have
> > > > > > > >   an overall bounding box
> > > > > > >
> > > > > > > This doesn't really fix the problem, you still end up delaying 
> > > > > > > sending
> > > > > > > the next back-buffer to mesa
> > > > > >
> > > > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > > > tracking corruption is possible, but that's not the kernel's 
> > > > > > problem.
> > > > > > So how would anything get held up in userspace.
> > > > >
> > > > > the part about stalling if a dirtyfb is pending was what I was worried
> > > > > about.. but I suppose you meant the worker stalling, rather than
> > > > > userspace stalling (where I had interpreted it the other way around).
> > > > > As soon as userspace needs to stall, you're losing again.
> > > >
> > > > Nah, I did mean userspace stalling, so we can't pile up unlimited 
> > > > amounts
> > > > of dirtyfb request in the kernel.
> > > >

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Rob Clark
On Tue, May 11, 2021 at 10:21 AM Daniel Vetter  wrote:
>
> On Tue, May 11, 2021 at 10:19:57AM -0700, Rob Clark wrote:
> > On Tue, May 11, 2021 at 9:44 AM Daniel Vetter  wrote:
> > >
> > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
> > > > >
> > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> > > > > >
> > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > > From: Rob Clark 
> > > > > > > >
> > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > > > "video
> > > > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > > > optional helper vfunc to determine if a plane is attached to a 
> > > > > > > > CRTC
> > > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark 
> > > > > > >
> > > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > > legacy uapi
> > > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > > anything
> > > > > > > beyond what userspace also has available. So adding hacks for 
> > > > > > > them feels
> > > > > > > really bad.
> > > > > >
> > > > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > > > (or similar) is actually required or is a no-op.
> > > > > >
> > > > > > But it is perhaps less of a problem because this essentially boils
> > > > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > > > non-vsync'd rendering just pageflips and throws away extra frames 
> > > > > > from
> > > > > > the app?
> > > > >
> > > > > Yeah it's about not adequately batching up rendering and syncing with
> > > > > hw. bare metal x11 is just especially stupid about it :-)
> > > > >
> > > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > > either.
> > > > > > > We've had this problem already on the fbcon emulation side (which 
> > > > > > > also
> > > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), 
> > > > > > > and the fix
> > > > > > > there was to have a worker which batches up all the updates and 
> > > > > > > avoids any
> > > > > > > stalls in bad places.
> > > > > >
> > > > > > I'm not too worried about fbcon not being able to render faster than
> > > > > > vblank.  OTOH it is a pretty big problem for x11
> > > > >
> > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > > > (if sometimes flushes each character, so you have to pile them up into
> > > > > a single update if that's still pending).
> > > > >
> > > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > > probably get
> > > > > > > away with assuming there's only a single fb, so the 
> > > > > > > implementation becomes
> > > > > > > pretty simple:
> > > > > > >
> > > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > > - if there's already a dirty fb pending on a different fb, we 
> > > > > > > stall for
> > > > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > > > track is
> > > > > > >   reset to NULL)
> > > > > > > - if it's pending on the same fb we just toss away all the 
> > > > > > > updates and go
> > > > > > >   with a full update, since merging the clip rects is too much 
> > > > > > > work :-) I
> > > > > > >   think there's helpers so you could be slightly more clever and 
> > > > > > > just have
> > > > > > >   an overall bounding box
> > > > > >
> > > > > > This doesn't really fix the problem, you still end up delaying 
> > > > > > sending
> > > > > > the next back-buffer to mesa
> > > > >
> > > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > > tracking corruption is possible, but that's not the kernel's problem.
> > > > > So how would anything get held up in userspace.
> > > >
> > > > the part about stalling if a dirtyfb is pending was what I was worried
> > > > about.. but I suppose you meant the worker stalling, rather than
> > > > userspace stalling (where I had interpreted it the other way around).
> > > > As soon as userspace needs to stall, you're losing again.
> > >
> > > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> > > of dirtyfb request in the kernel.
> > >
> > > But also I never expect userspace that uses dirtyfb to actually hit this
> > > stall point (otherwise we'd need to look at this again). It would really
> > > be only there as defense against abuse.
> >
> > I don't believe modesetting ddx throttles dirtyfb, it (indirectly)
> > calls this from it's BlockHandler.. so if you do end up blocking after
> > the N'th dirtyfb, you are still going to end up

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 10:19:57AM -0700, Rob Clark wrote:
> On Tue, May 11, 2021 at 9:44 AM Daniel Vetter  wrote:
> >
> > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
> > > >
> > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> > > > >
> > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
> > > > > >
> > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > From: Rob Clark 
> > > > > > >
> > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > > "video
> > > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > > optional helper vfunc to determine if a plane is attached to a 
> > > > > > > CRTC
> > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > >
> > > > > > > Signed-off-by: Rob Clark 
> > > > > >
> > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > legacy uapi
> > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > anything
> > > > > > beyond what userspace also has available. So adding hacks for them 
> > > > > > feels
> > > > > > really bad.
> > > > >
> > > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > > (or similar) is actually required or is a no-op.
> > > > >
> > > > > But it is perhaps less of a problem because this essentially boils
> > > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > > non-vsync'd rendering just pageflips and throws away extra frames from
> > > > > the app?
> > > >
> > > > Yeah it's about not adequately batching up rendering and syncing with
> > > > hw. bare metal x11 is just especially stupid about it :-)
> > > >
> > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > either.
> > > > > > We've had this problem already on the fbcon emulation side (which 
> > > > > > also
> > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and 
> > > > > > the fix
> > > > > > there was to have a worker which batches up all the updates and 
> > > > > > avoids any
> > > > > > stalls in bad places.
> > > > >
> > > > > I'm not too worried about fbcon not being able to render faster than
> > > > > vblank.  OTOH it is a pretty big problem for x11
> > > >
> > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > > (if sometimes flushes each character, so you have to pile them up into
> > > > a single update if that's still pending).
> > > >
> > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > probably get
> > > > > > away with assuming there's only a single fb, so the implementation 
> > > > > > becomes
> > > > > > pretty simple:
> > > > > >
> > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > - if there's already a dirty fb pending on a different fb, we stall 
> > > > > > for
> > > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > > track is
> > > > > >   reset to NULL)
> > > > > > - if it's pending on the same fb we just toss away all the updates 
> > > > > > and go
> > > > > >   with a full update, since merging the clip rects is too much work 
> > > > > > :-) I
> > > > > >   think there's helpers so you could be slightly more clever and 
> > > > > > just have
> > > > > >   an overall bounding box
> > > > >
> > > > > This doesn't really fix the problem, you still end up delaying sending
> > > > > the next back-buffer to mesa
> > > >
> > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > tracking corruption is possible, but that's not the kernel's problem.
> > > > So how would anything get held up in userspace.
> > >
> > > the part about stalling if a dirtyfb is pending was what I was worried
> > > about.. but I suppose you meant the worker stalling, rather than
> > > userspace stalling (where I had interpreted it the other way around).
> > > As soon as userspace needs to stall, you're losing again.
> >
> > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> > of dirtyfb request in the kernel.
> >
> > But also I never expect userspace that uses dirtyfb to actually hit this
> > stall point (otherwise we'd need to look at this again). It would really
> > be only there as defense against abuse.
> 
> I don't believe modesetting ddx throttles dirtyfb, it (indirectly)
> calls this from it's BlockHandler.. so if you do end up blocking after
> the N'th dirtyfb, you are still going to end up stalling for vblank,
> you are just deferring that for a frame or two..

Nope, that's not what I mean.

By default we pile up the updates, so you _never_ stall. The worker then
takes the entire update every time it runs and batches them up.

We _only_ stall when we get a dirtyfb with a different

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Rob Clark
On Tue, May 11, 2021 at 9:44 AM Daniel Vetter  wrote:
>
> On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
> > >
> > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> > > >
> > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
> > > > >
> > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > From: Rob Clark 
> > > > > >
> > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > "video
> > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > optional helper vfunc to determine if a plane is attached to a CRTC
> > > > > > that actually needs dirtyfb, and skip over them.
> > > > > >
> > > > > > Signed-off-by: Rob Clark 
> > > > >
> > > > > So this is a bit annoying because the idea of all these "remap legacy 
> > > > > uapi
> > > > > to atomic constructs" helpers is that they shouldn't need/use anything
> > > > > beyond what userspace also has available. So adding hacks for them 
> > > > > feels
> > > > > really bad.
> > > >
> > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > (or similar) is actually required or is a no-op.
> > > >
> > > > But it is perhaps less of a problem because this essentially boils
> > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > non-vsync'd rendering just pageflips and throws away extra frames from
> > > > the app?
> > >
> > > Yeah it's about not adequately batching up rendering and syncing with
> > > hw. bare metal x11 is just especially stupid about it :-)
> > >
> > > > > Also I feel like it's not entirely the right thing to do here either.
> > > > > We've had this problem already on the fbcon emulation side (which also
> > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and 
> > > > > the fix
> > > > > there was to have a worker which batches up all the updates and 
> > > > > avoids any
> > > > > stalls in bad places.
> > > >
> > > > I'm not too worried about fbcon not being able to render faster than
> > > > vblank.  OTOH it is a pretty big problem for x11
> > >
> > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > (if sometimes flushes each character, so you have to pile them up into
> > > a single update if that's still pending).
> > >
> > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > probably get
> > > > > away with assuming there's only a single fb, so the implementation 
> > > > > becomes
> > > > > pretty simple:
> > > > >
> > > > > - 1 worker, and we keep track of a single pending fb
> > > > > - if there's already a dirty fb pending on a different fb, we stall 
> > > > > for
> > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > track is
> > > > >   reset to NULL)
> > > > > - if it's pending on the same fb we just toss away all the updates 
> > > > > and go
> > > > >   with a full update, since merging the clip rects is too much work 
> > > > > :-) I
> > > > >   think there's helpers so you could be slightly more clever and just 
> > > > > have
> > > > >   an overall bounding box
> > > >
> > > > This doesn't really fix the problem, you still end up delaying sending
> > > > the next back-buffer to mesa
> > >
> > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > tracking corruption is possible, but that's not the kernel's problem.
> > > So how would anything get held up in userspace.
> >
> > the part about stalling if a dirtyfb is pending was what I was worried
> > about.. but I suppose you meant the worker stalling, rather than
> > userspace stalling (where I had interpreted it the other way around).
> > As soon as userspace needs to stall, you're losing again.
>
> Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> of dirtyfb request in the kernel.
>
> But also I never expect userspace that uses dirtyfb to actually hit this
> stall point (otherwise we'd need to look at this again). It would really
> be only there as defense against abuse.

I don't believe modesetting ddx throttles dirtyfb, it (indirectly)
calls this from it's BlockHandler.. so if you do end up blocking after
the N'th dirtyfb, you are still going to end up stalling for vblank,
you are just deferring that for a frame or two..

The thing is, for a push style panel, you don't necessarily have to
wait for "vblank" (because "vblank" isn't necessarily a real thing),
so in that scenario dirtyfb could in theory be fast.  What you want to
do is fundamentally different for push vs pull style displays.

> > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > > per-crtc basis and hoist the loop and check if dirtyfb is needed out
> > > > of drm_atomic_helper_dirtyfb()
> > >
> > > That's still using information that userspace doesn't ha

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Daniel Vetter
On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
> >
> > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> > >
> > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
> > > >
> > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > From: Rob Clark 
> > > > >
> > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on "video
> > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > optional helper vfunc to determine if a plane is attached to a CRTC
> > > > > that actually needs dirtyfb, and skip over them.
> > > > >
> > > > > Signed-off-by: Rob Clark 
> > > >
> > > > So this is a bit annoying because the idea of all these "remap legacy 
> > > > uapi
> > > > to atomic constructs" helpers is that they shouldn't need/use anything
> > > > beyond what userspace also has available. So adding hacks for them feels
> > > > really bad.
> > >
> > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > (or similar) is actually required or is a no-op.
> > >
> > > But it is perhaps less of a problem because this essentially boils
> > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > non-vsync'd rendering just pageflips and throws away extra frames from
> > > the app?
> >
> > Yeah it's about not adequately batching up rendering and syncing with
> > hw. bare metal x11 is just especially stupid about it :-)
> >
> > > > Also I feel like it's not entirely the right thing to do here either.
> > > > We've had this problem already on the fbcon emulation side (which also
> > > > shouldn't be able to peek behind the atomic kms uapi curtain), and the 
> > > > fix
> > > > there was to have a worker which batches up all the updates and avoids 
> > > > any
> > > > stalls in bad places.
> > >
> > > I'm not too worried about fbcon not being able to render faster than
> > > vblank.  OTOH it is a pretty big problem for x11
> >
> > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > the same with fbcon, which trivially can get ahead of vblank otherwise
> > (if sometimes flushes each character, so you have to pile them up into
> > a single update if that's still pending).
> >
> > > > Since this is for frontbuffer rendering userspace only we can probably 
> > > > get
> > > > away with assuming there's only a single fb, so the implementation 
> > > > becomes
> > > > pretty simple:
> > > >
> > > > - 1 worker, and we keep track of a single pending fb
> > > > - if there's already a dirty fb pending on a different fb, we stall for
> > > >   the worker to start processing that one already (i.e. the fb we track 
> > > > is
> > > >   reset to NULL)
> > > > - if it's pending on the same fb we just toss away all the updates and 
> > > > go
> > > >   with a full update, since merging the clip rects is too much work :-) 
> > > > I
> > > >   think there's helpers so you could be slightly more clever and just 
> > > > have
> > > >   an overall bounding box
> > >
> > > This doesn't really fix the problem, you still end up delaying sending
> > > the next back-buffer to mesa
> >
> > With this the dirtyfb would never block. Also glorious frontbuffer
> > tracking corruption is possible, but that's not the kernel's problem.
> > So how would anything get held up in userspace.
> 
> the part about stalling if a dirtyfb is pending was what I was worried
> about.. but I suppose you meant the worker stalling, rather than
> userspace stalling (where I had interpreted it the other way around).
> As soon as userspace needs to stall, you're losing again.

Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
of dirtyfb request in the kernel.

But also I never expect userspace that uses dirtyfb to actually hit this
stall point (otherwise we'd need to look at this again). It would really
be only there as defense against abuse.

> > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > per-crtc basis and hoist the loop and check if dirtyfb is needed out
> > > of drm_atomic_helper_dirtyfb()
> >
> > That's still using information that userspace doesn't have, which is a
> > bit irky. We might as well go with your thing here then.
> 
> arguably, this is something we should expose to userspace.. for DSI
> command-mode panels, you probably want to make a different decision
> with regard to how many buffers in your flip-chain..
> 
> Possibly we should add/remove the fb_damage_clips property depending
> on the display type (ie. video/pull vs cmd/push mode)?

I'm not sure whether atomic actually needs this exposed:
- clients will do full flips for every frame anyway, I've not heard of
  anyone seriously doing frontbuffer rendering.
- transporting the cliprects around and then tossing them if the driver
  doesn't need them in their flip is probably not a measurable win

But yeah if I'm wrong and we have a need here and it's useful, then
exposing th

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-10 Thread Rob Clark
On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
>
> On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> >
> > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
> > >
> > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > From: Rob Clark 
> > > >
> > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on "video
> > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > optional helper vfunc to determine if a plane is attached to a CRTC
> > > > that actually needs dirtyfb, and skip over them.
> > > >
> > > > Signed-off-by: Rob Clark 
> > >
> > > So this is a bit annoying because the idea of all these "remap legacy uapi
> > > to atomic constructs" helpers is that they shouldn't need/use anything
> > > beyond what userspace also has available. So adding hacks for them feels
> > > really bad.
> >
> > I suppose the root problem is that userspace doesn't know if dirtyfb
> > (or similar) is actually required or is a no-op.
> >
> > But it is perhaps less of a problem because this essentially boils
> > down to "x11 vs wayland", and it seems like wayland compositors for
> > non-vsync'd rendering just pageflips and throws away extra frames from
> > the app?
>
> Yeah it's about not adequately batching up rendering and syncing with
> hw. bare metal x11 is just especially stupid about it :-)
>
> > > Also I feel like it's not entirely the right thing to do here either.
> > > We've had this problem already on the fbcon emulation side (which also
> > > shouldn't be able to peek behind the atomic kms uapi curtain), and the fix
> > > there was to have a worker which batches up all the updates and avoids any
> > > stalls in bad places.
> >
> > I'm not too worried about fbcon not being able to render faster than
> > vblank.  OTOH it is a pretty big problem for x11
>
> That's why we'd let the worker get ahead at most one dirtyfb. We do
> the same with fbcon, which trivially can get ahead of vblank otherwise
> (if sometimes flushes each character, so you have to pile them up into
> a single update if that's still pending).
>
> > > Since this is for frontbuffer rendering userspace only we can probably get
> > > away with assuming there's only a single fb, so the implementation becomes
> > > pretty simple:
> > >
> > > - 1 worker, and we keep track of a single pending fb
> > > - if there's already a dirty fb pending on a different fb, we stall for
> > >   the worker to start processing that one already (i.e. the fb we track is
> > >   reset to NULL)
> > > - if it's pending on the same fb we just toss away all the updates and go
> > >   with a full update, since merging the clip rects is too much work :-) I
> > >   think there's helpers so you could be slightly more clever and just have
> > >   an overall bounding box
> >
> > This doesn't really fix the problem, you still end up delaying sending
> > the next back-buffer to mesa
>
> With this the dirtyfb would never block. Also glorious frontbuffer
> tracking corruption is possible, but that's not the kernel's problem.
> So how would anything get held up in userspace.

the part about stalling if a dirtyfb is pending was what I was worried
about.. but I suppose you meant the worker stalling, rather than
userspace stalling (where I had interpreted it the other way around).
As soon as userspace needs to stall, you're losing again.

> > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > per-crtc basis and hoist the loop and check if dirtyfb is needed out
> > of drm_atomic_helper_dirtyfb()
>
> That's still using information that userspace doesn't have, which is a
> bit irky. We might as well go with your thing here then.

arguably, this is something we should expose to userspace.. for DSI
command-mode panels, you probably want to make a different decision
with regard to how many buffers in your flip-chain..

Possibly we should add/remove the fb_damage_clips property depending
on the display type (ie. video/pull vs cmd/push mode)?

BR,
-R

> -Daniel
>
> > BR,
> > -R
> >
> > >
> > > Could probably steal most of the implementation.
> > >
> > > This approach here feels a tad too much in the hacky area ...
> > >
> > > Thoughts?
> > > -Daniel
> > >
> > > > ---
> > > >  drivers/gpu/drm/drm_damage_helper.c  |  8 
> > > >  include/drm/drm_modeset_helper_vtables.h | 14 ++
> > > >  2 files changed, 22 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_damage_helper.c 
> > > > b/drivers/gpu/drm/drm_damage_helper.c
> > > > index 3a4126dc2520..a0bed1a2c2dc 100644
> > > > --- a/drivers/gpu/drm/drm_damage_helper.c
> > > > +++ b/drivers/gpu/drm/drm_damage_helper.c
> > > > @@ -211,6 +211,7 @@ int drm_atomic_helper_dirtyfb(struct 
> > > > drm_framebuffer *fb,
> > > >  retry:
> > > >   drm_for_each_plane(plane, fb->dev) {
> > > >   struct drm_plane_state *plane_state;
> > > > + struct drm_crtc *crtc;
> > > >
> > > >   ret = drm_modeset_lock(&plane->mutex

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-10 Thread Daniel Vetter
On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
>
> On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
> >
> > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on "video
> > > mode" type displays, which is pointless and unnecessary.  Add an
> > > optional helper vfunc to determine if a plane is attached to a CRTC
> > > that actually needs dirtyfb, and skip over them.
> > >
> > > Signed-off-by: Rob Clark 
> >
> > So this is a bit annoying because the idea of all these "remap legacy uapi
> > to atomic constructs" helpers is that they shouldn't need/use anything
> > beyond what userspace also has available. So adding hacks for them feels
> > really bad.
>
> I suppose the root problem is that userspace doesn't know if dirtyfb
> (or similar) is actually required or is a no-op.
>
> But it is perhaps less of a problem because this essentially boils
> down to "x11 vs wayland", and it seems like wayland compositors for
> non-vsync'd rendering just pageflips and throws away extra frames from
> the app?

Yeah it's about not adequately batching up rendering and syncing with
hw. bare metal x11 is just especially stupid about it :-)

> > Also I feel like it's not entirely the right thing to do here either.
> > We've had this problem already on the fbcon emulation side (which also
> > shouldn't be able to peek behind the atomic kms uapi curtain), and the fix
> > there was to have a worker which batches up all the updates and avoids any
> > stalls in bad places.
>
> I'm not too worried about fbcon not being able to render faster than
> vblank.  OTOH it is a pretty big problem for x11

That's why we'd let the worker get ahead at most one dirtyfb. We do
the same with fbcon, which trivially can get ahead of vblank otherwise
(if sometimes flushes each character, so you have to pile them up into
a single update if that's still pending).

> > Since this is for frontbuffer rendering userspace only we can probably get
> > away with assuming there's only a single fb, so the implementation becomes
> > pretty simple:
> >
> > - 1 worker, and we keep track of a single pending fb
> > - if there's already a dirty fb pending on a different fb, we stall for
> >   the worker to start processing that one already (i.e. the fb we track is
> >   reset to NULL)
> > - if it's pending on the same fb we just toss away all the updates and go
> >   with a full update, since merging the clip rects is too much work :-) I
> >   think there's helpers so you could be slightly more clever and just have
> >   an overall bounding box
>
> This doesn't really fix the problem, you still end up delaying sending
> the next back-buffer to mesa

With this the dirtyfb would never block. Also glorious frontbuffer
tracking corruption is possible, but that's not the kernel's problem.
So how would anything get held up in userspace.

> But we could re-work drm_framebuffer_funcs::dirty to operate on a
> per-crtc basis and hoist the loop and check if dirtyfb is needed out
> of drm_atomic_helper_dirtyfb()

That's still using information that userspace doesn't have, which is a
bit irky. We might as well go with your thing here then.
-Daniel

> BR,
> -R
>
> >
> > Could probably steal most of the implementation.
> >
> > This approach here feels a tad too much in the hacky area ...
> >
> > Thoughts?
> > -Daniel
> >
> > > ---
> > >  drivers/gpu/drm/drm_damage_helper.c  |  8 
> > >  include/drm/drm_modeset_helper_vtables.h | 14 ++
> > >  2 files changed, 22 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/drm_damage_helper.c 
> > > b/drivers/gpu/drm/drm_damage_helper.c
> > > index 3a4126dc2520..a0bed1a2c2dc 100644
> > > --- a/drivers/gpu/drm/drm_damage_helper.c
> > > +++ b/drivers/gpu/drm/drm_damage_helper.c
> > > @@ -211,6 +211,7 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer 
> > > *fb,
> > >  retry:
> > >   drm_for_each_plane(plane, fb->dev) {
> > >   struct drm_plane_state *plane_state;
> > > + struct drm_crtc *crtc;
> > >
> > >   ret = drm_modeset_lock(&plane->mutex, state->acquire_ctx);
> > >   if (ret)
> > > @@ -221,6 +222,13 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer 
> > > *fb,
> > >   continue;
> > >   }
> > >
> > > + crtc = plane->state->crtc;
> > > + if (crtc->helper_private->needs_dirtyfb &&
> > > + !crtc->helper_private->needs_dirtyfb(crtc)) 
> > > {
> > > + drm_modeset_unlock(&plane->mutex);
> > > + continue;
> > > + }
> > > +
> > >   plane_state = drm_atomic_get_plane_state(state, plane);
> > >   if (IS_ERR(plane_state)) {
> > >   ret = PTR_ERR(plane_state);
> > > diff --git a/include/drm/drm_modeset_helper_vtables.h 
> > > b/include/drm/drm_modeset_helper_vtables.h
> > > inde

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-10 Thread Rob Clark
On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
>
> On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > drm_atomic_helper_dirtyfb() will end up stalling for vblank on "video
> > mode" type displays, which is pointless and unnecessary.  Add an
> > optional helper vfunc to determine if a plane is attached to a CRTC
> > that actually needs dirtyfb, and skip over them.
> >
> > Signed-off-by: Rob Clark 
>
> So this is a bit annoying because the idea of all these "remap legacy uapi
> to atomic constructs" helpers is that they shouldn't need/use anything
> beyond what userspace also has available. So adding hacks for them feels
> really bad.

I suppose the root problem is that userspace doesn't know if dirtyfb
(or similar) is actually required or is a no-op.

But it is perhaps less of a problem because this essentially boils
down to "x11 vs wayland", and it seems like wayland compositors for
non-vsync'd rendering just pageflips and throws away extra frames from
the app?

> Also I feel like it's not entirely the right thing to do here either.
> We've had this problem already on the fbcon emulation side (which also
> shouldn't be able to peek behind the atomic kms uapi curtain), and the fix
> there was to have a worker which batches up all the updates and avoids any
> stalls in bad places.

I'm not too worried about fbcon not being able to render faster than
vblank.  OTOH it is a pretty big problem for x11

> Since this is for frontbuffer rendering userspace only we can probably get
> away with assuming there's only a single fb, so the implementation becomes
> pretty simple:
>
> - 1 worker, and we keep track of a single pending fb
> - if there's already a dirty fb pending on a different fb, we stall for
>   the worker to start processing that one already (i.e. the fb we track is
>   reset to NULL)
> - if it's pending on the same fb we just toss away all the updates and go
>   with a full update, since merging the clip rects is too much work :-) I
>   think there's helpers so you could be slightly more clever and just have
>   an overall bounding box

This doesn't really fix the problem, you still end up delaying sending
the next back-buffer to mesa

But we could re-work drm_framebuffer_funcs::dirty to operate on a
per-crtc basis and hoist the loop and check if dirtyfb is needed out
of drm_atomic_helper_dirtyfb()

BR,
-R

>
> Could probably steal most of the implementation.
>
> This approach here feels a tad too much in the hacky area ...
>
> Thoughts?
> -Daniel
>
> > ---
> >  drivers/gpu/drm/drm_damage_helper.c  |  8 
> >  include/drm/drm_modeset_helper_vtables.h | 14 ++
> >  2 files changed, 22 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_damage_helper.c 
> > b/drivers/gpu/drm/drm_damage_helper.c
> > index 3a4126dc2520..a0bed1a2c2dc 100644
> > --- a/drivers/gpu/drm/drm_damage_helper.c
> > +++ b/drivers/gpu/drm/drm_damage_helper.c
> > @@ -211,6 +211,7 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer 
> > *fb,
> >  retry:
> >   drm_for_each_plane(plane, fb->dev) {
> >   struct drm_plane_state *plane_state;
> > + struct drm_crtc *crtc;
> >
> >   ret = drm_modeset_lock(&plane->mutex, state->acquire_ctx);
> >   if (ret)
> > @@ -221,6 +222,13 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer 
> > *fb,
> >   continue;
> >   }
> >
> > + crtc = plane->state->crtc;
> > + if (crtc->helper_private->needs_dirtyfb &&
> > + !crtc->helper_private->needs_dirtyfb(crtc)) {
> > + drm_modeset_unlock(&plane->mutex);
> > + continue;
> > + }
> > +
> >   plane_state = drm_atomic_get_plane_state(state, plane);
> >   if (IS_ERR(plane_state)) {
> >   ret = PTR_ERR(plane_state);
> > diff --git a/include/drm/drm_modeset_helper_vtables.h 
> > b/include/drm/drm_modeset_helper_vtables.h
> > index eb706342861d..afa8ec5754e7 100644
> > --- a/include/drm/drm_modeset_helper_vtables.h
> > +++ b/include/drm/drm_modeset_helper_vtables.h
> > @@ -487,6 +487,20 @@ struct drm_crtc_helper_funcs {
> >bool in_vblank_irq, int *vpos, int *hpos,
> >ktime_t *stime, ktime_t *etime,
> >const struct drm_display_mode *mode);
> > +
> > + /**
> > +  * @needs_dirtyfb
> > +  *
> > +  * Optional callback used by damage helpers to determine if 
> > fb_damage_clips
> > +  * update is needed.
> > +  *
> > +  * Returns:
> > +  *
> > +  * True if fb_damage_clips update is needed to handle DIRTYFB, False
> > +  * otherwise.  If this callback is not implemented, then True is
> > +  * assumed.
> > +  */
> > + bool (*needs_dirtyfb)(struct drm_crtc *crtc);
> >  };
> >
> >  /**
> > --
> > 2.30.2
> >
>
> --
> Dan

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-10 Thread Daniel Vetter
On Mon, May 10, 2021 at 06:14:20PM +0200, Daniel Vetter wrote:
> On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > From: Rob Clark 
> > 
> > drm_atomic_helper_dirtyfb() will end up stalling for vblank on "video
> > mode" type displays, which is pointless and unnecessary.  Add an
> > optional helper vfunc to determine if a plane is attached to a CRTC
> > that actually needs dirtyfb, and skip over them.
> > 
> > Signed-off-by: Rob Clark 
> 
> So this is a bit annoying because the idea of all these "remap legacy uapi
> to atomic constructs" helpers is that they shouldn't need/use anything
> beyond what userspace also has available. So adding hacks for them feels
> really bad.
> 
> Also I feel like it's not entirely the right thing to do here either.
> We've had this problem already on the fbcon emulation side (which also
> shouldn't be able to peek behind the atomic kms uapi curtain), and the fix
> there was to have a worker which batches up all the updates and avoids any
> stalls in bad places.
> 
> Since this is for frontbuffer rendering userspace only we can probably get
> away with assuming there's only a single fb, so the implementation becomes
> pretty simple:
> 
> - 1 worker, and we keep track of a single pending fb
> - if there's already a dirty fb pending on a different fb, we stall for
>   the worker to start processing that one already (i.e. the fb we track is
>   reset to NULL)
> - if it's pending on the same fb we just toss away all the updates and go
>   with a full update, since merging the clip rects is too much work :-) I
>   think there's helpers so you could be slightly more clever and just have
>   an overall bounding box
> 
> Could probably steal most of the implementation.

Maybe we should even do all this in the common dirtyfb code, before we
call into the driver hook. Gives more symmetry in how it works between
fbcon and direct rendering userspace.
-Daniel

> 
> This approach here feels a tad too much in the hacky area ...
> 
> Thoughts?
> -Daniel
> 
> > ---
> >  drivers/gpu/drm/drm_damage_helper.c  |  8 
> >  include/drm/drm_modeset_helper_vtables.h | 14 ++
> >  2 files changed, 22 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_damage_helper.c 
> > b/drivers/gpu/drm/drm_damage_helper.c
> > index 3a4126dc2520..a0bed1a2c2dc 100644
> > --- a/drivers/gpu/drm/drm_damage_helper.c
> > +++ b/drivers/gpu/drm/drm_damage_helper.c
> > @@ -211,6 +211,7 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer 
> > *fb,
> >  retry:
> > drm_for_each_plane(plane, fb->dev) {
> > struct drm_plane_state *plane_state;
> > +   struct drm_crtc *crtc;
> >  
> > ret = drm_modeset_lock(&plane->mutex, state->acquire_ctx);
> > if (ret)
> > @@ -221,6 +222,13 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer 
> > *fb,
> > continue;
> > }
> >  
> > +   crtc = plane->state->crtc;
> > +   if (crtc->helper_private->needs_dirtyfb &&
> > +   !crtc->helper_private->needs_dirtyfb(crtc)) {
> > +   drm_modeset_unlock(&plane->mutex);
> > +   continue;
> > +   }
> > +
> > plane_state = drm_atomic_get_plane_state(state, plane);
> > if (IS_ERR(plane_state)) {
> > ret = PTR_ERR(plane_state);
> > diff --git a/include/drm/drm_modeset_helper_vtables.h 
> > b/include/drm/drm_modeset_helper_vtables.h
> > index eb706342861d..afa8ec5754e7 100644
> > --- a/include/drm/drm_modeset_helper_vtables.h
> > +++ b/include/drm/drm_modeset_helper_vtables.h
> > @@ -487,6 +487,20 @@ struct drm_crtc_helper_funcs {
> >  bool in_vblank_irq, int *vpos, int *hpos,
> >  ktime_t *stime, ktime_t *etime,
> >  const struct drm_display_mode *mode);
> > +
> > +   /**
> > +* @needs_dirtyfb
> > +*
> > +* Optional callback used by damage helpers to determine if 
> > fb_damage_clips
> > +* update is needed.
> > +*
> > +* Returns:
> > +*
> > +* True if fb_damage_clips update is needed to handle DIRTYFB, False
> > +* otherwise.  If this callback is not implemented, then True is
> > +* assumed.
> > +*/
> > +   bool (*needs_dirtyfb)(struct drm_crtc *crtc);
> >  };
> >  
> >  /**
> > -- 
> > 2.30.2
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-10 Thread Daniel Vetter
On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> drm_atomic_helper_dirtyfb() will end up stalling for vblank on "video
> mode" type displays, which is pointless and unnecessary.  Add an
> optional helper vfunc to determine if a plane is attached to a CRTC
> that actually needs dirtyfb, and skip over them.
> 
> Signed-off-by: Rob Clark 

So this is a bit annoying because the idea of all these "remap legacy uapi
to atomic constructs" helpers is that they shouldn't need/use anything
beyond what userspace also has available. So adding hacks for them feels
really bad.

Also I feel like it's not entirely the right thing to do here either.
We've had this problem already on the fbcon emulation side (which also
shouldn't be able to peek behind the atomic kms uapi curtain), and the fix
there was to have a worker which batches up all the updates and avoids any
stalls in bad places.

Since this is for frontbuffer rendering userspace only we can probably get
away with assuming there's only a single fb, so the implementation becomes
pretty simple:

- 1 worker, and we keep track of a single pending fb
- if there's already a dirty fb pending on a different fb, we stall for
  the worker to start processing that one already (i.e. the fb we track is
  reset to NULL)
- if it's pending on the same fb we just toss away all the updates and go
  with a full update, since merging the clip rects is too much work :-) I
  think there's helpers so you could be slightly more clever and just have
  an overall bounding box

Could probably steal most of the implementation.

This approach here feels a tad too much in the hacky area ...

Thoughts?
-Daniel

> ---
>  drivers/gpu/drm/drm_damage_helper.c  |  8 
>  include/drm/drm_modeset_helper_vtables.h | 14 ++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_damage_helper.c 
> b/drivers/gpu/drm/drm_damage_helper.c
> index 3a4126dc2520..a0bed1a2c2dc 100644
> --- a/drivers/gpu/drm/drm_damage_helper.c
> +++ b/drivers/gpu/drm/drm_damage_helper.c
> @@ -211,6 +211,7 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>  retry:
>   drm_for_each_plane(plane, fb->dev) {
>   struct drm_plane_state *plane_state;
> + struct drm_crtc *crtc;
>  
>   ret = drm_modeset_lock(&plane->mutex, state->acquire_ctx);
>   if (ret)
> @@ -221,6 +222,13 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
>   continue;
>   }
>  
> + crtc = plane->state->crtc;
> + if (crtc->helper_private->needs_dirtyfb &&
> + !crtc->helper_private->needs_dirtyfb(crtc)) {
> + drm_modeset_unlock(&plane->mutex);
> + continue;
> + }
> +
>   plane_state = drm_atomic_get_plane_state(state, plane);
>   if (IS_ERR(plane_state)) {
>   ret = PTR_ERR(plane_state);
> diff --git a/include/drm/drm_modeset_helper_vtables.h 
> b/include/drm/drm_modeset_helper_vtables.h
> index eb706342861d..afa8ec5754e7 100644
> --- a/include/drm/drm_modeset_helper_vtables.h
> +++ b/include/drm/drm_modeset_helper_vtables.h
> @@ -487,6 +487,20 @@ struct drm_crtc_helper_funcs {
>bool in_vblank_irq, int *vpos, int *hpos,
>ktime_t *stime, ktime_t *etime,
>const struct drm_display_mode *mode);
> +
> + /**
> +  * @needs_dirtyfb
> +  *
> +  * Optional callback used by damage helpers to determine if 
> fb_damage_clips
> +  * update is needed.
> +  *
> +  * Returns:
> +  *
> +  * True if fb_damage_clips update is needed to handle DIRTYFB, False
> +  * otherwise.  If this callback is not implemented, then True is
> +  * assumed.
> +  */
> + bool (*needs_dirtyfb)(struct drm_crtc *crtc);
>  };
>  
>  /**
> -- 
> 2.30.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-08 Thread Rob Clark
From: Rob Clark 

drm_atomic_helper_dirtyfb() will end up stalling for vblank on "video
mode" type displays, which is pointless and unnecessary.  Add an
optional helper vfunc to determine if a plane is attached to a CRTC
that actually needs dirtyfb, and skip over them.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/drm_damage_helper.c  |  8 
 include/drm/drm_modeset_helper_vtables.h | 14 ++
 2 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/drm_damage_helper.c 
b/drivers/gpu/drm/drm_damage_helper.c
index 3a4126dc2520..a0bed1a2c2dc 100644
--- a/drivers/gpu/drm/drm_damage_helper.c
+++ b/drivers/gpu/drm/drm_damage_helper.c
@@ -211,6 +211,7 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
 retry:
drm_for_each_plane(plane, fb->dev) {
struct drm_plane_state *plane_state;
+   struct drm_crtc *crtc;
 
ret = drm_modeset_lock(&plane->mutex, state->acquire_ctx);
if (ret)
@@ -221,6 +222,13 @@ int drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
continue;
}
 
+   crtc = plane->state->crtc;
+   if (crtc->helper_private->needs_dirtyfb &&
+   !crtc->helper_private->needs_dirtyfb(crtc)) {
+   drm_modeset_unlock(&plane->mutex);
+   continue;
+   }
+
plane_state = drm_atomic_get_plane_state(state, plane);
if (IS_ERR(plane_state)) {
ret = PTR_ERR(plane_state);
diff --git a/include/drm/drm_modeset_helper_vtables.h 
b/include/drm/drm_modeset_helper_vtables.h
index eb706342861d..afa8ec5754e7 100644
--- a/include/drm/drm_modeset_helper_vtables.h
+++ b/include/drm/drm_modeset_helper_vtables.h
@@ -487,6 +487,20 @@ struct drm_crtc_helper_funcs {
 bool in_vblank_irq, int *vpos, int *hpos,
 ktime_t *stime, ktime_t *etime,
 const struct drm_display_mode *mode);
+
+   /**
+* @needs_dirtyfb
+*
+* Optional callback used by damage helpers to determine if 
fb_damage_clips
+* update is needed.
+*
+* Returns:
+*
+* True if fb_damage_clips update is needed to handle DIRTYFB, False
+* otherwise.  If this callback is not implemented, then True is
+* assumed.
+*/
+   bool (*needs_dirtyfb)(struct drm_crtc *crtc);
 };
 
 /**
-- 
2.30.2