Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-10-13 Thread Ray Strode
Hi

On Fri, Oct 13, 2023 at 5:41 AM Daniel Vetter  wrote:
> > I mean we're not talking about scientific computing, or code
> > compilation, or seti@home. We're talking about nearly the equivalent
> > of `while (1) __asm__ ("nop");`
>
> I don't think anyone said this shouldn't be fixed or improved.

Yea but we apparently disagree that it would be an improvement to stop
discrediting user space for driver problems.
You see, to me, there are two problems 1) The behavior itself is not
nice and should be fixed 2) The blame/accounting/attribution for a
driver problem is getting directed to user space. I think both issues
should be fixed. One brought the other issue to light, but both
problems, in my mind, are legitimate and should be addressed. You
think fixing the second problem is some tactic to paper over the first
problem. Christian thinks the second problem isn't a problem at all
but the correct design.  So none of us are completely aligned and I
don't see anyone changing their mind anytime soon.

> What I'm saying is that the atomic ioctl is not going to make guarantees
> that it will not take up to much cpu time (for some extremely vague value
> of "too much") to the point that userspace can configure it's compositor
> in a way that it _will_ get killed if we _ever_ violate this rule.
yea I find that strange, all kernel tasks have a certain implied
baseline responsibility to be good citizens to the system.
And how user space is configured seems nearly immaterial.

But we're talking in circles.

> Fixing the worst offenders I don't think will see any pushback at all.
Yea we all agree fixing this one busy loop is a problem but we don't
agree on where the cpu time blame should go.

> > But *this* feels like duct tape: You've already said there's no
> > guarantee the problem won't also happen during preliminary computation
> > during non-blocking commits or via other drm entry points. So it
> > really does seem like a fix that won't age well. I won't be surprised
> > if in ~3 years (or whatever) in some RHEL release there's a customer
> > bug leading to the real-time thread getting blocklisted for obscure
> > server display hardware because it's causing the session to tank on a
> > production machine.
>
> Yes the atomic ioctl makes no guarantees across drivers and hw platforms
> of guaranteed "we will never violate, you can kill your compositor if you
> do" cpu bound limits. We'll try to not suck too badly, and we'll try to
> focus on fixing the suck where it upsets the people the most.
>
> But there is fundamentally no hard real time guarantee in atomic. At least
> not with the current uapi.

So in your mind mutter should get out of the real-time thread business entirely?

> The problem isn't about wasting cpu time. It's about you having set up the
> system in a way so that mutter gets killed if we ever waste too much cpu
> time, ever.

mutter is not set up in a way to kill itself if the driver wastes too
much cpu time,
ever. mutter is set up in a way to kill itself if it, itself, wastes
too much cpu time, ever.
The fact that the kernel is killing it when a kernel driver is wasting
cpu time is the
bug that led to the patch in the first place!

> The latter is flat out not a guarantee the kernel currently makes, and I
> see no practical feasible way to make that happen. And pretending we do
> make this guarantee will just result in frustrated users filling bugs that
> they desktop session died and developers closing them as "too hard to
> fix".

I think all three of us agree busy loops are not nice (though maybe we
aren't completely aligned on severity). And I'll certainly concede
that fixing all the busy loops can be hard. Some of the cpu-bound code
paths may even be doing legitimate computation.  I still assert that
if the uapi calls into driver code that might potentially be doing
something slow where it can't sleep, it should be doing it on a
workqueue or thread. That seems orthogonal to fixing all the places
where the drivers aren't acting nicely.

> Maybe as a bit more useful outcome of this entire discussion: Could you
> sign up to write a documentation patch for the atomic ioctl that adds a
> paragraph stating extremely clearly that this ioctl does not make hard
> real time guarantees, but only best effort soft realtime, and even then
> priority of the effort is focused entirely on the !ALLOW_MODESET case,
> because that is the one users care about the most.

I don't think I'm the best positioned to write such documentation. I
still think what the kernel is doing is wrong here and I don't even
fully grok what you mean by best effort.

--Ray


Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-10-12 Thread Ray Strode
Hi,

On Mon, Oct 09, 2023 at 02:36:17PM +0200, Christian König wrote:
> > > > To be clear, my take is, if driver code is running in process context
> > > > and needs to wait for periods of time on the order of or in excess of
> > > > a typical process time slice it should be sleeping during the waiting.
> > > > If the operation is at a point where it can be cancelled without side
> > > > effects, the sleeping should be INTERRUPTIBLE. If it's past the point
> > > > of no return, sleeping should be UNINTERRUPTIBLE. At no point, in my
> > > > opinion, should kernel code busy block a typical process for dozens of
> > > > milliseconds while keeping the process RUNNING. I don't think this is
> > > > a controversial take.
> > > Exactly that's what I completely disagree on.

Okay if we can't agree that it's not okay for user space (or the
kernel running in the context of user space) to busy loop a cpu core
at 100% utilization throughout and beyond the process's entire
scheduled time slice then we really are at an impasse. I gotta say i'm
astonished that this seemingly indefensible behavior is somehow a
point of contention, but I'm not going to keep arguing about it beyond
this email.

I mean we're not talking about scientific computing, or code
compilation, or seti@home. We're talking about nearly the equivalent
of `while (1) __asm__ ("nop");`

> > The key point here is that the patch puts the work into the background just
> > to avoid that it is accounted to the thread issuing it, and that in turn is
> > not valid as far as I can see.
>
> Yeah it's that aspect I'm really worried about, because we essentially
> start to support some gurantees that a) most drivers can't uphold without
> a huge amount of work, some of the DC state recomputations are _really_
> expensive b) without actually making the semantics clear, it's just
> duct-tape.

If DC plane state computation (or whatever) is really taking 50ms or
200ms, then it probably should be done in chunks so it doesn't get
preempted at an inopportune point? Look, this is not my wheelhouse,
this is your wheelhouse, and I don't want to keep debating forever. It
seems there is a discrepancy between our understandings of implied
acceptable behavior.

> Yes compositors want to run kms in real-time, and yes that results in fun
> if you try to strictly account for cpu time spent. Especially if your
> policy is to just nuke the real time thread instead of demoting it to
> SCHED_NORMAL for a time.

So I ended up going with this suggestion for blocking modesets:

https://gitlab.gnome.org/GNOME/mutter/-/commit/5d3e31a49968fc0da04e98c0f9d624ea5095c9e0

But *this* feels like duct tape: You've already said there's no
guarantee the problem won't also happen during preliminary computation
during non-blocking commits or via other drm entry points. So it
really does seem like a fix that won't age well. I won't be surprised
if in ~3 years (or whatever) in some RHEL release there's a customer
bug leading to the real-time thread getting blocklisted for obscure
server display hardware because it's causing the session to tank on a
production machine.

> I think if we want more than hacks here we need to answer two questions:
> - which parts of the kms api are real time
> - what exactly do we guarantee with that

imo, this isn't just about real-time versus non-real-time. It's no
more acceptable for non-real-time mutter to be using 100% CPU doing
busywaits than it is for real-time mutter to be using 100% cpu doing
busywaits.

Also, both you and Christian have suggested using the non-blocking
modeset api with a fence fd to poll on is equivalent to the blocking
api flushing the commit_tail work before returning from the ioctl, but
that's not actually true. I think we all now agree the EBUSY problem
you mentioned as an issue with my proposed patch wasn't actually a
problem for blocking commits, but that very same issue is a problem
with the non-blocking commits that then block on a fence fd, right? I
guess we'd need to block on a fence fd from the prior non-blocking
commit first before starting the blocking commit (or something)

--Ray


Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-10-06 Thread Ray Strode
Hi,

On Fri, Oct 6, 2023 at 3:12 AM Christian König  wrote:
> When the operation busy waits then that *should* get accounted to the
> CPU time of the current process. When the operation sleeps and waits for
> some interrupt for example it should not get accounted.
> What you suggest is to put the parts of the operation which busy wait
> into a background task and then sleep for that task to complete. This is
> not a solution to the problem, but just hides it.

Actually, I think we both probably agree that there shouldn't be long
busy waits in the context of the current process. After all, we both
agree what the AMD DC driver code is doing is wrong.

To be clear, my take is, if driver code is running in process context
and needs to wait for periods of time on the order of or in excess of
a typical process time slice it should be sleeping during the waiting.
If the operation is at a point where it can be cancelled without side
effects, the sleeping should be INTERRUPTIBLE. If it's past the point
of no return, sleeping should be UNINTERRUPTIBLE. At no point, in my
opinion, should kernel code busy block a typical process for dozens of
milliseconds while keeping the process RUNNING. I don't think this is
a controversial take.

Actually, I think (maybe?) you might even agree with that, but you're
also saying: user space processes aren't special here. While it's not
okay to busy block them, it's also not okay to busy block on the
system unbound workqueue either. If that's your sentiment, I don't
disagree with it.

So I think we both agree the busy waiting is a problem, but maybe we
disagree on the best place for the problem to manifest when it
happens.

One thought re the DC code is regardless of where the code is running,
the scheduler is going to forcefully preempt it at some point right?
Any writereg/udelay(1)/readreg loop is going to get disrupted by a
much bigger than 1us delay by the kernel if the loop goes on long
enough. I'm not wrong about that? if that's true, the code might as
well switch out the udelay(1) for a usleep(1) and call it a day (well
modulo the fact I think it can be called from an interrupt handler; at
least "git grep" says there's a link_set_dpms_off in
link_dp_irq_handler.c)

> Stuff like that is not a valid justification for the change. Ville
> changes on the other hand tried to prevent lock contention which is a
> valid goal here.

Okay so let's land his patchset! (assuming it's ready to go in).
Ville, is that something you'd want to resend for review?

Note, a point that I don't think has been brought up yet, too, is the
system unbound workqueue doesn't run with real time priority. Given
the lion's share of mutter's drmModeAtomicCommit calls are nonblock,
and so are using the system unbound workqueue for handling the
commits, even without this patch, that somewhat diminishes the utility
of using a realtime thread anyway. I believe the original point of the
realtime thread was to make sure mouse cursor motion updates are as
responsive as possible, because any latency there is something the
user really feels. Maybe there needs to be a different mechanism in
place to make sure user perceived interactivity is given priority.

--Ray


Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-10-05 Thread Ray Strode
Hi,

On Thu, Oct 5, 2023 at 5:57 AM Daniel Vetter  wrote:
> So imo the trouble with this is that we suddenly start to make
> realtime/cpu usage guarantees in the atomic ioctl. That's a _huge_ uapi
> change, because even limited to the case of !ALLOW_MODESET we do best
> effort guarantees at best.
So you're saying there's a best effort to not use up process CPU time,
but Christian was trying to argue nearly the opposite, that any needed
CPU time for the operation should get accounted toward the process.

What you're saying makes more sense to me and it tracks with what I'm
getting skimming over the code. I see it potentially sleeps during
blocking drmModeAtomicCommit() calls in several places:

- copy_from_user when copying properties
- fence and vblank event allocation
- waiting on modeset locks
- And even plane changes:

for_each_new_plane_in_state(state, plane, new_plane_state, i) {•
...
→   /*•
→* If waiting for fences pre-swap (ie: nonblock), userspace can•
→* still interrupt the operation. Instead of blocking until the•
→* timer expires, make the wait interruptible.•
→*/•
→   ret = dma_fence_wait(new_plane_state->fence, pre_swap);•
...
}•

(Ignore the typo where it says "nonblock" instead of "!nonblock",
the point is while waiting on plane state changes it sleeps.)

So, in general, the ioctl sleeps when it needs to. It's not trying
to be some non-premptible RT task with a dedicated cpu budget that it
strictly adheres to. That makes sense to me.

> And that state recomputation has to happen synchronously, because
> it always influences the ioctl errno return value.

Not sure I'm following. The task doesn't have to stay in RUNNING the
entire time the ioctl is getting carried out. It can get preempted,
and rescheduled later, all before returning from the ioctl and
delivering the errno back to userspace. What am I missing?

The problem isn't that the ioctl blocks, the problem is that when it
blocks it shouldn't be leaving the task state in RUNNING.

> My take is that you're papering over a performance problem here of the
> "the driver is too slow/wastes too much cpu time". We should fix the
> driver, if that's possible.

I think everyone agrees the amdgpu DC code doing several 100ms busy
loops in a row with no break in between is buggy behavior, and getting
that fixed is important.

Also, there's no dispute that the impetus for my proposed change was
that misbehavior in the driver code.

Neither of those facts mean we shouldn't improve
drm_atomic_helper_commit too. Making the change is a good idea. It was
even independently proposed a year ago, for reasons unrelated to the
current situation. It makes the nonblock and the
!nonblock code paths behave closer to the same. it makes the userspace
experience better in the face of driver bugs. You said best effort
earlier, this change helps provide a best effort.

Realistically, if it was done this way from the start, no one would be
batting an eye, right? It just improves things all around. I think
maybe you're worried about a slippery slope?

> We can also try to change the atomic uapi to give some hard real-time
> guarantees so that running compositors as SCHED_RT is possible, but that
> - means a very serious stream of bugs to fix all over
> - therefore needs some very wide buy-in from drivers that they're willing
>   to make this guarantee
> - probably needs some really carefully carved out limitations, because
>   there's imo flat-out no way we'll make all atomic ioctl hard time limit
>   bound

We don't need a guarantee that reprogramming ast BMC framebuffer
offsets or displayport link training (or whatever) takes less than
200ms.  If a driver has to sit and wait 32ms for vblank before
twiddling things in hardware to keep crud from showing up on screen or
something, that's fine.  But in none of these cases should the
userspace process be kept in RUNNING while it does it!

I take your point that there are a lot of drivers that may be doing
slow things, but surely they should let the process nap while they do
their work?

> Also, as König has pointed out, you can roll this duct-tape out in
> userspace by making the commit non-blocking and immediately waiting for
> the fences.

Sure, userspace can do that (even, it turns out, in the all crtc
disabled case which was an open question earlier in the thread).

That's not really an argument against fixing the !nonblock case
though.

> One thing I didn't see mention is that there's a very subtle uapi
> difference between non-blocking and blocking:
> - non-blocking is not allowed to get ahead of the previous commit, and
>   will return EBUSY in that case. See the comment in
>   drm_atomic_helper_commit()
> - blocking otoh will just block until any previous pending commit has
>   finished
>
> Not taking that into account in your patch here breaks uapi because
> userspace will suddenly get EBUSY when they don't expect that.

I don't think that's right, drm_atomic_helper_setup_co

Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-10-04 Thread Ray Strode
Hi,

On Wed, Oct 4, 2023 at 1:28 PM Ville Syrjälä
 wrote:
> No one really seemed all that interested in it. I'd still like to get
> it in, if for no other reason than to make things operate more uniformly.
> Though there are lots of legacy codepaths left that still hold the locks
> over the whole commit, but those could be fixed up as a followup.

Ah I see what you're saying.

> > > execpt I went further and moved the flush past the unlock in the end.
> >
> > Is that necessary? I was wondering about that but there's this comment
> > in the code:
>
> I'm not really sure there is any point in doing this otherwise.
> It would just change which thread executes the code but everything
> else would still get blocked on the locks the same as before.

Well in my case, I just want driver work to not charge the process cpu time.

But I get what you're saying, Checking if new configurations are valid
shouldnt block on existing configurations getting applied.

Doing it outside the locks isn't necessary to prevents deadlocks, it's
necessary because you're trying to avoid contention when there doesn't
need to be contention.

Your patchset sort of has a different goal, but it solves the issue I
care about at the same time.

I'd be happy if it went in... Daniel, Dave, what are your takes on this?

--Ray


Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-09-28 Thread Ray Strode
hI,

On Thu, Sep 28, 2023 at 11:05 AM Ville Syrjälä
 wrote:
> Here's my earlier take on this: 
> https://patchwork.freedesktop.org/series/108668/

Nice. Was there push back? Why didn't it go in?

> execpt I went further and moved the flush past the unlock in the end.

Is that necessary? I was wondering about that but there's this comment
in the code:

*  ... Locks, especially struct
 * &drm_modeset_lock, should not be held in worker threads or any other
 * asynchronous context used to commit the hardware state.

So it made me think there would be no locks held, and then I threw a
scratch build
at someone who reported it didn't deadlock and solved their issue.

--Ray


Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-09-28 Thread Ray Strode
Hi,

On Thu, Sep 28, 2023 at 9:24 AM Christian König
 wrote:
> If you see a large delay in the dpms off case then we probably have a driver 
> bug somewhere.

This is something we both agree on, I think.

>> I'm getting the idea that you think there is some big bucket of kernel
>> syscalls that block for a large fraction of a second by design and are
>> not meant to reset RLIMIT_RTTIME.
>
> Yes, exactly that's the case as far as I know.

Okay, well i'm willing to be proven wrong. My previously stated list
is: futex, sched_yield, and maybe a few obscure ioctls. If you have a
big bucket of syscalls in your mind, i'm all ears. I think I'm
probably right and the lions share of all syscalls that can block for
a long time don't leave the process RUNNABLE, though.

> The purpose of RLIMIT_RTTIME is to have a watchdog if an application is still 
> responsive. Only things which make the application look for outside events 
> are considered a reset for this whatdog.
>
> So for example calling select() and poll() will reset the watchdog, but not 
> calling any DRM modeset functions or an power management IOCTL for example.

what power management ioctls are you talking about?

>
> There are only two possibilities I can see how a DRM modeset is triggering 
> this:
>
> 1. We create enough CPU overhead to actually exceed the limit. In this case 
> triggering it is certainly intentional.

As I said before I suspect that not all modeset work for all drivers
is happening in the process context already anyway. So if this is
intentional, I suspect it's not actually working as designed (but I
haven't confirmed that, aside from my `git grep -E
'INIT_WORK|DECLARE_TASKLET|open_softirq|timer_setup|kthread_run' | wc
-l` litmus test) . I can't come up with a good reason it would be
designed like that, however, and you haven't provided one, so I
suspect it's not actually intentional.

>> I don't think you really thought through what you're saying here. It
>> just flatly doesn't apply for drmModeAtomicCommit. What is an
>> application supposed to do?
>
> Get terminated and restart. That's what the whole functionality is good for.
>
> If you don't desire that than don't use the RLIMIT_RTTIME functionality.

No, getting terminated and restarting the session because
drmModeAtomicCommit is taking too long is not really acceptable
behavior for a display server. In general, crashing the session is
maybe better than locking up the entire system hard, so RLIMIT_RTTIME
is still a good idea, but in either case we're in bug land here, each
such bug needs to be fixed. In this case, the bug is
drmModeAtomicCommit isn't sleeping like nearly every other syscall
does.

> From the RLIMIT_RTTIME documentation: "The intended use of this limit
> is to stop a runaway real-time process from locking up the system."
>
> And when drmAtomicCommit() is triggering this then we either have a
> problem with the application doing something it is not supposed to do
> (like blocking for vblank while it should listen to network traffic) or
> the driver is somehow buggy.
Yes! This is about bugs.

> drmModeAtomicCommit() is used by display servers. If drmModeAtomicCommit
> runs away in e.g. a set of 100ms busy loops responding to a confused or
> slow responding GPU, the system will seemingly lock up to the user. That
> is an intractable problem that we can not get away from. It doesn't
> matter if the kernel is stuck in process context or on a workqueue. And,
> regardless, it's not reasonable to expect userspace to craft elaborate
> workarounds for driver bugs. We just have to fix the bugs.
>
> Exactly that's the reason why I'm pointing out that this isn't a good idea.

Not following your logic here.

> That just papers over the problem. The process doesn't become more responsive 
> by hiding the problem.

It's not about the process being responsive. In meat-space, the
sub-second delay is imperceptible because the screen is turned off.

It's about not killing an innocent display server because of a driver bug.

> What you need to do here is to report those problems to the driver teams and 
> not try to hide them this way.

There is already a bug, it's mentioned in the merge request, the
reason I cc'd you (vs say just Dave and Daniel) is because the driver
bug is in amdgpu code.

--Ray


Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-09-28 Thread Ray Strode
hi,

On Thu, Sep 28, 2023 at 5:43 AM Michel Dänzer
 wrote:
> >>> When it's really not desirable to account the CPU overhead to the
> >>> process initiating it then you probably rather want to use an non
> >>> blocking commit plus a dma_fence to wait for the work to end from 
> >>> userspace.
> >> Well, first I don't think that's very convenient. You're talking about
> >> a per-plane property, so there would need to be a separate file
> >> descriptor allocated for every plane, right? and user-space would have
> >> to block on all of them before proceeding?
>
> OUT_FENCE_PTR is a per-CRTC property, not per-plane.

Okay, sure.

> Also, at least in this particular case, a single sync file (not dma_fence) 
> for any CRTC might suffice.

I don't see how we could rely on that given the provided api and
multitude of drivers. It might
work and then break randomly.

> >> Is the code really going to allocate a vblank_event when all the
> >> crtc's are disabled ? I guess it's possible, but that's
> >> counterintuitive to me. I really don't know. You're confident a set of
> >> fences will actually work?
> >
> > No when you disable everything there is of course no fence allocated.
>
> I don't see why not. "new_crtc_in_state" in this code means the atomic
> commit contains some state for the CRTC (such as setting the
> OUT_FENCE_PTR property), it could disable or leave it disabled though.
> I can't see any other code which would prevent this from working with a
> disabled CRTC, I could be missing something though.

So I'll concede it's possible it works, and the fact it's using a vblank
type event is just technical debt (though Christian says he doesn't think it
works, so I think it's possible it doesn't work as well.)

Having said that, this whole argument of "the dysfunctional blocking api
is actually functioning as designed, but don't use it, since it doesn't work,
use the non-blocking api instead, because maybe it might work or it might not,
not sure" is pretty non-convincing to me.

> Handling modesets asynchronously does seem desirable in mutter to me.

Sure, fine. That doesn't mean we shouldn't fix the blocking call to behave like
almost all other blocking system calls.

--Ray


Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-09-28 Thread Ray Strode
Hi,

On Thu, Sep 28, 2023 at 2:56 AM Christian König
 wrote:
> > To say the "whole point" is about CPU overhead accounting sounds
> > rather absurd to me. Is that really what you meant?
>
> Yes, absolutely. See the functionality you try to implement already exists.

You say lower in this same message that you don't believe the
functionality actually works for the dpms off case I mentioned.

> After making a non blocking commit userspace can still wait on the
> commit to finish by looking at the out fence.

fences, not fence, fences. drmModeAtomicCommit works on multiple objects
at the same time. To follow the spirit of such an api, we would need a
separate fd allocated for each crtc and would have to wait on all of
them.  Surely you can see how that is much less straightforward than
using a blocking api. But mutter already uses non-blocking apis for the
lion's share of cases. It doesn't need fences for those cases, though,
because it can just use page flip events.

The main reason it uses blocking apis are for modesets and when doing
dpms off. The latter case you said you don't think can use fences, and
it certainly can't use page flip events.

So if you're right that fences can't be used for the dpms off case, it's
not workable answer. If you're wrong, and fences can be used for the
dpms off case, then it's a messy answer.

> A blocking system call in the sense of RLIMIT_RTTIME means something
> which results in the process listening for external events, e.g. calling
> select(), epoll() or read() on (for example) a network socket etc...
>
> As far as I can see drmAtomicCommit() is *not* meant with that what
> similar to for example yield() also doesn't reset the RLIMIT_RTTIME counter.
No no no, drmModeAtomicCommit() is not like sched_yield(). That's a
really strange thing to say (you do mean sched_yield() right?).
sched_yield() is an oddball because it's specifically for giving other
threads a turn if they need it without causing the current thread to
sleep if they don't.  It's a niche api that's meant for high performance
use cases. It's a way to reduce scheduling latency and increase running
time predictability.  drmModeAtomicCommit() using up rt time, busy
looping while waiting on the hardware to respond, eating into userspace
RLIMIT_RTTIME is nothing like that.

I'm getting the idea that you think there is some big bucket of kernel
syscalls that block for a large fraction of a second by design and are
not meant to reset RLIMIT_RTTIME. I could be wrong, but I don't think
that's true. Off the top of my head, the only ones I can think of that
might reasonably do that are futex() (which obviously can't sleep),
sched_yield() (who's whole point is to not sleep), and maybe a
some obscure ioctls (some probably legitimately, some probably
illegitimately and noone has noticed.). I'm willing to be proven wrong
here, and I might be, but right now from thinking about it, my guess is
the above list is pretty close to complete.

> Well you are breaking the RLIMIT_RTTIME functionality.
>
> The purpose of that functionality is to allow debugging and monitoring
> applications to make sure that they keep alive and can react to external
> signals.

I don't think you really thought through what you're saying here. It
just flatly doesn't apply for drmModeAtomicCommit. What is an
application supposed to do? It can't block the SIGKILL that's coming.
Respond to the preceding SIGXCPUs? What response could the application
possibly make? I'm guessing drmModeAtomicCommit isn't going to EINTR
because it's busy looping waiting on hardware in the process context.
And the kernel doesn't even guarantee SIGXCPU is going to go to the
thread with the stuck syscall, so even if there was a legitimate
response (or even "pthread_cancel" or some wreckless nonsense like
that), getting the notification to the right part of the program is an
exercise in gymnastics.

> From the RLIMIT_RTTIME documentation: "The intended use of this limit
> is to stop a runaway real-time process from locking up the system."
>
> And when drmAtomicCommit() is triggering this then we either have a
> problem with the application doing something it is not supposed to do
> (like blocking for vblank while it should listen to network traffic) or
> the driver is somehow buggy.

drmModeAtomicCommit() is used by display servers. If drmModeAtomicCommit
runs away in e.g. a set of 100ms busy loops responding to a confused or
slow responding GPU, the system will seemingly lock up to the user. That
is an intractable problem that we can not get away from. It doesn't
matter if the kernel is stuck in process context or on a workqueue. And,
regardless, it's not reasonable to expect userspace to craft elaborate
workarounds for driver bugs. We just have to fix the bugs.

> No when you disable everything there is of course no fence allocated.
Okay, so it's not actually an answer

> But then you also should never see anything waiting for to long to
> actually be able to trigger the RLIMI

Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

2023-09-27 Thread Ray Strode
Hi,

On Wed, Sep 27, 2023 at 4:05 AM Christian König
 wrote:
> I'm not an expert for that stuff, but as far as I know the whole purpose
> of the blocking functionality is to make sure that the CPU overhead
> caused by the commit is accounted to the right process.
I'm not an expert either, but that's clearly wrong.

The point of blocking functionality in drmAtomicCommit is for the
ioctl to block until the operation is completed, so userspace knows
that they can commit again after it returns without getting EBUSY, and
they can be sure, e.g., the mode is set or the displays are disabled
or whatever.

To say the "whole point" is about CPU overhead accounting sounds
rather absurd to me. Is that really what you meant?

You could try to make the argument that one of the points of the
blocking call is about CPU accounting (instead of the whole point),
maybe that's what you meant? That seems likely wrong to me too. I mean
just check the docs:

   RLIMIT_RTTIME (since Linux 2.6.25)
  This is a limit (in microseconds) on the amount of CPU
time that a process scheduled under a real‐time scheduling policy may
consume without making a blocking system call.  For  the  purpose  of
  this limit, each time a process makes a blocking system
call, the count of its consumed CPU time is reset to zero.

drmAtomicCommit() is making a blocking system call so the limit should
be reset, in my opinion.

Furthermore, a lot happens under the covers as part of drmAtomicCommit.

Are you telling me that in all the drivers handling drmAtomicCommit,
none of the work from those drivers gets deferred outside of process
context if DRM_MODE_ATOMIC_NONBLOCK isn't set? I find that very hard
to believe. But it would have to be true, if one of the main points of
the blocking call is about CPU accounting, right? You can't say the
point is about CPU accounting if some of the work is handled in a
helper thread or work queue or whatever.

╎❯ git grep -E 'INIT_WORK|DECLARE_TASKLET|open_softirq|timer_setup|kthread_run'
| wc -l
182

There seems to be a lot of non-process context execution going on in the tree...

> So what you are suggesting here is to actually break that functionality
> and that doesn't seem to make sense.

What's the use case here that could break? Walk me through a
real-world, sensible example where a program depends on
drmAtomicCommit() blocking and continuing to eat into process cpu time
while it blocks. I just want to see where you are coming from. Maybe
I'm being unimaginative but I just don't see it. I do see actual
main-stream display servers breaking today because the current
behavior defies expectations.

> When it's really not desirable to account the CPU overhead to the
> process initiating it then you probably rather want to use an non
> blocking commit plus a dma_fence to wait for the work to end from userspace.

Well, first I don't think that's very convenient. You're talking about
a per-plane property, so there would need to be a separate file
descriptor allocated for every plane, right? and user-space would have
to block on all of them before proceeding? Also, are you sure that
works in all cases? The problematic case we're facing right now is
when all planes and all crtcs are getting disabled. I'm just skimming
over the code here, but I see this:

→   for_each_new_crtc_in_state(state, crtc, crtc_state, i) {•
...
→   →   if (arg->flags & DRM_MODE_PAGE_FLIP_EVENT || fence_ptr) {•
...
→   →   →   e = create_vblank_event(crtc, arg->user_data);•
...
→   →   →   crtc_state->event = e;•
→   →   }•
...
→   →   if (fence_ptr) {•
...
→   →   →   fence = drm_crtc_create_fence(crtc);•
...
→   →   →   ret = setup_out_fence(&f[(*num_fences)++], fence);•
...
→   →   →   crtc_state->event->base.fence = fence;•
→   →   }•

Is the code really going to allocate a vblank_event when all the
crtc's are disabled ? I guess it's possible, but that's
counterintuitive to me. I really don't know. You're confident a set of
fences will actually work?

Regardless, this seems like a roundabout way to address a problem that
we could just ... fix.

--Ray


[PATCH] drm/atomic: Perform blocking commits on workqueue

2023-09-26 Thread Ray Strode
From: Ray Strode 

A drm atomic commit can be quite slow on some hardware. It can lead
to a lengthy queue of commands that need to get processed and waited
on before control can go back to user space.

If user space is a real-time thread, that delay can have severe
consequences, leading to the process getting killed for exceeding
rlimits.

This commit addresses the problem by always running the slow part of
a commit on a workqueue, separated from the task initiating the
commit.

This change makes the nonblocking and blocking paths work in the same way,
and as a result allows the task to sleep and not use up its
RLIMIT_RTTIME allocation.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2861
Signed-off-by: Ray Strode 
---
 drivers/gpu/drm/drm_atomic_helper.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 292e38eb6218..1a1e68d98d38 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -2028,64 +2028,63 @@ int drm_atomic_helper_commit(struct drm_device *dev,
 * This is the point of no return - everything below never fails except
 * when the hw goes bonghits. Which means we can commit the new state on
 * the software side now.
 */
 
ret = drm_atomic_helper_swap_state(state, true);
if (ret)
goto err;
 
/*
 * Everything below can be run asynchronously without the need to grab
 * any modeset locks at all under one condition: It must be guaranteed
 * that the asynchronous work has either been cancelled (if the driver
 * supports it, which at least requires that the framebuffers get
 * cleaned up with drm_atomic_helper_cleanup_planes()) or completed
 * before the new state gets committed on the software side with
 * drm_atomic_helper_swap_state().
 *
 * This scheme allows new atomic state updates to be prepared and
 * checked in parallel to the asynchronous completion of the previous
 * update. Which is important since compositors need to figure out the
 * composition of the next frame right after having submitted the
 * current layout.
 *
 * NOTE: Commit work has multiple phases, first hardware commit, then
 * cleanup. We want them to overlap, hence need system_unbound_wq to
 * make sure work items don't artificially stall on each another.
 */
 
drm_atomic_state_get(state);
-   if (nonblock)
-   queue_work(system_unbound_wq, &state->commit_work);
-   else
-   commit_tail(state);
+   queue_work(system_unbound_wq, &state->commit_work);
+   if (!nonblock)
+   flush_work(&state->commit_work);
 
return 0;
 
 err:
drm_atomic_helper_cleanup_planes(dev, state);
return ret;
 }
 EXPORT_SYMBOL(drm_atomic_helper_commit);
 
 /**
  * DOC: implementing nonblocking commit
  *
  * Nonblocking atomic commits should use struct &drm_crtc_commit to sequence
  * different operations against each another. Locks, especially struct
  * &drm_modeset_lock, should not be held in worker threads or any other
  * asynchronous context used to commit the hardware state.
  *
  * drm_atomic_helper_commit() implements the recommended sequence for
  * nonblocking commits, using drm_atomic_helper_setup_commit() internally:
  *
  * 1. Run drm_atomic_helper_prepare_planes(). Since this can fail and we
  * need to propagate out of memory/VRAM errors to userspace, it must be called
  * synchronously.
  *
  * 2. Synchronize with any outstanding nonblocking commit worker threads which
  * might be affected by the new state update. This is handled by
  * drm_atomic_helper_setup_commit().
  *
  * Asynchronous workers need to have sufficient parallelism to be able to run
  * different atomic commits on different CRTCs in parallel. The simplest way to
-- 
2.41.0



Re: [RFC PATCH v2 00/13] Kernel based bootsplash

2017-12-21 Thread Ray Strode
Hi,

On Wed, Dec 20, 2017 at 11:44 AM Max Staudt  wrote:
> It'd be nice to see this bug fixed, as it happens only occasionally (as is 
> the nature of a
> race condition), and was thus really hard to debug. I'm sure it can drive 
> people insane,
> as they try to find out whether they've disabled Ctrl-Alt-Fx in their 
> xorg.conf, but really
> it's Plymouth getting the system into a bad state. I probably owe a bald 
> patch on my
> head to this bug.
Okay, reading through the code I do see how what you're describing could happen.

I sketched out a patch here:

https://cgit.freedesktop.org/plymouth/commit/?h=fix-vt-management-on-deactivate&id=0b5b790d628f4c96e3ab9fbc0daba67a29b850da

that I think should fix it.  I need to do some testing with it (ideally rig up
a reproducer) before I push it.

> This is exactly where the kernel bootsplash is useful. Since it starts even 
> before any
> userspace program is loaded, it can close this gap.
>
> I've even tried it in combination with Plymouth: Plymouth is just another 
> graphical
> application, so it simply pops up "on top", just like X would. The two 
> splashes
> integrate flawlessly.
I just wish it used our modern graphics platform instead of the
deprecated subsystem.

> One could argue that one could put all DRM drivers into the initrd. Ubuntu 
> does this,
> and the initrd is ~40 MB in size. Not nice.
well, that 40mb isn't just graphics drivers...
╎❯ du -sh /lib/modules/`uname -r`/kernel/drivers/gpu
2.7M /lib/modules/4.14.6-300.fc27.x86_64/kernel/drivers/gpu

3M isn't too awful.

But really you have two choices as I see it:

1) make the initrd support new hardware
2) make the initrd be taylored to a specific configuration.

I actually think ubuntu has it right by doing 1. it's going to give
the best user experience.
(not just with graphics but other new hardware too).

But if you do 2) then it's not unreasonable if things break with new
hardware. Now
ideally, the breakage would be as isolated as possible.  I mean maybe
it's okay if the
boot splash breaks (or shows a text splash), but it's not okay if the
bootsplash sort of
works using /dev/fb, but then causes Xorg to break.  So we should
probably change
plymouth to avoid falling back to /dev/fb in the case where new
hardware got added.
Could probably fix this by detecting if kms is around when the initrd
is generated,
and adding a config option to skip fb renderer in that case.  or
something like that.

But the easy answer is to just fix the initrd to have the graphics drivers.

> And even then, the initrd could be outdated for some reason. Maybe it's a 
> developer
> machine. Nobody would expect the boot to hang/fail because of this problem.
Right, ideally boot splash problems should never stop boot from proceeding.

> So let's take SUSE. They don't have a finishing transition, the splash simply 
> stops
> and is hidden at once. Such a splash makes sense to be shown instantly, right?
I don't think it makes sense for animations to lack transitions.
animations without
transitions look buggy or unfinished. they should fade out or finish
the loop, or
whatever.  If it's a static image it should fade to black or the
background color.

(going to be away from the computer for a few days after this message
so probably won't reply for a while to further discussion)
--Ray
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC PATCH v2 03/13] bootsplash: Flush framebuffer after drawing

2017-12-20 Thread Ray Strode
Hi,

> Actually, I don't want that :)
>
> This was a design decision that I've made to keep the code small and simple 
> to audit.
> As it is, the simple bootsplash code will make 99% of people happy.
You think only 1% of linux users have more than one monitor or a 4k screen?

> I've made this decision from the point of view of someone who wants to ship a 
> general
> purpose distribution. If you have a look around and compare this to e.g. the 
> Windows or
> Mac OS X bootsplashes, the possibilities that my kernel code provides already
> surpasses them.
I haven't looked specifically, but I don't believe you :-)  You're
telling me the apple boot
splash isn't scaled up on machines with retina displays?  I don't use
OSX (or windows),
so I don't know, but I'd be really surprised.

> If you really want such sophisticated features, supplanting the initial 
> bootsplash  with
> Plymouth (or not using it at all) is a better solution. In my opinion, it is 
> overengineering,
> at least for kernel code.
disagree..it's support for basic, commodity hardware these days.

> As for HiDPI, if you're working on an embedded device with a fixed screen 
> size -
> say a phone - you can easily include a huge picture the size of the screen in 
> the
> bootsplash.
I'm talking about a situation where you have a dell xps or whatever
with an external
monitor attached.  Each monitor should be able to show the splash
without deforming
it, and without making it huge or microscopic. pretty basic stuff...

--Ray
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC PATCH v2 00/13] Kernel based bootsplash

2017-12-20 Thread Ray Strode
Hi,

> The problem that I am stumbling upon is different:
>  - the system starts with an FB driver
>  - after the ShowDelay time, Plymouth opens /dev/fb0
>  - the system finally loads the DRM driver, which tries to kick the previous 
> FB driver
>  - loading the DRM driver fails because Plymouth still has the previous 
> /dev/fb0 open
So the thing to realize is, that using /dev/fb is a last ditch effort
by plymouth to make
things work. It's basically a compat hack to keep vga=0x318 working
and also a nod
to embedded systems that just have /dev/fb and don't have a kms driver.  If we
fall back to /dev/fb we lose, as mentioned before, multi-monitor
support. And it's a
legacy, deprecated api.

If we've reached the scenario you're discussing above, the real
failure is that the KMS
driver took too long to load. DRM is the platform graphics api.  If
it's not loading
timely enough to show graphics then that's the problem!  It sounds
like maybe in the
above bug, you're just failing to load the drm driver in the initrd ?

> If you have a better way of calling it, I'd be glad to learn.
> Maybe "grabbing the VT", "taking ownership of the VT", ...?
I don't care what we call it, I just didn't understand what you were
saying before.
I think i'd say "manages vt switching", but whatever.

> And then, if something causes Plymouth to sense a new device (such as Plymouth
>  thinking that udev coldplug is complete), it will open the device, and as 
> part of that,
> call VT_SETMODE. This is unexpected, since "plymouth deactivate" should keep 
> it
> from doing this. And Plymouth's code architecture is such that this bug is 
> hard to fix.
If what you're describing is happening, this does sound like a bug. I
don't think it
should be hard to fix, if it's a problem. I'll look into it.

> [I] have decided to write a kernel-based replacement to simplify things and 
> to show a
> splash as early as possible. It just avoids all of this complexity.
So, for the record, I don't actually have a problem with you doing a
kernel based splash.
(though it should use drm subsystem apis not graphics subsystem apis,
/dev/fb is going
the way of the dodo)

> This is the sleep that I mean.
>
> On the one hand, it is this delay that makes most users not notice the
> "busy VRAM bug". If the DRM driver that replaces the FB driver is included in 
> the
> initramfs, then in most cases, it will be loaded before the 5 seconds are up. 
> However,
> if the driver is loaded after these 5 seconds have elapsed, then Plymouth 
> will have
> opened /dev/fb0 and the modprobe fails.
Think of this from a user perspective.  If the screen is black for 15 seconds
(or something) before a splash is shown, then we've already hit a
problem! That's like 15
seconds of time where the user is wondering if their system is broken.
But I don't think that actually happens in practice.  I think (maybe?)
the situation you're
hitting is your drm driver isn't starting to get loaded until N
seconds after boot has started,
because it's not in the initrd.  So the fix is to put it in the initrd.

> On the other hand, what is the motivation for this delay?
As I said earlier, the motivation for the delay is to avoid showing a
splash for systems that
boot in 4 seconds or something. At that point a splash is just getting
in the way.

>  If Plymouth were to display the splash instantly on a system that needs 0.5 
> seconds to
> boot, then the splash would flash for 0.5 seconds.
No, flashing a splash for half a second would be a bug. (again think
of things from a user
perpective).  Plymouth splashes have animations at the end to
transition the user to the
 login screen.  Normally those animations don't contribute to boot
time, because we know
when boot will finish from prior boot data.  But if boot were 0.5
seconds long, then those
 animations would contribute 2 to 3 seconds to boot time, and if boot
is 0.5 seconds long
showing a splash is pointless.
> But with the delay, a system that needs 5.5 seconds to boot will also flash 
> it for 0.5 seconds.
> Either way, the splash will just flash for a moment.
again, we don't blink the splash on and off. we have transition animations.

> The delay only changes which systems are affected. However, if you set the 
> delay to 0,
> you'll run into the bug I described above.
Then put the drm driver in the initramfs so you fix your bug !

> This is a design problem, hidden by a needless delay.
really don't see how it is.

--Ray
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC PATCH v2 03/13] bootsplash: Flush framebuffer after drawing

2017-12-19 Thread Ray Strode
Hi,

On Tue, Dec 19, 2017 at 10:41 AM, Max Staudt  wrote:
> I'm hooking into the in-kernel terminal emulator, because the bootsplash is a
> functional extension of that. It just happens that fbcon sits on top of FB, 
> so I
> work with what I get.
>
> And the console in turn happens to work on all FB and KMS drivers, so it
> makes users of all kinds of drivers happy. In fact, that's why the FB 
> emulation
> in KMS drivers came to be in the first place, if I remember right - to ensure
> fbcon continues to work.

But what about multi-monitor? what about hidpi screens?  Ideally you want
each monitor to show the splash in a way that best fits that monitor.

You can't do that if you're drawing all over fbcon... and it's not like multiple
monitors and 4k screens are niche these days.

--Ray
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC PATCH v2 00/13] Kernel based bootsplash

2017-12-19 Thread Ray Strode
Hi,

> For example, having a userspace splash that starts as early as it can
> (thus on vesafb/efifb on a PC) will cause the KMS driver to fail
> reserving the entirety of video RAM, and thus fail loading. This cannot be 
> fixed.
well the fix there is to use drm devices... like this:

https://cgit.freedesktop.org/plymouth/commit/?id=97f02ee959374afb1b8a78dc67116cd880cf2d23

> Furthermore, Plymouth is quite broken. For example, it may lock
> (via VT_SETMODE) the VT even though Plymouth is in "disabled"
> state and X has already taken control of the VT.
What do you mean by "disabled" (plymouth.enable=0?) ?  if it's
disabled, it's not going to call VT_SETMODE ...

Why do you refer to VT_SETMODE as locking ?  VT_SETMODE sets
whether a process handles VT changes or the kernel does.  There is a
long standing kernel issue where a mode of VT_AUTO (kernel handles
vt switching) + KDGRAPHICS means VTs can't be changed. is that what
you're talking about?

Anyway plymouth is only going to step on X's toes, if the distro erroneously
asks it to.  Normally, a distro would run "plymouth deactivate" before
starting X, instead of trying to run them at the same time...

> This causes the kernel to throw away X's PID as the VT owner, and thus
> chvt and Ctrl-Alt-Fx no longer work because X can neither release the
> console (VT_RELDISP fails), nor does the kernel send it the signal to do
> so. This is hard to impossible to fix.
Unless i'm missing something, this is totally just a problem with startup
scripts not doing the right thing?  Plymouth shouldn't be doing anything
once X is started.  If it is, that's either a plymouth bug (or more likely a
distro integration problem)

> A third reason is that in practice, Plymouth's start is delayed for reasons
> such as the above. Yes, race conditions are being worked around with
> sleeps.
??? that's not true.  We don't have any sleep statements in the code to work
around race conditions with X.

We do have two configurable delays in the code, are you talking about one of
them?

1) The first is a ShowDelay option.  The point of this option is,
"If boot takes 5 seconds or less, it's essentially instant and we
shouldn't show a splash at all".  Nothing to do with race conditions.
You can set it to 0 if you want.

2) The second is DeviceTimeout option.  The point of this option is to
decide how long to wait for udev coldplug to finish.  It's mostly
relevant for systems that don't have kms drivers.  The point is at
somepoint during boot we need to decide to stop waiting for a drm
device to show up and just fallback to showing graphics using
legacy interfaces (like /dev/fb).  We used to wait until the udev
queue went empty, but that's error prone since it gets cleared when
the root is switched.  See

https://lists.freedesktop.org/archives/systemd-devel/2015-March/029184.html


> So some issues are hard to fix, others are impossible to fix in userspace.
I'm not convinced there are any insurmountable problems here...

One thing i'd like to do is change boot to not map fbcon at first, and
only map it in on the fly when the user hits escape, or after boot
finishes.  like, for instance, try booting with fbcon=vc:2 or
fbcon=map:9 to see how it improves the boot experience.

--Ray
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 0/2] qxl cursor fixes

2017-11-27 Thread Ray Strode
From: Ray Strode 

This series includes a small leak fix and a fix for an initial invisible
cursor on wayland sessions.

Ray Strode (2):
  drm/qxl: unref cursor bo when finished with it
  drm/qxl: reapply cursor after resetting primary

 drivers/gpu/drm/qxl/qxl_display.c | 63 ++-
 drivers/gpu/drm/qxl/qxl_drv.h |  2 ++
 2 files changed, 64 insertions(+), 1 deletion(-)

-- 
2.14.3

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] drm/qxl: unref cursor bo when finished with it

2017-11-27 Thread Ray Strode
From: Ray Strode 

qxl_cursor_atomic_update allocs a bo for the cursor that
it never frees up at the end of the function.

This commit fixes that.

Signed-off-by: Ray Strode 
---
 drivers/gpu/drm/qxl/qxl_display.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 4756b3c9bf2c..7335d99244d5 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -548,61 +548,61 @@ static void qxl_primary_atomic_disable(struct drm_plane 
*plane,
struct qxl_device *qdev = plane->dev->dev_private;
 
if (old_state->fb) {
struct qxl_framebuffer *qfb =
to_qxl_framebuffer(old_state->fb);
struct qxl_bo *bo = gem_to_qxl_bo(qfb->obj);
 
if (bo->is_primary) {
qxl_io_destroy_primary(qdev);
bo->is_primary = false;
}
}
 }
 
 static int qxl_plane_atomic_check(struct drm_plane *plane,
  struct drm_plane_state *state)
 {
return 0;
 }
 
 static void qxl_cursor_atomic_update(struct drm_plane *plane,
 struct drm_plane_state *old_state)
 {
struct drm_device *dev = plane->dev;
struct qxl_device *qdev = dev->dev_private;
struct drm_framebuffer *fb = plane->state->fb;
struct qxl_release *release;
struct qxl_cursor_cmd *cmd;
struct qxl_cursor *cursor;
struct drm_gem_object *obj;
-   struct qxl_bo *cursor_bo, *user_bo = NULL;
+   struct qxl_bo *cursor_bo = NULL, *user_bo = NULL;
int ret;
void *user_ptr;
int size = 64*64*4;
 
ret = qxl_alloc_release_reserved(qdev, sizeof(*cmd),
 QXL_RELEASE_CURSOR_CMD,
 &release, NULL);
if (ret)
return;
 
if (fb != old_state->fb) {
obj = to_qxl_framebuffer(fb)->obj;
user_bo = gem_to_qxl_bo(obj);
 
/* pinning is done in the prepare/cleanup framevbuffer */
ret = qxl_bo_kmap(user_bo, &user_ptr);
if (ret)
goto out_free_release;
 
ret = qxl_alloc_bo_reserved(qdev, release,
sizeof(struct qxl_cursor) + size,
&cursor_bo);
if (ret)
goto out_kunmap;
 
ret = qxl_release_reserve_list(release, true);
if (ret)
goto out_free_bo;
 
ret = qxl_bo_kmap(cursor_bo, (void **)&cursor);
@@ -618,60 +618,62 @@ static void qxl_cursor_atomic_update(struct drm_plane 
*plane,
cursor->data_size = size;
cursor->chunk.next_chunk = 0;
cursor->chunk.prev_chunk = 0;
cursor->chunk.data_size = size;
memcpy(cursor->chunk.data, user_ptr, size);
qxl_bo_kunmap(cursor_bo);
qxl_bo_kunmap(user_bo);
 
cmd = (struct qxl_cursor_cmd *) qxl_release_map(qdev, release);
cmd->u.set.visible = 1;
cmd->u.set.shape = qxl_bo_physical_address(qdev,
   cursor_bo, 0);
cmd->type = QXL_CURSOR_SET;
} else {
 
ret = qxl_release_reserve_list(release, true);
if (ret)
goto out_free_release;
 
cmd = (struct qxl_cursor_cmd *) qxl_release_map(qdev, release);
cmd->type = QXL_CURSOR_MOVE;
}
 
cmd->u.position.x = plane->state->crtc_x + fb->hot_x;
cmd->u.position.y = plane->state->crtc_y + fb->hot_y;
 
qxl_release_unmap(qdev, release, &cmd->release_info);
qxl_push_cursor_ring_release(qdev, release, QXL_CMD_CURSOR, false);
qxl_release_fence_buffer_objects(release);
 
+   qxl_bo_unref(&cursor_bo);
+
return;
 
 out_backoff:
qxl_release_backoff_reserve_list(release);
 out_free_bo:
qxl_bo_unref(&cursor_bo);
 out_kunmap:
qxl_bo_kunmap(user_bo);
 out_free_release:
qxl_release_free(qdev, release);
return;
 
 }
 
 static void qxl_cursor_atomic_disable(struct drm_plane *plane,
  struct drm_plane_state *old_state)
 {
struct qxl_device *qdev = plane->dev->dev_private;
struct qxl_release *release;
struct qxl_cursor_cmd *cmd;
int ret;
 
ret = qxl_alloc_release_reserved(qdev, sizeof(*cmd),
 QXL_RELEASE_CURSOR_CMD,
 &am

[PATCH 2/2] drm/qxl: reapply cursor after resetting primary

2017-11-27 Thread Ray Strode
From: Ray Strode 

QXL associates mouse state with its primary plane.

Destroying a primary plane and putting a new one in place has the side
effect of destroying the cursor as well.

This commit changes the driver to reapply the cursor any time a new
primary is created. It achieves this by keeping a reference to the
cursor bo on the qxl_crtc struct.

This fix is very similar to

commit 4532b241a4b7 ("drm/qxl: reapply cursor after SetCrtc calls")

which got implicitly reverted as part of implementing the atomic
modeset feature.

Cc: Gerd Hoffmann 
Cc: Dave Airlie 
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1512097
Fixes: 1277eed5fecb ("drm: qxl: Atomic phase 1: convert cursor to universal 
plane")
Signed-off-by: Ray Strode 
---
 drivers/gpu/drm/qxl/qxl_display.c | 59 +++
 drivers/gpu/drm/qxl/qxl_drv.h |  2 ++
 2 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 7335d99244d5..9a9214ae0fb5 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -262,60 +262,61 @@ static int qxl_add_common_modes(struct drm_connector 
*connector,
mode = drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h,
60, false, false, false);
if (common_modes[i].w == pwidth && common_modes[i].h == pheight)
mode->type |= DRM_MODE_TYPE_PREFERRED;
drm_mode_probed_add(connector, mode);
}
return i - 1;
 }
 
 static void qxl_crtc_atomic_flush(struct drm_crtc *crtc,
  struct drm_crtc_state *old_crtc_state)
 {
struct drm_device *dev = crtc->dev;
struct drm_pending_vblank_event *event;
unsigned long flags;
 
if (crtc->state && crtc->state->event) {
event = crtc->state->event;
crtc->state->event = NULL;
 
spin_lock_irqsave(&dev->event_lock, flags);
drm_crtc_send_vblank_event(crtc, event);
spin_unlock_irqrestore(&dev->event_lock, flags);
}
 }
 
 static void qxl_crtc_destroy(struct drm_crtc *crtc)
 {
struct qxl_crtc *qxl_crtc = to_qxl_crtc(crtc);
 
+   qxl_bo_unref(&qxl_crtc->cursor_bo);
drm_crtc_cleanup(crtc);
kfree(qxl_crtc);
 }
 
 static const struct drm_crtc_funcs qxl_crtc_funcs = {
.set_config = drm_atomic_helper_set_config,
.destroy = qxl_crtc_destroy,
.page_flip = drm_atomic_helper_page_flip,
.reset = drm_atomic_helper_crtc_reset,
.atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state,
.atomic_destroy_state = drm_atomic_helper_crtc_destroy_state,
 };
 
 void qxl_user_framebuffer_destroy(struct drm_framebuffer *fb)
 {
struct qxl_framebuffer *qxl_fb = to_qxl_framebuffer(fb);
struct qxl_bo *bo = gem_to_qxl_bo(qxl_fb->obj);
 
WARN_ON(bo->shadow);
drm_gem_object_unreference_unlocked(qxl_fb->obj);
drm_framebuffer_cleanup(fb);
kfree(qxl_fb);
 }
 
 static int qxl_framebuffer_surface_dirty(struct drm_framebuffer *fb,
 struct drm_file *file_priv,
 unsigned flags, unsigned color,
 struct drm_clip_rect *clips,
 unsigned num_clips)
 {
@@ -468,193 +469,251 @@ static void qxl_crtc_atomic_disable(struct drm_crtc 
*crtc,
 
 static const struct drm_crtc_helper_funcs qxl_crtc_helper_funcs = {
.mode_fixup = qxl_crtc_mode_fixup,
.mode_set_nofb = qxl_mode_set_nofb,
.atomic_flush = qxl_crtc_atomic_flush,
.atomic_enable = qxl_crtc_atomic_enable,
.atomic_disable = qxl_crtc_atomic_disable,
 };
 
 static int qxl_primary_atomic_check(struct drm_plane *plane,
struct drm_plane_state *state)
 {
struct qxl_device *qdev = plane->dev->dev_private;
struct qxl_framebuffer *qfb;
struct qxl_bo *bo;
 
if (!state->crtc || !state->fb)
return 0;
 
qfb = to_qxl_framebuffer(state->fb);
bo = gem_to_qxl_bo(qfb->obj);
 
if (bo->surf.stride * bo->surf.height > qdev->vram_size) {
DRM_ERROR("Mode doesn't fit in vram size (vgamem)");
return -EINVAL;
}
 
return 0;
 }
 
+static int qxl_primary_apply_cursor(struct drm_plane *plane)
+{
+   struct drm_device *dev = plane->dev;
+   struct qxl_device *qdev = dev->dev_private;
+   struct drm_framebuffer *fb = plane->state->fb;
+   struct qxl_crtc *qcrtc = to_qxl_crtc(plane->state->crtc);
+   struct qxl_cursor_cmd *cmd;
+   struct qxl_re

Re: [PATCH] drm/qxl: reapply cursor after resetting primary

2017-11-27 Thread Ray Strode
Hi,

On Mon, Nov 27, 2017 at 4:07 PM Ray Strode  wrote:
...
> This commit changes the driver to reapply the cursor any time a new
> primary is created. It achieves this by keeping a reference to the
> cursor bo on the qxl_crtc struct.
So i forgot this is on top of a small leak fix, too.

For clarity, I'll resend with both patches in the series.

--Ray
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/qxl: reapply cursor after resetting primary

2017-11-27 Thread Ray Strode
From: Ray Strode 

QXL associates mouse state with its primary plane.

Destroying a primary plane and putting a new one in place has the side
effect of destroying the cursor as well.

This commit changes the driver to reapply the cursor any time a new
primary is created. It achieves this by keeping a reference to the
cursor bo on the qxl_crtc struct.

This fix is very similar to

commit 4532b241a4b7 ("drm/qxl: reapply cursor after SetCrtc calls")

which got implicitly reverted as part of implementing the atomic
modeset feature.

Cc: Gerd Hoffmann 
Cc: Dave Airlie 
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1512097
Fixes: 1277eed5fecb ("drm: qxl: Atomic phase 1: convert cursor to universal 
plane")
Signed-off-by: Ray Strode 
---
 drivers/gpu/drm/qxl/qxl_display.c | 59 +++
 drivers/gpu/drm/qxl/qxl_drv.h |  2 ++
 2 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index fc9e86f3811f..a51a5abb977d 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -262,60 +262,61 @@ static int qxl_add_common_modes(struct drm_connector 
*connector,
mode = drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h,
60, false, false, false);
if (common_modes[i].w == pwidth && common_modes[i].h == pheight)
mode->type |= DRM_MODE_TYPE_PREFERRED;
drm_mode_probed_add(connector, mode);
}
return i - 1;
 }
 
 static void qxl_crtc_atomic_flush(struct drm_crtc *crtc,
  struct drm_crtc_state *old_crtc_state)
 {
struct drm_device *dev = crtc->dev;
struct drm_pending_vblank_event *event;
unsigned long flags;
 
if (crtc->state && crtc->state->event) {
event = crtc->state->event;
crtc->state->event = NULL;
 
spin_lock_irqsave(&dev->event_lock, flags);
drm_crtc_send_vblank_event(crtc, event);
spin_unlock_irqrestore(&dev->event_lock, flags);
}
 }
 
 static void qxl_crtc_destroy(struct drm_crtc *crtc)
 {
struct qxl_crtc *qxl_crtc = to_qxl_crtc(crtc);
 
+   qxl_bo_unref(&qxl_crtc->cursor_bo);
drm_crtc_cleanup(crtc);
kfree(qxl_crtc);
 }
 
 static const struct drm_crtc_funcs qxl_crtc_funcs = {
.set_config = drm_atomic_helper_set_config,
.destroy = qxl_crtc_destroy,
.page_flip = drm_atomic_helper_page_flip,
.reset = drm_atomic_helper_crtc_reset,
.atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state,
.atomic_destroy_state = drm_atomic_helper_crtc_destroy_state,
 };
 
 void qxl_user_framebuffer_destroy(struct drm_framebuffer *fb)
 {
struct qxl_framebuffer *qxl_fb = to_qxl_framebuffer(fb);
struct qxl_bo *bo = gem_to_qxl_bo(qxl_fb->obj);
 
WARN_ON(bo->shadow);
drm_gem_object_unreference_unlocked(qxl_fb->obj);
drm_framebuffer_cleanup(fb);
kfree(qxl_fb);
 }
 
 static int qxl_framebuffer_surface_dirty(struct drm_framebuffer *fb,
 struct drm_file *file_priv,
 unsigned flags, unsigned color,
 struct drm_clip_rect *clips,
 unsigned num_clips)
 {
@@ -468,193 +469,251 @@ static void qxl_crtc_atomic_disable(struct drm_crtc 
*crtc,
 
 static const struct drm_crtc_helper_funcs qxl_crtc_helper_funcs = {
.mode_fixup = qxl_crtc_mode_fixup,
.mode_set_nofb = qxl_mode_set_nofb,
.atomic_flush = qxl_crtc_atomic_flush,
.atomic_enable = qxl_crtc_atomic_enable,
.atomic_disable = qxl_crtc_atomic_disable,
 };
 
 static int qxl_primary_atomic_check(struct drm_plane *plane,
struct drm_plane_state *state)
 {
struct qxl_device *qdev = plane->dev->dev_private;
struct qxl_framebuffer *qfb;
struct qxl_bo *bo;
 
if (!state->crtc || !state->fb)
return 0;
 
qfb = to_qxl_framebuffer(state->fb);
bo = gem_to_qxl_bo(qfb->obj);
 
if (bo->surf.stride * bo->surf.height > qdev->vram_size) {
DRM_ERROR("Mode doesn't fit in vram size (vgamem)");
return -EINVAL;
}
 
return 0;
 }
 
+static int qxl_primary_apply_cursor(struct drm_plane *plane)
+{
+   struct drm_device *dev = plane->dev;
+   struct qxl_device *qdev = dev->dev_private;
+   struct drm_framebuffer *fb = plane->state->fb;
+   struct qxl_crtc *qcrtc = to_qxl_crtc(plane->state->crtc);
+   struct qxl_cursor_cmd *cmd;
+   struct qxl_re

[PATCH] drm/qxl: reapply cursor after SetCrtc calls

2016-09-09 Thread Ray Strode
From: Ray Strode 

The qxl driver currently destroys and recreates the
qxl "primary" any time the first crtc is set.

A side-effect of destroying the primary is mouse state
associated with the crtc is lost, which leads to
disappearing mouse cursors on wayland sessions.

This commit changes the driver to reapply the cursor
any time SetCrtc is called. It achieves this by keeping
a reference to the cursor bo on the qxl_crtc struct.

Signed-off-by: Ray Strode 

https://bugzilla.redhat.com/show_bug.cgi?id=1200901
---
 drivers/gpu/drm/qxl/qxl_display.c | 56 ++-
 drivers/gpu/drm/qxl/qxl_drv.h |  1 +
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 3aef127..a61c0d4 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -184,60 +184,61 @@ static struct mode_size {
{1440,  900},
{1400, 1050},
{1680, 1050},
{1600, 1200},
{1920, 1080},
{1920, 1200}
 };

 static int qxl_add_common_modes(struct drm_connector *connector,
 unsigned pwidth,
 unsigned pheight)
 {
struct drm_device *dev = connector->dev;
struct drm_display_mode *mode = NULL;
int i;
for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
mode = drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h,
60, false, false, false);
if (common_modes[i].w == pwidth && common_modes[i].h == pheight)
mode->type |= DRM_MODE_TYPE_PREFERRED;
drm_mode_probed_add(connector, mode);
}
return i - 1;
 }

 static void qxl_crtc_destroy(struct drm_crtc *crtc)
 {
struct qxl_crtc *qxl_crtc = to_qxl_crtc(crtc);

drm_crtc_cleanup(crtc);
+   qxl_bo_unref(&qxl_crtc->cursor_bo);
kfree(qxl_crtc);
 }

 static int qxl_crtc_page_flip(struct drm_crtc *crtc,
   struct drm_framebuffer *fb,
   struct drm_pending_vblank_event *event,
   uint32_t page_flip_flags)
 {
struct drm_device *dev = crtc->dev;
struct qxl_device *qdev = dev->dev_private;
struct qxl_framebuffer *qfb_src = to_qxl_framebuffer(fb);
struct qxl_framebuffer *qfb_old = to_qxl_framebuffer(crtc->primary->fb);
struct qxl_bo *bo_old = gem_to_qxl_bo(qfb_old->obj);
struct qxl_bo *bo = gem_to_qxl_bo(qfb_src->obj);
unsigned long flags;
struct drm_clip_rect norect = {
.x1 = 0,
.y1 = 0,
.x2 = fb->width,
.y2 = fb->height
};
int inc = 1;
int one_clip_rect = 1;
int ret = 0;

crtc->primary->fb = fb;
bo_old->is_primary = false;
bo->is_primary = true;

ret = qxl_bo_reserve(bo, false);
@@ -269,60 +270,106 @@ static int qxl_crtc_page_flip(struct drm_crtc *crtc,
return 0;
 }

 static int
 qxl_hide_cursor(struct qxl_device *qdev)
 {
struct qxl_release *release;
struct qxl_cursor_cmd *cmd;
int ret;

ret = qxl_alloc_release_reserved(qdev, sizeof(*cmd), 
QXL_RELEASE_CURSOR_CMD,
 &release, NULL);
if (ret)
return ret;

ret = qxl_release_reserve_list(release, true);
if (ret) {
qxl_release_free(qdev, release);
return ret;
}

cmd = (struct qxl_cursor_cmd *)qxl_release_map(qdev, release);
cmd->type = QXL_CURSOR_HIDE;
qxl_release_unmap(qdev, release, &cmd->release_info);

qxl_push_cursor_ring_release(qdev, release, QXL_CMD_CURSOR, false);
qxl_release_fence_buffer_objects(release);
return 0;
 }

+static int qxl_crtc_apply_cursor(struct drm_crtc *crtc)
+{
+   struct qxl_crtc *qcrtc = to_qxl_crtc(crtc);
+   struct drm_device *dev = crtc->dev;
+   struct qxl_device *qdev = dev->dev_private;
+   struct qxl_cursor_cmd *cmd;
+   struct qxl_release *release;
+   int ret = 0;
+
+   if (!qcrtc->cursor_bo)
+   return 0;
+
+   ret = qxl_alloc_release_reserved(qdev, sizeof(*cmd),
+QXL_RELEASE_CURSOR_CMD,
+&release, NULL);
+   if (ret)
+   return ret;
+
+   ret = qxl_release_list_add(release, qcrtc->cursor_bo);
+   if (ret)
+   goto out_free_release;
+
+   ret = qxl_release_reserve_list(release, false);
+   if (ret)
+   goto out_free_release;
+
+   cmd = (struct qxl_cursor_cmd *)qxl_release_map(qdev, release);
+   cmd->type = QXL_CURSOR_SET;
+   cmd->u.set.position.x = qcrtc-&