Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
Hi On Fri, Oct 13, 2023 at 5:41 AM Daniel Vetter wrote: > > I mean we're not talking about scientific computing, or code > > compilation, or seti@home. We're talking about nearly the equivalent > > of `while (1) __asm__ ("nop");` > > I don't think anyone said this shouldn't be fixed or improved. Yea but we apparently disagree that it would be an improvement to stop discrediting user space for driver problems. You see, to me, there are two problems 1) The behavior itself is not nice and should be fixed 2) The blame/accounting/attribution for a driver problem is getting directed to user space. I think both issues should be fixed. One brought the other issue to light, but both problems, in my mind, are legitimate and should be addressed. You think fixing the second problem is some tactic to paper over the first problem. Christian thinks the second problem isn't a problem at all but the correct design. So none of us are completely aligned and I don't see anyone changing their mind anytime soon. > What I'm saying is that the atomic ioctl is not going to make guarantees > that it will not take up to much cpu time (for some extremely vague value > of "too much") to the point that userspace can configure it's compositor > in a way that it _will_ get killed if we _ever_ violate this rule. yea I find that strange, all kernel tasks have a certain implied baseline responsibility to be good citizens to the system. And how user space is configured seems nearly immaterial. But we're talking in circles. > Fixing the worst offenders I don't think will see any pushback at all. Yea we all agree fixing this one busy loop is a problem but we don't agree on where the cpu time blame should go. > > But *this* feels like duct tape: You've already said there's no > > guarantee the problem won't also happen during preliminary computation > > during non-blocking commits or via other drm entry points. So it > > really does seem like a fix that won't age well. I won't be surprised > > if in ~3 years (or whatever) in some RHEL release there's a customer > > bug leading to the real-time thread getting blocklisted for obscure > > server display hardware because it's causing the session to tank on a > > production machine. > > Yes the atomic ioctl makes no guarantees across drivers and hw platforms > of guaranteed "we will never violate, you can kill your compositor if you > do" cpu bound limits. We'll try to not suck too badly, and we'll try to > focus on fixing the suck where it upsets the people the most. > > But there is fundamentally no hard real time guarantee in atomic. At least > not with the current uapi. So in your mind mutter should get out of the real-time thread business entirely? > The problem isn't about wasting cpu time. It's about you having set up the > system in a way so that mutter gets killed if we ever waste too much cpu > time, ever. mutter is not set up in a way to kill itself if the driver wastes too much cpu time, ever. mutter is set up in a way to kill itself if it, itself, wastes too much cpu time, ever. The fact that the kernel is killing it when a kernel driver is wasting cpu time is the bug that led to the patch in the first place! > The latter is flat out not a guarantee the kernel currently makes, and I > see no practical feasible way to make that happen. And pretending we do > make this guarantee will just result in frustrated users filling bugs that > they desktop session died and developers closing them as "too hard to > fix". I think all three of us agree busy loops are not nice (though maybe we aren't completely aligned on severity). And I'll certainly concede that fixing all the busy loops can be hard. Some of the cpu-bound code paths may even be doing legitimate computation. I still assert that if the uapi calls into driver code that might potentially be doing something slow where it can't sleep, it should be doing it on a workqueue or thread. That seems orthogonal to fixing all the places where the drivers aren't acting nicely. > Maybe as a bit more useful outcome of this entire discussion: Could you > sign up to write a documentation patch for the atomic ioctl that adds a > paragraph stating extremely clearly that this ioctl does not make hard > real time guarantees, but only best effort soft realtime, and even then > priority of the effort is focused entirely on the !ALLOW_MODESET case, > because that is the one users care about the most. I don't think I'm the best positioned to write such documentation. I still think what the kernel is doing is wrong here and I don't even fully grok what you mean by best effort. --Ray
Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
Hi, On Mon, Oct 09, 2023 at 02:36:17PM +0200, Christian König wrote: > > > > To be clear, my take is, if driver code is running in process context > > > > and needs to wait for periods of time on the order of or in excess of > > > > a typical process time slice it should be sleeping during the waiting. > > > > If the operation is at a point where it can be cancelled without side > > > > effects, the sleeping should be INTERRUPTIBLE. If it's past the point > > > > of no return, sleeping should be UNINTERRUPTIBLE. At no point, in my > > > > opinion, should kernel code busy block a typical process for dozens of > > > > milliseconds while keeping the process RUNNING. I don't think this is > > > > a controversial take. > > > Exactly that's what I completely disagree on. Okay if we can't agree that it's not okay for user space (or the kernel running in the context of user space) to busy loop a cpu core at 100% utilization throughout and beyond the process's entire scheduled time slice then we really are at an impasse. I gotta say i'm astonished that this seemingly indefensible behavior is somehow a point of contention, but I'm not going to keep arguing about it beyond this email. I mean we're not talking about scientific computing, or code compilation, or seti@home. We're talking about nearly the equivalent of `while (1) __asm__ ("nop");` > > The key point here is that the patch puts the work into the background just > > to avoid that it is accounted to the thread issuing it, and that in turn is > > not valid as far as I can see. > > Yeah it's that aspect I'm really worried about, because we essentially > start to support some gurantees that a) most drivers can't uphold without > a huge amount of work, some of the DC state recomputations are _really_ > expensive b) without actually making the semantics clear, it's just > duct-tape. If DC plane state computation (or whatever) is really taking 50ms or 200ms, then it probably should be done in chunks so it doesn't get preempted at an inopportune point? Look, this is not my wheelhouse, this is your wheelhouse, and I don't want to keep debating forever. It seems there is a discrepancy between our understandings of implied acceptable behavior. > Yes compositors want to run kms in real-time, and yes that results in fun > if you try to strictly account for cpu time spent. Especially if your > policy is to just nuke the real time thread instead of demoting it to > SCHED_NORMAL for a time. So I ended up going with this suggestion for blocking modesets: https://gitlab.gnome.org/GNOME/mutter/-/commit/5d3e31a49968fc0da04e98c0f9d624ea5095c9e0 But *this* feels like duct tape: You've already said there's no guarantee the problem won't also happen during preliminary computation during non-blocking commits or via other drm entry points. So it really does seem like a fix that won't age well. I won't be surprised if in ~3 years (or whatever) in some RHEL release there's a customer bug leading to the real-time thread getting blocklisted for obscure server display hardware because it's causing the session to tank on a production machine. > I think if we want more than hacks here we need to answer two questions: > - which parts of the kms api are real time > - what exactly do we guarantee with that imo, this isn't just about real-time versus non-real-time. It's no more acceptable for non-real-time mutter to be using 100% CPU doing busywaits than it is for real-time mutter to be using 100% cpu doing busywaits. Also, both you and Christian have suggested using the non-blocking modeset api with a fence fd to poll on is equivalent to the blocking api flushing the commit_tail work before returning from the ioctl, but that's not actually true. I think we all now agree the EBUSY problem you mentioned as an issue with my proposed patch wasn't actually a problem for blocking commits, but that very same issue is a problem with the non-blocking commits that then block on a fence fd, right? I guess we'd need to block on a fence fd from the prior non-blocking commit first before starting the blocking commit (or something) --Ray
Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
Hi, On Fri, Oct 6, 2023 at 3:12 AM Christian König wrote: > When the operation busy waits then that *should* get accounted to the > CPU time of the current process. When the operation sleeps and waits for > some interrupt for example it should not get accounted. > What you suggest is to put the parts of the operation which busy wait > into a background task and then sleep for that task to complete. This is > not a solution to the problem, but just hides it. Actually, I think we both probably agree that there shouldn't be long busy waits in the context of the current process. After all, we both agree what the AMD DC driver code is doing is wrong. To be clear, my take is, if driver code is running in process context and needs to wait for periods of time on the order of or in excess of a typical process time slice it should be sleeping during the waiting. If the operation is at a point where it can be cancelled without side effects, the sleeping should be INTERRUPTIBLE. If it's past the point of no return, sleeping should be UNINTERRUPTIBLE. At no point, in my opinion, should kernel code busy block a typical process for dozens of milliseconds while keeping the process RUNNING. I don't think this is a controversial take. Actually, I think (maybe?) you might even agree with that, but you're also saying: user space processes aren't special here. While it's not okay to busy block them, it's also not okay to busy block on the system unbound workqueue either. If that's your sentiment, I don't disagree with it. So I think we both agree the busy waiting is a problem, but maybe we disagree on the best place for the problem to manifest when it happens. One thought re the DC code is regardless of where the code is running, the scheduler is going to forcefully preempt it at some point right? Any writereg/udelay(1)/readreg loop is going to get disrupted by a much bigger than 1us delay by the kernel if the loop goes on long enough. I'm not wrong about that? if that's true, the code might as well switch out the udelay(1) for a usleep(1) and call it a day (well modulo the fact I think it can be called from an interrupt handler; at least "git grep" says there's a link_set_dpms_off in link_dp_irq_handler.c) > Stuff like that is not a valid justification for the change. Ville > changes on the other hand tried to prevent lock contention which is a > valid goal here. Okay so let's land his patchset! (assuming it's ready to go in). Ville, is that something you'd want to resend for review? Note, a point that I don't think has been brought up yet, too, is the system unbound workqueue doesn't run with real time priority. Given the lion's share of mutter's drmModeAtomicCommit calls are nonblock, and so are using the system unbound workqueue for handling the commits, even without this patch, that somewhat diminishes the utility of using a realtime thread anyway. I believe the original point of the realtime thread was to make sure mouse cursor motion updates are as responsive as possible, because any latency there is something the user really feels. Maybe there needs to be a different mechanism in place to make sure user perceived interactivity is given priority. --Ray
Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
Hi, On Thu, Oct 5, 2023 at 5:57 AM Daniel Vetter wrote: > So imo the trouble with this is that we suddenly start to make > realtime/cpu usage guarantees in the atomic ioctl. That's a _huge_ uapi > change, because even limited to the case of !ALLOW_MODESET we do best > effort guarantees at best. So you're saying there's a best effort to not use up process CPU time, but Christian was trying to argue nearly the opposite, that any needed CPU time for the operation should get accounted toward the process. What you're saying makes more sense to me and it tracks with what I'm getting skimming over the code. I see it potentially sleeps during blocking drmModeAtomicCommit() calls in several places: - copy_from_user when copying properties - fence and vblank event allocation - waiting on modeset locks - And even plane changes: for_each_new_plane_in_state(state, plane, new_plane_state, i) {• ... → /*• →* If waiting for fences pre-swap (ie: nonblock), userspace can• →* still interrupt the operation. Instead of blocking until the• →* timer expires, make the wait interruptible.• →*/• → ret = dma_fence_wait(new_plane_state->fence, pre_swap);• ... }• (Ignore the typo where it says "nonblock" instead of "!nonblock", the point is while waiting on plane state changes it sleeps.) So, in general, the ioctl sleeps when it needs to. It's not trying to be some non-premptible RT task with a dedicated cpu budget that it strictly adheres to. That makes sense to me. > And that state recomputation has to happen synchronously, because > it always influences the ioctl errno return value. Not sure I'm following. The task doesn't have to stay in RUNNING the entire time the ioctl is getting carried out. It can get preempted, and rescheduled later, all before returning from the ioctl and delivering the errno back to userspace. What am I missing? The problem isn't that the ioctl blocks, the problem is that when it blocks it shouldn't be leaving the task state in RUNNING. > My take is that you're papering over a performance problem here of the > "the driver is too slow/wastes too much cpu time". We should fix the > driver, if that's possible. I think everyone agrees the amdgpu DC code doing several 100ms busy loops in a row with no break in between is buggy behavior, and getting that fixed is important. Also, there's no dispute that the impetus for my proposed change was that misbehavior in the driver code. Neither of those facts mean we shouldn't improve drm_atomic_helper_commit too. Making the change is a good idea. It was even independently proposed a year ago, for reasons unrelated to the current situation. It makes the nonblock and the !nonblock code paths behave closer to the same. it makes the userspace experience better in the face of driver bugs. You said best effort earlier, this change helps provide a best effort. Realistically, if it was done this way from the start, no one would be batting an eye, right? It just improves things all around. I think maybe you're worried about a slippery slope? > We can also try to change the atomic uapi to give some hard real-time > guarantees so that running compositors as SCHED_RT is possible, but that > - means a very serious stream of bugs to fix all over > - therefore needs some very wide buy-in from drivers that they're willing > to make this guarantee > - probably needs some really carefully carved out limitations, because > there's imo flat-out no way we'll make all atomic ioctl hard time limit > bound We don't need a guarantee that reprogramming ast BMC framebuffer offsets or displayport link training (or whatever) takes less than 200ms. If a driver has to sit and wait 32ms for vblank before twiddling things in hardware to keep crud from showing up on screen or something, that's fine. But in none of these cases should the userspace process be kept in RUNNING while it does it! I take your point that there are a lot of drivers that may be doing slow things, but surely they should let the process nap while they do their work? > Also, as König has pointed out, you can roll this duct-tape out in > userspace by making the commit non-blocking and immediately waiting for > the fences. Sure, userspace can do that (even, it turns out, in the all crtc disabled case which was an open question earlier in the thread). That's not really an argument against fixing the !nonblock case though. > One thing I didn't see mention is that there's a very subtle uapi > difference between non-blocking and blocking: > - non-blocking is not allowed to get ahead of the previous commit, and > will return EBUSY in that case. See the comment in > drm_atomic_helper_commit() > - blocking otoh will just block until any previous pending commit has > finished > > Not taking that into account in your patch here breaks uapi because > userspace will suddenly get EBUSY when they don't expect that. I don't think that's right, drm_atomic_helper_setup_co
Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
Hi, On Wed, Oct 4, 2023 at 1:28 PM Ville Syrjälä wrote: > No one really seemed all that interested in it. I'd still like to get > it in, if for no other reason than to make things operate more uniformly. > Though there are lots of legacy codepaths left that still hold the locks > over the whole commit, but those could be fixed up as a followup. Ah I see what you're saying. > > > execpt I went further and moved the flush past the unlock in the end. > > > > Is that necessary? I was wondering about that but there's this comment > > in the code: > > I'm not really sure there is any point in doing this otherwise. > It would just change which thread executes the code but everything > else would still get blocked on the locks the same as before. Well in my case, I just want driver work to not charge the process cpu time. But I get what you're saying, Checking if new configurations are valid shouldnt block on existing configurations getting applied. Doing it outside the locks isn't necessary to prevents deadlocks, it's necessary because you're trying to avoid contention when there doesn't need to be contention. Your patchset sort of has a different goal, but it solves the issue I care about at the same time. I'd be happy if it went in... Daniel, Dave, what are your takes on this? --Ray
Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
hI, On Thu, Sep 28, 2023 at 11:05 AM Ville Syrjälä wrote: > Here's my earlier take on this: > https://patchwork.freedesktop.org/series/108668/ Nice. Was there push back? Why didn't it go in? > execpt I went further and moved the flush past the unlock in the end. Is that necessary? I was wondering about that but there's this comment in the code: * ... Locks, especially struct * &drm_modeset_lock, should not be held in worker threads or any other * asynchronous context used to commit the hardware state. So it made me think there would be no locks held, and then I threw a scratch build at someone who reported it didn't deadlock and solved their issue. --Ray
Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
Hi, On Thu, Sep 28, 2023 at 9:24 AM Christian König wrote: > If you see a large delay in the dpms off case then we probably have a driver > bug somewhere. This is something we both agree on, I think. >> I'm getting the idea that you think there is some big bucket of kernel >> syscalls that block for a large fraction of a second by design and are >> not meant to reset RLIMIT_RTTIME. > > Yes, exactly that's the case as far as I know. Okay, well i'm willing to be proven wrong. My previously stated list is: futex, sched_yield, and maybe a few obscure ioctls. If you have a big bucket of syscalls in your mind, i'm all ears. I think I'm probably right and the lions share of all syscalls that can block for a long time don't leave the process RUNNABLE, though. > The purpose of RLIMIT_RTTIME is to have a watchdog if an application is still > responsive. Only things which make the application look for outside events > are considered a reset for this whatdog. > > So for example calling select() and poll() will reset the watchdog, but not > calling any DRM modeset functions or an power management IOCTL for example. what power management ioctls are you talking about? > > There are only two possibilities I can see how a DRM modeset is triggering > this: > > 1. We create enough CPU overhead to actually exceed the limit. In this case > triggering it is certainly intentional. As I said before I suspect that not all modeset work for all drivers is happening in the process context already anyway. So if this is intentional, I suspect it's not actually working as designed (but I haven't confirmed that, aside from my `git grep -E 'INIT_WORK|DECLARE_TASKLET|open_softirq|timer_setup|kthread_run' | wc -l` litmus test) . I can't come up with a good reason it would be designed like that, however, and you haven't provided one, so I suspect it's not actually intentional. >> I don't think you really thought through what you're saying here. It >> just flatly doesn't apply for drmModeAtomicCommit. What is an >> application supposed to do? > > Get terminated and restart. That's what the whole functionality is good for. > > If you don't desire that than don't use the RLIMIT_RTTIME functionality. No, getting terminated and restarting the session because drmModeAtomicCommit is taking too long is not really acceptable behavior for a display server. In general, crashing the session is maybe better than locking up the entire system hard, so RLIMIT_RTTIME is still a good idea, but in either case we're in bug land here, each such bug needs to be fixed. In this case, the bug is drmModeAtomicCommit isn't sleeping like nearly every other syscall does. > From the RLIMIT_RTTIME documentation: "The intended use of this limit > is to stop a runaway real-time process from locking up the system." > > And when drmAtomicCommit() is triggering this then we either have a > problem with the application doing something it is not supposed to do > (like blocking for vblank while it should listen to network traffic) or > the driver is somehow buggy. Yes! This is about bugs. > drmModeAtomicCommit() is used by display servers. If drmModeAtomicCommit > runs away in e.g. a set of 100ms busy loops responding to a confused or > slow responding GPU, the system will seemingly lock up to the user. That > is an intractable problem that we can not get away from. It doesn't > matter if the kernel is stuck in process context or on a workqueue. And, > regardless, it's not reasonable to expect userspace to craft elaborate > workarounds for driver bugs. We just have to fix the bugs. > > Exactly that's the reason why I'm pointing out that this isn't a good idea. Not following your logic here. > That just papers over the problem. The process doesn't become more responsive > by hiding the problem. It's not about the process being responsive. In meat-space, the sub-second delay is imperceptible because the screen is turned off. It's about not killing an innocent display server because of a driver bug. > What you need to do here is to report those problems to the driver teams and > not try to hide them this way. There is already a bug, it's mentioned in the merge request, the reason I cc'd you (vs say just Dave and Daniel) is because the driver bug is in amdgpu code. --Ray
Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
hi, On Thu, Sep 28, 2023 at 5:43 AM Michel Dänzer wrote: > >>> When it's really not desirable to account the CPU overhead to the > >>> process initiating it then you probably rather want to use an non > >>> blocking commit plus a dma_fence to wait for the work to end from > >>> userspace. > >> Well, first I don't think that's very convenient. You're talking about > >> a per-plane property, so there would need to be a separate file > >> descriptor allocated for every plane, right? and user-space would have > >> to block on all of them before proceeding? > > OUT_FENCE_PTR is a per-CRTC property, not per-plane. Okay, sure. > Also, at least in this particular case, a single sync file (not dma_fence) > for any CRTC might suffice. I don't see how we could rely on that given the provided api and multitude of drivers. It might work and then break randomly. > >> Is the code really going to allocate a vblank_event when all the > >> crtc's are disabled ? I guess it's possible, but that's > >> counterintuitive to me. I really don't know. You're confident a set of > >> fences will actually work? > > > > No when you disable everything there is of course no fence allocated. > > I don't see why not. "new_crtc_in_state" in this code means the atomic > commit contains some state for the CRTC (such as setting the > OUT_FENCE_PTR property), it could disable or leave it disabled though. > I can't see any other code which would prevent this from working with a > disabled CRTC, I could be missing something though. So I'll concede it's possible it works, and the fact it's using a vblank type event is just technical debt (though Christian says he doesn't think it works, so I think it's possible it doesn't work as well.) Having said that, this whole argument of "the dysfunctional blocking api is actually functioning as designed, but don't use it, since it doesn't work, use the non-blocking api instead, because maybe it might work or it might not, not sure" is pretty non-convincing to me. > Handling modesets asynchronously does seem desirable in mutter to me. Sure, fine. That doesn't mean we shouldn't fix the blocking call to behave like almost all other blocking system calls. --Ray
Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
Hi, On Thu, Sep 28, 2023 at 2:56 AM Christian König wrote: > > To say the "whole point" is about CPU overhead accounting sounds > > rather absurd to me. Is that really what you meant? > > Yes, absolutely. See the functionality you try to implement already exists. You say lower in this same message that you don't believe the functionality actually works for the dpms off case I mentioned. > After making a non blocking commit userspace can still wait on the > commit to finish by looking at the out fence. fences, not fence, fences. drmModeAtomicCommit works on multiple objects at the same time. To follow the spirit of such an api, we would need a separate fd allocated for each crtc and would have to wait on all of them. Surely you can see how that is much less straightforward than using a blocking api. But mutter already uses non-blocking apis for the lion's share of cases. It doesn't need fences for those cases, though, because it can just use page flip events. The main reason it uses blocking apis are for modesets and when doing dpms off. The latter case you said you don't think can use fences, and it certainly can't use page flip events. So if you're right that fences can't be used for the dpms off case, it's not workable answer. If you're wrong, and fences can be used for the dpms off case, then it's a messy answer. > A blocking system call in the sense of RLIMIT_RTTIME means something > which results in the process listening for external events, e.g. calling > select(), epoll() or read() on (for example) a network socket etc... > > As far as I can see drmAtomicCommit() is *not* meant with that what > similar to for example yield() also doesn't reset the RLIMIT_RTTIME counter. No no no, drmModeAtomicCommit() is not like sched_yield(). That's a really strange thing to say (you do mean sched_yield() right?). sched_yield() is an oddball because it's specifically for giving other threads a turn if they need it without causing the current thread to sleep if they don't. It's a niche api that's meant for high performance use cases. It's a way to reduce scheduling latency and increase running time predictability. drmModeAtomicCommit() using up rt time, busy looping while waiting on the hardware to respond, eating into userspace RLIMIT_RTTIME is nothing like that. I'm getting the idea that you think there is some big bucket of kernel syscalls that block for a large fraction of a second by design and are not meant to reset RLIMIT_RTTIME. I could be wrong, but I don't think that's true. Off the top of my head, the only ones I can think of that might reasonably do that are futex() (which obviously can't sleep), sched_yield() (who's whole point is to not sleep), and maybe a some obscure ioctls (some probably legitimately, some probably illegitimately and noone has noticed.). I'm willing to be proven wrong here, and I might be, but right now from thinking about it, my guess is the above list is pretty close to complete. > Well you are breaking the RLIMIT_RTTIME functionality. > > The purpose of that functionality is to allow debugging and monitoring > applications to make sure that they keep alive and can react to external > signals. I don't think you really thought through what you're saying here. It just flatly doesn't apply for drmModeAtomicCommit. What is an application supposed to do? It can't block the SIGKILL that's coming. Respond to the preceding SIGXCPUs? What response could the application possibly make? I'm guessing drmModeAtomicCommit isn't going to EINTR because it's busy looping waiting on hardware in the process context. And the kernel doesn't even guarantee SIGXCPU is going to go to the thread with the stuck syscall, so even if there was a legitimate response (or even "pthread_cancel" or some wreckless nonsense like that), getting the notification to the right part of the program is an exercise in gymnastics. > From the RLIMIT_RTTIME documentation: "The intended use of this limit > is to stop a runaway real-time process from locking up the system." > > And when drmAtomicCommit() is triggering this then we either have a > problem with the application doing something it is not supposed to do > (like blocking for vblank while it should listen to network traffic) or > the driver is somehow buggy. drmModeAtomicCommit() is used by display servers. If drmModeAtomicCommit runs away in e.g. a set of 100ms busy loops responding to a confused or slow responding GPU, the system will seemingly lock up to the user. That is an intractable problem that we can not get away from. It doesn't matter if the kernel is stuck in process context or on a workqueue. And, regardless, it's not reasonable to expect userspace to craft elaborate workarounds for driver bugs. We just have to fix the bugs. > No when you disable everything there is of course no fence allocated. Okay, so it's not actually an answer > But then you also should never see anything waiting for to long to > actually be able to trigger the RLIMI
Re: [PATCH] drm/atomic: Perform blocking commits on workqueue
Hi, On Wed, Sep 27, 2023 at 4:05 AM Christian König wrote: > I'm not an expert for that stuff, but as far as I know the whole purpose > of the blocking functionality is to make sure that the CPU overhead > caused by the commit is accounted to the right process. I'm not an expert either, but that's clearly wrong. The point of blocking functionality in drmAtomicCommit is for the ioctl to block until the operation is completed, so userspace knows that they can commit again after it returns without getting EBUSY, and they can be sure, e.g., the mode is set or the displays are disabled or whatever. To say the "whole point" is about CPU overhead accounting sounds rather absurd to me. Is that really what you meant? You could try to make the argument that one of the points of the blocking call is about CPU accounting (instead of the whole point), maybe that's what you meant? That seems likely wrong to me too. I mean just check the docs: RLIMIT_RTTIME (since Linux 2.6.25) This is a limit (in microseconds) on the amount of CPU time that a process scheduled under a real‐time scheduling policy may consume without making a blocking system call. For the purpose of this limit, each time a process makes a blocking system call, the count of its consumed CPU time is reset to zero. drmAtomicCommit() is making a blocking system call so the limit should be reset, in my opinion. Furthermore, a lot happens under the covers as part of drmAtomicCommit. Are you telling me that in all the drivers handling drmAtomicCommit, none of the work from those drivers gets deferred outside of process context if DRM_MODE_ATOMIC_NONBLOCK isn't set? I find that very hard to believe. But it would have to be true, if one of the main points of the blocking call is about CPU accounting, right? You can't say the point is about CPU accounting if some of the work is handled in a helper thread or work queue or whatever. ╎❯ git grep -E 'INIT_WORK|DECLARE_TASKLET|open_softirq|timer_setup|kthread_run' | wc -l 182 There seems to be a lot of non-process context execution going on in the tree... > So what you are suggesting here is to actually break that functionality > and that doesn't seem to make sense. What's the use case here that could break? Walk me through a real-world, sensible example where a program depends on drmAtomicCommit() blocking and continuing to eat into process cpu time while it blocks. I just want to see where you are coming from. Maybe I'm being unimaginative but I just don't see it. I do see actual main-stream display servers breaking today because the current behavior defies expectations. > When it's really not desirable to account the CPU overhead to the > process initiating it then you probably rather want to use an non > blocking commit plus a dma_fence to wait for the work to end from userspace. Well, first I don't think that's very convenient. You're talking about a per-plane property, so there would need to be a separate file descriptor allocated for every plane, right? and user-space would have to block on all of them before proceeding? Also, are you sure that works in all cases? The problematic case we're facing right now is when all planes and all crtcs are getting disabled. I'm just skimming over the code here, but I see this: → for_each_new_crtc_in_state(state, crtc, crtc_state, i) {• ... → → if (arg->flags & DRM_MODE_PAGE_FLIP_EVENT || fence_ptr) {• ... → → → e = create_vblank_event(crtc, arg->user_data);• ... → → → crtc_state->event = e;• → → }• ... → → if (fence_ptr) {• ... → → → fence = drm_crtc_create_fence(crtc);• ... → → → ret = setup_out_fence(&f[(*num_fences)++], fence);• ... → → → crtc_state->event->base.fence = fence;• → → }• Is the code really going to allocate a vblank_event when all the crtc's are disabled ? I guess it's possible, but that's counterintuitive to me. I really don't know. You're confident a set of fences will actually work? Regardless, this seems like a roundabout way to address a problem that we could just ... fix. --Ray
[PATCH] drm/atomic: Perform blocking commits on workqueue
From: Ray Strode A drm atomic commit can be quite slow on some hardware. It can lead to a lengthy queue of commands that need to get processed and waited on before control can go back to user space. If user space is a real-time thread, that delay can have severe consequences, leading to the process getting killed for exceeding rlimits. This commit addresses the problem by always running the slow part of a commit on a workqueue, separated from the task initiating the commit. This change makes the nonblocking and blocking paths work in the same way, and as a result allows the task to sleep and not use up its RLIMIT_RTTIME allocation. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2861 Signed-off-by: Ray Strode --- drivers/gpu/drm/drm_atomic_helper.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 292e38eb6218..1a1e68d98d38 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -2028,64 +2028,63 @@ int drm_atomic_helper_commit(struct drm_device *dev, * This is the point of no return - everything below never fails except * when the hw goes bonghits. Which means we can commit the new state on * the software side now. */ ret = drm_atomic_helper_swap_state(state, true); if (ret) goto err; /* * Everything below can be run asynchronously without the need to grab * any modeset locks at all under one condition: It must be guaranteed * that the asynchronous work has either been cancelled (if the driver * supports it, which at least requires that the framebuffers get * cleaned up with drm_atomic_helper_cleanup_planes()) or completed * before the new state gets committed on the software side with * drm_atomic_helper_swap_state(). * * This scheme allows new atomic state updates to be prepared and * checked in parallel to the asynchronous completion of the previous * update. Which is important since compositors need to figure out the * composition of the next frame right after having submitted the * current layout. * * NOTE: Commit work has multiple phases, first hardware commit, then * cleanup. We want them to overlap, hence need system_unbound_wq to * make sure work items don't artificially stall on each another. */ drm_atomic_state_get(state); - if (nonblock) - queue_work(system_unbound_wq, &state->commit_work); - else - commit_tail(state); + queue_work(system_unbound_wq, &state->commit_work); + if (!nonblock) + flush_work(&state->commit_work); return 0; err: drm_atomic_helper_cleanup_planes(dev, state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_commit); /** * DOC: implementing nonblocking commit * * Nonblocking atomic commits should use struct &drm_crtc_commit to sequence * different operations against each another. Locks, especially struct * &drm_modeset_lock, should not be held in worker threads or any other * asynchronous context used to commit the hardware state. * * drm_atomic_helper_commit() implements the recommended sequence for * nonblocking commits, using drm_atomic_helper_setup_commit() internally: * * 1. Run drm_atomic_helper_prepare_planes(). Since this can fail and we * need to propagate out of memory/VRAM errors to userspace, it must be called * synchronously. * * 2. Synchronize with any outstanding nonblocking commit worker threads which * might be affected by the new state update. This is handled by * drm_atomic_helper_setup_commit(). * * Asynchronous workers need to have sufficient parallelism to be able to run * different atomic commits on different CRTCs in parallel. The simplest way to -- 2.41.0
Re: [RFC PATCH v2 00/13] Kernel based bootsplash
Hi, On Wed, Dec 20, 2017 at 11:44 AM Max Staudt wrote: > It'd be nice to see this bug fixed, as it happens only occasionally (as is > the nature of a > race condition), and was thus really hard to debug. I'm sure it can drive > people insane, > as they try to find out whether they've disabled Ctrl-Alt-Fx in their > xorg.conf, but really > it's Plymouth getting the system into a bad state. I probably owe a bald > patch on my > head to this bug. Okay, reading through the code I do see how what you're describing could happen. I sketched out a patch here: https://cgit.freedesktop.org/plymouth/commit/?h=fix-vt-management-on-deactivate&id=0b5b790d628f4c96e3ab9fbc0daba67a29b850da that I think should fix it. I need to do some testing with it (ideally rig up a reproducer) before I push it. > This is exactly where the kernel bootsplash is useful. Since it starts even > before any > userspace program is loaded, it can close this gap. > > I've even tried it in combination with Plymouth: Plymouth is just another > graphical > application, so it simply pops up "on top", just like X would. The two > splashes > integrate flawlessly. I just wish it used our modern graphics platform instead of the deprecated subsystem. > One could argue that one could put all DRM drivers into the initrd. Ubuntu > does this, > and the initrd is ~40 MB in size. Not nice. well, that 40mb isn't just graphics drivers... ╎❯ du -sh /lib/modules/`uname -r`/kernel/drivers/gpu 2.7M /lib/modules/4.14.6-300.fc27.x86_64/kernel/drivers/gpu 3M isn't too awful. But really you have two choices as I see it: 1) make the initrd support new hardware 2) make the initrd be taylored to a specific configuration. I actually think ubuntu has it right by doing 1. it's going to give the best user experience. (not just with graphics but other new hardware too). But if you do 2) then it's not unreasonable if things break with new hardware. Now ideally, the breakage would be as isolated as possible. I mean maybe it's okay if the boot splash breaks (or shows a text splash), but it's not okay if the bootsplash sort of works using /dev/fb, but then causes Xorg to break. So we should probably change plymouth to avoid falling back to /dev/fb in the case where new hardware got added. Could probably fix this by detecting if kms is around when the initrd is generated, and adding a config option to skip fb renderer in that case. or something like that. But the easy answer is to just fix the initrd to have the graphics drivers. > And even then, the initrd could be outdated for some reason. Maybe it's a > developer > machine. Nobody would expect the boot to hang/fail because of this problem. Right, ideally boot splash problems should never stop boot from proceeding. > So let's take SUSE. They don't have a finishing transition, the splash simply > stops > and is hidden at once. Such a splash makes sense to be shown instantly, right? I don't think it makes sense for animations to lack transitions. animations without transitions look buggy or unfinished. they should fade out or finish the loop, or whatever. If it's a static image it should fade to black or the background color. (going to be away from the computer for a few days after this message so probably won't reply for a while to further discussion) --Ray ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH v2 03/13] bootsplash: Flush framebuffer after drawing
Hi, > Actually, I don't want that :) > > This was a design decision that I've made to keep the code small and simple > to audit. > As it is, the simple bootsplash code will make 99% of people happy. You think only 1% of linux users have more than one monitor or a 4k screen? > I've made this decision from the point of view of someone who wants to ship a > general > purpose distribution. If you have a look around and compare this to e.g. the > Windows or > Mac OS X bootsplashes, the possibilities that my kernel code provides already > surpasses them. I haven't looked specifically, but I don't believe you :-) You're telling me the apple boot splash isn't scaled up on machines with retina displays? I don't use OSX (or windows), so I don't know, but I'd be really surprised. > If you really want such sophisticated features, supplanting the initial > bootsplash with > Plymouth (or not using it at all) is a better solution. In my opinion, it is > overengineering, > at least for kernel code. disagree..it's support for basic, commodity hardware these days. > As for HiDPI, if you're working on an embedded device with a fixed screen > size - > say a phone - you can easily include a huge picture the size of the screen in > the > bootsplash. I'm talking about a situation where you have a dell xps or whatever with an external monitor attached. Each monitor should be able to show the splash without deforming it, and without making it huge or microscopic. pretty basic stuff... --Ray ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH v2 00/13] Kernel based bootsplash
Hi, > The problem that I am stumbling upon is different: > - the system starts with an FB driver > - after the ShowDelay time, Plymouth opens /dev/fb0 > - the system finally loads the DRM driver, which tries to kick the previous > FB driver > - loading the DRM driver fails because Plymouth still has the previous > /dev/fb0 open So the thing to realize is, that using /dev/fb is a last ditch effort by plymouth to make things work. It's basically a compat hack to keep vga=0x318 working and also a nod to embedded systems that just have /dev/fb and don't have a kms driver. If we fall back to /dev/fb we lose, as mentioned before, multi-monitor support. And it's a legacy, deprecated api. If we've reached the scenario you're discussing above, the real failure is that the KMS driver took too long to load. DRM is the platform graphics api. If it's not loading timely enough to show graphics then that's the problem! It sounds like maybe in the above bug, you're just failing to load the drm driver in the initrd ? > If you have a better way of calling it, I'd be glad to learn. > Maybe "grabbing the VT", "taking ownership of the VT", ...? I don't care what we call it, I just didn't understand what you were saying before. I think i'd say "manages vt switching", but whatever. > And then, if something causes Plymouth to sense a new device (such as Plymouth > thinking that udev coldplug is complete), it will open the device, and as > part of that, > call VT_SETMODE. This is unexpected, since "plymouth deactivate" should keep > it > from doing this. And Plymouth's code architecture is such that this bug is > hard to fix. If what you're describing is happening, this does sound like a bug. I don't think it should be hard to fix, if it's a problem. I'll look into it. > [I] have decided to write a kernel-based replacement to simplify things and > to show a > splash as early as possible. It just avoids all of this complexity. So, for the record, I don't actually have a problem with you doing a kernel based splash. (though it should use drm subsystem apis not graphics subsystem apis, /dev/fb is going the way of the dodo) > This is the sleep that I mean. > > On the one hand, it is this delay that makes most users not notice the > "busy VRAM bug". If the DRM driver that replaces the FB driver is included in > the > initramfs, then in most cases, it will be loaded before the 5 seconds are up. > However, > if the driver is loaded after these 5 seconds have elapsed, then Plymouth > will have > opened /dev/fb0 and the modprobe fails. Think of this from a user perspective. If the screen is black for 15 seconds (or something) before a splash is shown, then we've already hit a problem! That's like 15 seconds of time where the user is wondering if their system is broken. But I don't think that actually happens in practice. I think (maybe?) the situation you're hitting is your drm driver isn't starting to get loaded until N seconds after boot has started, because it's not in the initrd. So the fix is to put it in the initrd. > On the other hand, what is the motivation for this delay? As I said earlier, the motivation for the delay is to avoid showing a splash for systems that boot in 4 seconds or something. At that point a splash is just getting in the way. > If Plymouth were to display the splash instantly on a system that needs 0.5 > seconds to > boot, then the splash would flash for 0.5 seconds. No, flashing a splash for half a second would be a bug. (again think of things from a user perpective). Plymouth splashes have animations at the end to transition the user to the login screen. Normally those animations don't contribute to boot time, because we know when boot will finish from prior boot data. But if boot were 0.5 seconds long, then those animations would contribute 2 to 3 seconds to boot time, and if boot is 0.5 seconds long showing a splash is pointless. > But with the delay, a system that needs 5.5 seconds to boot will also flash > it for 0.5 seconds. > Either way, the splash will just flash for a moment. again, we don't blink the splash on and off. we have transition animations. > The delay only changes which systems are affected. However, if you set the > delay to 0, > you'll run into the bug I described above. Then put the drm driver in the initramfs so you fix your bug ! > This is a design problem, hidden by a needless delay. really don't see how it is. --Ray ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH v2 03/13] bootsplash: Flush framebuffer after drawing
Hi, On Tue, Dec 19, 2017 at 10:41 AM, Max Staudt wrote: > I'm hooking into the in-kernel terminal emulator, because the bootsplash is a > functional extension of that. It just happens that fbcon sits on top of FB, > so I > work with what I get. > > And the console in turn happens to work on all FB and KMS drivers, so it > makes users of all kinds of drivers happy. In fact, that's why the FB > emulation > in KMS drivers came to be in the first place, if I remember right - to ensure > fbcon continues to work. But what about multi-monitor? what about hidpi screens? Ideally you want each monitor to show the splash in a way that best fits that monitor. You can't do that if you're drawing all over fbcon... and it's not like multiple monitors and 4k screens are niche these days. --Ray ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH v2 00/13] Kernel based bootsplash
Hi, > For example, having a userspace splash that starts as early as it can > (thus on vesafb/efifb on a PC) will cause the KMS driver to fail > reserving the entirety of video RAM, and thus fail loading. This cannot be > fixed. well the fix there is to use drm devices... like this: https://cgit.freedesktop.org/plymouth/commit/?id=97f02ee959374afb1b8a78dc67116cd880cf2d23 > Furthermore, Plymouth is quite broken. For example, it may lock > (via VT_SETMODE) the VT even though Plymouth is in "disabled" > state and X has already taken control of the VT. What do you mean by "disabled" (plymouth.enable=0?) ? if it's disabled, it's not going to call VT_SETMODE ... Why do you refer to VT_SETMODE as locking ? VT_SETMODE sets whether a process handles VT changes or the kernel does. There is a long standing kernel issue where a mode of VT_AUTO (kernel handles vt switching) + KDGRAPHICS means VTs can't be changed. is that what you're talking about? Anyway plymouth is only going to step on X's toes, if the distro erroneously asks it to. Normally, a distro would run "plymouth deactivate" before starting X, instead of trying to run them at the same time... > This causes the kernel to throw away X's PID as the VT owner, and thus > chvt and Ctrl-Alt-Fx no longer work because X can neither release the > console (VT_RELDISP fails), nor does the kernel send it the signal to do > so. This is hard to impossible to fix. Unless i'm missing something, this is totally just a problem with startup scripts not doing the right thing? Plymouth shouldn't be doing anything once X is started. If it is, that's either a plymouth bug (or more likely a distro integration problem) > A third reason is that in practice, Plymouth's start is delayed for reasons > such as the above. Yes, race conditions are being worked around with > sleeps. ??? that's not true. We don't have any sleep statements in the code to work around race conditions with X. We do have two configurable delays in the code, are you talking about one of them? 1) The first is a ShowDelay option. The point of this option is, "If boot takes 5 seconds or less, it's essentially instant and we shouldn't show a splash at all". Nothing to do with race conditions. You can set it to 0 if you want. 2) The second is DeviceTimeout option. The point of this option is to decide how long to wait for udev coldplug to finish. It's mostly relevant for systems that don't have kms drivers. The point is at somepoint during boot we need to decide to stop waiting for a drm device to show up and just fallback to showing graphics using legacy interfaces (like /dev/fb). We used to wait until the udev queue went empty, but that's error prone since it gets cleared when the root is switched. See https://lists.freedesktop.org/archives/systemd-devel/2015-March/029184.html > So some issues are hard to fix, others are impossible to fix in userspace. I'm not convinced there are any insurmountable problems here... One thing i'd like to do is change boot to not map fbcon at first, and only map it in on the fly when the user hits escape, or after boot finishes. like, for instance, try booting with fbcon=vc:2 or fbcon=map:9 to see how it improves the boot experience. --Ray ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 0/2] qxl cursor fixes
From: Ray Strode This series includes a small leak fix and a fix for an initial invisible cursor on wayland sessions. Ray Strode (2): drm/qxl: unref cursor bo when finished with it drm/qxl: reapply cursor after resetting primary drivers/gpu/drm/qxl/qxl_display.c | 63 ++- drivers/gpu/drm/qxl/qxl_drv.h | 2 ++ 2 files changed, 64 insertions(+), 1 deletion(-) -- 2.14.3 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] drm/qxl: unref cursor bo when finished with it
From: Ray Strode qxl_cursor_atomic_update allocs a bo for the cursor that it never frees up at the end of the function. This commit fixes that. Signed-off-by: Ray Strode --- drivers/gpu/drm/qxl/qxl_display.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 4756b3c9bf2c..7335d99244d5 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -548,61 +548,61 @@ static void qxl_primary_atomic_disable(struct drm_plane *plane, struct qxl_device *qdev = plane->dev->dev_private; if (old_state->fb) { struct qxl_framebuffer *qfb = to_qxl_framebuffer(old_state->fb); struct qxl_bo *bo = gem_to_qxl_bo(qfb->obj); if (bo->is_primary) { qxl_io_destroy_primary(qdev); bo->is_primary = false; } } } static int qxl_plane_atomic_check(struct drm_plane *plane, struct drm_plane_state *state) { return 0; } static void qxl_cursor_atomic_update(struct drm_plane *plane, struct drm_plane_state *old_state) { struct drm_device *dev = plane->dev; struct qxl_device *qdev = dev->dev_private; struct drm_framebuffer *fb = plane->state->fb; struct qxl_release *release; struct qxl_cursor_cmd *cmd; struct qxl_cursor *cursor; struct drm_gem_object *obj; - struct qxl_bo *cursor_bo, *user_bo = NULL; + struct qxl_bo *cursor_bo = NULL, *user_bo = NULL; int ret; void *user_ptr; int size = 64*64*4; ret = qxl_alloc_release_reserved(qdev, sizeof(*cmd), QXL_RELEASE_CURSOR_CMD, &release, NULL); if (ret) return; if (fb != old_state->fb) { obj = to_qxl_framebuffer(fb)->obj; user_bo = gem_to_qxl_bo(obj); /* pinning is done in the prepare/cleanup framevbuffer */ ret = qxl_bo_kmap(user_bo, &user_ptr); if (ret) goto out_free_release; ret = qxl_alloc_bo_reserved(qdev, release, sizeof(struct qxl_cursor) + size, &cursor_bo); if (ret) goto out_kunmap; ret = qxl_release_reserve_list(release, true); if (ret) goto out_free_bo; ret = qxl_bo_kmap(cursor_bo, (void **)&cursor); @@ -618,60 +618,62 @@ static void qxl_cursor_atomic_update(struct drm_plane *plane, cursor->data_size = size; cursor->chunk.next_chunk = 0; cursor->chunk.prev_chunk = 0; cursor->chunk.data_size = size; memcpy(cursor->chunk.data, user_ptr, size); qxl_bo_kunmap(cursor_bo); qxl_bo_kunmap(user_bo); cmd = (struct qxl_cursor_cmd *) qxl_release_map(qdev, release); cmd->u.set.visible = 1; cmd->u.set.shape = qxl_bo_physical_address(qdev, cursor_bo, 0); cmd->type = QXL_CURSOR_SET; } else { ret = qxl_release_reserve_list(release, true); if (ret) goto out_free_release; cmd = (struct qxl_cursor_cmd *) qxl_release_map(qdev, release); cmd->type = QXL_CURSOR_MOVE; } cmd->u.position.x = plane->state->crtc_x + fb->hot_x; cmd->u.position.y = plane->state->crtc_y + fb->hot_y; qxl_release_unmap(qdev, release, &cmd->release_info); qxl_push_cursor_ring_release(qdev, release, QXL_CMD_CURSOR, false); qxl_release_fence_buffer_objects(release); + qxl_bo_unref(&cursor_bo); + return; out_backoff: qxl_release_backoff_reserve_list(release); out_free_bo: qxl_bo_unref(&cursor_bo); out_kunmap: qxl_bo_kunmap(user_bo); out_free_release: qxl_release_free(qdev, release); return; } static void qxl_cursor_atomic_disable(struct drm_plane *plane, struct drm_plane_state *old_state) { struct qxl_device *qdev = plane->dev->dev_private; struct qxl_release *release; struct qxl_cursor_cmd *cmd; int ret; ret = qxl_alloc_release_reserved(qdev, sizeof(*cmd), QXL_RELEASE_CURSOR_CMD, &am
[PATCH 2/2] drm/qxl: reapply cursor after resetting primary
From: Ray Strode QXL associates mouse state with its primary plane. Destroying a primary plane and putting a new one in place has the side effect of destroying the cursor as well. This commit changes the driver to reapply the cursor any time a new primary is created. It achieves this by keeping a reference to the cursor bo on the qxl_crtc struct. This fix is very similar to commit 4532b241a4b7 ("drm/qxl: reapply cursor after SetCrtc calls") which got implicitly reverted as part of implementing the atomic modeset feature. Cc: Gerd Hoffmann Cc: Dave Airlie Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1512097 Fixes: 1277eed5fecb ("drm: qxl: Atomic phase 1: convert cursor to universal plane") Signed-off-by: Ray Strode --- drivers/gpu/drm/qxl/qxl_display.c | 59 +++ drivers/gpu/drm/qxl/qxl_drv.h | 2 ++ 2 files changed, 61 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 7335d99244d5..9a9214ae0fb5 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -262,60 +262,61 @@ static int qxl_add_common_modes(struct drm_connector *connector, mode = drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h, 60, false, false, false); if (common_modes[i].w == pwidth && common_modes[i].h == pheight) mode->type |= DRM_MODE_TYPE_PREFERRED; drm_mode_probed_add(connector, mode); } return i - 1; } static void qxl_crtc_atomic_flush(struct drm_crtc *crtc, struct drm_crtc_state *old_crtc_state) { struct drm_device *dev = crtc->dev; struct drm_pending_vblank_event *event; unsigned long flags; if (crtc->state && crtc->state->event) { event = crtc->state->event; crtc->state->event = NULL; spin_lock_irqsave(&dev->event_lock, flags); drm_crtc_send_vblank_event(crtc, event); spin_unlock_irqrestore(&dev->event_lock, flags); } } static void qxl_crtc_destroy(struct drm_crtc *crtc) { struct qxl_crtc *qxl_crtc = to_qxl_crtc(crtc); + qxl_bo_unref(&qxl_crtc->cursor_bo); drm_crtc_cleanup(crtc); kfree(qxl_crtc); } static const struct drm_crtc_funcs qxl_crtc_funcs = { .set_config = drm_atomic_helper_set_config, .destroy = qxl_crtc_destroy, .page_flip = drm_atomic_helper_page_flip, .reset = drm_atomic_helper_crtc_reset, .atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state, .atomic_destroy_state = drm_atomic_helper_crtc_destroy_state, }; void qxl_user_framebuffer_destroy(struct drm_framebuffer *fb) { struct qxl_framebuffer *qxl_fb = to_qxl_framebuffer(fb); struct qxl_bo *bo = gem_to_qxl_bo(qxl_fb->obj); WARN_ON(bo->shadow); drm_gem_object_unreference_unlocked(qxl_fb->obj); drm_framebuffer_cleanup(fb); kfree(qxl_fb); } static int qxl_framebuffer_surface_dirty(struct drm_framebuffer *fb, struct drm_file *file_priv, unsigned flags, unsigned color, struct drm_clip_rect *clips, unsigned num_clips) { @@ -468,193 +469,251 @@ static void qxl_crtc_atomic_disable(struct drm_crtc *crtc, static const struct drm_crtc_helper_funcs qxl_crtc_helper_funcs = { .mode_fixup = qxl_crtc_mode_fixup, .mode_set_nofb = qxl_mode_set_nofb, .atomic_flush = qxl_crtc_atomic_flush, .atomic_enable = qxl_crtc_atomic_enable, .atomic_disable = qxl_crtc_atomic_disable, }; static int qxl_primary_atomic_check(struct drm_plane *plane, struct drm_plane_state *state) { struct qxl_device *qdev = plane->dev->dev_private; struct qxl_framebuffer *qfb; struct qxl_bo *bo; if (!state->crtc || !state->fb) return 0; qfb = to_qxl_framebuffer(state->fb); bo = gem_to_qxl_bo(qfb->obj); if (bo->surf.stride * bo->surf.height > qdev->vram_size) { DRM_ERROR("Mode doesn't fit in vram size (vgamem)"); return -EINVAL; } return 0; } +static int qxl_primary_apply_cursor(struct drm_plane *plane) +{ + struct drm_device *dev = plane->dev; + struct qxl_device *qdev = dev->dev_private; + struct drm_framebuffer *fb = plane->state->fb; + struct qxl_crtc *qcrtc = to_qxl_crtc(plane->state->crtc); + struct qxl_cursor_cmd *cmd; + struct qxl_re
Re: [PATCH] drm/qxl: reapply cursor after resetting primary
Hi, On Mon, Nov 27, 2017 at 4:07 PM Ray Strode wrote: ... > This commit changes the driver to reapply the cursor any time a new > primary is created. It achieves this by keeping a reference to the > cursor bo on the qxl_crtc struct. So i forgot this is on top of a small leak fix, too. For clarity, I'll resend with both patches in the series. --Ray ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/qxl: reapply cursor after resetting primary
From: Ray Strode QXL associates mouse state with its primary plane. Destroying a primary plane and putting a new one in place has the side effect of destroying the cursor as well. This commit changes the driver to reapply the cursor any time a new primary is created. It achieves this by keeping a reference to the cursor bo on the qxl_crtc struct. This fix is very similar to commit 4532b241a4b7 ("drm/qxl: reapply cursor after SetCrtc calls") which got implicitly reverted as part of implementing the atomic modeset feature. Cc: Gerd Hoffmann Cc: Dave Airlie Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1512097 Fixes: 1277eed5fecb ("drm: qxl: Atomic phase 1: convert cursor to universal plane") Signed-off-by: Ray Strode --- drivers/gpu/drm/qxl/qxl_display.c | 59 +++ drivers/gpu/drm/qxl/qxl_drv.h | 2 ++ 2 files changed, 61 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index fc9e86f3811f..a51a5abb977d 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -262,60 +262,61 @@ static int qxl_add_common_modes(struct drm_connector *connector, mode = drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h, 60, false, false, false); if (common_modes[i].w == pwidth && common_modes[i].h == pheight) mode->type |= DRM_MODE_TYPE_PREFERRED; drm_mode_probed_add(connector, mode); } return i - 1; } static void qxl_crtc_atomic_flush(struct drm_crtc *crtc, struct drm_crtc_state *old_crtc_state) { struct drm_device *dev = crtc->dev; struct drm_pending_vblank_event *event; unsigned long flags; if (crtc->state && crtc->state->event) { event = crtc->state->event; crtc->state->event = NULL; spin_lock_irqsave(&dev->event_lock, flags); drm_crtc_send_vblank_event(crtc, event); spin_unlock_irqrestore(&dev->event_lock, flags); } } static void qxl_crtc_destroy(struct drm_crtc *crtc) { struct qxl_crtc *qxl_crtc = to_qxl_crtc(crtc); + qxl_bo_unref(&qxl_crtc->cursor_bo); drm_crtc_cleanup(crtc); kfree(qxl_crtc); } static const struct drm_crtc_funcs qxl_crtc_funcs = { .set_config = drm_atomic_helper_set_config, .destroy = qxl_crtc_destroy, .page_flip = drm_atomic_helper_page_flip, .reset = drm_atomic_helper_crtc_reset, .atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state, .atomic_destroy_state = drm_atomic_helper_crtc_destroy_state, }; void qxl_user_framebuffer_destroy(struct drm_framebuffer *fb) { struct qxl_framebuffer *qxl_fb = to_qxl_framebuffer(fb); struct qxl_bo *bo = gem_to_qxl_bo(qxl_fb->obj); WARN_ON(bo->shadow); drm_gem_object_unreference_unlocked(qxl_fb->obj); drm_framebuffer_cleanup(fb); kfree(qxl_fb); } static int qxl_framebuffer_surface_dirty(struct drm_framebuffer *fb, struct drm_file *file_priv, unsigned flags, unsigned color, struct drm_clip_rect *clips, unsigned num_clips) { @@ -468,193 +469,251 @@ static void qxl_crtc_atomic_disable(struct drm_crtc *crtc, static const struct drm_crtc_helper_funcs qxl_crtc_helper_funcs = { .mode_fixup = qxl_crtc_mode_fixup, .mode_set_nofb = qxl_mode_set_nofb, .atomic_flush = qxl_crtc_atomic_flush, .atomic_enable = qxl_crtc_atomic_enable, .atomic_disable = qxl_crtc_atomic_disable, }; static int qxl_primary_atomic_check(struct drm_plane *plane, struct drm_plane_state *state) { struct qxl_device *qdev = plane->dev->dev_private; struct qxl_framebuffer *qfb; struct qxl_bo *bo; if (!state->crtc || !state->fb) return 0; qfb = to_qxl_framebuffer(state->fb); bo = gem_to_qxl_bo(qfb->obj); if (bo->surf.stride * bo->surf.height > qdev->vram_size) { DRM_ERROR("Mode doesn't fit in vram size (vgamem)"); return -EINVAL; } return 0; } +static int qxl_primary_apply_cursor(struct drm_plane *plane) +{ + struct drm_device *dev = plane->dev; + struct qxl_device *qdev = dev->dev_private; + struct drm_framebuffer *fb = plane->state->fb; + struct qxl_crtc *qcrtc = to_qxl_crtc(plane->state->crtc); + struct qxl_cursor_cmd *cmd; + struct qxl_re
[PATCH] drm/qxl: reapply cursor after SetCrtc calls
From: Ray Strode The qxl driver currently destroys and recreates the qxl "primary" any time the first crtc is set. A side-effect of destroying the primary is mouse state associated with the crtc is lost, which leads to disappearing mouse cursors on wayland sessions. This commit changes the driver to reapply the cursor any time SetCrtc is called. It achieves this by keeping a reference to the cursor bo on the qxl_crtc struct. Signed-off-by: Ray Strode https://bugzilla.redhat.com/show_bug.cgi?id=1200901 --- drivers/gpu/drm/qxl/qxl_display.c | 56 ++- drivers/gpu/drm/qxl/qxl_drv.h | 1 + 2 files changed, 56 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 3aef127..a61c0d4 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -184,60 +184,61 @@ static struct mode_size { {1440, 900}, {1400, 1050}, {1680, 1050}, {1600, 1200}, {1920, 1080}, {1920, 1200} }; static int qxl_add_common_modes(struct drm_connector *connector, unsigned pwidth, unsigned pheight) { struct drm_device *dev = connector->dev; struct drm_display_mode *mode = NULL; int i; for (i = 0; i < ARRAY_SIZE(common_modes); i++) { mode = drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h, 60, false, false, false); if (common_modes[i].w == pwidth && common_modes[i].h == pheight) mode->type |= DRM_MODE_TYPE_PREFERRED; drm_mode_probed_add(connector, mode); } return i - 1; } static void qxl_crtc_destroy(struct drm_crtc *crtc) { struct qxl_crtc *qxl_crtc = to_qxl_crtc(crtc); drm_crtc_cleanup(crtc); + qxl_bo_unref(&qxl_crtc->cursor_bo); kfree(qxl_crtc); } static int qxl_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb, struct drm_pending_vblank_event *event, uint32_t page_flip_flags) { struct drm_device *dev = crtc->dev; struct qxl_device *qdev = dev->dev_private; struct qxl_framebuffer *qfb_src = to_qxl_framebuffer(fb); struct qxl_framebuffer *qfb_old = to_qxl_framebuffer(crtc->primary->fb); struct qxl_bo *bo_old = gem_to_qxl_bo(qfb_old->obj); struct qxl_bo *bo = gem_to_qxl_bo(qfb_src->obj); unsigned long flags; struct drm_clip_rect norect = { .x1 = 0, .y1 = 0, .x2 = fb->width, .y2 = fb->height }; int inc = 1; int one_clip_rect = 1; int ret = 0; crtc->primary->fb = fb; bo_old->is_primary = false; bo->is_primary = true; ret = qxl_bo_reserve(bo, false); @@ -269,60 +270,106 @@ static int qxl_crtc_page_flip(struct drm_crtc *crtc, return 0; } static int qxl_hide_cursor(struct qxl_device *qdev) { struct qxl_release *release; struct qxl_cursor_cmd *cmd; int ret; ret = qxl_alloc_release_reserved(qdev, sizeof(*cmd), QXL_RELEASE_CURSOR_CMD, &release, NULL); if (ret) return ret; ret = qxl_release_reserve_list(release, true); if (ret) { qxl_release_free(qdev, release); return ret; } cmd = (struct qxl_cursor_cmd *)qxl_release_map(qdev, release); cmd->type = QXL_CURSOR_HIDE; qxl_release_unmap(qdev, release, &cmd->release_info); qxl_push_cursor_ring_release(qdev, release, QXL_CMD_CURSOR, false); qxl_release_fence_buffer_objects(release); return 0; } +static int qxl_crtc_apply_cursor(struct drm_crtc *crtc) +{ + struct qxl_crtc *qcrtc = to_qxl_crtc(crtc); + struct drm_device *dev = crtc->dev; + struct qxl_device *qdev = dev->dev_private; + struct qxl_cursor_cmd *cmd; + struct qxl_release *release; + int ret = 0; + + if (!qcrtc->cursor_bo) + return 0; + + ret = qxl_alloc_release_reserved(qdev, sizeof(*cmd), +QXL_RELEASE_CURSOR_CMD, +&release, NULL); + if (ret) + return ret; + + ret = qxl_release_list_add(release, qcrtc->cursor_bo); + if (ret) + goto out_free_release; + + ret = qxl_release_reserve_list(release, false); + if (ret) + goto out_free_release; + + cmd = (struct qxl_cursor_cmd *)qxl_release_map(qdev, release); + cmd->type = QXL_CURSOR_SET; + cmd->u.set.position.x = qcrtc-&