Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()
On 24/07/2024 12:16, Christian König wrote: Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin: [SNIP] Absolutely. Absolutely good and absolutely me, or absolutely you? :) You, I don't even have time to finish all the stuff I already started :/ Okay, I think I can squeeze it in. These are the TODO points and their opens: - Adjust amdgpu_ctx_set_entity_priority() to call drm_sched_entity_modify_sched() regardless of the hw_type - to fix priority changes on a single sched other than gfx or compute. Either that or to stop using the scheduler priority to implement userspace priorities and always use different HW queues for that. - Document sched_list array lifetime must align with the entity and adjust the callers. Open: Do you still oppose keeping sched_list for num_scheds == 1? Not if you can fix up all callers. If so do you propose drm_sched_entity_modify_sched() keeps disagreeing with drm_sched_entity_init() on this detail? And keep the "one shot single sched_list" quirk in? Why is that nicer than simply keeping the list and remove that quirk? Once lifetime rules are clear it IMO is okay to always keep the list. Yeah if every caller of drm_sched_entity_init() can be fixed I'm fine with that as well. Okay so I will tackle the above few first. - Remove drm_sched_entity_set_priority(). Open: Should we at this point also modify amdgpu_device_init_schedulers() to stop initialising schedulers with DRM_SCHED_PRIORITY_COUNT run queues? One step at a time. And leave this for later. Regards, Tvrtko
[PULL] drm-intel-next-fixes
Hi Dave, Sima, Two fixes for the merge window - turning off preemption on Gen8 since it apparently just doesn't work reliably enough and a fix for potential NULL pointer dereference when stolen memory probing failed. Regards, Tvrtko drm-intel-next-fixes-2024-07-25: - Do not consider preemption during execlists_dequeue for gen8 [gt] (Nitin Gote) - Allow NULL memory region (Jonathan Cavitt) The following changes since commit 509580fad7323b6a5da27e8365cd488f3b57210e: drm/i915/dp: Don't switch the LTTPR mode on an active link (2024-07-16 08:14:29 +) are available in the Git repository at: https://gitlab.freedesktop.org/drm/i915/kernel.git tags/drm-intel-next-fixes-2024-07-25 for you to fetch changes up to 26720dd2b5a1d088bff8f7e6355fca021c83718f: drm/i915: Allow NULL memory region (2024-07-23 09:34:13 +) - Do not consider preemption during execlists_dequeue for gen8 [gt] (Nitin Gote) - Allow NULL memory region (Jonathan Cavitt) Jonathan Cavitt (1): drm/i915: Allow NULL memory region Nitin Gote (1): drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 6 +- drivers/gpu/drm/i915/intel_memory_region.c | 6 -- 2 files changed, 5 insertions(+), 7 deletions(-)
Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()
On 22/07/2024 16:13, Christian König wrote: Am 22.07.24 um 16:43 schrieb Tvrtko Ursulin: On 22/07/2024 15:06, Christian König wrote: Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin: On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for entities with only one assigned scheduler. The commit reduced drm_sched_entity_set_priority() to simply update the entities priority, but the run queue selection logic in drm_sched_entity_select_rq() was never able to actually change the originally assigned run queue. In practice that only affected amdgpu, being the only driver which can do dynamic priority changes. And that appears was attempted to be rectified there in 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx priority override"). A few unresolved problems however were that this only fixed drm_sched_entity_set_priority() *if* drm_sched_entity_modify_sched() was called first. That was not documented anywhere. Secondly, this only works if drm_sched_entity_modify_sched() is actually called, which in amdgpu's case today is true only for gfx and compute. Priority changes for other engines with only one scheduler assigned, such as jpeg and video decode will still not work. Note that this was also noticed in 981b04d96856 ("drm/sched: improve docs around drm_sched_entity"). Completely different set of non-obvious confusion was that whereas drm_sched_entity_init() was not keeping the passed in list of schedulers (courtesy of 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched list")), drm_sched_entity_modify_sched() was disagreeing with that and would simply assign the single item list. That incosistency appears to be semi-silently fixed in ac4eb83ab255 ("drm/sched: select new rq even if there is only one v3"). What was also not documented is why it was important to not keep the list of schedulers when there is only one. I suspect it could have something to do with the fact the passed in array is on stack for many callers with just one scheduler. With more than one scheduler amdgpu is the only caller, and there the container is not on the stack. Keeping a stack backed list in the entity would obviously be undefined behaviour *if* the list was kept. Amdgpu however did only stop passing in stack backed container for the more than one scheduler case in 977f7e1068be ("drm/amdgpu: allocate entities on demand"). Until then I suspect dereferencing freed stack from drm_sched_entity_select_rq() was still present. In order to untangle all that and fix priority changes this patch is bringing back the entity owned container for storing the passed in scheduler list. Please don't. That makes the mess just more horrible. The background of not keeping the array is to intentionally prevent the priority override from working. The bug is rather that adding drm_sched_entity_modify_sched() messed this up. To give more background: Amdgpu has two different ways of handling priority: 1. The priority in the DRM scheduler. 2. Different HW rings with different priorities. Your analysis is correct that drm_sched_entity_init() initially dropped the scheduler list to avoid using a stack allocated list, and that functionality is still used in amdgpu_ctx_init_entity() for example. Setting the scheduler priority was basically just a workaround because we didn't had the hw priorities at that time. Since that is no longer the case I suggest to just completely drop the drm_sched_entity_set_priority() function instead. Removing drm_sched_entity_set_priority() is one thing, but we also need to clear up the sched_list container ownership issue. It is neither documented, nor robustly handled in the code. The "num_scheds == 1" special casing throughout IMHO has to go too. I disagree. Keeping the scheduler list in the entity is only useful for load balancing. As long as only one scheduler is provided and we don't load balance the entity doesn't needs the scheduler list in the first place. Once set_priority is removed then it indeed it doesn't. But even when it is removed it needs documenting who owns the passed in container. Today drivers are okay to pass a stack array when it is one element, but if they did it with more than one they would be in for a nasty surprise. Yes, completely agree. But instead of copying the array I would rather go into the direction to cleanup all callers and make the scheduler list mandatory to stay around as long as the scheduler lives. The whole thing of one calling convention there and another one at a different place really sucks. Ok, lets scroll a bit down to formulate a plan. Another thing if you want to get rid of
Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
On 23/07/2024 16:30, Lucas De Marchi wrote: On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote: On 22/07/2024 22:06, Lucas De Marchi wrote: Instead of calling perf_pmu_unregister() when unbinding, defer that to the destruction of i915 object. Since perf itself holds a reference in the event, this only happens when all events are gone, which guarantees i915 is not unregistering the pmu with live events. Previously, running the following sequence would crash the system after ~2 tries: 1) bind device to i915 2) wait events to show up on sysfs 3) start perf stat -I 1000 -e i915/rcs0-busy/ 4) unbind driver 5) kill perf Most of the time this crashes in perf_pmu_disable() while accessing the percpu pmu_disable_count. This happens because perf_pmu_unregister() destroys it with free_percpu(pmu->pmu_disable_count). With a lazy unbind, the pmu is only unregistered after (5) as opposed to after (4). The downside is that if a new bind operation is attempted for the same device/driver without killing the perf process, i915 will fail to register the pmu (but still load successfully). This seems better than completely crashing the system. So effectively allows unbind to succeed without fully unbinding the driver from the device? That sounds like a significant drawback and if so, I wonder if a more complicated solution wouldn't be better after all. Or is there precedence for allowing userspace keeping their paws on unbound devices in this way? keeping the resources alive but "unplunged" while the hardware disappeared is a common thing to do... it's the whole point of the drmm-managed resource for example. If you bind the driver and then unbind it while userspace is holding a ref, next time you try to bind it will come up with a different card number. A similar thing that could be done is to adjust the name of the event - currently we add the mangled pci slot. Yes.. but what my point was this from your commit message: """ The downside is that if a new bind operation is attempted for the same device/driver without killing the perf process, i915 will fail to register the pmu (but still load successfully). """ So the subsequent bind does not "come up with a different card number". Statement is it will come up with an error if we look at the PMU subset of functionality. I was wondering if there was precedent for that kind of situation. Mangling the PMU driver name probably also wouldn't be great. That said, I agree a better approach would be to allow perf_pmu_unregister() to do its job even when there are open events. On top of that (or as a way to help achieve that), make perf core replace the callbacks with stubs when pmu is unregistered - that would even kill the need for i915's checks on pmu->closed (and fix the lack thereof in other drivers). It can be a can of worms though and may be pushed back by perf core maintainers, so it'd be good have their feedback. Yeah definitely would be essential. Regards, Tvrtko Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/i915/i915_pmu.c | 24 +--- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 8708f905f4f4..df53a8fe53ec 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, void *res) struct i915_pmu *pmu = res; struct drm_i915_private *i915 = pmu_to_i915(pmu); + perf_pmu_unregister(>base); free_event_attributes(pmu); kfree(pmu->base.attr_groups); if (IS_DGFX(i915)) kfree(pmu->name); + + /* + * Make sure all currently running (but shortcut on pmu->closed) are + * gone before proceeding with free'ing the pmu object embedded in i915. + */ + synchronize_rcu(); } static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node) { - struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node); - - GEM_BUG_ON(!pmu->base.event_init); - /* Select the first online CPU as a designated reader. */ if (cpumask_empty(_pmu_cpumask)) cpumask_set_cpu(cpu, _pmu_cpumask); @@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node) struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node); unsigned int target = i915_pmu_target_cpu; - GEM_BUG_ON(!pmu->base.event_init); - /* * Unregistering an instance generates a CPU offline event which we must * ignore to avoid incorrectly modifying the shared i915_pmu_cpumask. @@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct drm_i915_private *i915) { struct i915_pmu *pmu = >pmu; - if (!pmu->base.event_init) - return; - /* - * "Disconnect" the PMU callbacks - since all are atomic
Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
On 22/07/2024 22:06, Lucas De Marchi wrote: Instead of calling perf_pmu_unregister() when unbinding, defer that to the destruction of i915 object. Since perf itself holds a reference in the event, this only happens when all events are gone, which guarantees i915 is not unregistering the pmu with live events. Previously, running the following sequence would crash the system after ~2 tries: 1) bind device to i915 2) wait events to show up on sysfs 3) start perf stat -I 1000 -e i915/rcs0-busy/ 4) unbind driver 5) kill perf Most of the time this crashes in perf_pmu_disable() while accessing the percpu pmu_disable_count. This happens because perf_pmu_unregister() destroys it with free_percpu(pmu->pmu_disable_count). With a lazy unbind, the pmu is only unregistered after (5) as opposed to after (4). The downside is that if a new bind operation is attempted for the same device/driver without killing the perf process, i915 will fail to register the pmu (but still load successfully). This seems better than completely crashing the system. So effectively allows unbind to succeed without fully unbinding the driver from the device? That sounds like a significant drawback and if so, I wonder if a more complicated solution wouldn't be better after all. Or is there precedence for allowing userspace keeping their paws on unbound devices in this way? Regards, Tvrtko Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/i915/i915_pmu.c | 24 +--- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 8708f905f4f4..df53a8fe53ec 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, void *res) struct i915_pmu *pmu = res; struct drm_i915_private *i915 = pmu_to_i915(pmu); + perf_pmu_unregister(>base); free_event_attributes(pmu); kfree(pmu->base.attr_groups); if (IS_DGFX(i915)) kfree(pmu->name); + + /* +* Make sure all currently running (but shortcut on pmu->closed) are +* gone before proceeding with free'ing the pmu object embedded in i915. +*/ + synchronize_rcu(); } static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node) { - struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node); - - GEM_BUG_ON(!pmu->base.event_init); - /* Select the first online CPU as a designated reader. */ if (cpumask_empty(_pmu_cpumask)) cpumask_set_cpu(cpu, _pmu_cpumask); @@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node) struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node); unsigned int target = i915_pmu_target_cpu; - GEM_BUG_ON(!pmu->base.event_init); - /* * Unregistering an instance generates a CPU offline event which we must * ignore to avoid incorrectly modifying the shared i915_pmu_cpumask. @@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct drm_i915_private *i915) { struct i915_pmu *pmu = >pmu; - if (!pmu->base.event_init) - return; - /* -* "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu -* ensures all currently executing ones will have exited before we -* proceed with unregistration. +* "Disconnect" the PMU callbacks - unregistering the pmu will be done +* later when all currently open events are gone */ pmu->closed = true; - synchronize_rcu(); hrtimer_cancel(>timer); - i915_pmu_unregister_cpuhp_state(pmu); - perf_pmu_unregister(>base); pmu->base.event_init = NULL; }
Re: [PATCH 5/7] drm/i915/pmu: Let resource survive unbind
On 22/07/2024 22:06, Lucas De Marchi wrote: There's no need to free the resources during unbind. Since perf events may still access them due to open events, it's safer to free them when dropping the last i915 reference. Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/i915/i915_pmu.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index b5d14dd318e4..8708f905f4f4 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -5,6 +5,7 @@ */ #include +#include #include "gt/intel_engine.h" #include "gt/intel_engine_pm.h" @@ -1152,6 +1153,17 @@ static void free_event_attributes(struct i915_pmu *pmu) pmu->pmu_attr = NULL; } +static void free_pmu(struct drm_device *dev, void *res) +{ + struct i915_pmu *pmu = res; + struct drm_i915_private *i915 = pmu_to_i915(pmu); + + free_event_attributes(pmu); + kfree(pmu->base.attr_groups); + if (IS_DGFX(i915)) + kfree(pmu->name); +} + static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node) { struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node); @@ -1302,6 +1314,9 @@ void i915_pmu_register(struct drm_i915_private *i915) if (ret) goto err_unreg; + if (drmm_add_action_or_reset(>drm, free_pmu, pmu)) + goto err_unreg; Is i915_pmu_unregister_cpuhp_state missing on this error path? Regards, Tvrtko + return; err_unreg: @@ -1336,11 +1351,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915) hrtimer_cancel(>timer); i915_pmu_unregister_cpuhp_state(pmu); - perf_pmu_unregister(>base); + pmu->base.event_init = NULL; - kfree(pmu->base.attr_groups); - if (IS_DGFX(i915)) - kfree(pmu->name); - free_event_attributes(pmu); }
Re: [PATCH 4/7] drm/i915/pmu: Drop is_igp()
On 22/07/2024 22:06, Lucas De Marchi wrote: There's no reason to hardcode checking for integrated graphics on a specific pci slot. That information is already available per platform an can be checked with IS_DGFX(). Hmm probably reason was this, added is_igp: commit 05488673a4d41383f9dd537f298e525e6b00fb93 Author: Tvrtko Ursulin AuthorDate: Wed Oct 16 10:38:02 2019 +0100 Commit: Tvrtko Ursulin CommitDate: Thu Oct 17 10:50:47 2019 +0100 drm/i915/pmu: Support multiple GPUs Added IS_DGFX: commit dc90fe3fd219c7693617ba09a9467e4aadc2e039 Author: José Roberto de Souza AuthorDate: Thu Oct 24 12:51:19 2019 -0700 Commit: Lucas De Marchi CommitDate: Fri Oct 25 13:53:51 2019 -0700 drm/i915: Add is_dgfx to device info So innocently arrived just a bit before. Regards, Tvrtko Signed-off-by: Lucas De Marchi --- drivers/gpu/drm/i915/i915_pmu.c | 17 +++-- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 3a8bd11b87e7..b5d14dd318e4 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -1235,17 +1235,6 @@ static void i915_pmu_unregister_cpuhp_state(struct i915_pmu *pmu) cpuhp_state_remove_instance(cpuhp_slot, >cpuhp.node); } -static bool is_igp(struct drm_i915_private *i915) -{ - struct pci_dev *pdev = to_pci_dev(i915->drm.dev); - - /* IGP is :00:02.0 */ - return pci_domain_nr(pdev->bus) == 0 && - pdev->bus->number == 0 && - PCI_SLOT(pdev->devfn) == 2 && - PCI_FUNC(pdev->devfn) == 0; -} - void i915_pmu_register(struct drm_i915_private *i915) { struct i915_pmu *pmu = >pmu; @@ -1269,7 +1258,7 @@ void i915_pmu_register(struct drm_i915_private *i915) pmu->cpuhp.cpu = -1; init_rc6(pmu); - if (!is_igp(i915)) { + if (IS_DGFX(i915)) { pmu->name = kasprintf(GFP_KERNEL, "i915_%s", dev_name(i915->drm.dev)); @@ -1323,7 +1312,7 @@ void i915_pmu_register(struct drm_i915_private *i915) pmu->base.event_init = NULL; free_event_attributes(pmu); err_name: - if (!is_igp(i915)) + if (IS_DGFX(i915)) kfree(pmu->name); err: drm_notice(>drm, "Failed to register PMU!\n"); @@ -1351,7 +1340,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915) perf_pmu_unregister(>base); pmu->base.event_init = NULL; kfree(pmu->base.attr_groups); - if (!is_igp(i915)) + if (IS_DGFX(i915)) kfree(pmu->name); free_event_attributes(pmu); }
Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()
On 22/07/2024 15:06, Christian König wrote: Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin: On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for entities with only one assigned scheduler. The commit reduced drm_sched_entity_set_priority() to simply update the entities priority, but the run queue selection logic in drm_sched_entity_select_rq() was never able to actually change the originally assigned run queue. In practice that only affected amdgpu, being the only driver which can do dynamic priority changes. And that appears was attempted to be rectified there in 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx priority override"). A few unresolved problems however were that this only fixed drm_sched_entity_set_priority() *if* drm_sched_entity_modify_sched() was called first. That was not documented anywhere. Secondly, this only works if drm_sched_entity_modify_sched() is actually called, which in amdgpu's case today is true only for gfx and compute. Priority changes for other engines with only one scheduler assigned, such as jpeg and video decode will still not work. Note that this was also noticed in 981b04d96856 ("drm/sched: improve docs around drm_sched_entity"). Completely different set of non-obvious confusion was that whereas drm_sched_entity_init() was not keeping the passed in list of schedulers (courtesy of 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched list")), drm_sched_entity_modify_sched() was disagreeing with that and would simply assign the single item list. That incosistency appears to be semi-silently fixed in ac4eb83ab255 ("drm/sched: select new rq even if there is only one v3"). What was also not documented is why it was important to not keep the list of schedulers when there is only one. I suspect it could have something to do with the fact the passed in array is on stack for many callers with just one scheduler. With more than one scheduler amdgpu is the only caller, and there the container is not on the stack. Keeping a stack backed list in the entity would obviously be undefined behaviour *if* the list was kept. Amdgpu however did only stop passing in stack backed container for the more than one scheduler case in 977f7e1068be ("drm/amdgpu: allocate entities on demand"). Until then I suspect dereferencing freed stack from drm_sched_entity_select_rq() was still present. In order to untangle all that and fix priority changes this patch is bringing back the entity owned container for storing the passed in scheduler list. Please don't. That makes the mess just more horrible. The background of not keeping the array is to intentionally prevent the priority override from working. The bug is rather that adding drm_sched_entity_modify_sched() messed this up. To give more background: Amdgpu has two different ways of handling priority: 1. The priority in the DRM scheduler. 2. Different HW rings with different priorities. Your analysis is correct that drm_sched_entity_init() initially dropped the scheduler list to avoid using a stack allocated list, and that functionality is still used in amdgpu_ctx_init_entity() for example. Setting the scheduler priority was basically just a workaround because we didn't had the hw priorities at that time. Since that is no longer the case I suggest to just completely drop the drm_sched_entity_set_priority() function instead. Removing drm_sched_entity_set_priority() is one thing, but we also need to clear up the sched_list container ownership issue. It is neither documented, nor robustly handled in the code. The "num_scheds == 1" special casing throughout IMHO has to go too. I disagree. Keeping the scheduler list in the entity is only useful for load balancing. As long as only one scheduler is provided and we don't load balance the entity doesn't needs the scheduler list in the first place. Once set_priority is removed then it indeed it doesn't. But even when it is removed it needs documenting who owns the passed in container. Today drivers are okay to pass a stack array when it is one element, but if they did it with more than one they would be in for a nasty surprise. Another thing if you want to get rid of frontend priority handling is to stop configuring scheduler instances with DRM_SCHED_PRIORITY_COUNT priority levels, to avoid wasting memory on pointless run queues. I would rather like to completely drop the RR with the runlists altogether and keep only the FIFO approach around. This way priority can be implemented by boosting the score of submissions by a certain degree. You mean larger refactoring of the scheduler removing the 1:N between drm_sch
Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()
On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for entities with only one assigned scheduler. The commit reduced drm_sched_entity_set_priority() to simply update the entities priority, but the run queue selection logic in drm_sched_entity_select_rq() was never able to actually change the originally assigned run queue. In practice that only affected amdgpu, being the only driver which can do dynamic priority changes. And that appears was attempted to be rectified there in 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx priority override"). A few unresolved problems however were that this only fixed drm_sched_entity_set_priority() *if* drm_sched_entity_modify_sched() was called first. That was not documented anywhere. Secondly, this only works if drm_sched_entity_modify_sched() is actually called, which in amdgpu's case today is true only for gfx and compute. Priority changes for other engines with only one scheduler assigned, such as jpeg and video decode will still not work. Note that this was also noticed in 981b04d96856 ("drm/sched: improve docs around drm_sched_entity"). Completely different set of non-obvious confusion was that whereas drm_sched_entity_init() was not keeping the passed in list of schedulers (courtesy of 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched list")), drm_sched_entity_modify_sched() was disagreeing with that and would simply assign the single item list. That incosistency appears to be semi-silently fixed in ac4eb83ab255 ("drm/sched: select new rq even if there is only one v3"). What was also not documented is why it was important to not keep the list of schedulers when there is only one. I suspect it could have something to do with the fact the passed in array is on stack for many callers with just one scheduler. With more than one scheduler amdgpu is the only caller, and there the container is not on the stack. Keeping a stack backed list in the entity would obviously be undefined behaviour *if* the list was kept. Amdgpu however did only stop passing in stack backed container for the more than one scheduler case in 977f7e1068be ("drm/amdgpu: allocate entities on demand"). Until then I suspect dereferencing freed stack from drm_sched_entity_select_rq() was still present. In order to untangle all that and fix priority changes this patch is bringing back the entity owned container for storing the passed in scheduler list. Please don't. That makes the mess just more horrible. The background of not keeping the array is to intentionally prevent the priority override from working. The bug is rather that adding drm_sched_entity_modify_sched() messed this up. To give more background: Amdgpu has two different ways of handling priority: 1. The priority in the DRM scheduler. 2. Different HW rings with different priorities. Your analysis is correct that drm_sched_entity_init() initially dropped the scheduler list to avoid using a stack allocated list, and that functionality is still used in amdgpu_ctx_init_entity() for example. Setting the scheduler priority was basically just a workaround because we didn't had the hw priorities at that time. Since that is no longer the case I suggest to just completely drop the drm_sched_entity_set_priority() function instead. Removing drm_sched_entity_set_priority() is one thing, but we also need to clear up the sched_list container ownership issue. It is neither documented, nor robustly handled in the code. The "num_scheds == 1" special casing throughout IMHO has to go too. Another thing if you want to get rid of frontend priority handling is to stop configuring scheduler instances with DRM_SCHED_PRIORITY_COUNT priority levels, to avoid wasting memory on pointless run queues. And final thing is to check whether the locking in drm_sched_entity_modify_sched() is okay. Because according to kerneldoc: * Note that this must be called under the same common lock for @entity as * drm_sched_job_arm() and drm_sched_entity_push_job(), or the driver needs to * guarantee through some other means that this is never called while new jobs * can be pushed to @entity. I don't see that is the case. Priority override is under amdgpu_ctx_mgr->lock, while job arm and push appear not. I also cannot spot anything else preventing amdgpu_sched_ioctl() running in parallel to everything else. In general scheduler priorities were meant to be used for things like kernel queues which would always have higher priority than user space submissions and using them for userspace turned out to be not such a good idea. Out of curiousity what were the problems? I cannot think of anythi
[PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()
From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for entities with only one assigned scheduler. The commit reduced drm_sched_entity_set_priority() to simply update the entities priority, but the run queue selection logic in drm_sched_entity_select_rq() was never able to actually change the originally assigned run queue. In practice that only affected amdgpu, being the only driver which can do dynamic priority changes. And that appears was attempted to be rectified there in 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx priority override"). A few unresolved problems however were that this only fixed drm_sched_entity_set_priority() *if* drm_sched_entity_modify_sched() was called first. That was not documented anywhere. Secondly, this only works if drm_sched_entity_modify_sched() is actually called, which in amdgpu's case today is true only for gfx and compute. Priority changes for other engines with only one scheduler assigned, such as jpeg and video decode will still not work. Note that this was also noticed in 981b04d96856 ("drm/sched: improve docs around drm_sched_entity"). Completely different set of non-obvious confusion was that whereas drm_sched_entity_init() was not keeping the passed in list of schedulers (courtesy of 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched list")), drm_sched_entity_modify_sched() was disagreeing with that and would simply assign the single item list. That incosistency appears to be semi-silently fixed in ac4eb83ab255 ("drm/sched: select new rq even if there is only one v3"). What was also not documented is why it was important to not keep the list of schedulers when there is only one. I suspect it could have something to do with the fact the passed in array is on stack for many callers with just one scheduler. With more than one scheduler amdgpu is the only caller, and there the container is not on the stack. Keeping a stack backed list in the entity would obviously be undefined behaviour *if* the list was kept. Amdgpu however did only stop passing in stack backed container for the more than one scheduler case in 977f7e1068be ("drm/amdgpu: allocate entities on demand"). Until then I suspect dereferencing freed stack from drm_sched_entity_select_rq() was still present. In order to untangle all that and fix priority changes this patch is bringing back the entity owned container for storing the passed in scheduler list. Container is now owned by the entity and the pointers are owned by the drivers. List of schedulers is always kept including for the one scheduler case. The patch can therefore also removes the single scheduler special case, which means that priority changes should now work (be able to change the selected run-queue) for all drivers and engines. In other words drm_sched_entity_set_priority() should now just work for all cases. To enable maintaining its own container some API calls needed to grow a capability for returning success/failure, which is a change which percolates mostly through amdgpu source. Signed-off-by: Tvrtko Ursulin Fixes: b3ac17667f11 ("drm/scheduler: rework entity creation") References: 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched list") References: 977f7e1068be ("drm/amdgpu: allocate entities on demand") References: 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx priority override") References: ac4eb83ab255 ("drm/sched: select new rq even if there is only one v3") References: 981b04d96856 ("drm/sched: improve docs around drm_sched_entity") Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Daniel Vetter Cc: amd-...@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: # v5.6+ --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 31 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 13 +-- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 3 +- drivers/gpu/drm/scheduler/sched_entity.c | 96 --- include/drm/gpu_scheduler.h | 16 ++-- 6 files changed, 100 insertions(+), 61 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index 5cb33ac99f70..387247f8307e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -802,15 +802,15 @@ struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx, return fence; } -static void amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx, - struct amdgpu_ctx_entity *aentity, - int hw_ip, - int32_t priority) +static int amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx, +
[PULL] drm-intel-next-fixes
Hi Dave, Sima, One display fix for the merge window relating to DisplayPort LTTPR. It fixes at least Dell UD22 dock when used on Intel N100 systems. Regards, Tvrtko drm-intel-next-fixes-2024-07-18: - Reset intel_dp->link_trained before retraining the link [dp] (Imre Deak) - Don't switch the LTTPR mode on an active link [dp] (Imre Deak) The following changes since commit c58c39163a7e2c4c8885c57e4e74931c7b482e53: drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB (2024-07-12 13:13:15 +1000) are available in the Git repository at: https://gitlab.freedesktop.org/drm/i915/kernel.git tags/drm-intel-next-fixes-2024-07-18 for you to fetch changes up to 509580fad7323b6a5da27e8365cd488f3b57210e: drm/i915/dp: Don't switch the LTTPR mode on an active link (2024-07-16 08:14:29 +) - Reset intel_dp->link_trained before retraining the link [dp] (Imre Deak) - Don't switch the LTTPR mode on an active link [dp] (Imre Deak) Imre Deak (2): drm/i915/dp: Reset intel_dp->link_trained before retraining the link drm/i915/dp: Don't switch the LTTPR mode on an active link drivers/gpu/drm/i915/display/intel_dp.c| 2 + .../gpu/drm/i915/display/intel_dp_link_training.c | 55 +++--- 2 files changed, 50 insertions(+), 7 deletions(-)
Re: [PATCH] drm/v3d: Expose memory stats through fdinfo
On 11/07/2024 15:25, Maíra Canal wrote: Use the common DRM function `drm_show_memory_stats()` to expose standard fdinfo memory stats. V3D exposes global GPU memory stats through debugfs. Those stats will be preserved while the DRM subsystem doesn't have a standard solution to expose global GPU stats. Signed-off-by: Maíra Canal --- * Example fdinfo output: $ cat /proc/10100/fdinfo/19 pos:0 flags: 0242 mnt_id: 25 ino:521 drm-driver: v3d drm-client-id: 81 drm-engine-bin: 4916187 ns v3d-jobs-bin: 98 jobs drm-engine-render: 154563573 ns v3d-jobs-render:98 jobs drm-engine-tfu: 10574 ns v3d-jobs-tfu: 1 jobs drm-engine-csd: 0 ns v3d-jobs-csd: 0 jobs drm-engine-cache_clean: 0 ns v3d-jobs-cache_clean: 0 jobs drm-engine-cpu: 0 ns v3d-jobs-cpu: 0 jobs drm-total-memory: 15168 KiB drm-shared-memory: 9336 KiB drm-active-memory: 0 * Example gputop output: DRM minor 128 PID MEM RSS bin render tfucsd cache_cleancpu NAME 10257 19M 19M | 3.6% ▎ || 43.2% ██▋ || 0.0% || 0.0% || 0.0% || 0.0% | glmark2 9963 3M 3M | 0.3% ▏ || 2.6% ▎ || 0.0% || 0.0% || 0.0% || 0.0% | glxgears 9965 10M 10M | 0.0% || 0.0% || 0.0% || 0.0% || 0.0% || 0.0% | Xwayland 10100 14M 14M | 0.0% || 0.0% || 0.0% || 0.0% || 0.0% || 0.0% | chromium-browse Best Regards, - Maíra drivers/gpu/drm/v3d/v3d_bo.c | 12 drivers/gpu/drm/v3d/v3d_drv.c | 2 ++ 2 files changed, 14 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c index a165cbcdd27b..ecb80fd75b1a 100644 --- a/drivers/gpu/drm/v3d/v3d_bo.c +++ b/drivers/gpu/drm/v3d/v3d_bo.c @@ -26,6 +26,17 @@ #include "v3d_drv.h" #include "uapi/drm/v3d_drm.h" +static enum drm_gem_object_status v3d_gem_status(struct drm_gem_object *obj) +{ + struct v3d_bo *bo = to_v3d_bo(obj); + enum drm_gem_object_status res = 0; + + if (bo->base.pages) + res |= DRM_GEM_OBJECT_RESIDENT; To check my understanding of v3d - pages are actually always there for the lifetime of the object? If so this could be just "return DRM_GEM_OBJECT_RESIDENT", although granted, like you have it is more future proof. Either way: Reviewed-by: Tvrtko Ursulin Regards, Tvrtko + + return res; +} + /* Called DRM core on the last userspace/kernel unreference of the * BO. */ @@ -63,6 +74,7 @@ static const struct drm_gem_object_funcs v3d_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, + .status = v3d_gem_status, .vm_ops = _gem_shmem_vm_ops, }; diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index a47f00b443d3..e883f405f26a 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -184,6 +184,8 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n", v3d_queue_to_string(queue), jobs_completed); } + + drm_show_memory_stats(p, file); } static const struct file_operations v3d_drm_fops = {
[PATCH 11/11] drm/v3d: Add some local variables in queries/extensions
From: Tvrtko Ursulin Add some local variables to make the code a bit less verbose, with the main benefit being pulling some lines to under 80 columns wide. Signed-off-by: Tvrtko Ursulin Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_submit.c | 88 ++-- 1 file changed, 49 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index b282d12571b5..d607aa9c4ec2 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; + struct v3d_timestamp_query_info *query_info = >timestamp_query; unsigned int i; int err; @@ -473,10 +474,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(timestamp.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + query_info->queries = kvmalloc_array(timestamp.count, +sizeof(struct v3d_timestamp_query), +GFP_KERNEL); + if (!query_info->queries) return -ENOMEM; offsets = u64_to_user_ptr(timestamp.offsets); @@ -490,20 +491,21 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, goto error; } - job->timestamp_query.queries[i].offset = offset; + query_info->queries[i].offset = offset; if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->timestamp_query.queries[i].syncobj) { + query_info->queries[i].syncobj = drm_syncobj_find(file_priv, + sync); + if (!query_info->queries[i].syncobj) { err = -ENOENT; goto error; } } - job->timestamp_query.count = timestamp.count; + query_info->count = timestamp.count; return 0; @@ -519,6 +521,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, { u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; + struct v3d_timestamp_query_info *query_info = >timestamp_query; unsigned int i; int err; @@ -537,10 +540,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(reset.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + query_info->queries = kvmalloc_array(reset.count, +sizeof(struct v3d_timestamp_query), +GFP_KERNEL); + if (!query_info->queries) return -ENOMEM; syncs = u64_to_user_ptr(reset.syncs); @@ -548,20 +551,21 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, for (i = 0; i < reset.count; i++) { u32 sync; - job->timestamp_query.queries[i].offset = reset.offset + 8 * i; + query_info->queries[i].offset = reset.offset + 8 * i; if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->timestamp_query.queries[i].syncobj) { + query_info->queries[i].syncobj = drm_syncobj_find(file_priv, + sync); + if (!query_info->queries[i].syncobj) { err = -ENOENT; goto error; } } - job->timestamp_query.count = reset.count; + query_info->count = reset.count; return 0; @@ -578,6 +582,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_copy_timestamp_query copy; + struct v3d_timestamp_query_info *query_info = >timestamp_query; unsigned int i; int err;
[PATCH 05/11] drm/v3d: Validate passed in drm syncobj handles in the performance extension
From: Tvrtko Ursulin If userspace provides an unknown or invalid handle anywhere in the handle array the rest of the driver will not handle that well. Fix it by checking handle was looked up successfully or otherwise fail the extension by jumping into the existing unwind. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_submit.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 9a3e32075ebe..4cdfabbf4964 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -710,6 +710,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, } job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->performance_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->performance_query.count = reset.count; job->performance_query.nperfmons = reset.nperfmons; @@ -790,6 +794,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, } job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->performance_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->performance_query.count = copy.count; job->performance_query.nperfmons = copy.nperfmons; -- 2.44.0
[PATCH 09/11] drm/v3d: Move perfmon init completely into own unit
From: Tvrtko Ursulin Now that the build time dependencies on various array sizes have been removed, we can move the perfmon init completely into its own compilation unit and remove the hardcoded defines. This improves on the temporary fix quickly delivered in commit 9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning"). Signed-off-by: Tvrtko Ursulin References: 9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning") Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.c | 9 +--- drivers/gpu/drm/v3d/v3d_drv.h | 6 +-- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++ .../gpu/drm/v3d/v3d_performance_counters.h| 16 --- 4 files changed, 40 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index a47f00b443d3..491c638a4d74 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data, args->value = 1; return 0; case DRM_V3D_PARAM_MAX_PERF_COUNTERS: - args->value = v3d->max_counters; + args->value = v3d->perfmon_info.max_counters; return 0; default: DRM_DEBUG("Unknown parameter %d\n", args->param); @@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES); WARN_ON(v3d->cores > 1); /* multicore not yet implemented */ - if (v3d->ver >= 71) - v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS; - else if (v3d->ver >= 42) - v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS; - else - v3d->max_counters = 0; + v3d_perfmon_init(v3d); v3d->reset = devm_reset_control_get_exclusive(dev, NULL); if (IS_ERR(v3d->reset)) { diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index b1dfec49ba7d..8524761bc62d 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -104,10 +104,7 @@ struct v3d_dev { int ver; bool single_irq_line; - /* Different revisions of V3D have different total number of performance -* counters -*/ - unsigned int max_counters; + struct v3d_perfmon_info perfmon_info; void __iomem *hub_regs; void __iomem *core_regs[3]; @@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); /* v3d_perfmon.c */ +void v3d_perfmon_init(struct v3d_dev *v3d); void v3d_perfmon_get(struct v3d_perfmon *perfmon); void v3d_perfmon_put(struct v3d_perfmon *perfmon); void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon); diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index b7d0b02e1a95..cd7f1eedf17f 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = { {"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any other reason (vary/W/Z)"}, }; +void v3d_perfmon_init(struct v3d_dev *v3d) +{ + const struct v3d_perf_counter_desc *counters = NULL; + unsigned int max = 0; + + if (v3d->ver >= 71) { + counters = v3d_v71_performance_counters; + max = ARRAY_SIZE(v3d_v71_performance_counters); + } else if (v3d->ver >= 42) { + counters = v3d_v42_performance_counters; + max = ARRAY_SIZE(v3d_v42_performance_counters); + } + + v3d->perfmon_info.max_counters = max; + v3d->perfmon_info.counters = counters; +} + void v3d_perfmon_get(struct v3d_perfmon *perfmon) { if (perfmon) @@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, /* Make sure all counters are valid. */ for (i = 0; i < req->ncounters; i++) { - if (req->counters[i] >= v3d->max_counters) + if (req->counters[i] >= v3d->perfmon_info.max_counters) return -EINVAL; } @@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, return -EINVAL; } - /* Make sure that the counter ID is valid */ - if (req->counter >= v3d->max_counters) - return -EINVAL; - - BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) != -V3D_V42_NUM_PERFCOUNTERS); - BUILD_BUG_ON(ARRAY_SIZE(v3d_v71_performance_counters) != -V3D_V71_NUM_PERFCOUNTERS); - BUILD_BUG_ON(V3D_MAX_COUNTE
[PATCH 08/11] drm/v3d: Do not use intermediate storage when copying performance query results
From: Tvrtko Ursulin Removing the intermediate buffer removes the last use of the V3D_MAX_COUNTERS define, which will enable further driver cleanup. While at it pull the 32 vs 64 bit copying decision outside the loop in order to reduce the number of conditional instructions. Signed-off-by: Tvrtko Ursulin Reviewed-by: Iago Toral Quiroga Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_sched.c | 59 + 1 file changed, 37 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 7b2195ba4248..d193072703f3 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job) v3d_put_bo_vaddr(bo); } +static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value) +{ + dst[idx] = value; +} + +static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value) +{ + dst[idx] = value; +} + static void -write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value) +write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value) { - if (do_64bit) { - u64 *dst64 = (u64 *)dst; - - dst64[idx] = value; - } else { - u32 *dst32 = (u32 *)dst; - - dst32[idx] = (u32)value; - } + if (do_64bit) + write_to_buffer_64(dst, idx, value); + else + write_to_buffer_32(dst, idx, value); } static void @@ -505,18 +510,24 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job) } static void -v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 query) +v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, + unsigned int query) { - struct v3d_performance_query_info *performance_query = >performance_query; - struct v3d_copy_query_results_info *copy = >copy; + struct v3d_performance_query_info *performance_query = + >performance_query; struct v3d_file_priv *v3d_priv = job->base.file->driver_priv; + struct v3d_performance_query *perf_query = + _query->queries[query]; struct v3d_dev *v3d = job->base.v3d; - struct v3d_perfmon *perfmon; - u64 counter_values[V3D_MAX_COUNTERS]; + unsigned int i, j, offset; + + for (i = 0, offset = 0; +i < performance_query->nperfmons; +i++, offset += DRM_V3D_MAX_PERF_COUNTERS) { + struct v3d_perfmon *perfmon; - for (int i = 0; i < performance_query->nperfmons; i++) { perfmon = v3d_perfmon_find(v3d_priv, - performance_query->queries[query].kperfmon_ids[i]); + perf_query->kperfmon_ids[i]); if (!perfmon) { DRM_DEBUG("Failed to find perfmon."); continue; @@ -524,14 +535,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 quer v3d_perfmon_stop(v3d, perfmon, true); - memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], perfmon->values, - perfmon->ncounters * sizeof(u64)); + if (job->copy.do_64bit) { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_64(data, offset + j, + perfmon->values[j]); + } else { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_32(data, offset + j, + perfmon->values[j]); + } v3d_perfmon_put(perfmon); } - - for (int i = 0; i < performance_query->ncounters; i++) - write_to_buffer(data, i, copy->do_64bit, counter_values[i]); } static void -- 2.44.0
[PATCH 07/11] drm/v3d: Size the kperfmon_ids array at runtime
From: Tvrtko Ursulin Instead of statically reserving pessimistic space for the kperfmon_ids array, make the userspace extension code allocate the exactly required amount of space. Apart from saving some memory at runtime, this also removes the need for the V3D_MAX_PERFMONS macro whose removal will benefit further driver cleanup. Signed-off-by: Tvrtko Ursulin Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.h| 6 +- drivers/gpu/drm/v3d/v3d_sched.c | 4 +++- drivers/gpu/drm/v3d/v3d_submit.c | 17 +++-- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index dd3ead4cb8bd..b1dfec49ba7d 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -351,13 +351,9 @@ struct v3d_timestamp_query { struct drm_syncobj *syncobj; }; -/* Number of perfmons required to handle all supported performance counters */ -#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \ - DRM_V3D_MAX_PERF_COUNTERS) - struct v3d_performance_query { /* Performance monitor IDs for this query */ - u32 kperfmon_ids[V3D_MAX_PERFMONS]; + u32 *kperfmon_ids; /* Syncobj that indicates the query availability */ struct drm_syncobj *syncobj; diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 5fbbee47c6b7..7b2195ba4248 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -94,8 +94,10 @@ v3d_performance_query_info_free(struct v3d_performance_query_info *query_info, if (query_info->queries) { unsigned int i; - for (i = 0; i < count; i++) + for (i = 0; i < count; i++) { drm_syncobj_put(query_info->queries[i].syncobj); + kvfree(query_info->queries[i].kperfmon_ids); + } kvfree(query_info->queries); } diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index ce56e31a027d..d1060e60aafa 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -671,10 +671,20 @@ v3d_copy_query_info(struct v3d_performance_query_info *query_info, goto error; } + query->kperfmon_ids = + kvmalloc_array(nperfmons, + sizeof(struct v3d_performance_query *), + GFP_KERNEL); + if (!query->kperfmon_ids) { + err = -ENOMEM; + goto error; + } + ids_pointer = u64_to_user_ptr(ids); for (j = 0; j < nperfmons; j++) { if (get_user(id, ids_pointer++)) { + kvfree(query->kperfmon_ids); err = -EFAULT; goto error; } @@ -684,6 +694,7 @@ v3d_copy_query_info(struct v3d_performance_query_info *query_info, query->syncobj = drm_syncobj_find(file_priv, sync); if (!query->syncobj) { + kvfree(query->kperfmon_ids); err = -ENOENT; goto error; } @@ -717,9 +728,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (copy_from_user(, ext, sizeof(reset))) return -EFAULT; - if (reset.nperfmons > V3D_MAX_PERFMONS) - return -EINVAL; - job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(reset.count, @@ -767,9 +775,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (copy.pad) return -EINVAL; - if (copy.nperfmons > V3D_MAX_PERFMONS) - return -EINVAL; - job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(copy.count, -- 2.44.0
[PATCH 06/11] drm/v3d: Move part of copying of reset/copy performance extension to a helper
From: Tvrtko Ursulin The loop which looks up the syncobj and copies the kperfmon ids is identical so lets move it to a helper. The only change is replacing copy_from_user with get_user when copying a scalar. Signed-off-by: Tvrtko Ursulin Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_submit.c | 152 ++- 1 file changed, 68 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 4cdfabbf4964..ce56e31a027d 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -644,15 +644,64 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, return err; } +static int +v3d_copy_query_info(struct v3d_performance_query_info *query_info, + unsigned int count, + unsigned int nperfmons, + u32 __user *syncs, + u64 __user *kperfmon_ids, + struct drm_file *file_priv) +{ + unsigned int i, j; + int err; + + for (i = 0; i < count; i++) { + struct v3d_performance_query *query = _info->queries[i]; + u32 __user *ids_pointer; + u32 sync, id; + u64 ids; + + if (get_user(sync, syncs++)) { + err = -EFAULT; + goto error; + } + + if (get_user(ids, kperfmon_ids++)) { + err = -EFAULT; + goto error; + } + + ids_pointer = u64_to_user_ptr(ids); + + for (j = 0; j < nperfmons; j++) { + if (get_user(id, ids_pointer++)) { + err = -EFAULT; + goto error; + } + + query->kperfmon_ids[j] = id; + } + + query->syncobj = drm_syncobj_find(file_priv, sync); + if (!query->syncobj) { + err = -ENOENT; + goto error; + } + } + + return 0; + +error: + v3d_performance_query_info_free(query_info, i); + return err; +} + static int v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, struct drm_v3d_extension __user *ext, struct v3d_cpu_job *job) { - u32 __user *syncs; - u64 __user *kperfmon_ids; struct drm_v3d_reset_performance_query reset; - unsigned int i, j; int err; if (!job) { @@ -679,50 +728,19 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (!job->performance_query.queries) return -ENOMEM; - syncs = u64_to_user_ptr(reset.syncs); - kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids); + err = v3d_copy_query_info(>performance_query, + reset.count, + reset.nperfmons, + u64_to_user_ptr(reset.syncs), + u64_to_user_ptr(reset.kperfmon_ids), + file_priv); + if (err) + return err; - for (i = 0; i < reset.count; i++) { - u32 sync; - u64 ids; - u32 __user *ids_pointer; - u32 id; - - if (copy_from_user(, syncs++, sizeof(sync))) { - err = -EFAULT; - goto error; - } - - if (copy_from_user(, kperfmon_ids++, sizeof(ids))) { - err = -EFAULT; - goto error; - } - - ids_pointer = u64_to_user_ptr(ids); - - for (j = 0; j < reset.nperfmons; j++) { - if (copy_from_user(, ids_pointer++, sizeof(id))) { - err = -EFAULT; - goto error; - } - - job->performance_query.queries[i].kperfmon_ids[j] = id; - } - - job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->performance_query.queries[i].syncobj) { - err = -ENOENT; - goto error; - } - } job->performance_query.count = reset.count; job->performance_query.nperfmons = reset.nperfmons; return 0; - -error: - v3d_performance_query_info_free(>performance_query, i); - return err; } static int @@ -730,10 +748,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, struct drm_v3d_extension __user *ext, struct v3d_cpu_job *job) { - u32 __user *syncs;
[PATCH 04/11] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension
From: Tvrtko Ursulin If userspace provides an unknown or invalid handle anywhere in the handle array the rest of the driver will not handle that well. Fix it by checking handle was looked up successfully or otherwise fail the extension by jumping into the existing unwind. Signed-off-by: Tvrtko Ursulin Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_submit.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 50be4e8a7512..9a3e32075ebe 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -498,6 +498,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = timestamp.count; @@ -552,6 +556,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = reset.count; @@ -616,6 +624,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = copy.count; -- 2.44.0
[PATCH 10/11] drm/v3d: Prefer get_user for scalar types
From: Tvrtko Ursulin It makes it just a tiny bit more obvious what is going on. Signed-off-by: Tvrtko Ursulin Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_submit.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index d1060e60aafa..b282d12571b5 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -485,14 +485,14 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, for (i = 0; i < timestamp.count; i++) { u32 offset, sync; - if (copy_from_user(, offsets++, sizeof(offset))) { + if (get_user(offset, offsets++)) { err = -EFAULT; goto error; } job->timestamp_query.queries[i].offset = offset; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } @@ -550,7 +550,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, job->timestamp_query.queries[i].offset = reset.offset + 8 * i; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } @@ -611,14 +611,14 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, for (i = 0; i < copy.count; i++) { u32 offset, sync; - if (copy_from_user(, offsets++, sizeof(offset))) { + if (get_user(offset, offsets++)) { err = -EFAULT; goto error; } job->timestamp_query.queries[i].offset = offset; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } -- 2.44.0
[PATCH 03/11] drm/v3d: Fix potential memory leak in the performance extension
From: Tvrtko Ursulin If fetching of userspace memory fails during the main loop, all drm sync objs looked up until that point will be leaked because of the missing drm_syncobj_put. Fix it by exporting and using a common cleanup helper. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_drv.h| 2 ++ drivers/gpu/drm/v3d/v3d_sched.c | 22 ++ drivers/gpu/drm/v3d/v3d_submit.c | 52 3 files changed, 50 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index e208ffdfba32..dd3ead4cb8bd 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, unsigned int count); +void v3d_performance_query_info_free(struct v3d_performance_query_info *query_info, +unsigned int count); void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 59dc0287dab9..5fbbee47c6b7 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -87,20 +87,30 @@ v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, } } +void +v3d_performance_query_info_free(struct v3d_performance_query_info *query_info, + unsigned int count) +{ + if (query_info->queries) { + unsigned int i; + + for (i = 0; i < count; i++) + drm_syncobj_put(query_info->queries[i].syncobj); + + kvfree(query_info->queries); + } +} + static void v3d_cpu_job_free(struct drm_sched_job *sched_job) { struct v3d_cpu_job *job = to_cpu_job(sched_job); - struct v3d_performance_query_info *performance_query = >performance_query; v3d_timestamp_query_info_free(>timestamp_query, job->timestamp_query.count); - if (performance_query->queries) { - for (int i = 0; i < performance_query->count; i++) - drm_syncobj_put(performance_query->queries[i].syncobj); - kvfree(performance_query->queries); - } + v3d_performance_query_info_free(>performance_query, + job->performance_query.count); v3d_job_cleanup(>base); } diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 121bf1314b80..50be4e8a7512 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -640,6 +640,8 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, u32 __user *syncs; u64 __user *kperfmon_ids; struct drm_v3d_reset_performance_query reset; + unsigned int i, j; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -668,39 +670,43 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, syncs = u64_to_user_ptr(reset.syncs); kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids); - for (int i = 0; i < reset.count; i++) { + for (i = 0; i < reset.count; i++) { u32 sync; u64 ids; u32 __user *ids_pointer; u32 id; if (copy_from_user(, syncs++, sizeof(sync))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } - job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (copy_from_user(, kperfmon_ids++, sizeof(ids))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } ids_pointer = u64_to_user_ptr(ids); - for (int j = 0; j < reset.nperfmons; j++) { + for (j = 0; j < reset.nperfmons; j++) { if (copy_from_user(, ids_pointer++, sizeof(id))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job-&g
[PATCH 02/11] drm/v3d: Fix potential memory leak in the timestamp extension
From: Tvrtko Ursulin If fetching of userspace memory fails during the main loop, all drm sync objs looked up until that point will be leaked because of the missing drm_syncobj_put. Fix it by exporting and using a common cleanup helper. Signed-off-by: Tvrtko Ursulin Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_drv.h| 2 ++ drivers/gpu/drm/v3d/v3d_sched.c | 22 +++- drivers/gpu/drm/v3d/v3d_submit.c | 43 ++-- 3 files changed, 48 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 099b962bdfde..e208ffdfba32 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo); void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ +void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, + unsigned int count); void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 03df37a3acf5..59dc0287dab9 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job) v3d_job_cleanup(job); } +void +v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, + unsigned int count) +{ + if (query_info->queries) { + unsigned int i; + + for (i = 0; i < count; i++) + drm_syncobj_put(query_info->queries[i].syncobj); + + kvfree(query_info->queries); + } +} + static void v3d_cpu_job_free(struct drm_sched_job *sched_job) { struct v3d_cpu_job *job = to_cpu_job(sched_job); - struct v3d_timestamp_query_info *timestamp_query = >timestamp_query; struct v3d_performance_query_info *performance_query = >performance_query; - if (timestamp_query->queries) { - for (int i = 0; i < timestamp_query->count; i++) - drm_syncobj_put(timestamp_query->queries[i].syncobj); - kvfree(timestamp_query->queries); - } + v3d_timestamp_query_info_free(>timestamp_query, + job->timestamp_query.count); if (performance_query->queries) { for (int i = 0; i < performance_query->count; i++) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 263fefc1d04f..121bf1314b80 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -452,6 +452,8 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; + unsigned int i; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -480,19 +482,19 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, offsets = u64_to_user_ptr(timestamp.offsets); syncs = u64_to_user_ptr(timestamp.syncs); - for (int i = 0; i < timestamp.count; i++) { + for (i = 0; i < timestamp.count; i++) { u32 offset, sync; if (copy_from_user(, offsets++, sizeof(offset))) { - kvfree(job->timestamp_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job->timestamp_query.queries[i].offset = offset; if (copy_from_user(, syncs++, sizeof(sync))) { - kvfree(job->timestamp_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); @@ -500,6 +502,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, job->timestamp_query.count = timestamp.count; return 0; + +error: + v3d_timestamp_query_info_free(>timestamp_query, i); + return err; } static int @@ -509,6 +515,8 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, { u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; + unsigned int i; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -533,14 +541,14 @@ v3d_get_cpu_reset_timestamp_params(st
[PATCH 01/11] drm/v3d: Prevent out of bounds access in performance query extensions
From: Tvrtko Ursulin Check that the number of perfmons userspace is passing in the copy and reset extensions is not greater than the internal kernel storage where the ids will be copied into. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ Reviewed-by: Iago Toral Quiroga Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_submit.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 88f63d526b22..263fefc1d04f 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (copy_from_user(, ext, sizeof(reset))) return -EFAULT; + if (reset.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(reset.count, @@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (copy.pad) return -EINVAL; + if (copy.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(copy.count, -- 2.44.0
[PATCH v4 00/11] v3d: Perfmon cleanup
From: Tvrtko Ursulin When we had to quickly deal with a tree build issue via merging 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we promised to follow up with a nicer solution. As in the process of eliminating the hardcoded defines we have discovered a few issues in handling of corner cases and userspace input validation, the fix has turned into a larger series, but hopefully the end result is a justifiable cleanup. v2: * Re-order the patches so fixes come first while last three are optional cleanups. v3: * Fixed a bunch of rebase errors I made when re-ordering patches from v1 to v2. * Dropped the double underscore from __v3d_timestamp_query_info_free. * Added v3d prefix to v3d_copy_query_info. * Renamed qinfo to query_info. * Fixed some spelling errors and bad patch references. * Added mention to get_user to one commit message. * Dropped one patch from the series which became redundant due other re-ordering. * Re-ordered last two patches with the view of dropping the last. v4: * Fixed more rebase errors and details in commit messages. Cc: Maíra Canal Tvrtko Ursulin (11): drm/v3d: Prevent out of bounds access in performance query extensions drm/v3d: Fix potential memory leak in the timestamp extension drm/v3d: Fix potential memory leak in the performance extension drm/v3d: Validate passed in drm syncobj handles in the timestamp extension drm/v3d: Validate passed in drm syncobj handles in the performance extension drm/v3d: Move part of copying of reset/copy performance extension to a helper drm/v3d: Size the kperfmon_ids array at runtime drm/v3d: Do not use intermediate storage when copying performance query results drm/v3d: Move perfmon init completely into own unit drm/v3d: Prefer get_user for scalar types drm/v3d: Add some local variables in queries/extensions drivers/gpu/drm/v3d/v3d_drv.c | 9 +- drivers/gpu/drm/v3d/v3d_drv.h | 16 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +-- .../gpu/drm/v3d/v3d_performance_counters.h| 16 +- drivers/gpu/drm/v3d/v3d_sched.c | 105 +-- drivers/gpu/drm/v3d/v3d_submit.c | 294 +++--- 6 files changed, 290 insertions(+), 194 deletions(-) -- 2.44.0
Re: [PATCH 03/11] drm/v3d: Fix potential memory leak in the performance extension
On 11/07/2024 14:00, Maíra Canal wrote: On 7/11/24 06:15, Tvrtko Ursulin wrote: From: Tvrtko Ursulin If fetching of userspace memory fails during the main loop, all drm sync objs looked up until that point will be leaked because of the missing drm_syncobj_put. Fix it by exporting and using a common cleanup helper. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_drv.h | 2 ++ drivers/gpu/drm/v3d/v3d_sched.c | 22 ++ drivers/gpu/drm/v3d/v3d_submit.c | 50 3 files changed, 49 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index e208ffdfba32..dd3ead4cb8bd 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, unsigned int count); +void v3d_performance_query_info_free(struct v3d_performance_query_info *query_info, + unsigned int count); void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 59dc0287dab9..5fbbee47c6b7 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -87,20 +87,30 @@ v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, } } +void +v3d_performance_query_info_free(struct v3d_performance_query_info *query_info, + unsigned int count) +{ + if (query_info->queries) { + unsigned int i; + + for (i = 0; i < count; i++) + drm_syncobj_put(query_info->queries[i].syncobj); + + kvfree(query_info->queries); + } +} + static void v3d_cpu_job_free(struct drm_sched_job *sched_job) { struct v3d_cpu_job *job = to_cpu_job(sched_job); - struct v3d_performance_query_info *performance_query = >performance_query; v3d_timestamp_query_info_free(>timestamp_query, job->timestamp_query.count); - if (performance_query->queries) { - for (int i = 0; i < performance_query->count; i++) - drm_syncobj_put(performance_query->queries[i].syncobj); - kvfree(performance_query->queries); - } + v3d_performance_query_info_free(>performance_query, + job->performance_query.count); v3d_job_cleanup(>base); } diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 121bf1314b80..d626c8539b04 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -640,6 +640,8 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, u32 __user *syncs; u64 __user *kperfmon_ids; struct drm_v3d_reset_performance_query reset; + unsigned int i, j; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -668,39 +670,43 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, syncs = u64_to_user_ptr(reset.syncs); kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids); - for (int i = 0; i < reset.count; i++) { + for (i = 0; i < reset.count; i++) { u32 sync; u64 ids; u32 __user *ids_pointer; u32 id; if (copy_from_user(, syncs++, sizeof(sync))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } - job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (copy_from_user(, kperfmon_ids++, sizeof(ids))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } ids_pointer = u64_to_user_ptr(ids); - for (int j = 0; j < reset.nperfmons; j++) { + for (j = 0; j < reset.nperfmons; j++) { if (copy_from_user(, ids_pointer++, sizeof(id))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job->performance_query.queries[i].kperfmon_ids[j] = id; } + + job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); } job->performance_query.count = reset.count; job->performance_query.nperfmons = reset.nperfmons; return 0; + +error: + v3d_performance_query_info_free(>performance_qu
Re: [PATCH 11/12] drm/v3d: Do not use intermediate storage when copying performance query results
On 11/07/2024 13:31, Iago Toral wrote: El mar, 09-07-2024 a las 17:34 +0100, Tvrtko Ursulin escribió: From: Tvrtko Ursulin Removing the intermediate buffer removes the last use of the V3D_MAX_COUNTERS define, which will enable further driver cleanup. While at it pull the 32 vs 64 bit copying decision outside the loop in order to reduce the number of conditional instructions. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_sched.c | 60 --- -- 1 file changed, 37 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index fc8730264386..77f795e38fad 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job) v3d_put_bo_vaddr(bo); } +static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value) +{ + dst[idx] = value; +} + +static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value) +{ + dst[idx] = value; +} + static void -write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value) +write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value) { - if (do_64bit) { - u64 *dst64 = (u64 *)dst; - - dst64[idx] = value; - } else { - u32 *dst32 = (u32 *)dst; - - dst32[idx] = (u32)value; - } + if (do_64bit) + write_to_buffer_64(dst, idx, value); + else + write_to_buffer_32(dst, idx, value); } static void @@ -505,18 +510,23 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job) } static void -v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 query) +v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, + unsigned int query) { - struct v3d_performance_query_info *performance_query = performance_query; - struct v3d_copy_query_results_info *copy = >copy; + struct v3d_performance_query_info *performance_query = + performance_query; struct v3d_file_priv *v3d_priv = job->base.file- driver_priv; struct v3d_dev *v3d = job->base.v3d; - struct v3d_perfmon *perfmon; - u64 counter_values[V3D_MAX_COUNTERS]; + unsigned int i, j, offset; - for (int i = 0; i < performance_query->nperfmons; i++) { - perfmon = v3d_perfmon_find(v3d_priv, - performance_query- queries[query].kperfmon_ids[i]); + for (i = 0, offset = 0; + i < performance_query->nperfmons; + i++, offset += DRM_V3D_MAX_PERF_COUNTERS) { + struct v3d_performance_query *q = + _query->queries[query]; Looks like we could move this before the loop. Indeed! I will change it and re-send, either for v4 of the series, or single update if there will not be any other changes required. Otherwise this patch is: Reviewed-by: Iago Toral Quiroga Thanks! Regards, Tvrtko + struct v3d_perfmon *perfmon; + + perfmon = v3d_perfmon_find(v3d_priv, q- kperfmon_ids[i]); if (!perfmon) { DRM_DEBUG("Failed to find perfmon."); continue; @@ -524,14 +534,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 quer v3d_perfmon_stop(v3d, perfmon, true); - memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], perfmon->values, - perfmon->ncounters * sizeof(u64)); + if (job->copy.do_64bit) { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_64(data, offset + j, + perfmon- values[j]); + } else { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_32(data, offset + j, + perfmon- values[j]); + } v3d_perfmon_put(perfmon); } - - for (int i = 0; i < performance_query->ncounters; i++) - write_to_buffer(data, i, copy->do_64bit, counter_values[i]); } static void
Re: [PATCH] drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8
On 11/07/2024 06:12, Nitin Gote wrote: We're seeing a GPU HANG issue on a CHV platform, which was caused by bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8"). Gen8 platform has only timeslice and doesn't support a preemption mechanism as engines do not have a preemption timer and doesn't send an irq if the preemption timeout expires. So, add a fix to not consider preemption during dequeuing for gen8 platforms. Also move can_preemt() above need_preempt() function to resolve implicit declaration of function ‘can_preempt' error and make can_preempt() function param as const to resolve error: passing argument 1 of ‘can_preempt’ discards ‘const’ qualifier from the pointer target type. v2: Simplify can_preemt() function (Tvrtko Ursulin) Yeah sorry for that yesterday when I thought gen8 emit bb was dead code, somehow I thought there was a gen9 emit_bb flavour. Looks like I confused it with something else. Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396 Suggested-by: Andi Shyti Signed-off-by: Nitin Gote Cc: Chris Wilson CC: # v5.2+ --- .../drm/i915/gt/intel_execlists_submission.c| 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 21829439e686..59885d7721e4 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -294,11 +294,19 @@ static int virtual_prio(const struct intel_engine_execlists *el) return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; } +static bool can_preempt(const struct intel_engine_cs *engine) +{ + return GRAPHICS_VER(engine->i915) > 8; +} + static bool need_preempt(const struct intel_engine_cs *engine, const struct i915_request *rq) { int last_prio; + if (!can_preempt(engine)) + return false; + if (!intel_engine_has_semaphores(engine)) Patch looks clean now. Hmmm one new observation is whether the "has semaphores" check is now redundant? Looks preemption depends on semaphore support in logical_ring_default_vfuncs(). Regards, Tvrtko return false; @@ -3313,15 +3321,6 @@ static void remove_from_engine(struct i915_request *rq) i915_request_notify_execute_cb_imm(rq); } -static bool can_preempt(struct intel_engine_cs *engine) -{ - if (GRAPHICS_VER(engine->i915) > 8) - return true; - - /* GPGPU on bdw requires extra w/a; not implemented */ - return engine->class != RENDER_CLASS; -} - static void kick_execlists(const struct i915_request *rq, int prio) { struct intel_engine_cs *engine = rq->engine;
[PATCH 10/11] drm/v3d: Prefer get_user for scalar types
From: Tvrtko Ursulin It makes it just a tiny bit more obvious what is going on. Signed-off-by: Tvrtko Ursulin Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_submit.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index d1060e60aafa..b282d12571b5 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -485,14 +485,14 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, for (i = 0; i < timestamp.count; i++) { u32 offset, sync; - if (copy_from_user(, offsets++, sizeof(offset))) { + if (get_user(offset, offsets++)) { err = -EFAULT; goto error; } job->timestamp_query.queries[i].offset = offset; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } @@ -550,7 +550,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, job->timestamp_query.queries[i].offset = reset.offset + 8 * i; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } @@ -611,14 +611,14 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, for (i = 0; i < copy.count; i++) { u32 offset, sync; - if (copy_from_user(, offsets++, sizeof(offset))) { + if (get_user(offset, offsets++)) { err = -EFAULT; goto error; } job->timestamp_query.queries[i].offset = offset; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } -- 2.44.0
[PATCH 11/11] drm/v3d: Add some local variables in queries/extensions
From: Tvrtko Ursulin Add some local variables to make the code a bit less verbose, with the main benefit being pulling some lines to under 80 columns wide. Signed-off-by: Tvrtko Ursulin Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_submit.c | 88 ++-- 1 file changed, 49 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index b282d12571b5..d607aa9c4ec2 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; + struct v3d_timestamp_query_info *query_info = >timestamp_query; unsigned int i; int err; @@ -473,10 +474,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(timestamp.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + query_info->queries = kvmalloc_array(timestamp.count, +sizeof(struct v3d_timestamp_query), +GFP_KERNEL); + if (!query_info->queries) return -ENOMEM; offsets = u64_to_user_ptr(timestamp.offsets); @@ -490,20 +491,21 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, goto error; } - job->timestamp_query.queries[i].offset = offset; + query_info->queries[i].offset = offset; if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->timestamp_query.queries[i].syncobj) { + query_info->queries[i].syncobj = drm_syncobj_find(file_priv, + sync); + if (!query_info->queries[i].syncobj) { err = -ENOENT; goto error; } } - job->timestamp_query.count = timestamp.count; + query_info->count = timestamp.count; return 0; @@ -519,6 +521,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, { u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; + struct v3d_timestamp_query_info *query_info = >timestamp_query; unsigned int i; int err; @@ -537,10 +540,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(reset.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + query_info->queries = kvmalloc_array(reset.count, +sizeof(struct v3d_timestamp_query), +GFP_KERNEL); + if (!query_info->queries) return -ENOMEM; syncs = u64_to_user_ptr(reset.syncs); @@ -548,20 +551,21 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, for (i = 0; i < reset.count; i++) { u32 sync; - job->timestamp_query.queries[i].offset = reset.offset + 8 * i; + query_info->queries[i].offset = reset.offset + 8 * i; if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->timestamp_query.queries[i].syncobj) { + query_info->queries[i].syncobj = drm_syncobj_find(file_priv, + sync); + if (!query_info->queries[i].syncobj) { err = -ENOENT; goto error; } } - job->timestamp_query.count = reset.count; + query_info->count = reset.count; return 0; @@ -578,6 +582,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_copy_timestamp_query copy; + struct v3d_timestamp_query_info *query_info = >timestamp_query; unsigned int i; int err;
[PATCH 06/11] drm/v3d: Move part of copying of reset/copy performance extension to a helper
From: Tvrtko Ursulin The loop which looks up the syncobj and copies the kperfmon ids is identical so lets move it to a helper. The only change is replacing copy_from_user with get_user when copying a scalar. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_submit.c | 152 ++- 1 file changed, 68 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 3838ebade45d..ce56e31a027d 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -644,15 +644,64 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, return err; } +static int +v3d_copy_query_info(struct v3d_performance_query_info *query_info, + unsigned int count, + unsigned int nperfmons, + u32 __user *syncs, + u64 __user *kperfmon_ids, + struct drm_file *file_priv) +{ + unsigned int i, j; + int err; + + for (i = 0; i < count; i++) { + struct v3d_performance_query *query = _info->queries[i]; + u32 __user *ids_pointer; + u32 sync, id; + u64 ids; + + if (get_user(sync, syncs++)) { + err = -EFAULT; + goto error; + } + + if (get_user(ids, kperfmon_ids++)) { + err = -EFAULT; + goto error; + } + + ids_pointer = u64_to_user_ptr(ids); + + for (j = 0; j < nperfmons; j++) { + if (get_user(id, ids_pointer++)) { + err = -EFAULT; + goto error; + } + + query->kperfmon_ids[j] = id; + } + + query->syncobj = drm_syncobj_find(file_priv, sync); + if (!query->syncobj) { + err = -ENOENT; + goto error; + } + } + + return 0; + +error: + v3d_performance_query_info_free(query_info, i); + return err; +} + static int v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, struct drm_v3d_extension __user *ext, struct v3d_cpu_job *job) { - u32 __user *syncs; - u64 __user *kperfmon_ids; struct drm_v3d_reset_performance_query reset; - unsigned int i, j; int err; if (!job) { @@ -679,50 +728,19 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (!job->performance_query.queries) return -ENOMEM; - syncs = u64_to_user_ptr(reset.syncs); - kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids); + err = v3d_copy_query_info(>performance_query, + reset.count, + reset.nperfmons, + u64_to_user_ptr(reset.syncs), + u64_to_user_ptr(reset.kperfmon_ids), + file_priv); + if (err) + return err; - for (i = 0; i < reset.count; i++) { - u32 sync; - u64 ids; - u32 __user *ids_pointer; - u32 id; - - if (copy_from_user(, syncs++, sizeof(sync))) { - err = -EFAULT; - goto error; - } - - if (copy_from_user(, kperfmon_ids++, sizeof(ids))) { - err = -EFAULT; - goto error; - } - - ids_pointer = u64_to_user_ptr(ids); - - for (j = 0; j < reset.nperfmons; j++) { - if (copy_from_user(, ids_pointer++, sizeof(id))) { - err = -EFAULT; - goto error; - } - - job->performance_query.queries[i].kperfmon_ids[j] = id; - } - - job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->performance_query.queries[i].syncobj) { - err = -ENOENT; - goto error; - } - } job->performance_query.count = reset.count; job->performance_query.nperfmons = reset.nperfmons; return 0; - -error: - v3d_performance_query_info_free(>performance_query, i); - return err; } static int @@ -730,10 +748,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, struct drm_v3d_extension __user *ext, struct v3d_cpu_job *job) { - u32 __user *syncs;
[PATCH 09/11] drm/v3d: Move perfmon init completely into own unit
From: Tvrtko Ursulin Now that the build time dependencies on various array sizes have been removed, we can move the perfmon init completely into its own compilation unit and remove the hardcoded defines. This improves on the temporary fix quickly delivered in 9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning"). Signed-off-by: Tvrtko Ursulin References: 9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning") --- drivers/gpu/drm/v3d/v3d_drv.c | 9 +--- drivers/gpu/drm/v3d/v3d_drv.h | 6 +-- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++ .../gpu/drm/v3d/v3d_performance_counters.h| 16 --- 4 files changed, 40 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index a47f00b443d3..491c638a4d74 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data, args->value = 1; return 0; case DRM_V3D_PARAM_MAX_PERF_COUNTERS: - args->value = v3d->max_counters; + args->value = v3d->perfmon_info.max_counters; return 0; default: DRM_DEBUG("Unknown parameter %d\n", args->param); @@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES); WARN_ON(v3d->cores > 1); /* multicore not yet implemented */ - if (v3d->ver >= 71) - v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS; - else if (v3d->ver >= 42) - v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS; - else - v3d->max_counters = 0; + v3d_perfmon_init(v3d); v3d->reset = devm_reset_control_get_exclusive(dev, NULL); if (IS_ERR(v3d->reset)) { diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index b1dfec49ba7d..8524761bc62d 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -104,10 +104,7 @@ struct v3d_dev { int ver; bool single_irq_line; - /* Different revisions of V3D have different total number of performance -* counters -*/ - unsigned int max_counters; + struct v3d_perfmon_info perfmon_info; void __iomem *hub_regs; void __iomem *core_regs[3]; @@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); /* v3d_perfmon.c */ +void v3d_perfmon_init(struct v3d_dev *v3d); void v3d_perfmon_get(struct v3d_perfmon *perfmon); void v3d_perfmon_put(struct v3d_perfmon *perfmon); void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon); diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index b7d0b02e1a95..cd7f1eedf17f 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = { {"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any other reason (vary/W/Z)"}, }; +void v3d_perfmon_init(struct v3d_dev *v3d) +{ + const struct v3d_perf_counter_desc *counters = NULL; + unsigned int max = 0; + + if (v3d->ver >= 71) { + counters = v3d_v71_performance_counters; + max = ARRAY_SIZE(v3d_v71_performance_counters); + } else if (v3d->ver >= 42) { + counters = v3d_v42_performance_counters; + max = ARRAY_SIZE(v3d_v42_performance_counters); + } + + v3d->perfmon_info.max_counters = max; + v3d->perfmon_info.counters = counters; +} + void v3d_perfmon_get(struct v3d_perfmon *perfmon) { if (perfmon) @@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, /* Make sure all counters are valid. */ for (i = 0; i < req->ncounters; i++) { - if (req->counters[i] >= v3d->max_counters) + if (req->counters[i] >= v3d->perfmon_info.max_counters) return -EINVAL; } @@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, return -EINVAL; } - /* Make sure that the counter ID is valid */ - if (req->counter >= v3d->max_counters) - return -EINVAL; - - BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) != -V3D_V42_NUM_PERFCOUNTERS); - BUILD_BUG_ON(ARRAY_SIZE(v3d_v71_performance_counters) != -V3D_V71_NUM_PERFCOUNTERS); - BUILD_BUG_ON(V3D_MAX_COUNTERS < V3D_V42_NUM_PERFCOUNTERS); - BUILD_BUG_ON(V3D_MAX_COUNTER
[PATCH 08/11] drm/v3d: Do not use intermediate storage when copying performance query results
From: Tvrtko Ursulin Removing the intermediate buffer removes the last use of the V3D_MAX_COUNTERS define, which will enable further driver cleanup. While at it pull the 32 vs 64 bit copying decision outside the loop in order to reduce the number of conditional instructions. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_sched.c | 60 - 1 file changed, 37 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 7b2195ba4248..2564467735fc 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job) v3d_put_bo_vaddr(bo); } +static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value) +{ + dst[idx] = value; +} + +static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value) +{ + dst[idx] = value; +} + static void -write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value) +write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value) { - if (do_64bit) { - u64 *dst64 = (u64 *)dst; - - dst64[idx] = value; - } else { - u32 *dst32 = (u32 *)dst; - - dst32[idx] = (u32)value; - } + if (do_64bit) + write_to_buffer_64(dst, idx, value); + else + write_to_buffer_32(dst, idx, value); } static void @@ -505,18 +510,23 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job) } static void -v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 query) +v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, + unsigned int query) { - struct v3d_performance_query_info *performance_query = >performance_query; - struct v3d_copy_query_results_info *copy = >copy; + struct v3d_performance_query_info *performance_query = + >performance_query; struct v3d_file_priv *v3d_priv = job->base.file->driver_priv; struct v3d_dev *v3d = job->base.v3d; - struct v3d_perfmon *perfmon; - u64 counter_values[V3D_MAX_COUNTERS]; + unsigned int i, j, offset; - for (int i = 0; i < performance_query->nperfmons; i++) { - perfmon = v3d_perfmon_find(v3d_priv, - performance_query->queries[query].kperfmon_ids[i]); + for (i = 0, offset = 0; +i < performance_query->nperfmons; +i++, offset += DRM_V3D_MAX_PERF_COUNTERS) { + struct v3d_performance_query *q = + _query->queries[query]; + struct v3d_perfmon *perfmon; + + perfmon = v3d_perfmon_find(v3d_priv, q->kperfmon_ids[i]); if (!perfmon) { DRM_DEBUG("Failed to find perfmon."); continue; @@ -524,14 +534,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 quer v3d_perfmon_stop(v3d, perfmon, true); - memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], perfmon->values, - perfmon->ncounters * sizeof(u64)); + if (job->copy.do_64bit) { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_64(data, offset + j, + perfmon->values[j]); + } else { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_32(data, offset + j, + perfmon->values[j]); + } v3d_perfmon_put(perfmon); } - - for (int i = 0; i < performance_query->ncounters; i++) - write_to_buffer(data, i, copy->do_64bit, counter_values[i]); } static void -- 2.44.0
[PATCH 07/11] drm/v3d: Size the kperfmon_ids array at runtime
From: Tvrtko Ursulin Instead of statically reserving pessimistic space for the kperfmon_ids array, make the userspace extension code allocate the exactly required amount of space. Apart from saving some memory at runtime, this also removes the need for the V3D_MAX_PERFMONS macro whose removal will benefit further driver cleanup. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_drv.h| 6 +- drivers/gpu/drm/v3d/v3d_sched.c | 4 +++- drivers/gpu/drm/v3d/v3d_submit.c | 17 +++-- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index dd3ead4cb8bd..b1dfec49ba7d 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -351,13 +351,9 @@ struct v3d_timestamp_query { struct drm_syncobj *syncobj; }; -/* Number of perfmons required to handle all supported performance counters */ -#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \ - DRM_V3D_MAX_PERF_COUNTERS) - struct v3d_performance_query { /* Performance monitor IDs for this query */ - u32 kperfmon_ids[V3D_MAX_PERFMONS]; + u32 *kperfmon_ids; /* Syncobj that indicates the query availability */ struct drm_syncobj *syncobj; diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 5fbbee47c6b7..7b2195ba4248 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -94,8 +94,10 @@ v3d_performance_query_info_free(struct v3d_performance_query_info *query_info, if (query_info->queries) { unsigned int i; - for (i = 0; i < count; i++) + for (i = 0; i < count; i++) { drm_syncobj_put(query_info->queries[i].syncobj); + kvfree(query_info->queries[i].kperfmon_ids); + } kvfree(query_info->queries); } diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index ce56e31a027d..d1060e60aafa 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -671,10 +671,20 @@ v3d_copy_query_info(struct v3d_performance_query_info *query_info, goto error; } + query->kperfmon_ids = + kvmalloc_array(nperfmons, + sizeof(struct v3d_performance_query *), + GFP_KERNEL); + if (!query->kperfmon_ids) { + err = -ENOMEM; + goto error; + } + ids_pointer = u64_to_user_ptr(ids); for (j = 0; j < nperfmons; j++) { if (get_user(id, ids_pointer++)) { + kvfree(query->kperfmon_ids); err = -EFAULT; goto error; } @@ -684,6 +694,7 @@ v3d_copy_query_info(struct v3d_performance_query_info *query_info, query->syncobj = drm_syncobj_find(file_priv, sync); if (!query->syncobj) { + kvfree(query->kperfmon_ids); err = -ENOENT; goto error; } @@ -717,9 +728,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (copy_from_user(, ext, sizeof(reset))) return -EFAULT; - if (reset.nperfmons > V3D_MAX_PERFMONS) - return -EINVAL; - job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(reset.count, @@ -767,9 +775,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (copy.pad) return -EINVAL; - if (copy.nperfmons > V3D_MAX_PERFMONS) - return -EINVAL; - job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(copy.count, -- 2.44.0
[PATCH v3 00/11] v3d: Perfmon cleanup
From: Tvrtko Ursulin When we had to quickly deal with a tree build issue via merging 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we promised to follow up with a nicer solution. As in the process of eliminating the hardcoded defines we have discovered a few issues in handling of corner cases and userspace input validation, the fix has turned into a larger series, but hopefully the end result is a justifiable cleanup. v2: * Re-order the patches so fixes come first while last three are optional cleanups. v3: * Fixed a bunch of rebase errors I made when re-ordering patches from v1 to v2. * Dropped the double underscore from __v3d_timestamp_query_info_free. * Added v3d prefix to v3d_copy_query_info. * Renamed qinfo to query_info. * Fixed some spelling errors and bad patch references. * Added mention to get_user to one commit message. * Dropped one patch from the series which became redundant due other re-ordering. * Re-ordered last two patches with the view of dropping the last. Cc: Maíra Canal Tvrtko Ursulin (11): drm/v3d: Prevent out of bounds access in performance query extensions drm/v3d: Fix potential memory leak in the timestamp extension drm/v3d: Fix potential memory leak in the performance extension drm/v3d: Validate passed in drm syncobj handles in the timestamp extension drm/v3d: Validate passed in drm syncobj handles in the performance extension drm/v3d: Move part of copying of reset/copy performance extension to a helper drm/v3d: Size the kperfmon_ids array at runtime drm/v3d: Do not use intermediate storage when copying performance query results drm/v3d: Move perfmon init completely into own unit drm/v3d: Prefer get_user for scalar types drm/v3d: Add some local variables in queries/extensions drivers/gpu/drm/v3d/v3d_drv.c | 9 +- drivers/gpu/drm/v3d/v3d_drv.h | 16 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +-- .../gpu/drm/v3d/v3d_performance_counters.h| 16 +- drivers/gpu/drm/v3d/v3d_sched.c | 106 --- drivers/gpu/drm/v3d/v3d_submit.c | 294 +++--- 6 files changed, 290 insertions(+), 195 deletions(-) -- 2.44.0
[PATCH 05/11] drm/v3d: Validate passed in drm syncobj handles in the performance extension
From: Tvrtko Ursulin If userspace provides an unknown or invalid handle anywhere in the handle array the rest of the driver will not handle that well. Fix it by checking handle was looked up successfuly or otherwise fail the extension by jumping into the existing unwind. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job" Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_submit.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index e3a00c8394a5..3838ebade45d 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -710,6 +710,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, } job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->performance_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->performance_query.count = reset.count; job->performance_query.nperfmons = reset.nperfmons; @@ -790,6 +794,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, } job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->performance_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->performance_query.count = copy.count; job->performance_query.nperfmons = copy.nperfmons; -- 2.44.0
[PATCH 03/11] drm/v3d: Fix potential memory leak in the performance extension
From: Tvrtko Ursulin If fetching of userspace memory fails during the main loop, all drm sync objs looked up until that point will be leaked because of the missing drm_syncobj_put. Fix it by exporting and using a common cleanup helper. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_drv.h| 2 ++ drivers/gpu/drm/v3d/v3d_sched.c | 22 ++ drivers/gpu/drm/v3d/v3d_submit.c | 50 3 files changed, 49 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index e208ffdfba32..dd3ead4cb8bd 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, unsigned int count); +void v3d_performance_query_info_free(struct v3d_performance_query_info *query_info, +unsigned int count); void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 59dc0287dab9..5fbbee47c6b7 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -87,20 +87,30 @@ v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, } } +void +v3d_performance_query_info_free(struct v3d_performance_query_info *query_info, + unsigned int count) +{ + if (query_info->queries) { + unsigned int i; + + for (i = 0; i < count; i++) + drm_syncobj_put(query_info->queries[i].syncobj); + + kvfree(query_info->queries); + } +} + static void v3d_cpu_job_free(struct drm_sched_job *sched_job) { struct v3d_cpu_job *job = to_cpu_job(sched_job); - struct v3d_performance_query_info *performance_query = >performance_query; v3d_timestamp_query_info_free(>timestamp_query, job->timestamp_query.count); - if (performance_query->queries) { - for (int i = 0; i < performance_query->count; i++) - drm_syncobj_put(performance_query->queries[i].syncobj); - kvfree(performance_query->queries); - } + v3d_performance_query_info_free(>performance_query, + job->performance_query.count); v3d_job_cleanup(>base); } diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 121bf1314b80..d626c8539b04 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -640,6 +640,8 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, u32 __user *syncs; u64 __user *kperfmon_ids; struct drm_v3d_reset_performance_query reset; + unsigned int i, j; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -668,39 +670,43 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, syncs = u64_to_user_ptr(reset.syncs); kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids); - for (int i = 0; i < reset.count; i++) { + for (i = 0; i < reset.count; i++) { u32 sync; u64 ids; u32 __user *ids_pointer; u32 id; if (copy_from_user(, syncs++, sizeof(sync))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } - job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (copy_from_user(, kperfmon_ids++, sizeof(ids))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } ids_pointer = u64_to_user_ptr(ids); - for (int j = 0; j < reset.nperfmons; j++) { + for (j = 0; j < reset.nperfmons; j++) { if (copy_from_user(, ids_pointer++, sizeof(id))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job-&g
[PATCH 02/11] drm/v3d: Fix potential memory leak in the timestamp extension
From: Tvrtko Ursulin If fetching of userspace memory fails during the main loop, all drm sync objs looked up until that point will be leaked because of the missing drm_syncobj_put. Fix it by exporting and using a common cleanup helper. Signed-off-by: Tvrtko Ursulin Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_drv.h| 2 ++ drivers/gpu/drm/v3d/v3d_sched.c | 22 +++- drivers/gpu/drm/v3d/v3d_submit.c | 43 ++-- 3 files changed, 48 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 099b962bdfde..e208ffdfba32 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo); void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ +void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, + unsigned int count); void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 03df37a3acf5..59dc0287dab9 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job) v3d_job_cleanup(job); } +void +v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info, + unsigned int count) +{ + if (query_info->queries) { + unsigned int i; + + for (i = 0; i < count; i++) + drm_syncobj_put(query_info->queries[i].syncobj); + + kvfree(query_info->queries); + } +} + static void v3d_cpu_job_free(struct drm_sched_job *sched_job) { struct v3d_cpu_job *job = to_cpu_job(sched_job); - struct v3d_timestamp_query_info *timestamp_query = >timestamp_query; struct v3d_performance_query_info *performance_query = >performance_query; - if (timestamp_query->queries) { - for (int i = 0; i < timestamp_query->count; i++) - drm_syncobj_put(timestamp_query->queries[i].syncobj); - kvfree(timestamp_query->queries); - } + v3d_timestamp_query_info_free(>timestamp_query, + job->timestamp_query.count); if (performance_query->queries) { for (int i = 0; i < performance_query->count; i++) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 263fefc1d04f..121bf1314b80 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -452,6 +452,8 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; + unsigned int i; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -480,19 +482,19 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, offsets = u64_to_user_ptr(timestamp.offsets); syncs = u64_to_user_ptr(timestamp.syncs); - for (int i = 0; i < timestamp.count; i++) { + for (i = 0; i < timestamp.count; i++) { u32 offset, sync; if (copy_from_user(, offsets++, sizeof(offset))) { - kvfree(job->timestamp_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job->timestamp_query.queries[i].offset = offset; if (copy_from_user(, syncs++, sizeof(sync))) { - kvfree(job->timestamp_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); @@ -500,6 +502,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, job->timestamp_query.count = timestamp.count; return 0; + +error: + v3d_timestamp_query_info_free(>timestamp_query, i); + return err; } static int @@ -509,6 +515,8 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, { u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; + unsigned int i; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -533,14 +541,14 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *f
[PATCH 04/11] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension
From: Tvrtko Ursulin If userspace provides an unknown or invalid handle anywhere in the handle array the rest of the driver will not handle that well. Fix it by checking handle was looked up successfully or otherwise fail the extension by jumping into the existing unwind. Signed-off-by: Tvrtko Ursulin Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_submit.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index d626c8539b04..e3a00c8394a5 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -498,6 +498,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = timestamp.count; @@ -552,6 +556,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = reset.count; @@ -616,6 +624,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = copy.count; -- 2.44.0
[PATCH 01/11] drm/v3d: Prevent out of bounds access in performance query extensions
From: Tvrtko Ursulin Check that the number of perfmons userspace is passing in the copy and reset extensions is not greater than the internal kernel storage where the ids will be copied into. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job" Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ Reviewed-by: Iago Toral Quiroga Reviewed-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_submit.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 88f63d526b22..263fefc1d04f 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (copy_from_user(, ext, sizeof(reset))) return -EFAULT; + if (reset.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(reset.count, @@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (copy.pad) return -EINVAL; + if (copy.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(copy.count, -- 2.44.0
Re: [PATCH 11/12] drm/v3d: Add some local variables in queries/extensions
On 10/07/2024 18:43, Maíra Canal wrote: On 7/10/24 10:41, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Add some local variables to make the code a bit less verbose, with the main benefit being pulling some lines to under 80 columns wide. Signed-off-by: Tvrtko Ursulin I'd prefer `query_info`, but anyway: Yeah it does look nicer - done throughout the series. I also bumped this patch to be last in the series since I don't "believe" in it that much any more. We probably should drop it. Regards, Tvrtko Reviewed-by: Maíra Canal Best Regards, - Maíra --- drivers/gpu/drm/v3d/v3d_submit.c | 79 +--- 1 file changed, 42 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 34ecd844f16a..b0c2a8e9cb06 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; + struct v3d_timestamp_query_info *qinfo = >timestamp_query; unsigned int i; int err; @@ -473,10 +474,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(timestamp.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + qinfo->queries = kvmalloc_array(timestamp.count, + sizeof(struct v3d_timestamp_query), + GFP_KERNEL); + if (!qinfo->queries) return -ENOMEM; offsets = u64_to_user_ptr(timestamp.offsets); @@ -490,20 +491,20 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, goto error; } - job->timestamp_query.queries[i].offset = offset; + qinfo->queries[i].offset = offset; if (copy_from_user(, syncs++, sizeof(sync))) { err = -EFAULT; goto error; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->timestamp_query.queries[i].syncobj) { + qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!qinfo->queries[i].syncobj) { err = -ENOENT; goto error; } } - job->timestamp_query.count = timestamp.count; + qinfo->count = timestamp.count; return 0; @@ -519,6 +520,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, { u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; + struct v3d_timestamp_query_info *qinfo = >timestamp_query; unsigned int i; int err; @@ -537,10 +539,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(reset.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + qinfo->queries = kvmalloc_array(reset.count, + sizeof(struct v3d_timestamp_query), + GFP_KERNEL); + if (!qinfo->queries) return -ENOMEM; syncs = u64_to_user_ptr(reset.syncs); @@ -548,20 +550,20 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, for (i = 0; i < reset.count; i++) { u32 sync; - job->timestamp_query.queries[i].offset = reset.offset + 8 * i; + qinfo->queries[i].offset = reset.offset + 8 * i; if (copy_from_user(, syncs++, sizeof(sync))) { err = -EFAULT; goto error; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->timestamp_query.queries[i].syncobj) { + qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!qinfo->queries[i].syncobj) { err = -ENOENT; goto error; } } - job->timestamp_query.count = reset.count; + qinfo->count = reset.count; return 0; @@ -578,6 +580,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_copy_timestamp_query copy; + struct v3d_timestamp_query_info *qinfo = >timestamp_query; unsigned int i; int err; @@ -599,10 +602,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(copy.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!
Re: [PATCH 09/12] drm/v3d: Move perfmon init completely into own unit
On 10/07/2024 18:38, Maíra Canal wrote: On 7/10/24 10:41, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Now that the build time dependencies on various array sizes have been removed, we can move the perfmon init completely into its own compilation unit and remove the hardcoded defines. This improves on the temporary fix quickly delivered in 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"). I believe you mean: 9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning") Currently, it is reference the current patch. Well that was a hillarious mistake, well spotted! Regards, Tvrtko Apart from this fix, this is Reviewed-by: Maíra Canal Best Regards, - Maíra Signed-off-by: Tvrtko Ursulin References: 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit") --- drivers/gpu/drm/v3d/v3d_drv.c | 9 +--- drivers/gpu/drm/v3d/v3d_drv.h | 6 +-- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++ .../gpu/drm/v3d/v3d_performance_counters.h | 16 --- 4 files changed, 40 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index a47f00b443d3..491c638a4d74 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data, args->value = 1; return 0; case DRM_V3D_PARAM_MAX_PERF_COUNTERS: - args->value = v3d->max_counters; + args->value = v3d->perfmon_info.max_counters; return 0; default: DRM_DEBUG("Unknown parameter %d\n", args->param); @@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES); WARN_ON(v3d->cores > 1); /* multicore not yet implemented */ - if (v3d->ver >= 71) - v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS; - else if (v3d->ver >= 42) - v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS; - else - v3d->max_counters = 0; + v3d_perfmon_init(v3d); v3d->reset = devm_reset_control_get_exclusive(dev, NULL); if (IS_ERR(v3d->reset)) { diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 00fe5d993175..6d2d34cd135c 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -104,10 +104,7 @@ struct v3d_dev { int ver; bool single_irq_line; - /* Different revisions of V3D have different total number of performance - * counters - */ - unsigned int max_counters; + struct v3d_perfmon_info perfmon_info; void __iomem *hub_regs; void __iomem *core_regs[3]; @@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); /* v3d_perfmon.c */ +void v3d_perfmon_init(struct v3d_dev *v3d); void v3d_perfmon_get(struct v3d_perfmon *perfmon); void v3d_perfmon_put(struct v3d_perfmon *perfmon); void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon); diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index b7d0b02e1a95..cd7f1eedf17f 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = { {"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any other reason (vary/W/Z)"}, }; +void v3d_perfmon_init(struct v3d_dev *v3d) +{ + const struct v3d_perf_counter_desc *counters = NULL; + unsigned int max = 0; + + if (v3d->ver >= 71) { + counters = v3d_v71_performance_counters; + max = ARRAY_SIZE(v3d_v71_performance_counters); + } else if (v3d->ver >= 42) { + counters = v3d_v42_performance_counters; + max = ARRAY_SIZE(v3d_v42_performance_counters); + } + + v3d->perfmon_info.max_counters = max; + v3d->perfmon_info.counters = counters; +} + void v3d_perfmon_get(struct v3d_perfmon *perfmon) { if (perfmon) @@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, /* Make sure all counters are valid. */ for (i = 0; i < req->ncounters; i++) { - if (req->counters[i] >= v3d->max_counters) + if (req->counters[i] >= v3d->perfmon_info.max_counters) return -EINVAL; } @@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, return -EINVAL; } - /* Make sure that the counter ID is valid */ - if (req->counter >= v3d->max_counters) - return -EINVAL; - - BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) != - V3D_V42_NUM_PERFCOUNTERS); - BUILD_BUG_ON(ARRAY_SI
Re: [PATCH 04/12] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension
On 10/07/2024 18:06, Maíra Canal wrote: On 7/10/24 10:41, Tvrtko Ursulin wrote: From: Tvrtko Ursulin If userspace provides an unknown or invalid handle anywhere in the handle array the rest of the driver will not handle that well. Fix it by checking handle was looked up successfuly or otherwise fail the I believe you mean "Fix it by checking if the handle..." Also s/successfuly/successfully Oops, thank you! extension by jumping into the existing unwind. Signed-off-by: Tvrtko Ursulin Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_submit.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index ca1b1ad0a75c..3313423080e7 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -497,6 +497,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; I'm not sure if err should be -ENOENT or -EINVAL, but based on other drivers, I believe it should be -EINVAL. After a quick grep I am inclined to think ENOENT is correct. DRM core uses that, and drivers seem generally confused (split between ENOENT and EINVAL). With one even going for ENODEV! Regards, Tvrtko + goto error; + } } job->timestamp_query.count = timestamp.count; @@ -550,6 +554,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = reset.count; @@ -613,6 +621,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = copy.count;
Re: [PATCH 01/12] drm/v3d: Prevent out of bounds access in performance query extensions
On 10/07/2024 14:41, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Check that the number of perfmons userspace is passing in the copy and reset extensions is not greater than the internal kernel storage where the ids will be copied into. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job" Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ On this one I forgot to carry over from v1: Reviewed-by: Iago Toral Quiroga Regards, Tvrtko --- drivers/gpu/drm/v3d/v3d_submit.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 88f63d526b22..263fefc1d04f 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (copy_from_user(, ext, sizeof(reset))) return -EFAULT; + if (reset.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(reset.count, @@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (copy.pad) return -EINVAL; + if (copy.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(copy.count,
Re: [PATCH 00/12] v3d: Perfmon cleanup
Hi Iago, On 10/07/2024 07:06, Iago Toral wrote: El mar, 09-07-2024 a las 17:34 +0100, Tvrtko Ursulin escribió: From: Tvrtko Ursulin When we had to quickly deal with a tree build issue via merging 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we promised to follow up with a nicer solution. As in the process of eliminating the hardcoded defines we have discovered a few issues in handling of corner cases and userspace input validation, the fix has turned into a larger series, but hopefully the end result is a justifiable cleanup. Thanks for going the extra mile with this :) Patches 1 and 5-8 are: Reviewed-by: Iago Toral Quiroga Thank you! Unfortunately I had to re-order the patches in the series so fixes come first, and as that caused a lot of churn in each patch I did not apply your r-b's when re-sending. Hmmm actually I should have for the first patch, that one is unchanged. I will fix that one. Regards, Tvrtko Tvrtko Ursulin (12): drm/v3d: Prevent out of bounds access in performance query extensions drm/v3d: Prefer get_user for scalar types drm/v3d: Add some local variables in queries/extensions drm/v3d: Align data types of internal and uapi counts drm/v3d: Fix potential memory leak in the timestamp extension drm/v3d: Fix potential memory leak in the performance extension drm/v3d: Validate passed in drm syncobj handles in the timestamp extension drm/v3d: Validate passed in drm syncobj handles in the performance extension drm/v3d: Move part of copying of reset/copy performance extension to a helper drm/v3d: Size the kperfmon_ids array at runtime drm/v3d: Do not use intermediate storage when copying performance query results drm/v3d: Move perfmon init completely into own unit drivers/gpu/drm/v3d/v3d_drv.c | 9 +- drivers/gpu/drm/v3d/v3d_drv.h | 16 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +-- .../gpu/drm/v3d/v3d_performance_counters.h | 16 +- drivers/gpu/drm/v3d/v3d_sched.c | 106 --- drivers/gpu/drm/v3d/v3d_submit.c | 285 ++-- -- 6 files changed, 281 insertions(+), 195 deletions(-)
[PATCH 09/12] drm/v3d: Move perfmon init completely into own unit
From: Tvrtko Ursulin Now that the build time dependencies on various array sizes have been removed, we can move the perfmon init completely into its own compilation unit and remove the hardcoded defines. This improves on the temporary fix quickly delivered in 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"). Signed-off-by: Tvrtko Ursulin References: 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit") --- drivers/gpu/drm/v3d/v3d_drv.c | 9 +--- drivers/gpu/drm/v3d/v3d_drv.h | 6 +-- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++ .../gpu/drm/v3d/v3d_performance_counters.h| 16 --- 4 files changed, 40 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index a47f00b443d3..491c638a4d74 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data, args->value = 1; return 0; case DRM_V3D_PARAM_MAX_PERF_COUNTERS: - args->value = v3d->max_counters; + args->value = v3d->perfmon_info.max_counters; return 0; default: DRM_DEBUG("Unknown parameter %d\n", args->param); @@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES); WARN_ON(v3d->cores > 1); /* multicore not yet implemented */ - if (v3d->ver >= 71) - v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS; - else if (v3d->ver >= 42) - v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS; - else - v3d->max_counters = 0; + v3d_perfmon_init(v3d); v3d->reset = devm_reset_control_get_exclusive(dev, NULL); if (IS_ERR(v3d->reset)) { diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 00fe5d993175..6d2d34cd135c 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -104,10 +104,7 @@ struct v3d_dev { int ver; bool single_irq_line; - /* Different revisions of V3D have different total number of performance -* counters -*/ - unsigned int max_counters; + struct v3d_perfmon_info perfmon_info; void __iomem *hub_regs; void __iomem *core_regs[3]; @@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); /* v3d_perfmon.c */ +void v3d_perfmon_init(struct v3d_dev *v3d); void v3d_perfmon_get(struct v3d_perfmon *perfmon); void v3d_perfmon_put(struct v3d_perfmon *perfmon); void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon); diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index b7d0b02e1a95..cd7f1eedf17f 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = { {"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any other reason (vary/W/Z)"}, }; +void v3d_perfmon_init(struct v3d_dev *v3d) +{ + const struct v3d_perf_counter_desc *counters = NULL; + unsigned int max = 0; + + if (v3d->ver >= 71) { + counters = v3d_v71_performance_counters; + max = ARRAY_SIZE(v3d_v71_performance_counters); + } else if (v3d->ver >= 42) { + counters = v3d_v42_performance_counters; + max = ARRAY_SIZE(v3d_v42_performance_counters); + } + + v3d->perfmon_info.max_counters = max; + v3d->perfmon_info.counters = counters; +} + void v3d_perfmon_get(struct v3d_perfmon *perfmon) { if (perfmon) @@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, /* Make sure all counters are valid. */ for (i = 0; i < req->ncounters; i++) { - if (req->counters[i] >= v3d->max_counters) + if (req->counters[i] >= v3d->perfmon_info.max_counters) return -EINVAL; } @@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, return -EINVAL; } - /* Make sure that the counter ID is valid */ - if (req->counter >= v3d->max_counters) - return -EINVAL; - - BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) != -V3D_V42_NUM_PERFCOUNTERS); - BUILD_BUG_ON(ARRAY_SIZE(v3d_v71_performance_counters) != -V3D_V71_NUM_PERFCOUNTERS); - BUILD_BUG_ON(V3D_MAX_COUNTERS < V3D_V42_NUM_PERFCOUNTERS); -
[PATCH 12/12] drm/v3d: Prefer get_user for scalar types
From: Tvrtko Ursulin It makes it just a tiny bit more obvious what is going on. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_submit.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index b0c2a8e9cb06..9273b0aadb79 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -486,14 +486,14 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, for (i = 0; i < timestamp.count; i++) { u32 offset, sync; - if (copy_from_user(, offsets++, sizeof(offset))) { + if (get_user(offset, offsets++)) { err = -EFAULT; goto error; } qinfo->queries[i].offset = offset; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } @@ -552,7 +552,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, qinfo->queries[i].offset = reset.offset + 8 * i; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } @@ -614,14 +614,14 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, for (i = 0; i < copy.count; i++) { u32 offset, sync; - if (copy_from_user(, offsets++, sizeof(offset))) { + if (get_user(offset, offsets++)) { err = -EFAULT; goto error; } qinfo->queries[i].offset = offset; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { err = -EFAULT; goto error; } -- 2.44.0
[PATCH 05/12] drm/v3d: Validate passed in drm syncobj handles in the performance extension
From: Tvrtko Ursulin If userspace provides an unknown or invalid handle anywhere in the handle array the rest of the driver will not handle that well. Fix it by checking handle was looked up successfuly or otherwise fail the extension by jumping into the existing unwind. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job" Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_submit.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 3313423080e7..b51600e236c8 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -706,6 +706,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, } job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->performance_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->performance_query.count = reset.count; job->performance_query.nperfmons = reset.nperfmons; @@ -787,6 +791,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, } job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->performance_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->performance_query.count = copy.count; job->performance_query.nperfmons = copy.nperfmons; -- 2.44.0
[PATCH 10/12] drm/v3d: Align data types of internal and uapi counts
From: Tvrtko Ursulin In the timestamp and performance extensions userspace type for counts is u32 so lets use unsigned in the kernel too. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_submit.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 8dae3ab5f936..34ecd844f16a 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; + unsigned int i; int err; if (!job) { @@ -481,7 +482,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, offsets = u64_to_user_ptr(timestamp.offsets); syncs = u64_to_user_ptr(timestamp.syncs); - for (int i = 0; i < timestamp.count; i++) { + for (i = 0; i < timestamp.count; i++) { u32 offset, sync; if (copy_from_user(, offsets++, sizeof(offset))) { @@ -518,6 +519,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, { u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; + unsigned int i; int err; if (!job) { @@ -543,7 +545,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, syncs = u64_to_user_ptr(reset.syncs); - for (int i = 0; i < reset.count; i++) { + for (i = 0; i < reset.count; i++) { u32 sync; job->timestamp_query.queries[i].offset = reset.offset + 8 * i; @@ -576,7 +578,8 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_copy_timestamp_query copy; - int i, err; + unsigned int i; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); -- 2.44.0
[PATCH 08/12] drm/v3d: Do not use intermediate storage when copying performance query results
From: Tvrtko Ursulin Removing the intermediate buffer removes the last use of the V3D_MAX_COUNTERS define, which will enable further driver cleanup. While at it pull the 32 vs 64 bit copying decision outside the loop in order to reduce the number of conditional instructions. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_sched.c | 60 - 1 file changed, 37 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index fc8730264386..77f795e38fad 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job) v3d_put_bo_vaddr(bo); } +static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value) +{ + dst[idx] = value; +} + +static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value) +{ + dst[idx] = value; +} + static void -write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value) +write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value) { - if (do_64bit) { - u64 *dst64 = (u64 *)dst; - - dst64[idx] = value; - } else { - u32 *dst32 = (u32 *)dst; - - dst32[idx] = (u32)value; - } + if (do_64bit) + write_to_buffer_64(dst, idx, value); + else + write_to_buffer_32(dst, idx, value); } static void @@ -505,18 +510,23 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job) } static void -v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 query) +v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, + unsigned int query) { - struct v3d_performance_query_info *performance_query = >performance_query; - struct v3d_copy_query_results_info *copy = >copy; + struct v3d_performance_query_info *performance_query = + >performance_query; struct v3d_file_priv *v3d_priv = job->base.file->driver_priv; struct v3d_dev *v3d = job->base.v3d; - struct v3d_perfmon *perfmon; - u64 counter_values[V3D_MAX_COUNTERS]; + unsigned int i, j, offset; - for (int i = 0; i < performance_query->nperfmons; i++) { - perfmon = v3d_perfmon_find(v3d_priv, - performance_query->queries[query].kperfmon_ids[i]); + for (i = 0, offset = 0; +i < performance_query->nperfmons; +i++, offset += DRM_V3D_MAX_PERF_COUNTERS) { + struct v3d_performance_query *q = + _query->queries[query]; + struct v3d_perfmon *perfmon; + + perfmon = v3d_perfmon_find(v3d_priv, q->kperfmon_ids[i]); if (!perfmon) { DRM_DEBUG("Failed to find perfmon."); continue; @@ -524,14 +534,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 quer v3d_perfmon_stop(v3d, perfmon, true); - memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], perfmon->values, - perfmon->ncounters * sizeof(u64)); + if (job->copy.do_64bit) { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_64(data, offset + j, + perfmon->values[j]); + } else { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_32(data, offset + j, + perfmon->values[j]); + } v3d_perfmon_put(perfmon); } - - for (int i = 0; i < performance_query->ncounters; i++) - write_to_buffer(data, i, copy->do_64bit, counter_values[i]); } static void -- 2.44.0
[PATCH 06/12] drm/v3d: Move part of copying of reset/copy performance extension to a helper
From: Tvrtko Ursulin The loop which looks up the syncobj and copies the kperfmon ids is identical so lets move it to a helper. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_submit.c | 148 +-- 1 file changed, 64 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index b51600e236c8..35682433f75b 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -641,13 +641,63 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, return err; } +static int +copy_query_info(struct v3d_performance_query_info *qinfo, + unsigned int count, + unsigned int nperfmons, + u32 __user *syncs, + u64 __user *kperfmon_ids, + struct drm_file *fpriv) +{ + unsigned int i, j; + int err; + + for (i = 0; i < count; i++) { + struct v3d_performance_query *query = >queries[i]; + u32 __user *ids_pointer; + u32 sync, id; + u64 ids; + + if (get_user(sync, syncs++)) { + err = -EFAULT; + goto error; + } + + if (get_user(ids, kperfmon_ids++)) { + err = -EFAULT; + goto error; + } + + ids_pointer = u64_to_user_ptr(ids); + + for (j = 0; j < nperfmons; j++) { + if (get_user(id, ids_pointer++)) { + err = -EFAULT; + goto error; + } + + query->kperfmon_ids[j] = id; + } + + query->syncobj = drm_syncobj_find(fpriv, sync); + if (!query->syncobj) { + err = -ENOENT; + goto error; + } + } + + return 0; + +error: + __v3d_performance_query_info_free(qinfo, i); + return err; +} + static int v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, struct drm_v3d_extension __user *ext, struct v3d_cpu_job *job) { - u32 __user *syncs; - u64 __user *kperfmon_ids; struct drm_v3d_reset_performance_query reset; int err; @@ -675,50 +725,17 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (!job->performance_query.queries) return -ENOMEM; - syncs = u64_to_user_ptr(reset.syncs); - kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids); + err = copy_query_info(qinfo, reset.count, reset.nperfmons, + u64_to_user_ptr(reset.syncs), + u64_to_user_ptr(reset.kperfmon_ids), + file_priv); + if (err) + return err; - for (int i = 0; i < reset.count; i++) { - u32 sync; - u64 ids; - u32 __user *ids_pointer; - u32 id; - - if (copy_from_user(, syncs++, sizeof(sync))) { - err = -EFAULT; - goto error; - } - - if (copy_from_user(, kperfmon_ids++, sizeof(ids))) { - err = -EFAULT; - goto error; - } - - ids_pointer = u64_to_user_ptr(ids); - - for (int j = 0; j < reset.nperfmons; j++) { - if (copy_from_user(, ids_pointer++, sizeof(id))) { - err = -EFAULT; - goto error; - } - - job->performance_query.queries[i].kperfmon_ids[j] = id; - } - - job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->performance_query.queries[i].syncobj) { - err = -ENOENT; - goto error; - } - } job->performance_query.count = reset.count; job->performance_query.nperfmons = reset.nperfmons; return 0; - -error: - __v3d_performance_query_info_free(qinfo, i); - return err; } static int @@ -726,8 +743,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, struct drm_v3d_extension __user *ext, struct v3d_cpu_job *job) { - u32 __user *syncs; - u64 __user *kperfmon_ids; struct drm_v3d_copy_performance_query copy; int err; @@ -758,44 +773,13 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (!job->performance_query.queries) return -ENOMEM; - syncs = u64_
[PATCH 07/12] drm/v3d: Size the kperfmon_ids array at runtime
From: Tvrtko Ursulin Instead of statically reserving pessimistic space for the kperfmon_ids array, make the userspace extension code allocate the exactly required amount of space. Apart from saving some memory at runtime, this also removes the need for the V3D_MAX_PERFMONS macro whose removal will benefit further driver cleanup. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_drv.h| 6 +- drivers/gpu/drm/v3d/v3d_sched.c | 4 +++- drivers/gpu/drm/v3d/v3d_submit.c | 17 +++-- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 38c80168da51..00fe5d993175 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -351,13 +351,9 @@ struct v3d_timestamp_query { struct drm_syncobj *syncobj; }; -/* Number of perfmons required to handle all supported performance counters */ -#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \ - DRM_V3D_MAX_PERF_COUNTERS) - struct v3d_performance_query { /* Performance monitor IDs for this query */ - u32 kperfmon_ids[V3D_MAX_PERFMONS]; + u32 *kperfmon_ids; /* Syncobj that indicates the query availability */ struct drm_syncobj *syncobj; diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 173801aa54ee..fc8730264386 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -94,8 +94,10 @@ __v3d_performance_query_info_free(struct v3d_performance_query_info *qinfo, if (qinfo->queries) { unsigned int i; - for (i = 0; i < count; i++) + for (i = 0; i < count; i++) { drm_syncobj_put(qinfo->queries[i].syncobj); + kvfree(qinfo->queries[i].kperfmon_ids); + } kvfree(qinfo->queries); } diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 35682433f75b..8dae3ab5f936 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -668,10 +668,20 @@ copy_query_info(struct v3d_performance_query_info *qinfo, goto error; } + query->kperfmon_ids = + kvmalloc_array(nperfmons, + sizeof(struct v3d_performance_query *), + GFP_KERNEL); + if (!query->kperfmon_ids) { + err = -ENOMEM; + goto error; + } + ids_pointer = u64_to_user_ptr(ids); for (j = 0; j < nperfmons; j++) { if (get_user(id, ids_pointer++)) { + kvfree(query->kperfmon_ids); err = -EFAULT; goto error; } @@ -681,6 +691,7 @@ copy_query_info(struct v3d_performance_query_info *qinfo, query->syncobj = drm_syncobj_find(fpriv, sync); if (!query->syncobj) { + kvfree(query->kperfmon_ids); err = -ENOENT; goto error; } @@ -714,9 +725,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (copy_from_user(, ext, sizeof(reset))) return -EFAULT; - if (reset.nperfmons > V3D_MAX_PERFMONS) - return -EINVAL; - job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(reset.count, @@ -762,9 +770,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (copy.pad) return -EINVAL; - if (copy.nperfmons > V3D_MAX_PERFMONS) - return -EINVAL; - job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(copy.count, -- 2.44.0
[PATCH 11/12] drm/v3d: Add some local variables in queries/extensions
From: Tvrtko Ursulin Add some local variables to make the code a bit less verbose, with the main benefit being pulling some lines to under 80 columns wide. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_submit.c | 79 +--- 1 file changed, 42 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 34ecd844f16a..b0c2a8e9cb06 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; + struct v3d_timestamp_query_info *qinfo = >timestamp_query; unsigned int i; int err; @@ -473,10 +474,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(timestamp.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + qinfo->queries = kvmalloc_array(timestamp.count, + sizeof(struct v3d_timestamp_query), + GFP_KERNEL); + if (!qinfo->queries) return -ENOMEM; offsets = u64_to_user_ptr(timestamp.offsets); @@ -490,20 +491,20 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, goto error; } - job->timestamp_query.queries[i].offset = offset; + qinfo->queries[i].offset = offset; if (copy_from_user(, syncs++, sizeof(sync))) { err = -EFAULT; goto error; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->timestamp_query.queries[i].syncobj) { + qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!qinfo->queries[i].syncobj) { err = -ENOENT; goto error; } } - job->timestamp_query.count = timestamp.count; + qinfo->count = timestamp.count; return 0; @@ -519,6 +520,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, { u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; + struct v3d_timestamp_query_info *qinfo = >timestamp_query; unsigned int i; int err; @@ -537,10 +539,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(reset.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + qinfo->queries = kvmalloc_array(reset.count, + sizeof(struct v3d_timestamp_query), + GFP_KERNEL); + if (!qinfo->queries) return -ENOMEM; syncs = u64_to_user_ptr(reset.syncs); @@ -548,20 +550,20 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, for (i = 0; i < reset.count; i++) { u32 sync; - job->timestamp_query.queries[i].offset = reset.offset + 8 * i; + qinfo->queries[i].offset = reset.offset + 8 * i; if (copy_from_user(, syncs++, sizeof(sync))) { err = -EFAULT; goto error; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!job->timestamp_query.queries[i].syncobj) { + qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!qinfo->queries[i].syncobj) { err = -ENOENT; goto error; } } - job->timestamp_query.count = reset.count; + qinfo->count = reset.count; return 0; @@ -578,6 +580,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_copy_timestamp_query copy; + struct v3d_timestamp_query_info *qinfo = >timestamp_query; unsigned int i; int err; @@ -599,10 +602,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(copy
[PATCH 04/12] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension
From: Tvrtko Ursulin If userspace provides an unknown or invalid handle anywhere in the handle array the rest of the driver will not handle that well. Fix it by checking handle was looked up successfuly or otherwise fail the extension by jumping into the existing unwind. Signed-off-by: Tvrtko Ursulin Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_submit.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index ca1b1ad0a75c..3313423080e7 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -497,6 +497,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = timestamp.count; @@ -550,6 +554,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = reset.count; @@ -613,6 +621,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!job->timestamp_query.queries[i].syncobj) { + err = -ENOENT; + goto error; + } } job->timestamp_query.count = copy.count; -- 2.44.0
[PATCH v2 00/12] v3d: Perfmon cleanup
From: Tvrtko Ursulin When we had to quickly deal with a tree build issue via merging 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we promised to follow up with a nicer solution. As in the process of eliminating the hardcoded defines we have discovered a few issues in handling of corner cases and userspace input validation, the fix has turned into a larger series, but hopefully the end result is a justifiable cleanup. v2: * Re-order the patches so fixes come first while last three are optional cleanups. Tvrtko Ursulin (12): drm/v3d: Prevent out of bounds access in performance query extensions drm/v3d: Fix potential memory leak in the timestamp extension drm/v3d: Fix potential memory leak in the performance extension drm/v3d: Validate passed in drm syncobj handles in the timestamp extension drm/v3d: Validate passed in drm syncobj handles in the performance extension drm/v3d: Move part of copying of reset/copy performance extension to a helper drm/v3d: Size the kperfmon_ids array at runtime drm/v3d: Do not use intermediate storage when copying performance query results drm/v3d: Move perfmon init completely into own unit drm/v3d: Align data types of internal and uapi counts drm/v3d: Add some local variables in queries/extensions drm/v3d: Prefer get_user for scalar types drivers/gpu/drm/v3d/v3d_drv.c | 9 +- drivers/gpu/drm/v3d/v3d_drv.h | 16 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +-- .../gpu/drm/v3d/v3d_performance_counters.h| 16 +- drivers/gpu/drm/v3d/v3d_sched.c | 106 --- drivers/gpu/drm/v3d/v3d_submit.c | 285 ++ 6 files changed, 281 insertions(+), 195 deletions(-) -- 2.44.0
[PATCH 02/12] drm/v3d: Fix potential memory leak in the timestamp extension
From: Tvrtko Ursulin If fetching of userspace memory fails during the main loop, all drm sync objs looked up until that point will be leaked because of the missing drm_syncobj_put. Fix it by exporting and using a common cleanup helper. Signed-off-by: Tvrtko Ursulin Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_drv.h| 2 ++ drivers/gpu/drm/v3d/v3d_sched.c | 22 +-- drivers/gpu/drm/v3d/v3d_submit.c | 36 ++-- 3 files changed, 43 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 099b962bdfde..95651c3c926f 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo); void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ +void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo, +unsigned int count); void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 03df37a3acf5..e45d3ddc6f82 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job) v3d_job_cleanup(job); } +void +__v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo, + unsigned int count) +{ + if (qinfo->queries) { + unsigned int i; + + for (i = 0; i < count; i++) + drm_syncobj_put(qinfo->queries[i].syncobj); + + kvfree(qinfo->queries); + } +} + static void v3d_cpu_job_free(struct drm_sched_job *sched_job) { struct v3d_cpu_job *job = to_cpu_job(sched_job); - struct v3d_timestamp_query_info *timestamp_query = >timestamp_query; struct v3d_performance_query_info *performance_query = >performance_query; - if (timestamp_query->queries) { - for (int i = 0; i < timestamp_query->count; i++) - drm_syncobj_put(timestamp_query->queries[i].syncobj); - kvfree(timestamp_query->queries); - } + __v3d_timestamp_query_info_free(>timestamp_query, + job->timestamp_query.count); if (performance_query->queries) { for (int i = 0; i < performance_query->count; i++) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 263fefc1d04f..2818afdd4807 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -484,15 +485,15 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, u32 offset, sync; if (copy_from_user(, offsets++, sizeof(offset))) { - kvfree(job->timestamp_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job->timestamp_query.queries[i].offset = offset; if (copy_from_user(, syncs++, sizeof(sync))) { - kvfree(job->timestamp_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); @@ -500,6 +501,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, job->timestamp_query.count = timestamp.count; return 0; + +error: + __v3d_timestamp_query_info_free(qinfo, i); + return err; } static int @@ -509,6 +514,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, { u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -539,8 +545,8 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, job->timestamp_query.queries[i].offset = reset.offset + 8 * i; if (copy_from_user(, syncs++, sizeof(sync))) { - kvfree(job->timestamp_query.queries); - return -EFAULT; + err = -EFAULT; +
[PATCH 01/12] drm/v3d: Prevent out of bounds access in performance query extensions
From: Tvrtko Ursulin Check that the number of perfmons userspace is passing in the copy and reset extensions is not greater than the internal kernel storage where the ids will be copied into. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job" Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_submit.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 88f63d526b22..263fefc1d04f 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (copy_from_user(, ext, sizeof(reset))) return -EFAULT; + if (reset.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(reset.count, @@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (copy.pad) return -EINVAL; + if (copy.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(copy.count, -- 2.44.0
[PATCH 03/12] drm/v3d: Fix potential memory leak in the performance extension
From: Tvrtko Ursulin If fetching of userspace memory fails during the main loop, all drm sync objs looked up until that point will be leaked because of the missing drm_syncobj_put. Fix it by exporting and using a common cleanup helper. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job" Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_drv.h| 2 ++ drivers/gpu/drm/v3d/v3d_sched.c | 22 +- drivers/gpu/drm/v3d/v3d_submit.c | 40 +--- 3 files changed, 44 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 95651c3c926f..38c80168da51 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo, unsigned int count); +void __v3d_performance_query_info_free(struct v3d_performance_query_info *qinfo, + unsigned int count); void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index e45d3ddc6f82..173801aa54ee 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -87,20 +87,30 @@ __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo, } } +void +__v3d_performance_query_info_free(struct v3d_performance_query_info *qinfo, + unsigned int count) +{ + if (qinfo->queries) { + unsigned int i; + + for (i = 0; i < count; i++) + drm_syncobj_put(qinfo->queries[i].syncobj); + + kvfree(qinfo->queries); + } +} + static void v3d_cpu_job_free(struct drm_sched_job *sched_job) { struct v3d_cpu_job *job = to_cpu_job(sched_job); - struct v3d_performance_query_info *performance_query = >performance_query; __v3d_timestamp_query_info_free(>timestamp_query, job->timestamp_query.count); - if (performance_query->queries) { - for (int i = 0; i < performance_query->count; i++) - drm_syncobj_put(performance_query->queries[i].syncobj); - kvfree(performance_query->queries); - } + __v3d_performance_query_info_free(>performance_query, + job->performance_query.count); v3d_job_cleanup(>base); } diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 2818afdd4807..ca1b1ad0a75c 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -637,6 +637,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, u32 __user *syncs; u64 __user *kperfmon_ids; struct drm_v3d_reset_performance_query reset; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -672,32 +673,36 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, u32 id; if (copy_from_user(, syncs++, sizeof(sync))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } - job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (copy_from_user(, kperfmon_ids++, sizeof(ids))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } ids_pointer = u64_to_user_ptr(ids); for (int j = 0; j < reset.nperfmons; j++) { if (copy_from_user(, ids_pointer++, sizeof(id))) { - kvfree(job->performance_query.queries); - return -EFAULT; + err = -EFAULT; + goto error; } job->performance_query.queries[i].kperfmon_ids[j] = id; } + + job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); } job->performance_query.count = reset.count; job->performance_query.nperfmons = reset.nperfmons; return 0; + +error: + __v3d_performance_query_info_free(qinfo, i); + return err; }
Re: [PATCH] drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8
On 09/07/2024 15:02, Tvrtko Ursulin wrote: On 09/07/2024 13:53, Nitin Gote wrote: We're seeing a GPU HANG issue on a CHV platform, which was caused by bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8"). Gen8 platform has only timeslice and doesn't support a preemption mechanism as engines do not have a preemption timer and doesn't send an irq if the preemption timeout expires. So, add a fix to not consider preemption during dequeuing for gen8 platforms. Also move can_preemt() above need_preempt() function to resolve implicit declaration of function ‘can_preempt' error and make can_preempt() function param as const to resolve error: passing argument 1 of ‘can_preempt’ discards ‘const’ qualifier from the pointer target type. Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396 Suggested-by: Andi Shyti Signed-off-by: Nitin Gote Cc: Chris Wilson CC: # v5.2+ --- .../drm/i915/gt/intel_execlists_submission.c | 24 --- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 21829439e686..30631cc690f2 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -294,11 +294,26 @@ static int virtual_prio(const struct intel_engine_execlists *el) return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; } +static bool can_preempt(const struct intel_engine_cs *engine) +{ + if (GRAPHICS_VER(engine->i915) > 8) + return true; + + if (IS_CHERRYVIEW(engine->i915) || IS_BROADWELL(engine->i915)) + return false; + + /* GPGPU on bdw requires extra w/a; not implemented */ + return engine->class != RENDER_CLASS; Aren't BDW and CHV the only Gen8 platforms, in which case this function can be simplifies as: ... { return GRAPHICS_VER(engine->i915) > 8; } ? +} + static bool need_preempt(const struct intel_engine_cs *engine, const struct i915_request *rq) { int last_prio; + if ((GRAPHICS_VER(engine->i915) <= 8) && can_preempt(engine)) The GRAPHICS_VER check here looks redundant with the one inside can_preempt(). One more thing - I think gen8_emit_bb_start() becomes dead code after this and can be removed. Regards, Tvrtko + return false; + if (!intel_engine_has_semaphores(engine)) return false; @@ -3313,15 +3328,6 @@ static void remove_from_engine(struct i915_request *rq) i915_request_notify_execute_cb_imm(rq); } -static bool can_preempt(struct intel_engine_cs *engine) -{ - if (GRAPHICS_VER(engine->i915) > 8) - return true; - - /* GPGPU on bdw requires extra w/a; not implemented */ - return engine->class != RENDER_CLASS; -} - static void kick_execlists(const struct i915_request *rq, int prio) { struct intel_engine_cs *engine = rq->engine;
[PATCH 12/12] drm/v3d: Move perfmon init completely into own unit
From: Tvrtko Ursulin Now that the build time dependencies on various array sizes have been removed, we can move the perfmon init completely into its own compilation unit and remove the hardcoded defines. This improves on the temporary fix quickly delivered in 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"). Signed-off-by: Tvrtko Ursulin References: 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit") --- drivers/gpu/drm/v3d/v3d_drv.c | 9 +--- drivers/gpu/drm/v3d/v3d_drv.h | 6 +-- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++ .../gpu/drm/v3d/v3d_performance_counters.h| 16 --- 4 files changed, 40 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index a47f00b443d3..491c638a4d74 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data, args->value = 1; return 0; case DRM_V3D_PARAM_MAX_PERF_COUNTERS: - args->value = v3d->max_counters; + args->value = v3d->perfmon_info.max_counters; return 0; default: DRM_DEBUG("Unknown parameter %d\n", args->param); @@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES); WARN_ON(v3d->cores > 1); /* multicore not yet implemented */ - if (v3d->ver >= 71) - v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS; - else if (v3d->ver >= 42) - v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS; - else - v3d->max_counters = 0; + v3d_perfmon_init(v3d); v3d->reset = devm_reset_control_get_exclusive(dev, NULL); if (IS_ERR(v3d->reset)) { diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 00fe5d993175..6d2d34cd135c 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -104,10 +104,7 @@ struct v3d_dev { int ver; bool single_irq_line; - /* Different revisions of V3D have different total number of performance -* counters -*/ - unsigned int max_counters; + struct v3d_perfmon_info perfmon_info; void __iomem *hub_regs; void __iomem *core_regs[3]; @@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); /* v3d_perfmon.c */ +void v3d_perfmon_init(struct v3d_dev *v3d); void v3d_perfmon_get(struct v3d_perfmon *perfmon); void v3d_perfmon_put(struct v3d_perfmon *perfmon); void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon); diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index b7d0b02e1a95..cd7f1eedf17f 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = { {"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any other reason (vary/W/Z)"}, }; +void v3d_perfmon_init(struct v3d_dev *v3d) +{ + const struct v3d_perf_counter_desc *counters = NULL; + unsigned int max = 0; + + if (v3d->ver >= 71) { + counters = v3d_v71_performance_counters; + max = ARRAY_SIZE(v3d_v71_performance_counters); + } else if (v3d->ver >= 42) { + counters = v3d_v42_performance_counters; + max = ARRAY_SIZE(v3d_v42_performance_counters); + } + + v3d->perfmon_info.max_counters = max; + v3d->perfmon_info.counters = counters; +} + void v3d_perfmon_get(struct v3d_perfmon *perfmon) { if (perfmon) @@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, /* Make sure all counters are valid. */ for (i = 0; i < req->ncounters; i++) { - if (req->counters[i] >= v3d->max_counters) + if (req->counters[i] >= v3d->perfmon_info.max_counters) return -EINVAL; } @@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, return -EINVAL; } - /* Make sure that the counter ID is valid */ - if (req->counter >= v3d->max_counters) - return -EINVAL; - - BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) != -V3D_V42_NUM_PERFCOUNTERS); - BUILD_BUG_ON(ARRAY_SIZE(v3d_v71_performance_counters) != -V3D_V71_NUM_PERFCOUNTERS); - BUILD_BUG_ON(V3D_MAX_COUNTERS < V3D_V42_NUM_PERFCOUNTERS); -
[PATCH 11/12] drm/v3d: Do not use intermediate storage when copying performance query results
From: Tvrtko Ursulin Removing the intermediate buffer removes the last use of the V3D_MAX_COUNTERS define, which will enable further driver cleanup. While at it pull the 32 vs 64 bit copying decision outside the loop in order to reduce the number of conditional instructions. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_sched.c | 60 - 1 file changed, 37 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index fc8730264386..77f795e38fad 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job) v3d_put_bo_vaddr(bo); } +static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value) +{ + dst[idx] = value; +} + +static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value) +{ + dst[idx] = value; +} + static void -write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value) +write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value) { - if (do_64bit) { - u64 *dst64 = (u64 *)dst; - - dst64[idx] = value; - } else { - u32 *dst32 = (u32 *)dst; - - dst32[idx] = (u32)value; - } + if (do_64bit) + write_to_buffer_64(dst, idx, value); + else + write_to_buffer_32(dst, idx, value); } static void @@ -505,18 +510,23 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job) } static void -v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 query) +v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, + unsigned int query) { - struct v3d_performance_query_info *performance_query = >performance_query; - struct v3d_copy_query_results_info *copy = >copy; + struct v3d_performance_query_info *performance_query = + >performance_query; struct v3d_file_priv *v3d_priv = job->base.file->driver_priv; struct v3d_dev *v3d = job->base.v3d; - struct v3d_perfmon *perfmon; - u64 counter_values[V3D_MAX_COUNTERS]; + unsigned int i, j, offset; - for (int i = 0; i < performance_query->nperfmons; i++) { - perfmon = v3d_perfmon_find(v3d_priv, - performance_query->queries[query].kperfmon_ids[i]); + for (i = 0, offset = 0; +i < performance_query->nperfmons; +i++, offset += DRM_V3D_MAX_PERF_COUNTERS) { + struct v3d_performance_query *q = + _query->queries[query]; + struct v3d_perfmon *perfmon; + + perfmon = v3d_perfmon_find(v3d_priv, q->kperfmon_ids[i]); if (!perfmon) { DRM_DEBUG("Failed to find perfmon."); continue; @@ -524,14 +534,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 quer v3d_perfmon_stop(v3d, perfmon, true); - memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], perfmon->values, - perfmon->ncounters * sizeof(u64)); + if (job->copy.do_64bit) { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_64(data, offset + j, + perfmon->values[j]); + } else { + for (j = 0; j < perfmon->ncounters; j++) + write_to_buffer_32(data, offset + j, + perfmon->values[j]); + } v3d_perfmon_put(perfmon); } - - for (int i = 0; i < performance_query->ncounters; i++) - write_to_buffer(data, i, copy->do_64bit, counter_values[i]); } static void -- 2.44.0
[PATCH 10/12] drm/v3d: Size the kperfmon_ids array at runtime
From: Tvrtko Ursulin Instead of statically reserving pessimistic space for the kperfmon_ids array, make the userspace extension code allocate the exactly required amount of space. Apart from saving some memory at runtime, this also removes the need for the V3D_MAX_PERFMONS macro whose removal will benefit further driver cleanup. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_drv.h| 6 +- drivers/gpu/drm/v3d/v3d_sched.c | 4 +++- drivers/gpu/drm/v3d/v3d_submit.c | 17 +++-- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 38c80168da51..00fe5d993175 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -351,13 +351,9 @@ struct v3d_timestamp_query { struct drm_syncobj *syncobj; }; -/* Number of perfmons required to handle all supported performance counters */ -#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \ - DRM_V3D_MAX_PERF_COUNTERS) - struct v3d_performance_query { /* Performance monitor IDs for this query */ - u32 kperfmon_ids[V3D_MAX_PERFMONS]; + u32 *kperfmon_ids; /* Syncobj that indicates the query availability */ struct drm_syncobj *syncobj; diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 173801aa54ee..fc8730264386 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -94,8 +94,10 @@ __v3d_performance_query_info_free(struct v3d_performance_query_info *qinfo, if (qinfo->queries) { unsigned int i; - for (i = 0; i < count; i++) + for (i = 0; i < count; i++) { drm_syncobj_put(qinfo->queries[i].syncobj); + kvfree(qinfo->queries[i].kperfmon_ids); + } kvfree(qinfo->queries); } diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index a2e55ba8222b..e1a7622a43f9 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -674,10 +674,20 @@ copy_query_info(struct v3d_performance_query_info *qinfo, goto error; } + query->kperfmon_ids = + kvmalloc_array(nperfmons, + sizeof(struct v3d_performance_query *), + GFP_KERNEL); + if (!query->kperfmon_ids) { + err = -ENOMEM; + goto error; + } + ids_pointer = u64_to_user_ptr(ids); for (j = 0; j < nperfmons; j++) { if (get_user(id, ids_pointer++)) { + kvfree(query->kperfmon_ids); err = -EFAULT; goto error; } @@ -687,6 +697,7 @@ copy_query_info(struct v3d_performance_query_info *qinfo, query->syncobj = drm_syncobj_find(fpriv, sync); if (!query->syncobj) { + kvfree(query->kperfmon_ids); err = -ENOENT; goto error; } @@ -721,9 +732,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (copy_from_user(, ext, sizeof(reset))) return -EFAULT; - if (reset.nperfmons > V3D_MAX_PERFMONS) - return -EINVAL; - job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY; qinfo->queries = kvmalloc_array(reset.count, @@ -770,9 +778,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (copy.pad) return -EINVAL; - if (copy.nperfmons > V3D_MAX_PERFMONS) - return -EINVAL; - job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY; qinfo->queries = kvmalloc_array(copy.count, -- 2.44.0
[PATCH 09/12] drm/v3d: Move part of copying of reset/copy performance extension to a helper
From: Tvrtko Ursulin The loop which looks up the syncobj and copies the kperfmon ids is identical so lets move it to a helper. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_submit.c | 148 +-- 1 file changed, 64 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 2c4bb39c9ac6..a2e55ba8222b 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -647,16 +647,65 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, return err; } +static int +copy_query_info(struct v3d_performance_query_info *qinfo, + unsigned int count, + unsigned int nperfmons, + u32 __user *syncs, + u64 __user *kperfmon_ids, + struct drm_file *fpriv) +{ + unsigned int i, j; + int err; + + for (i = 0; i < count; i++) { + struct v3d_performance_query *query = >queries[i]; + u32 __user *ids_pointer; + u32 sync, id; + u64 ids; + + if (get_user(sync, syncs++)) { + err = -EFAULT; + goto error; + } + + if (get_user(ids, kperfmon_ids++)) { + err = -EFAULT; + goto error; + } + + ids_pointer = u64_to_user_ptr(ids); + + for (j = 0; j < nperfmons; j++) { + if (get_user(id, ids_pointer++)) { + err = -EFAULT; + goto error; + } + + query->kperfmon_ids[j] = id; + } + + query->syncobj = drm_syncobj_find(fpriv, sync); + if (!query->syncobj) { + err = -ENOENT; + goto error; + } + } + + return 0; + +error: + __v3d_performance_query_info_free(qinfo, i); + return err; +} + static int v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, struct drm_v3d_extension __user *ext, struct v3d_cpu_job *job) { - u32 __user *syncs; - u64 __user *kperfmon_ids; struct drm_v3d_reset_performance_query reset; struct v3d_performance_query_info *qinfo = >performance_query; - unsigned int i, j; int err; if (!job) { @@ -683,50 +732,17 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (!qinfo->queries) return -ENOMEM; - syncs = u64_to_user_ptr(reset.syncs); - kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids); + err = copy_query_info(qinfo, reset.count, reset.nperfmons, + u64_to_user_ptr(reset.syncs), + u64_to_user_ptr(reset.kperfmon_ids), + file_priv); + if (err) + return err; - for (i = 0; i < reset.count; i++) { - u32 sync; - u64 ids; - u32 __user *ids_pointer; - u32 id; - - if (get_user(sync, syncs++)) { - err = -EFAULT; - goto error; - } - - if (get_user(ids, kperfmon_ids++)) { - err = -EFAULT; - goto error; - } - - ids_pointer = u64_to_user_ptr(ids); - - for (j = 0; j < reset.nperfmons; j++) { - if (get_user(id, ids_pointer++)) { - err = -EFAULT; - goto error; - } - - qinfo->queries[i].kperfmon_ids[j] = id; - } - - qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (!qinfo->queries[i].syncobj) { - err = -ENOENT; - goto error; - } - } qinfo->count = reset.count; qinfo->nperfmons = reset.nperfmons; return 0; - -error: - __v3d_performance_query_info_free(qinfo, i); - return err; } static int @@ -734,11 +750,8 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, struct drm_v3d_extension __user *ext, struct v3d_cpu_job *job) { - u32 __user *syncs; - u64 __user *kperfmon_ids; struct drm_v3d_copy_performance_query copy; struct v3d_performance_query_info *qinfo = >performance_query; - unsigned int i, j; int err; if (!job) { @@ -768,42 +781,13 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if
[PATCH 07/12] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension
From: Tvrtko Ursulin If userspace provides an unknown or invalid handle anywhere in the handle array the rest of the driver will not handle that well. Fix it by checking handle was looked up successfuly or otherwise fail the extension by jumping into the existing unwind. Signed-off-by: Tvrtko Ursulin Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_submit.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 81afcfccc6bb..a408db3d3e32 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -499,6 +499,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, } qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!qinfo->queries[i].syncobj) { + err = -ENOENT; + goto error; + } } qinfo->count = timestamp.count; @@ -554,6 +558,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, } qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!qinfo->queries[i].syncobj) { + err = -ENOENT; + goto error; + } } qinfo->count = reset.count; @@ -619,6 +627,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, } qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!qinfo->queries[i].syncobj) { + err = -ENOENT; + goto error; + } } qinfo->count = copy.count; -- 2.44.0
[PATCH 05/12] drm/v3d: Fix potential memory leak in the timestamp extension
From: Tvrtko Ursulin If fetching of userspace memory fails during the main loop, all drm sync objs looked up until that point will be leaked because of the missing drm_syncobj_put. Fix it by exporting and using a common cleanup helper. Signed-off-by: Tvrtko Ursulin Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job") Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_drv.h| 2 ++ drivers/gpu/drm/v3d/v3d_sched.c | 22 ++-- drivers/gpu/drm/v3d/v3d_submit.c | 35 +++- 3 files changed, 43 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 099b962bdfde..95651c3c926f 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo); void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ +void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo, +unsigned int count); void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 03df37a3acf5..e45d3ddc6f82 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job) v3d_job_cleanup(job); } +void +__v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo, + unsigned int count) +{ + if (qinfo->queries) { + unsigned int i; + + for (i = 0; i < count; i++) + drm_syncobj_put(qinfo->queries[i].syncobj); + + kvfree(qinfo->queries); + } +} + static void v3d_cpu_job_free(struct drm_sched_job *sched_job) { struct v3d_cpu_job *job = to_cpu_job(sched_job); - struct v3d_timestamp_query_info *timestamp_query = >timestamp_query; struct v3d_performance_query_info *performance_query = >performance_query; - if (timestamp_query->queries) { - for (int i = 0; i < timestamp_query->count; i++) - drm_syncobj_put(timestamp_query->queries[i].syncobj); - kvfree(timestamp_query->queries); - } + __v3d_timestamp_query_info_free(>timestamp_query, + job->timestamp_query.count); if (performance_query->queries) { for (int i = 0; i < performance_query->count; i++) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index c960bc6ca32d..0f1c900c7d35 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -454,6 +454,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, struct drm_v3d_timestamp_query timestamp; struct v3d_timestamp_query_info *qinfo = >timestamp_query; unsigned int i; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -486,15 +487,15 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, u32 offset, sync; if (get_user(offset, offsets++)) { - kvfree(qinfo->queries); - return -EFAULT; + err = -EFAULT; + goto error; } qinfo->queries[i].offset = offset; if (get_user(sync, syncs++)) { - kvfree(qinfo->queries); - return -EFAULT; + err = -EFAULT; + goto error; } qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); @@ -502,6 +503,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, qinfo->count = timestamp.count; return 0; + +error: + __v3d_timestamp_query_info_free(qinfo, i); + return err; } static int @@ -513,6 +518,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, struct drm_v3d_reset_timestamp_query reset; struct v3d_timestamp_query_info *qinfo = >timestamp_query; unsigned int i; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -543,8 +549,8 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, qinfo->queries[i].offset = reset.offset + 8 * i; if (get_user(sync, syncs++)) { - kvfree(qinfo->queries); - return -EFAULT; + err = -EFAULT; + got
[PATCH 08/12] drm/v3d: Validate passed in drm syncobj handles in the performance extension
From: Tvrtko Ursulin If userspace provides an unknown or invalid handle anywhere in the handle array the rest of the driver will not handle that well. Fix it by checking handle was looked up successfuly or otherwise fail the extension by jumping into the existing unwind. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job" Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_submit.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index a408db3d3e32..2c4bb39c9ac6 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -714,6 +714,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, } qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!qinfo->queries[i].syncobj) { + err = -ENOENT; + goto error; + } } qinfo->count = reset.count; qinfo->nperfmons = reset.nperfmons; @@ -795,6 +799,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, } qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); + if (!qinfo->queries[i].syncobj) { + err = -ENOENT; + goto error; + } } qinfo->count = copy.count; qinfo->nperfmons = copy.nperfmons; -- 2.44.0
[PATCH 06/12] drm/v3d: Fix potential memory leak in the performance extension
From: Tvrtko Ursulin If fetching of userspace memory fails during the main loop, all drm sync objs looked up until that point will be leaked because of the missing drm_syncobj_put. Fix it by exporting and using a common cleanup helper. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job" Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_drv.h| 2 ++ drivers/gpu/drm/v3d/v3d_sched.c | 22 - drivers/gpu/drm/v3d/v3d_submit.c | 42 3 files changed, 44 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 95651c3c926f..38c80168da51 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo); /* v3d_sched.c */ void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo, unsigned int count); +void __v3d_performance_query_info_free(struct v3d_performance_query_info *qinfo, + unsigned int count); void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue); int v3d_sched_init(struct v3d_dev *v3d); void v3d_sched_fini(struct v3d_dev *v3d); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index e45d3ddc6f82..173801aa54ee 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -87,20 +87,30 @@ __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo, } } +void +__v3d_performance_query_info_free(struct v3d_performance_query_info *qinfo, + unsigned int count) +{ + if (qinfo->queries) { + unsigned int i; + + for (i = 0; i < count; i++) + drm_syncobj_put(qinfo->queries[i].syncobj); + + kvfree(qinfo->queries); + } +} + static void v3d_cpu_job_free(struct drm_sched_job *sched_job) { struct v3d_cpu_job *job = to_cpu_job(sched_job); - struct v3d_performance_query_info *performance_query = >performance_query; __v3d_timestamp_query_info_free(>timestamp_query, job->timestamp_query.count); - if (performance_query->queries) { - for (int i = 0; i < performance_query->count; i++) - drm_syncobj_put(performance_query->queries[i].syncobj); - kvfree(performance_query->queries); - } + __v3d_performance_query_info_free(>performance_query, + job->performance_query.count); v3d_job_cleanup(>base); } diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 0f1c900c7d35..81afcfccc6bb 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -645,6 +645,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, struct drm_v3d_reset_performance_query reset; struct v3d_performance_query_info *qinfo = >performance_query; unsigned int i, j; + int err; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -680,32 +681,36 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, u32 id; if (get_user(sync, syncs++)) { - kvfree(qinfo->queries); - return -EFAULT; + err = -EFAULT; + goto error; } - qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (get_user(ids, kperfmon_ids++)) { - kvfree(qinfo->queries); - return -EFAULT; + err = -EFAULT; + goto error; } ids_pointer = u64_to_user_ptr(ids); for (j = 0; j < reset.nperfmons; j++) { if (get_user(id, ids_pointer++)) { - kvfree(qinfo->queries); - return -EFAULT; + err = -EFAULT; + goto error; } qinfo->queries[i].kperfmon_ids[j] = id; } + + qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); } qinfo->count = reset.count; qinfo->nperfmons = reset.nperfmons; return 0; + +error: + __v3d_performance_query_info_free(qinfo, i); + return err; } static int @@ -718,6 +723,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, struct drm_v3d_cop
[PATCH 04/12] drm/v3d: Align data types of internal and uapi counts
From: Tvrtko Ursulin In the timestamp and performance extensions userspace type for counts is u32 so lets use unsigned in the kernel too. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_submit.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index f99cd61a3e65..c960bc6ca32d 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -453,6 +453,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; struct v3d_timestamp_query_info *qinfo = >timestamp_query; + unsigned int i; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -481,7 +482,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, offsets = u64_to_user_ptr(timestamp.offsets); syncs = u64_to_user_ptr(timestamp.syncs); - for (int i = 0; i < timestamp.count; i++) { + for (i = 0; i < timestamp.count; i++) { u32 offset, sync; if (get_user(offset, offsets++)) { @@ -511,6 +512,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; struct v3d_timestamp_query_info *qinfo = >timestamp_query; + unsigned int i; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -535,7 +537,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, syncs = u64_to_user_ptr(reset.syncs); - for (int i = 0; i < reset.count; i++) { + for (i = 0; i < reset.count; i++) { u32 sync; qinfo->queries[i].offset = reset.offset + 8 * i; @@ -561,7 +563,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, u32 __user *offsets, *syncs; struct drm_v3d_copy_timestamp_query copy; struct v3d_timestamp_query_info *qinfo = >timestamp_query; - int i; + unsigned int i; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -627,6 +629,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, u64 __user *kperfmon_ids; struct drm_v3d_reset_performance_query reset; struct v3d_performance_query_info *qinfo = >performance_query; + unsigned int i, j; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -655,7 +658,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, syncs = u64_to_user_ptr(reset.syncs); kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids); - for (int i = 0; i < reset.count; i++) { + for (i = 0; i < reset.count; i++) { u32 sync; u64 ids; u32 __user *ids_pointer; @@ -675,7 +678,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, ids_pointer = u64_to_user_ptr(ids); - for (int j = 0; j < reset.nperfmons; j++) { + for (j = 0; j < reset.nperfmons; j++) { if (get_user(id, ids_pointer++)) { kvfree(qinfo->queries); return -EFAULT; @@ -699,6 +702,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, u64 __user *kperfmon_ids; struct drm_v3d_copy_performance_query copy; struct v3d_performance_query_info *qinfo = >performance_query; + unsigned int i, j; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -730,7 +734,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, syncs = u64_to_user_ptr(copy.syncs); kperfmon_ids = u64_to_user_ptr(copy.kperfmon_ids); - for (int i = 0; i < copy.count; i++) { + for (i = 0; i < copy.count; i++) { u32 sync; u64 ids; u32 __user *ids_pointer; @@ -750,7 +754,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, ids_pointer = u64_to_user_ptr(ids); - for (int j = 0; j < copy.nperfmons; j++) { + for (j = 0; j < copy.nperfmons; j++) { if (get_user(id, ids_pointer++)) { kvfree(qinfo->queries); return -EFAULT; -- 2.44.0
[PATCH 03/12] drm/v3d: Add some local variables in queries/extensions
From: Tvrtko Ursulin Add some local variables to make the code a bit less verbose, with the main benefit being pulling some lines to under 80 columns wide. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_submit.c | 103 --- 1 file changed, 54 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 5c71e9adfc65..f99cd61a3e65 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_timestamp_query timestamp; + struct v3d_timestamp_query_info *qinfo = >timestamp_query; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -471,10 +472,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(timestamp.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + qinfo->queries = kvmalloc_array(timestamp.count, + sizeof(struct v3d_timestamp_query), + GFP_KERNEL); + if (!qinfo->queries) return -ENOMEM; offsets = u64_to_user_ptr(timestamp.offsets); @@ -484,20 +485,20 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, u32 offset, sync; if (get_user(offset, offsets++)) { - kvfree(job->timestamp_query.queries); + kvfree(qinfo->queries); return -EFAULT; } - job->timestamp_query.queries[i].offset = offset; + qinfo->queries[i].offset = offset; if (get_user(sync, syncs++)) { - kvfree(job->timestamp_query.queries); + kvfree(qinfo->queries); return -EFAULT; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); } - job->timestamp_query.count = timestamp.count; + qinfo->count = timestamp.count; return 0; } @@ -509,6 +510,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, { u32 __user *syncs; struct drm_v3d_reset_timestamp_query reset; + struct v3d_timestamp_query_info *qinfo = >timestamp_query; if (!job) { DRM_DEBUG("CPU job extension was attached to a GPU job.\n"); @@ -525,10 +527,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY; - job->timestamp_query.queries = kvmalloc_array(reset.count, - sizeof(struct v3d_timestamp_query), - GFP_KERNEL); - if (!job->timestamp_query.queries) + qinfo->queries = kvmalloc_array(reset.count, + sizeof(struct v3d_timestamp_query), + GFP_KERNEL); + if (!qinfo->queries) return -ENOMEM; syncs = u64_to_user_ptr(reset.syncs); @@ -536,16 +538,16 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, for (int i = 0; i < reset.count; i++) { u32 sync; - job->timestamp_query.queries[i].offset = reset.offset + 8 * i; + qinfo->queries[i].offset = reset.offset + 8 * i; if (get_user(sync, syncs++)) { - kvfree(job->timestamp_query.queries); + kvfree(qinfo->queries); return -EFAULT; } - job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); + qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync); } - job->timestamp_query.count = reset.count; + qinfo->count = reset.count; return 0; } @@ -558,6 +560,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, { u32 __user *offsets, *syncs; struct drm_v3d_copy_timestamp_query copy; + struct v3d_timestamp_query_info *qinfo = >timestamp_query; int i; if (!job) { @@ -578,10 +581,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, job->job_type = V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY; - job-&g
[PATCH 02/12] drm/v3d: Prefer get_user for scalar types
From: Tvrtko Ursulin It makes it just a tiny bit more obvious what is going on. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/v3d/v3d_submit.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 263fefc1d04f..5c71e9adfc65 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -483,14 +483,14 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv, for (int i = 0; i < timestamp.count; i++) { u32 offset, sync; - if (copy_from_user(, offsets++, sizeof(offset))) { + if (get_user(offset, offsets++)) { kvfree(job->timestamp_query.queries); return -EFAULT; } job->timestamp_query.queries[i].offset = offset; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { kvfree(job->timestamp_query.queries); return -EFAULT; } @@ -538,7 +538,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv, job->timestamp_query.queries[i].offset = reset.offset + 8 * i; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { kvfree(job->timestamp_query.queries); return -EFAULT; } @@ -590,14 +590,14 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv, for (i = 0; i < copy.count; i++) { u32 offset, sync; - if (copy_from_user(, offsets++, sizeof(offset))) { + if (get_user(offset, offsets++)) { kvfree(job->timestamp_query.queries); return -EFAULT; } job->timestamp_query.queries[i].offset = offset; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { kvfree(job->timestamp_query.queries); return -EFAULT; } @@ -657,14 +657,14 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, u32 __user *ids_pointer; u32 id; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { kvfree(job->performance_query.queries); return -EFAULT; } job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (copy_from_user(, kperfmon_ids++, sizeof(ids))) { + if (get_user(ids, kperfmon_ids++)) { kvfree(job->performance_query.queries); return -EFAULT; } @@ -672,7 +672,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, ids_pointer = u64_to_user_ptr(ids); for (int j = 0; j < reset.nperfmons; j++) { - if (copy_from_user(, ids_pointer++, sizeof(id))) { + if (get_user(id, ids_pointer++)) { kvfree(job->performance_query.queries); return -EFAULT; } @@ -731,14 +731,14 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, u32 __user *ids_pointer; u32 id; - if (copy_from_user(, syncs++, sizeof(sync))) { + if (get_user(sync, syncs++)) { kvfree(job->performance_query.queries); return -EFAULT; } job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync); - if (copy_from_user(, kperfmon_ids++, sizeof(ids))) { + if (get_user(ids, kperfmon_ids++)) { kvfree(job->performance_query.queries); return -EFAULT; } @@ -746,7 +746,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, ids_pointer = u64_to_user_ptr(ids); for (int j = 0; j < copy.nperfmons; j++) { - if (copy_from_user(, ids_pointer++, sizeof(id))) { + if (get_user(id, ids_pointer++)) { kvfree(job->performance_query.queries); return -EFAULT; } -- 2.44.0
[PATCH 00/12] v3d: Perfmon cleanup
From: Tvrtko Ursulin When we had to quickly deal with a tree build issue via merging 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we promised to follow up with a nicer solution. As in the process of eliminating the hardcoded defines we have discovered a few issues in handling of corner cases and userspace input validation, the fix has turned into a larger series, but hopefully the end result is a justifiable cleanup. Tvrtko Ursulin (12): drm/v3d: Prevent out of bounds access in performance query extensions drm/v3d: Prefer get_user for scalar types drm/v3d: Add some local variables in queries/extensions drm/v3d: Align data types of internal and uapi counts drm/v3d: Fix potential memory leak in the timestamp extension drm/v3d: Fix potential memory leak in the performance extension drm/v3d: Validate passed in drm syncobj handles in the timestamp extension drm/v3d: Validate passed in drm syncobj handles in the performance extension drm/v3d: Move part of copying of reset/copy performance extension to a helper drm/v3d: Size the kperfmon_ids array at runtime drm/v3d: Do not use intermediate storage when copying performance query results drm/v3d: Move perfmon init completely into own unit drivers/gpu/drm/v3d/v3d_drv.c | 9 +- drivers/gpu/drm/v3d/v3d_drv.h | 16 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +-- .../gpu/drm/v3d/v3d_performance_counters.h| 16 +- drivers/gpu/drm/v3d/v3d_sched.c | 106 --- drivers/gpu/drm/v3d/v3d_submit.c | 285 ++ 6 files changed, 281 insertions(+), 195 deletions(-) -- 2.44.0
[PATCH 01/12] drm/v3d: Prevent out of bounds access in performance query extensions
From: Tvrtko Ursulin Check that the number of perfmons userspace is passing in the copy and reset extensions is not greater than the internal kernel storage where the ids will be copied into. Signed-off-by: Tvrtko Ursulin Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance query job" Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: # v6.8+ --- drivers/gpu/drm/v3d/v3d_submit.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index 88f63d526b22..263fefc1d04f 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv, if (copy_from_user(, ext, sizeof(reset))) return -EFAULT; + if (reset.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(reset.count, @@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv, if (copy.pad) return -EINVAL; + if (copy.nperfmons > V3D_MAX_PERFMONS) + return -EINVAL; + job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY; job->performance_query.queries = kvmalloc_array(copy.count, -- 2.44.0
Re: [PATCH] drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8
On 09/07/2024 13:53, Nitin Gote wrote: We're seeing a GPU HANG issue on a CHV platform, which was caused by bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8"). Gen8 platform has only timeslice and doesn't support a preemption mechanism as engines do not have a preemption timer and doesn't send an irq if the preemption timeout expires. So, add a fix to not consider preemption during dequeuing for gen8 platforms. Also move can_preemt() above need_preempt() function to resolve implicit declaration of function ‘can_preempt' error and make can_preempt() function param as const to resolve error: passing argument 1 of ‘can_preempt’ discards ‘const’ qualifier from the pointer target type. Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396 Suggested-by: Andi Shyti Signed-off-by: Nitin Gote Cc: Chris Wilson CC: # v5.2+ --- .../drm/i915/gt/intel_execlists_submission.c | 24 --- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 21829439e686..30631cc690f2 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -294,11 +294,26 @@ static int virtual_prio(const struct intel_engine_execlists *el) return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; } +static bool can_preempt(const struct intel_engine_cs *engine) +{ + if (GRAPHICS_VER(engine->i915) > 8) + return true; + + if (IS_CHERRYVIEW(engine->i915) || IS_BROADWELL(engine->i915)) + return false; + + /* GPGPU on bdw requires extra w/a; not implemented */ + return engine->class != RENDER_CLASS; Aren't BDW and CHV the only Gen8 platforms, in which case this function can be simplifies as: ... { return GRAPHICS_VER(engine->i915) > 8; } ? +} + static bool need_preempt(const struct intel_engine_cs *engine, const struct i915_request *rq) { int last_prio; + if ((GRAPHICS_VER(engine->i915) <= 8) && can_preempt(engine)) The GRAPHICS_VER check here looks redundant with the one inside can_preempt(). Regards, Tvrtko + return false; + if (!intel_engine_has_semaphores(engine)) return false; @@ -3313,15 +3328,6 @@ static void remove_from_engine(struct i915_request *rq) i915_request_notify_execute_cb_imm(rq); } -static bool can_preempt(struct intel_engine_cs *engine) -{ - if (GRAPHICS_VER(engine->i915) > 8) - return true; - - /* GPGPU on bdw requires extra w/a; not implemented */ - return engine->class != RENDER_CLASS; -} - static void kick_execlists(const struct i915_request *rq, int prio) { struct intel_engine_cs *engine = rq->engine;
[PULL] drm-intel-gt-next
Hi Dave, Sima, The final pull for 6.11 is quite small and only contains a handful of fixes in areas such as stolen memory probing on ATS-M, GuC priority handling, out of memory reporting noise downgrade and fence register hanlding race condition reported by CI. Regards, Tvrtko drm-intel-gt-next-2024-07-04: Driver Changes: Fixes/improvements/new stuff: - Downgrade stolen lmem setup warning [gem] (Jonathan Cavitt) - Evaluate GuC priority within locks [gt/uc] (Andi Shyti) - Fix potential UAF by revoke of fence registers [gt] (Janusz Krzysztofik) - Return NULL instead of '0' [gem] (Andi Shyti) - Use the correct format specifier for resource_size_t [gem] (Andi Shyti) - Suppress oom warning in favour of ENOMEM to userspace [gem] (Nirmoy Das) Miscellaneous: - Evaluate forcewake usage within locks [gt] (Andi Shyti) - Fix typo in comment [gt/uc] (Andi Shyti) The following changes since commit 79655e867ad6dfde2734c67c7704c0dd5bf1e777: drm/i915/mtl: Update workaround 14018575942 (2024-06-11 16:06:20 +0200) are available in the Git repository at: https://gitlab.freedesktop.org/drm/i915/kernel.git tags/drm-intel-gt-next-2024-07-04 for you to fetch changes up to 3b85152cb167bd24fe84ceb91b719b5904ca354f: drm/i915/gem: Suppress oom warning in favour of ENOMEM to userspace (2024-06-28 00:11:01 +0200) Driver Changes: Fixes/improvements/new stuff: - Downgrade stolen lmem setup warning [gem] (Jonathan Cavitt) - Evaluate GuC priority within locks [gt/uc] (Andi Shyti) - Fix potential UAF by revoke of fence registers [gt] (Janusz Krzysztofik) - Return NULL instead of '0' [gem] (Andi Shyti) - Use the correct format specifier for resource_size_t [gem] (Andi Shyti) - Suppress oom warning in favour of ENOMEM to userspace [gem] (Nirmoy Das) Miscellaneous: - Evaluate forcewake usage within locks [gt] (Andi Shyti) - Fix typo in comment [gt/uc] (Andi Shyti) Andi Shyti (5): drm/i915/gt: debugfs: Evaluate forcewake usage within locks drm/i915/gt/uc: Fix typo in comment drm/i915/gt/uc: Evaluate GuC priority within locks drm/i915/gem: Return NULL instead of '0' drm/i915/gem: Use the correct format specifier for resource_size_t Janusz Krzysztofik (1): drm/i915/gt: Fix potential UAF by revoke of fence registers Jonathan Cavitt (1): drm/i915/gem: Downgrade stolen lmem setup warning Nirmoy Das (1): drm/i915/gem: Suppress oom warning in favour of ENOMEM to userspace drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 8 +-- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 1 + drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 4 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 2 +- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++- drivers/gpu/drm/i915/i915_scatterlist.c | 8 +++ 6 files changed, 32 insertions(+), 18 deletions(-)
Re: [PATCH 1/4] drm/scheduler: implement hardware time accounting
Hi, I few questions below. On 01/07/2024 18:14, Lucas Stach wrote: From: Christian König Multiple drivers came up with the requirement to measure how much runtime each entity accumulated on the HW. A previous attempt of accounting this had to be reverted because HW submissions can have a lifetime exceeding that of the entity originally issuing them. Amdgpu on the other hand solves this task by keeping track of all the submissions and calculating how much time they have used on demand. Move this approach over into the scheduler to provide an easy to use interface for all drivers. Signed-off-by: Christian König Signed-off-by: Lucas Stach --- v2: - rebase to v6.10-rc1 - fix for non-power-of-two number of HW submission - add comment explaining the logic behind the fence tracking array - rename some function and fix documentation --- drivers/gpu/drm/scheduler/sched_entity.c | 82 +++- drivers/gpu/drm/scheduler/sched_fence.c | 19 ++ include/drm/gpu_scheduler.h | 31 + 3 files changed, 131 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 58c8161289fe..d678d0b9b29e 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -62,7 +62,9 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, unsigned int num_sched_list, atomic_t *guilty) { - if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0]))) + unsigned int i, num_submissions = 0; + + if (!entity || !sched_list) return -EINVAL; memset(entity, 0, sizeof(struct drm_sched_entity)); @@ -98,6 +100,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, (s32) DRM_SCHED_PRIORITY_KERNEL); } entity->rq = sched_list[0]->sched_rq[entity->priority]; + + for (i = 0; i < num_sched_list; ++i) { + num_submissions = max(num_submissions, + sched_list[i]->credit_limit); + } Does this work (in concept and naming) for all drivers if introduction of credits broke the 1:1 between jobs and hw "ring" capacity? How big is the array for different drivers? } init_completion(>entity_idle); @@ -110,11 +117,52 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, atomic_set(>fence_seq, 0); entity->fence_context = dma_fence_context_alloc(2); + spin_lock_init(>accounting_lock); + + if (!num_submissions) + return 0; + + entity->max_hw_submissions = num_submissions; + entity->hw_submissions = kcalloc(num_submissions, sizeof(void *), +GFP_KERNEL); + if (!entity->hw_submissions) + return -ENOMEM; return 0; } EXPORT_SYMBOL(drm_sched_entity_init); +/** + * drm_sched_entity_time_spent - Accumulated HW runtime used by this entity + * @entity: scheduler entity to check + * + * Get the current accumulated HW runtime used by all submissions made through + * this entity. + */ +ktime_t drm_sched_entity_time_spent(struct drm_sched_entity *entity) +{ + ktime_t result; + unsigned int i; + + if (!entity->max_hw_submissions) + return ns_to_ktime(0); + + spin_lock(>accounting_lock); + result = entity->hw_time_used; + for (i = 0; i < entity->max_hw_submissions; ++i) { + struct drm_sched_fence *fence = entity->hw_submissions[i]; + + if (!fence) + continue; + + result = ktime_add(result, drm_sched_fence_get_runtime(fence)); Does this end up counting from when jobs have been submitted to the hw backend and may not be actually executing? Say if a driver configures a backend N deep and is filled with N jobs, while in actuality they are executed sequentially one at a time, the time as reported here would over-account by some series such as (job[0].finish - job[0].submit) + ... + (job[N].finish - job[N].submit)? Or in other words if one submits N jobs to a ring serving an 1-wide hw backend, will we see "N*100%" utilisation instead of "100%" if sampling while first job is still executing, the rest queued behind it? Regards, Tvrtko + } + spin_unlock(>accounting_lock); + + return result; +} +EXPORT_SYMBOL(drm_sched_entity_time_spent); + /** * drm_sched_entity_modify_sched - Modify sched of an entity * @entity: scheduler entity to init @@ -326,6 +374,8 @@ EXPORT_SYMBOL(drm_sched_entity_flush); */ void drm_sched_entity_fini(struct drm_sched_entity *entity) { + unsigned int i; + /* * If consumption of existing IBs wasn't completed. Forcefully remove * them here. Also makes sure that the
Re: [RFC PATCH 2/6] drm/cgroup: Add memory accounting DRM cgroup
On 01/07/2024 10:25, Maarten Lankhorst wrote: Den 2024-06-28 kl. 16:04, skrev Maxime Ripard: Hi, On Thu, Jun 27, 2024 at 09:22:56PM GMT, Maarten Lankhorst wrote: Den 2024-06-27 kl. 19:16, skrev Maxime Ripard: Hi, Thanks for working on this! On Thu, Jun 27, 2024 at 05:47:21PM GMT, Maarten Lankhorst wrote: The initial version was based roughly on the rdma and misc cgroup controllers, with a lot of the accounting code borrowed from rdma. The current version is a complete rewrite with page counter; it uses the same min/low/max semantics as the memory cgroup as a result. There's a small mismatch as TTM uses u64, and page_counter long pages. In practice it's not a problem. 32-bits systems don't really come with =4GB cards and as long as we're consistently wrong with units, it's fine. The device page size may not be in the same units as kernel page size, and each region might also have a different page size (VRAM vs GART for example). The interface is simple: - populate drmcgroup_device->regions[..] name and size for each active region, set num_regions accordingly. - Call drm(m)cg_register_device() - Use drmcg_try_charge to check if you can allocate a chunk of memory, use drmcg_uncharge when freeing it. This may return an error code, or -EAGAIN when the cgroup limit is reached. In that case a reference to the limiting pool is returned. - The limiting cs can be used as compare function for drmcs_evict_valuable. - After having evicted enough, drop reference to limiting cs with drmcs_pool_put. This API allows you to limit device resources with cgroups. You can see the supported cards in /sys/fs/cgroup/drm.capacity You need to echo +drm to cgroup.subtree_control, and then you can partition memory. Signed-off-by: Maarten Lankhorst Co-developed-by: Friedrich Vock I'm sorry, I should have wrote minutes on the discussion we had with TJ and Tvrtko the other day. We're all very interested in making this happen, but doing a "DRM" cgroup doesn't look like the right path to us. Indeed, we have a significant number of drivers that won't have a dedicated memory but will depend on DMA allocations one way or the other, and those pools are shared between multiple frameworks (DRM, V4L2, DMA-Buf Heaps, at least). This was also pointed out by Sima some time ago here: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/ So we'll want that cgroup subsystem to be cross-framework. We settled on a "device" cgroup during the discussion, but I'm sure we'll have plenty of bikeshedding. The other thing we agreed on, based on the feedback TJ got on the last iterations of his series was to go for memcg for drivers not using DMA allocations. It's the part where I expect some discussion there too :) So we went back to a previous version of TJ's work, and I've started to work on: - Integration of the cgroup in the GEM DMA and GEM VRAM helpers (this works on tidss right now) - Integration of all heaps into that cgroup but the system one (working on this at the moment) Should be similar to what I have then. I think you could use my work to continue it. I made nothing DRM specific except the name, if you renamed it the device resource management cgroup and changed the init function signature to take a name instead of a drm pointer, nothing would change. This is exactly what I'm hoping to accomplish, including reserving memory. I've started to work on rebasing my current work onto your series today, and I'm not entirely sure how what I described would best fit. Let's assume we have two KMS device, one using shmem, one using DMA allocations, two heaps, one using the page allocator, the other using CMA, and one v4l2 device using dma allocations. So we would have one KMS device and one heap using the page allocator, and one KMS device, one heap, and one v4l2 driver using the DMA allocator. Would these make different cgroup devices, or different cgroup regions? Each driver would register a device, whatever feels most logical for that device I suppose. My guess is that a prefix would also be nice here, so register a device with name of drm/$name or v4l2/$name, heap/$name. I didn't give it much thought and we're still experimenting, so just try something. :) There's no limit to amount of devices, I only fixed amount of pools to match TTM, but even that could be increased arbitrarily. I just don't think there is a point in doing so. Do we need a plan for top level controls which do not include region names? If the latter will be driver specific then I am thinking of ease of configuring it all from the outside. Especially considering that one cgroup can have multiple devices in it. Second question is about double accounting for shmem backed objects. I think they will be seen, for drivers which allocate backing store at buffer objects creation time, under the cgroup of process doing the creation, in the existing memory controller. Right? Is
Re: [PATCH 5/6] drm/amdgpu: always enable move threshold for BOs
Hey Christian, Any thoughts on the below reply? Did I get it wrong or I found a legitimate issue? Regards, Tvrtko On 14/06/2024 17:06, Tvrtko Ursulin wrote: On 14/06/2024 10:53, Christian König wrote: if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM && - !(adev->flags & AMD_IS_APU)) - places[c].flags |= TTM_PL_FLAG_FALLBACK; + !(adev->flags & AMD_IS_APU)) { + /* + * When GTT is just an alternative to VRAM make sure that we + * only use it as fallback and still try to fill up VRAM first. + */ + if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT) + places[c].flags |= TTM_PL_FLAG_FALLBACK; + + /* + * Enable GTT when the threshold of moved bytes is + * reached. This prevents any non essential buffer move + * when the links are already saturated. + */ + places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD; + } For the APU case I *think* this works, but for discrete I am not sure yet. Agree, APUs are basically already fine as they are. VRAM is just used so that it isn't wasted there. Well yeah it works, but because re-validation is broken so it cannot hit the broken migration budget. ;) As a side note and disclaimer, the TTM "resource compatible" logic has a half-life of about one week in my brain until I need to almost re-figure it all out. I don't know if it just me, but I find it really non-intuitive and almost like double, triple, or even quadruple negation way of thinking about things. Yeah I was also going back and forth between the different approaches multiple times and just ended up in this implementation because it seemed to do what I wanted to have. It's certainly not very intuitive what's going on here. It is not helping that with this proposal you set threshold on just one of the possible object placements which further increases the asymmetry. For me intuitive thing would be that thresholds apply to the act of changing the current placement directly. Not indirectly via playing with one of the placement flags dynamically. Interesting idea, how would the handling then be? Currently we have only the stages - 'don't evict' and 'evict'. Should we make it something more like 'don't move', 'move', 'evict' ? Intuitively I would think "don't move" aligns with the "out of migration budget" concept. Since in this patch you add move_threshold to ttm_operation_context could it simply be used as the overall criteria if it is set? In a way like: 1. If the current placement is from the list of userspace supplied valid ones, and 2. Migration limit has been set, and 3. It is spent. -> Then just don't migrate, return "all is good" from ttm_bo_validate. Though I am not sure at the moment how that would interact with the amdgpu_evict_flags and placements userspace did not specify. Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it will be considered compatible while under the migration budget? (Side note, the fact both flags are set I also find very difficult to mentally model.) Say a buffer was evicted to GTT already. What then brings it back to VRAM? The first subsequent ttm_bo_validate pass (!evicting) says GTT is fine (applicable) while ctx->bytes_moved < ctx->move_threshold, no? Isn't that the opposite of what would be required and causes nothing to be migrated back in? What am I missing? The flag says that GTT is fine when ctx->bytes_moved >= ctx->move_threshold. The logic is exactly inverted to what you described. This way a BO will be moved back into VRAM as long as bytes moved doesn't exceed the threshold. I'm afraid I need to sketch it out... If buffer is currently in GTT and placements are VRAM+GTT. ttm_bo_validate(evicting=false) 1st iteration: res=GTT != place=VRAM continue 2nd iteration: res=GTT == place=GTT+FALLBACK+THRESHOLD ttm_place_applicable(GTT) moved < threshold return true Buffer stays in GTT while under migration budget -> wrong, no? Or am I still confused? Regards, Tvrtko Setting both flags has the effect of saying: It's ok that the BO stays in GTT when you either above the move threshold or would have to evict something. Regards, Christian. Regards, Tvrtko
Re: [PATCH] dma-buf/sw_sync: Add a reference when adding fence to timeline list
On 14/06/2024 19:00, Thadeu Lima de Souza Cascardo wrote: On Fri, Jun 14, 2024 at 11:52:03AM +0100, Tvrtko Ursulin wrote: On 24/03/2024 10:15, Thadeu Lima de Souza Cascardo wrote: commit e531fdb5cd5e ("dma-buf/sw_sync: Avoid recursive lock during fence signal") fixed a recursive locking when a signal callback released a fence. It did it by taking an extra reference while traversing it on the list and holding the timeline lock. However, this is racy and may end up adding to a kref that is 0, triggering a well deserved warning, as later that reference would be put again. CPU 0 CPU 1 sync_file_release sync_timeline_signal dma_fence_put timeline_fence_release spin_lock_irq(>lock) dma_fence_get(>base) spin_lock_irqsave(fence->lock, flags) As shown above, it is possible for the last reference to be dropped, but sync_timeline_signal take the lock before timeline_fence_release, which will lead to a 0->1 kref transition, which is not allowed. This is because there is still a pointer to the fence object in the list, which should be accounted as a reference. In previous discussions about this [3], it was called out that keeping such a reference was not a good idea because the fence also holds a reference to the timeline, hence leading to a loop. However, accounting for that reference doesn't change that the loop already exists. And userspace holds references in the form of file descriptors, so it is still possible to avoid potential memory leaks. This fix also avoids other issues. The nested locking is still possible to trigger when closing the timeline, as sw_sync_debugfs_release also calls dma_fence_signal_locked while holding the lock. By holding a reference and releasing it only after doing the signal, that nested locking is avoided. There are a few quirks about the reference counting here, though. In the simple case when sync_pt_create adds a new fence to the list, it returns with 2 references instead of 1. That is dealt with as sw_sync_ioctl_create_fence always puts a reference after calling sync_file_create. That is necessary for multiple reasons. One is that it takes care of the error case when sync_file_create fails. The extra reference is put, while the fence is still held on the list, so its last reference will be put when it is removed from the list either in sync_timeline_signal or sw_sync_debugfs_release. So any fences where sync_file_create failed linger around until sw_sync_debugfs_release? Okay-ish I guess since it is a pathological case. The challenge here is to determine which one of the multiple cases we are dealing with. Since we don't hold the lock while sync_file_create is called, we are left with this situation. An alternative would be to fold sync_pt_create into sw_sync_ioctl_create_fence, so at least we can determine which case is which. That would also fix the case where we handle userspace a file descriptor with a fence that is not even on the list. Since sync_pt_create is local and has only this single caller it could be worth exploring this option to see if it could simplify things and get rid of this lingering objects corner case. It also avoids the race when a signal may come in between sync_pt_create and sync_file_create as the lock is dropped. If that happens, the fence will be removed from the list, but a reference will still be kept as sync_file_create takes a reference. Then, there is the case when a fence with the given seqno already exists. sync_pt_create returns with an extra reference to it, that we later put. Similar reasoning can be applied here. That one extra reference is necessary to avoid a race with signaling (and release), and we later put that extra reference. Finally, there is the case when the fence is already signaled and not added to the list. In such case, sync_pt_create must return with a single reference as this fence has not been added to the timeline list. It will either be freed in case sync_file_create fails or the file will keep its reference, which is later put when the file is released. This is based on Chris Wilson attempt [2] to fix recursive locking during timeline signal. Hence, their signoff. Link: https://lore.kernel.org/all/20200714154102.450826-1-...@basnieuwenhuizen.nl/ [1] Link: https://lore.kernel.org/all/20200715100432.13928-2-ch...@chris-wilson.co.uk/ [2] Link: https://lore.kernel.org/all/20230817213729.110087-1-robdcl...@gmail.com/T/ [3] Fixes: e531fdb5cd5e ("dma-buf/sw_sync: Avoid recursive lock during fence signal") Signed-off-by: Chris Wilson Signed-off-by: Thadeu Lima de Souza Cascardo Cc: Chris Wilson Cc: Bas Nieuwenhuizen Cc: Rob Clark --- drivers/dma-buf/sw_sync.c | 42 --- 1 file changed, 17 insertions(+), 25 deletions(-) diff --git a/drivers/dma-buf/sw_sync.c b
Re: [PATCH 5/6] drm/amdgpu: always enable move threshold for BOs
On 14/06/2024 10:53, Christian König wrote: if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM && - !(adev->flags & AMD_IS_APU)) - places[c].flags |= TTM_PL_FLAG_FALLBACK; + !(adev->flags & AMD_IS_APU)) { + /* + * When GTT is just an alternative to VRAM make sure that we + * only use it as fallback and still try to fill up VRAM first. + */ + if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT) + places[c].flags |= TTM_PL_FLAG_FALLBACK; + + /* + * Enable GTT when the threshold of moved bytes is + * reached. This prevents any non essential buffer move + * when the links are already saturated. + */ + places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD; + } For the APU case I *think* this works, but for discrete I am not sure yet. Agree, APUs are basically already fine as they are. VRAM is just used so that it isn't wasted there. Well yeah it works, but because re-validation is broken so it cannot hit the broken migration budget. ;) As a side note and disclaimer, the TTM "resource compatible" logic has a half-life of about one week in my brain until I need to almost re-figure it all out. I don't know if it just me, but I find it really non-intuitive and almost like double, triple, or even quadruple negation way of thinking about things. Yeah I was also going back and forth between the different approaches multiple times and just ended up in this implementation because it seemed to do what I wanted to have. It's certainly not very intuitive what's going on here. It is not helping that with this proposal you set threshold on just one of the possible object placements which further increases the asymmetry. For me intuitive thing would be that thresholds apply to the act of changing the current placement directly. Not indirectly via playing with one of the placement flags dynamically. Interesting idea, how would the handling then be? Currently we have only the stages - 'don't evict' and 'evict'. Should we make it something more like 'don't move', 'move', 'evict' ? Intuitively I would think "don't move" aligns with the "out of migration budget" concept. Since in this patch you add move_threshold to ttm_operation_context could it simply be used as the overall criteria if it is set? In a way like: 1. If the current placement is from the list of userspace supplied valid ones, and 2. Migration limit has been set, and 3. It is spent. -> Then just don't migrate, return "all is good" from ttm_bo_validate. Though I am not sure at the moment how that would interact with the amdgpu_evict_flags and placements userspace did not specify. Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it will be considered compatible while under the migration budget? (Side note, the fact both flags are set I also find very difficult to mentally model.) Say a buffer was evicted to GTT already. What then brings it back to VRAM? The first subsequent ttm_bo_validate pass (!evicting) says GTT is fine (applicable) while ctx->bytes_moved < ctx->move_threshold, no? Isn't that the opposite of what would be required and causes nothing to be migrated back in? What am I missing? The flag says that GTT is fine when ctx->bytes_moved >= ctx->move_threshold. The logic is exactly inverted to what you described. This way a BO will be moved back into VRAM as long as bytes moved doesn't exceed the threshold. I'm afraid I need to sketch it out... If buffer is currently in GTT and placements are VRAM+GTT. ttm_bo_validate(evicting=false) 1st iteration: res=GTT != place=VRAM continue 2nd iteration: res=GTT == place=GTT+FALLBACK+THRESHOLD ttm_place_applicable(GTT) moved < threshold return true Buffer stays in GTT while under migration budget -> wrong, no? Or am I still confused? Regards, Tvrtko Setting both flags has the effect of saying: It's ok that the BO stays in GTT when you either above the move threshold or would have to evict something. Regards, Christian. Regards, Tvrtko
Re: [PATCH] dma-buf/sw_sync: Add a reference when adding fence to timeline list
On 24/03/2024 10:15, Thadeu Lima de Souza Cascardo wrote: commit e531fdb5cd5e ("dma-buf/sw_sync: Avoid recursive lock during fence signal") fixed a recursive locking when a signal callback released a fence. It did it by taking an extra reference while traversing it on the list and holding the timeline lock. However, this is racy and may end up adding to a kref that is 0, triggering a well deserved warning, as later that reference would be put again. CPU 0 CPU 1 sync_file_release sync_timeline_signal dma_fence_put timeline_fence_release spin_lock_irq(>lock) dma_fence_get(>base) spin_lock_irqsave(fence->lock, flags) As shown above, it is possible for the last reference to be dropped, but sync_timeline_signal take the lock before timeline_fence_release, which will lead to a 0->1 kref transition, which is not allowed. This is because there is still a pointer to the fence object in the list, which should be accounted as a reference. In previous discussions about this [3], it was called out that keeping such a reference was not a good idea because the fence also holds a reference to the timeline, hence leading to a loop. However, accounting for that reference doesn't change that the loop already exists. And userspace holds references in the form of file descriptors, so it is still possible to avoid potential memory leaks. This fix also avoids other issues. The nested locking is still possible to trigger when closing the timeline, as sw_sync_debugfs_release also calls dma_fence_signal_locked while holding the lock. By holding a reference and releasing it only after doing the signal, that nested locking is avoided. There are a few quirks about the reference counting here, though. In the simple case when sync_pt_create adds a new fence to the list, it returns with 2 references instead of 1. That is dealt with as sw_sync_ioctl_create_fence always puts a reference after calling sync_file_create. That is necessary for multiple reasons. One is that it takes care of the error case when sync_file_create fails. The extra reference is put, while the fence is still held on the list, so its last reference will be put when it is removed from the list either in sync_timeline_signal or sw_sync_debugfs_release. So any fences where sync_file_create failed linger around until sw_sync_debugfs_release? Okay-ish I guess since it is a pathological case. It also avoids the race when a signal may come in between sync_pt_create and sync_file_create as the lock is dropped. If that happens, the fence will be removed from the list, but a reference will still be kept as sync_file_create takes a reference. Then, there is the case when a fence with the given seqno already exists. sync_pt_create returns with an extra reference to it, that we later put. Similar reasoning can be applied here. That one extra reference is necessary to avoid a race with signaling (and release), and we later put that extra reference. Finally, there is the case when the fence is already signaled and not added to the list. In such case, sync_pt_create must return with a single reference as this fence has not been added to the timeline list. It will either be freed in case sync_file_create fails or the file will keep its reference, which is later put when the file is released. This is based on Chris Wilson attempt [2] to fix recursive locking during timeline signal. Hence, their signoff. Link: https://lore.kernel.org/all/20200714154102.450826-1-...@basnieuwenhuizen.nl/ [1] Link: https://lore.kernel.org/all/20200715100432.13928-2-ch...@chris-wilson.co.uk/ [2] Link: https://lore.kernel.org/all/20230817213729.110087-1-robdcl...@gmail.com/T/ [3] Fixes: e531fdb5cd5e ("dma-buf/sw_sync: Avoid recursive lock during fence signal") Signed-off-by: Chris Wilson Signed-off-by: Thadeu Lima de Souza Cascardo Cc: Chris Wilson Cc: Bas Nieuwenhuizen Cc: Rob Clark --- drivers/dma-buf/sw_sync.c | 42 --- 1 file changed, 17 insertions(+), 25 deletions(-) diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index c353029789cf..83b624ac4faa 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -151,16 +151,7 @@ static const char *timeline_fence_get_timeline_name(struct dma_fence *fence) static void timeline_fence_release(struct dma_fence *fence) { - struct sync_pt *pt = dma_fence_to_sync_pt(fence); struct sync_timeline *parent = dma_fence_parent(fence); - unsigned long flags; - - spin_lock_irqsave(fence->lock, flags); - if (!list_empty(>link)) { - list_del(>link); - rb_erase(>node, >pt_tree); - } - spin_unlock_irqrestore(fence->lock, flags); sync_timeline_put(parent); dma_fence_free(fence); @@ -229,7 +220,6 @@ static const struct
[PULL] drm-intel-gt-next
Hi Dave, Sima, Here is the main pull request for drm-intel-gt-next targeting 6.11. First is the new userspace API for allowing upload of custom context state used for replaying GPU hang error state captures. This will be used by Mesa (see https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594) for debugging GPU hangs captured in the wild on real hardware. So far that was only possible under simulation and that via some hacks. Also, simulation in general has certain limitations to what hangs it can reproduce. As the UAPI it is intended for Mesa developers only, it is hidden behind a kconfig and runtime enablement switches. Then there are fixes for hangs on Meteorlake due incorrect reduced CCS configuration and a missing video engine workaround. Then fixes for a couple race conditions in multi GT and breadcrumb handling, and a more robust functional level reset by extending the timeout used. A couple tiny cleanups here and there and finally one back-merge which was required to land some display code base refactoring. Regards, Tvrtko drm-intel-gt-next-2024-06-12: UAPI Changes: - Support replaying GPU hangs with captured context image (Tvrtko Ursulin) Driver Changes: Fixes/improvements/new stuff: - Automate CCS Mode setting during engine resets [gt] (Andi Shyti) - Revert "drm/i915: Remove extra multi-gt pm-references" (Janusz Krzysztofik) - Fix HAS_REGION() usage in intel_gt_probe_lmem() (Ville Syrjälä) - Disarm breadcrumbs if engines are already idle [gt] (Chris Wilson) - Shadow default engine context image in the context (Tvrtko Ursulin) - Support replaying GPU hangs with captured context image (Tvrtko Ursulin) - avoid FIELD_PREP warning [guc] (Arnd Bergmann) - Fix CCS id's calculation for CCS mode setting [gt] (Andi Shyti) - Increase FLR timeout from 3s to 9s (Andi Shyti) - Update workaround 14018575942 [mtl] (Angus Chen) Future platform enablement: - Enable w/a 16021333562 for DG2, MTL and ARL [guc] (John Harrison) Miscellaneous: - Pass the region ID rather than a bitmask to HAS_REGION() (Ville Syrjälä) - Remove counter productive REGION_* wrappers (Ville Syrjälä) - Fix typo [gem/i915_gem_ttm_move] (Deming Wang) - Delete the live_hearbeat_fast selftest [gt] (Krzysztof Niemiec) The following changes since commit 431c590c3ab0469dfedad3a832fe73556396ee52: drm/tests: Add a unit test for range bias allocation (2024-05-16 12:50:14 +1000) are available in the Git repository at: https://gitlab.freedesktop.org/drm/i915/kernel.git tags/drm-intel-gt-next-2024-06-12 for you to fetch changes up to 79655e867ad6dfde2734c67c7704c0dd5bf1e777: drm/i915/mtl: Update workaround 14018575942 (2024-06-11 16:06:20 +0200) UAPI Changes: - Support replaying GPU hangs with captured context image (Tvrtko Ursulin) Driver Changes: Fixes/improvements/new stuff: - Automate CCS Mode setting during engine resets [gt] (Andi Shyti) - Revert "drm/i915: Remove extra multi-gt pm-references" (Janusz Krzysztofik) - Fix HAS_REGION() usage in intel_gt_probe_lmem() (Ville Syrjälä) - Disarm breadcrumbs if engines are already idle [gt] (Chris Wilson) - Shadow default engine context image in the context (Tvrtko Ursulin) - Support replaying GPU hangs with captured context image (Tvrtko Ursulin) - avoid FIELD_PREP warning [guc] (Arnd Bergmann) - Fix CCS id's calculation for CCS mode setting [gt] (Andi Shyti) - Increase FLR timeout from 3s to 9s (Andi Shyti) - Update workaround 14018575942 [mtl] (Angus Chen) Future platform enablement: - Enable w/a 16021333562 for DG2, MTL and ARL [guc] (John Harrison) Miscellaneous: - Pass the region ID rather than a bitmask to HAS_REGION() (Ville Syrjälä) - Remove counter productive REGION_* wrappers (Ville Syrjälä) - Fix typo [gem/i915_gem_ttm_move] (Deming Wang) - Delete the live_hearbeat_fast selftest [gt] (Krzysztof Niemiec) Andi Shyti (3): drm/i915/gt: Automate CCS Mode setting during engine resets drm/i915/gt: Fix CCS id's calculation for CCS mode setting drm/i915: Increase FLR timeout from 3s to 9s Angus Chen (1): drm/i915/mtl: Update workaround 14018575942 Arnd Bergmann (1): drm/i915/guc: avoid FIELD_PREP warning Chris Wilson (1): drm/i915/gt: Disarm breadcrumbs if engines are already idle Deming Wang (1): drm/i915/gem/i915_gem_ttm_move: Fix typo Janusz Krzysztofik (1): Revert "drm/i915: Remove extra multi-gt pm-references" John Harrison (1): drm/i915/guc: Enable w/a 16021333562 for DG2, MTL and ARL Niemiec, Krzysztof (1): drm/i915/gt: Delete the live_hearbeat_fast selftest Tvrtko Ursulin (3): Merge drm/drm-next into drm-intel-gt-next drm/i915: Shadow default engine context image in the context drm/i915: Support replaying GPU hangs with captured context image Ville Syrjälä (3): drm/i915: Fix HAS_REGION() usage in i
Re: [PATCH 5/6] drm/amdgpu: always enable move threshold for BOs
Hi Christian, On 04/06/2024 17:05, Christian König wrote: This should prevent buffer moves when the threshold is reached during CS. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 36 -- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 22 + 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index ec888fc6ead8..9a217932a4fc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -784,7 +784,6 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) .no_wait_gpu = false, .resv = bo->tbo.base.resv }; - uint32_t domain; int r; if (bo->tbo.pin_count) @@ -796,37 +795,28 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) if (p->bytes_moved < p->bytes_moved_threshold && (!bo->tbo.base.dma_buf || list_empty(>tbo.base.dma_buf->attachments))) { + + /* And don't move a CPU_ACCESS_REQUIRED BO to limited +* visible VRAM if we've depleted our allowance to do +* that. +*/ if (!amdgpu_gmc_vram_full_visible(>gmc) && - (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) { - /* And don't move a CPU_ACCESS_REQUIRED BO to limited -* visible VRAM if we've depleted our allowance to do -* that. -*/ - if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) - domain = bo->preferred_domains; - else - domain = bo->allowed_domains; - } else { - domain = bo->preferred_domains; - } - } else { - domain = bo->allowed_domains; + (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) && + p->bytes_moved_vis < p->bytes_moved_vis_threshold) + ctx.move_threshold = p->bytes_moved_vis_threshold - + p->bytes_moved_vis; + else + ctx.move_threshold = p->bytes_moved_vis_threshold - + p->bytes_moved; } -retry: - amdgpu_bo_placement_from_domain(bo, domain); + amdgpu_bo_placement_from_domain(bo, bo->allowed_domains); r = ttm_bo_validate(>tbo, >placement, ); p->bytes_moved += ctx.bytes_moved; if (!amdgpu_gmc_vram_full_visible(>gmc) && amdgpu_res_cpu_visible(adev, bo->tbo.resource)) p->bytes_moved_vis += ctx.bytes_moved; - - if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) { - domain = bo->allowed_domains; - goto retry; - } - return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 8c92065c2d52..cae1a5420c58 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -168,13 +168,23 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ? AMDGPU_PL_PREEMPT : TTM_PL_TT; places[c].flags = 0; - /* -* When GTT is just an alternative to VRAM make sure that we -* only use it as fallback and still try to fill up VRAM first. -*/ + if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM && - !(adev->flags & AMD_IS_APU)) - places[c].flags |= TTM_PL_FLAG_FALLBACK; + !(adev->flags & AMD_IS_APU)) { + /* +* When GTT is just an alternative to VRAM make sure that we +* only use it as fallback and still try to fill up VRAM first. + */ + if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT) + places[c].flags |= TTM_PL_FLAG_FALLBACK; + + /* +* Enable GTT when the threshold of moved bytes is +* reached. This prevents any non essential buffer move +* when the links are already saturated. +*/ + places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD; + } For the APU case I *think* this works, but for discrete I am not sure yet. As a side note and disclaimer, the TTM "resource compatible" logic has a half-life of about one week in my brain until I need to almost re-figure it all out. I don't know if it just me, but I find it really
Re: [PATCH] drm/i915/gt: debugfs: Evaluate forcewake usage within locks
On 10/06/2024 10:24, Nirmoy Das wrote: Hi Andi, On 6/7/2024 4:51 PM, Andi Shyti wrote: The forcewake count and domains listing is multi process critical and the uncore provides a spinlock for such cases. Lock the forcewake evaluation section in the fw_domains_show() debugfs interface. Signed-off-by: Andi Shyti Needs a Fixes tag, below seems to be correct one. Fixes: 9dd4b065446a ("drm/i915/gt: Move pm debug files into a gt aware debugfs") Cc: # v5.6+ Reviewed-by: Nirmoy Das What is the back story here and why would it need backporting? IGT cares about the atomic view of user_forcewake_count and individual domains or what? Regards, Tvrtko Regards, Nirmoy --- drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c index 4fcba42cfe34..0437fd8217e0 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c @@ -71,6 +71,8 @@ static int fw_domains_show(struct seq_file *m, void *data) struct intel_uncore_forcewake_domain *fw_domain; unsigned int tmp; + spin_lock_irq(>lock); + seq_printf(m, "user.bypass_count = %u\n", uncore->user_forcewake_count); @@ -79,6 +81,8 @@ static int fw_domains_show(struct seq_file *m, void *data) intel_uncore_forcewake_domain_to_str(fw_domain->id), READ_ONCE(fw_domain->wake_count)); + spin_unlock_irq(>lock); + return 0; } DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE(fw_domains);
Re: [PATCH] drm/i915/gt: Delete the live_hearbeat_fast selftest
Hi Andi, On 10/06/2024 13:10, Andi Shyti wrote: Hi Tvrtko, On Mon, Jun 10, 2024 at 12:42:31PM +0100, Tvrtko Ursulin wrote: On 03/06/2024 17:20, Niemiec, Krzysztof wrote: The test is trying to push the heartbeat frequency to the limit, which might sometimes fail. Such a failure does not provide valuable information, because it does not indicate that there is something necessarily wrong with either the driver or the hardware. Remove the test to prevent random, unnecessary failures from appearing in CI. Suggested-by: Chris Wilson Signed-off-by: Niemiec, Krzysztof Just a note in passing that comma in the email display name is I believe not RFC 5322 compliant and there might be tools which barf on it(*). If you can put it in double quotes, it would be advisable. yes, we discussed it with Krzysztof, I noticed it right after I submitted the code. Regards, Tvrtko *) Such as my internal pull request generator which uses CPAN's Email::Address::XS. :) If we are in time, we can fix it as Krzysztof Niemiec Sorry about this oversight, It's not a big deal (it isn't the first and only occurence) and no need to do anything more than correct the display name going forward. Regards, Tvrtko
Re: [PATCH] drm/i915/gt: Delete the live_hearbeat_fast selftest
On 03/06/2024 17:20, Niemiec, Krzysztof wrote: The test is trying to push the heartbeat frequency to the limit, which might sometimes fail. Such a failure does not provide valuable information, because it does not indicate that there is something necessarily wrong with either the driver or the hardware. Remove the test to prevent random, unnecessary failures from appearing in CI. Suggested-by: Chris Wilson Signed-off-by: Niemiec, Krzysztof Just a note in passing that comma in the email display name is I believe not RFC 5322 compliant and there might be tools which barf on it(*). If you can put it in double quotes, it would be advisable. Regards, Tvrtko *) Such as my internal pull request generator which uses CPAN's Email::Address::XS. :) --- .../drm/i915/gt/selftest_engine_heartbeat.c | 110 -- 1 file changed, 110 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c index ef014df4c4fc..9e4f0e417b3b 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c @@ -193,115 +193,6 @@ static int live_idle_pulse(void *arg) return err; } -static int cmp_u32(const void *_a, const void *_b) -{ - const u32 *a = _a, *b = _b; - - return *a - *b; -} - -static int __live_heartbeat_fast(struct intel_engine_cs *engine) -{ - const unsigned int error_threshold = max(2u, jiffies_to_usecs(6)); - struct intel_context *ce; - struct i915_request *rq; - ktime_t t0, t1; - u32 times[5]; - int err; - int i; - - ce = intel_context_create(engine); - if (IS_ERR(ce)) - return PTR_ERR(ce); - - intel_engine_pm_get(engine); - - err = intel_engine_set_heartbeat(engine, 1); - if (err) - goto err_pm; - - for (i = 0; i < ARRAY_SIZE(times); i++) { - do { - /* Manufacture a tick */ - intel_engine_park_heartbeat(engine); - GEM_BUG_ON(engine->heartbeat.systole); - engine->serial++; /* pretend we are not idle! */ - intel_engine_unpark_heartbeat(engine); - - flush_delayed_work(>heartbeat.work); - if (!delayed_work_pending(>heartbeat.work)) { - pr_err("%s: heartbeat %d did not start\n", - engine->name, i); - err = -EINVAL; - goto err_pm; - } - - rcu_read_lock(); - rq = READ_ONCE(engine->heartbeat.systole); - if (rq) - rq = i915_request_get_rcu(rq); - rcu_read_unlock(); - } while (!rq); - - t0 = ktime_get(); - while (rq == READ_ONCE(engine->heartbeat.systole)) - yield(); /* work is on the local cpu! */ - t1 = ktime_get(); - - i915_request_put(rq); - times[i] = ktime_us_delta(t1, t0); - } - - sort(times, ARRAY_SIZE(times), sizeof(times[0]), cmp_u32, NULL); - - pr_info("%s: Heartbeat delay: %uus [%u, %u]\n", - engine->name, - times[ARRAY_SIZE(times) / 2], - times[0], - times[ARRAY_SIZE(times) - 1]); - - /* -* Ideally, the upper bound on min work delay would be something like -* 2 * 2 (worst), +1 for scheduling, +1 for slack. In practice, we -* are, even with system_wq_highpri, at the mercy of the CPU scheduler -* and may be stuck behind some slow work for many millisecond. Such -* as our very own display workers. -*/ - if (times[ARRAY_SIZE(times) / 2] > error_threshold) { - pr_err("%s: Heartbeat delay was %uus, expected less than %dus\n", - engine->name, - times[ARRAY_SIZE(times) / 2], - error_threshold); - err = -EINVAL; - } - - reset_heartbeat(engine); -err_pm: - intel_engine_pm_put(engine); - intel_context_put(ce); - return err; -} - -static int live_heartbeat_fast(void *arg) -{ - struct intel_gt *gt = arg; - struct intel_engine_cs *engine; - enum intel_engine_id id; - int err = 0; - - /* Check that the heartbeat ticks at the desired rate. */ - if (!CONFIG_DRM_I915_HEARTBEAT_INTERVAL) - return 0; - - for_each_engine(engine, gt, id) { - err = __live_heartbeat_fast(engine); - if (err) - break; - } - - return err; -} - static int __live_heartbeat_off(struct intel_engine_cs *engine) { int err; @@ -372,7 +263,6 @@ int
Re: [PATCH] drm/v3d: Fix perfmon build error/warning
On 05/06/2024 08:19, Iago Toral wrote: Thanks for looking at ixing this Tvrtko. El mar, 04-06-2024 a las 17:02 +0100, Tvrtko Ursulin escribió: From: Tvrtko Ursulin Move static const array into the source file to fix the "defined but not used" errors. The fix is perhaps not the prettiest due hand crafting the array sizes in v3d_performance_counters.h, but I did add some build time asserts to validate the counts look sensible, so hopefully it is good enough for a quick fix. If we need this to go in ASAP I am fine with this patch as-is, so: Reviewed-by: Iago Toral Quiroga With that said, if we are still in time for a bit of iteration may I suggest that instead of hard-coding the counters we instead add helper functions in drivers/gpu/drm/v3d/v3d_perfmon.c that call ARRAY_SIZE on the corresponding array based on v3d->ver? It is fine if we prefer to merge this as-is and do this change later as a follow-up patch. I agree it isn't pretty and I was (and am) planning to see if things can be improved. The reason I gave up on a prettier solution in the original attempt is the the fact one array is statically sized (at build time) based on the max number of counters: /* Number of perfmons required to handle all supported performance counters */ #define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \ DRM_V3D_MAX_PERF_COUNTERS) struct v3d_performance_query { /* Performance monitor IDs for this query */ u32 kperfmon_ids[V3D_MAX_PERFMONS]; So need to see how to untangle that and then perhaps even go a step further than the getters but move the whole perfmon init into v3d_perfmon.c. Regards, Tvrtko Iago Signed-off-by: Tvrtko Ursulin Fixes: 3cbcbe016c31 ("drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202405211137.huefklkg-...@intel.com/Cc : Maíra Canal Cc: Iago Toral Quiroga Cc: Jani Nikula Cc: Ashutosh Dixit --- drivers/gpu/drm/v3d/v3d_drv.c | 4 +- drivers/gpu/drm/v3d/v3d_drv.h | 3 - drivers/gpu/drm/v3d/v3d_perfmon.c | 204 +- .../gpu/drm/v3d/v3d_performance_counters.h | 189 +--- 4 files changed, 205 insertions(+), 195 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index f7477488b1cc..a47f00b443d3 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -299,9 +299,9 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) WARN_ON(v3d->cores > 1); /* multicore not yet implemented */ if (v3d->ver >= 71) - v3d->max_counters = ARRAY_SIZE(v3d_v71_performance_counters); + v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS; else if (v3d->ver >= 42) - v3d->max_counters = ARRAY_SIZE(v3d_v42_performance_counters); + v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS; else v3d->max_counters = 0; diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 556cbb400ba0..099b962bdfde 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -351,9 +351,6 @@ struct v3d_timestamp_query { struct drm_syncobj *syncobj; }; -/* Maximum number of performance counters supported by any version of V3D */ -#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters) - /* Number of perfmons required to handle all supported performance counters */ #define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \ DRM_V3D_MAX_PERF_COUNTERS) diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index 73e2bb8bdb7f..b7d0b02e1a95 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -9,6 +9,192 @@ #define V3D_PERFMONID_MIN 1 #define V3D_PERFMONID_MAX U32_MAX +static const struct v3d_perf_counter_desc v3d_v42_performance_counters[] = { + {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP] Valid primitives that result in no rendered pixels, for all rendered tiles"}, + {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid primitives for all rendered tiles (primitives may be counted in more than one tile)"}, + {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped quads"}, + {"FEP", "FEP-valid-quads", "[FEP] Valid quads"}, + {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads with no pixels passing the stencil test"}, + {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB] Quads with no pixels passing the Z and stencil tests"}, + {"TLB", "TL
Re: [PATCH v3 6/7] drm/drm_file: add display of driver's internal memory size
On 06/06/2024 02:49, Adrián Larumbe wrote: Some drivers must allocate a considerable amount of memory for bookkeeping structures and GPU's MCU-kernel shared communication regions. These are often created as a result of the invocation of the driver's ioctl() interface functions, so it is sensible to consider them as being owned by the render context associated with an open drm file. However, at the moment drm_show_memory_stats only traverses the UM-exposed drm objects for which a handle exists. Private driver objects and memory regions, though connected to a render context, are unaccounted for in their fdinfo numbers. Add a new drm_memory_stats 'internal' memory category. Because deciding what constitutes an 'internal' object and where to find these are driver-dependent, calculation of this size must be done through a driver-provided function pointer, which becomes the third argument of drm_show_memory_stats. Drivers which have no interest in exposing the size of internal memory objects can keep passing NULL for unaltered behaviour. Signed-off-by: Adrián Larumbe Please Cc people who were previously involved in defining drm-usage-stats.rst. I added Rob, but I am not sure if I forgot someone from the top of my head. Internal as a category sounds potentially useful. One reservation I have though is itdoes not necessarily fit with the others but is something semantically different from them. In i915 I had the similar desire to account for internal objects and have approached it by similarly tracking them outside the DRM idr but counting them under the existing respective categories and memory regions. Ie. internal objects can also be purgeable or not, etc, and can be backed by either system memory or device local memory. Advantage is it is more accurate in those aspect and does not require adding a new category. Downside of this is that 'internal' is bunched with the explicit userspace objects so perhaps less accurate in this other aspect. Regards, Tvrtko --- Documentation/gpu/drm-usage-stats.rst | 4 drivers/gpu/drm/drm_file.c | 9 +++-- drivers/gpu/drm/msm/msm_drv.c | 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +- include/drm/drm_file.h | 7 ++- 5 files changed, 19 insertions(+), 5 deletions(-) diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..0da5ebecd232 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -157,6 +157,10 @@ The total size of buffers that are purgeable. The total size of buffers that are active on one or more engines. +- drm-internal-: [KiB|MiB] + +The total size of GEM objects that aren't exposed to user space. + Implementation Details == diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 638ffaf5..d1c13eed8d34 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -874,9 +874,10 @@ void drm_print_memory_stats(struct drm_printer *p, enum drm_gem_object_status supported_status, const char *region) { - print_size(p, "total", region, stats->private + stats->shared); + print_size(p, "total", region, stats->private + stats->shared + stats->internal); print_size(p, "shared", region, stats->shared); print_size(p, "active", region, stats->active); + print_size(p, "internal", region, stats->internal); if (supported_status & DRM_GEM_OBJECT_RESIDENT) print_size(p, "resident", region, stats->resident); @@ -890,11 +891,12 @@ EXPORT_SYMBOL(drm_print_memory_stats); * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats * @p: the printer to print output to * @file: the DRM file + * @func: driver-specific function pointer to count the size of internal objects * * Helper to iterate over GEM objects with a handle allocated in the specified * file. */ -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, internal_bos func) { struct drm_gem_object *obj; struct drm_memory_stats status = {}; @@ -940,6 +942,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } spin_unlock(>table_lock); + if (func) + func(, file); + drm_print_memory_stats(p, , supported_status, "memory"); } EXPORT_SYMBOL(drm_show_memory_stats); diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 9c33f4e3f822..f97d3cdc4f50 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file) msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p); - drm_show_memory_stats(p,
[PATCH] drm/v3d: Fix perfmon build error/warning
From: Tvrtko Ursulin Move static const array into the source file to fix the "defined but not used" errors. The fix is perhaps not the prettiest due hand crafting the array sizes in v3d_performance_counters.h, but I did add some build time asserts to validate the counts look sensible, so hopefully it is good enough for a quick fix. Signed-off-by: Tvrtko Ursulin Fixes: 3cbcbe016c31 ("drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202405211137.huefklkg-...@intel.com/Cc: Maíra Canal Cc: Iago Toral Quiroga Cc: Jani Nikula Cc: Ashutosh Dixit --- drivers/gpu/drm/v3d/v3d_drv.c | 4 +- drivers/gpu/drm/v3d/v3d_drv.h | 3 - drivers/gpu/drm/v3d/v3d_perfmon.c | 204 +- .../gpu/drm/v3d/v3d_performance_counters.h| 189 +--- 4 files changed, 205 insertions(+), 195 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index f7477488b1cc..a47f00b443d3 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -299,9 +299,9 @@ static int v3d_platform_drm_probe(struct platform_device *pdev) WARN_ON(v3d->cores > 1); /* multicore not yet implemented */ if (v3d->ver >= 71) - v3d->max_counters = ARRAY_SIZE(v3d_v71_performance_counters); + v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS; else if (v3d->ver >= 42) - v3d->max_counters = ARRAY_SIZE(v3d_v42_performance_counters); + v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS; else v3d->max_counters = 0; diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index 556cbb400ba0..099b962bdfde 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -351,9 +351,6 @@ struct v3d_timestamp_query { struct drm_syncobj *syncobj; }; -/* Maximum number of performance counters supported by any version of V3D */ -#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters) - /* Number of perfmons required to handle all supported performance counters */ #define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \ DRM_V3D_MAX_PERF_COUNTERS) diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index 73e2bb8bdb7f..b7d0b02e1a95 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -9,6 +9,192 @@ #define V3D_PERFMONID_MIN 1 #define V3D_PERFMONID_MAX U32_MAX +static const struct v3d_perf_counter_desc v3d_v42_performance_counters[] = { + {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP] Valid primitives that result in no rendered pixels, for all rendered tiles"}, + {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid primitives for all rendered tiles (primitives may be counted in more than one tile)"}, + {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped quads"}, + {"FEP", "FEP-valid-quads", "[FEP] Valid quads"}, + {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads with no pixels passing the stencil test"}, + {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB] Quads with no pixels passing the Z and stencil tests"}, + {"TLB", "TLB-quads-passing-z-and-stencil-test", "[TLB] Quads with any pixels passing the Z and stencil tests"}, + {"TLB", "TLB-quads-with-zero-coverage", "[TLB] Quads with all pixels having zero coverage"}, + {"TLB", "TLB-quads-with-non-zero-coverage", "[TLB] Quads with any pixels having non-zero coverage"}, + {"TLB", "TLB-quads-written-to-color-buffer", "[TLB] Quads with valid pixels written to colour buffer"}, + {"PTB", "PTB-primitives-discarded-outside-viewport", "[PTB] Primitives discarded by being outside the viewport"}, + {"PTB", "PTB-primitives-need-clipping", "[PTB] Primitives that need clipping"}, + {"PTB", "PTB-primitives-discarded-reversed", "[PTB] Primitives that are discarded because they are reversed"}, + {"QPU", "QPU-total-idle-clk-cycles", "[QPU] Total idle clock cycles for all QPUs"}, + {"QPU", "QPU-total-active-clk-cycles-vertex-coord-shading", "[QPU] Total active clock cycles for all QPUs doing vertex/coordinate/user shading (counts only when QPU is not stalled)"}, + {"QPU", "QPU-total-ac
Re: [PATCH 2/2] drm/amdgpu: Use drm_print_memory_stats helper from fdinfo
Hi, On 20/05/2024 12:13, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Convert fdinfo memory stats to use the common drm_print_memory_stats helper. This achieves alignment with the common keys as documented in drm-usage-stats.rst, adding specifically drm-total- key the driver was missing until now. Additionally I made the code stop skipping total size for objects which currently do not have a backing store, and I added resident, active and purgeable reporting. Legacy keys have been preserved, with the outlook of only potentially removing only the drm-memory- when the time gets right. The example output now looks like this: pos: 0 flags:0212 mnt_id: 24 ino: 1239 drm-driver: amdgpu drm-client-id:4 drm-pdev: :04:00.0 pasid:32771 drm-total-cpu:0 drm-shared-cpu: 0 drm-active-cpu: 0 drm-resident-cpu: 0 drm-purgeable-cpu:0 drm-total-gtt:2392 KiB drm-shared-gtt: 0 drm-active-gtt: 0 drm-resident-gtt: 2392 KiB drm-purgeable-gtt:0 drm-total-vram: 44564 KiB drm-shared-vram: 31952 KiB drm-active-vram: 0 drm-resident-vram:44564 KiB drm-purgeable-vram: 0 drm-memory-vram: 44564 KiB drm-memory-gtt: 2392 KiB drm-memory-cpu: 0 KiB amd-memory-visible-vram: 44564 KiB amd-evicted-vram: 0 KiB amd-evicted-visible-vram: 0 KiB amd-requested-vram: 44564 KiB amd-requested-visible-vram: 11952 KiB amd-requested-gtt:2392 KiB drm-engine-compute: 46464671 ns v2: * Track purgeable via AMDGPU_GEM_CREATE_DISCARDABLE. Any interest on this work from AMD side? As a summary it adds active and purgeable reporting and converts to using the drm_print_memory_stats helper for outputting all the fields as documented in drm-usage-stats.rst. Regards, Tvrtko Signed-off-by: Tvrtko Ursulin Cc: Alex Deucher Cc: Christian König Cc: Daniel Vetter Cc: Rob Clark --- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 48 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 96 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 35 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 +- 6 files changed, 122 insertions(+), 81 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c index c7df7fa3459f..00a4ab082459 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c @@ -59,18 +59,21 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) struct amdgpu_fpriv *fpriv = file->driver_priv; struct amdgpu_vm *vm = >vm; - struct amdgpu_mem_stats stats; + struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST + 1] = { }; ktime_t usage[AMDGPU_HW_IP_NUM]; - unsigned int hw_ip; + const char *pl_name[] = { + [TTM_PL_VRAM] = "vram", + [TTM_PL_TT] = "gtt", + [TTM_PL_SYSTEM] = "cpu", + }; + unsigned int hw_ip, i; int ret; - memset(, 0, sizeof(stats)); - ret = amdgpu_bo_reserve(vm->root.bo, false); if (ret) return; - amdgpu_vm_get_memory(vm, ); + amdgpu_vm_get_memory(vm, stats, ARRAY_SIZE(stats)); amdgpu_bo_unreserve(vm->root.bo); amdgpu_ctx_mgr_usage(>ctx_mgr, usage); @@ -82,24 +85,35 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) */ drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid); - drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL); - drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL); - drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL); + + for (i = 0; i < TTM_PL_PRIV; i++) + drm_print_memory_stats(p, + [i].drm, + DRM_GEM_OBJECT_RESIDENT | + DRM_GEM_OBJECT_PURGEABLE, + pl_name[i]); + + /* Legacy amdgpu keys, alias to drm-resident-memory-: */ + drm_printf(p, "drm-memory-vram:\t%llu KiB\n", + stats[TTM_PL_VRAM].total/1024UL); + drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", + stats[TTM_PL_TT].total/1024UL); + drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", + stats[TTM_PL_SYSTEM].total/1024UL); + + /* Amdgpu specific memory accounting keys: */ drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n", - stats.visible_vram/1024UL); + stats[TTM_PL_VRAM].visible/1024UL); drm_printf(p, &
Re: [RFC v2 0/2] Discussion around eviction improvements
Hi, On 16/05/2024 13:18, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Reduced re-spin of my previous series after Christian corrected a few misconceptions that I had. So lets see if what remains makes sense or is still misguided. To summarise, the series address the following two issues: * Migration rate limiting does not work, at least not for the common case where userspace configures VRAM+GTT. It thinks it can stop migration attempts by playing with bo->allowed_domains vs bo->preferred domains but, both from the code, and from empirical experiments, I see that not working at all. When both masks are identical fiddling with them achieves nothing. Even when they are not identical allowed has a fallback GTT placement which means that when over the migration budget ttm_bo_validate with bo->allowed_domains can cause migration from GTT to VRAM. * Driver thinks it will be re-validating evicted buffers on the next submission but it does not for the very common case of VRAM+GTT because it only checks if current placement is *none* of the preferred placements. These two patches appear to have a positive result for a memory intensive game like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting the migration budget appears to keep buffer blits at bay and improves the minimum frame rate, ie. makes things smoother. From the game's built-in benchmark, average of three runs each: FPS migrated KiBmin avg max min-1% min-0.1% because 20784781 10.00 37.00 89.6722.0012.33 patched 4227688 13.67 37.00 81.3323.3315.00 Any feedback on this series? As described above, neither migration rate limiting or re-validation of evicted buffers seems to work as expected currently. Regards, Tvrtko Disclaimers that I have is that more runs would be needed to be more confident about the results. And more games. And APU versus discrete. Cc: Christian König Cc: Friedrich Vock Tvrtko Ursulin (2): drm/amdgpu: Re-validate evicted buffers drm/amdgpu: Actually respect buffer migration budget drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 - 2 files changed, 103 insertions(+), 30 deletions(-)
[PATCH] drm/i915: 2 GiB of relocations ought to be enough for anybody*
From: Tvrtko Ursulin Kernel test robot reports i915 can hit a warn in kvmalloc_node which has a purpose of dissalowing crazy size kernel allocations. This was added in 7661809d493b ("mm: don't allow oversized kvmalloc() calls"): /* Don't even allow crazy sizes */ if (WARN_ON_ONCE(size > INT_MAX)) return NULL; This would be kind of okay since i915 at one point dropped the need for making a shadow copy of the relocation list, but then it got re-added in fd1500fcd442 ("Revert "drm/i915/gem: Drop relocation slowpath".") a year after Linus added the above warning. It is plausible that the issue was not seen until now because to trigger gem_exec_reloc test requires a combination of an relatively older generation hardware but with at least 8GiB of RAM installed. Probably even more depending on runtime checks. Lets cap what we allow userspace to pass in using the matching limit. There should be no issue for real userspace since we are talking about "crazy" number of relocations which have no practical purpose. *) Well IGT tests might get upset but they can be easily adjusted. Signed-off-by: Tvrtko Ursulin Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202405151008.6ddd1aaf-oliver.s...@intel.com Cc: Kees Cook Cc: Kent Overstreet Cc: Joonas Lahtinen Cc: Rodrigo Vivi --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index d3a771afb083..4b34bf4fde77 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1533,7 +1533,7 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev) u64_to_user_ptr(entry->relocs_ptr); unsigned long remain = entry->relocation_count; - if (unlikely(remain > N_RELOC(ULONG_MAX))) + if (unlikely(remain > N_RELOC(INT_MAX))) return -EINVAL; /* @@ -1641,7 +1641,7 @@ static int check_relocations(const struct drm_i915_gem_exec_object2 *entry) if (size == 0) return 0; - if (size > N_RELOC(ULONG_MAX)) + if (size > N_RELOC(INT_MAX)) return -EINVAL; addr = u64_to_user_ptr(entry->relocs_ptr); -- 2.44.0
[PATCH 2/2] drm/amdgpu: Use drm_print_memory_stats helper from fdinfo
From: Tvrtko Ursulin Convert fdinfo memory stats to use the common drm_print_memory_stats helper. This achieves alignment with the common keys as documented in drm-usage-stats.rst, adding specifically drm-total- key the driver was missing until now. Additionally I made the code stop skipping total size for objects which currently do not have a backing store, and I added resident, active and purgeable reporting. Legacy keys have been preserved, with the outlook of only potentially removing only the drm-memory- when the time gets right. The example output now looks like this: pos: 0 flags: 0212 mnt_id:24 ino: 1239 drm-driver:amdgpu drm-client-id: 4 drm-pdev: :04:00.0 pasid: 32771 drm-total-cpu: 0 drm-shared-cpu:0 drm-active-cpu:0 drm-resident-cpu: 0 drm-purgeable-cpu: 0 drm-total-gtt: 2392 KiB drm-shared-gtt:0 drm-active-gtt:0 drm-resident-gtt: 2392 KiB drm-purgeable-gtt: 0 drm-total-vram:44564 KiB drm-shared-vram: 31952 KiB drm-active-vram: 0 drm-resident-vram: 44564 KiB drm-purgeable-vram:0 drm-memory-vram: 44564 KiB drm-memory-gtt:2392 KiB drm-memory-cpu:0 KiB amd-memory-visible-vram: 44564 KiB amd-evicted-vram: 0 KiB amd-evicted-visible-vram: 0 KiB amd-requested-vram:44564 KiB amd-requested-visible-vram:11952 KiB amd-requested-gtt: 2392 KiB drm-engine-compute:46464671 ns v2: * Track purgeable via AMDGPU_GEM_CREATE_DISCARDABLE. Signed-off-by: Tvrtko Ursulin Cc: Alex Deucher Cc: Christian König Cc: Daniel Vetter Cc: Rob Clark --- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 48 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 96 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 35 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 +- 6 files changed, 122 insertions(+), 81 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c index c7df7fa3459f..00a4ab082459 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c @@ -59,18 +59,21 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) struct amdgpu_fpriv *fpriv = file->driver_priv; struct amdgpu_vm *vm = >vm; - struct amdgpu_mem_stats stats; + struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST + 1] = { }; ktime_t usage[AMDGPU_HW_IP_NUM]; - unsigned int hw_ip; + const char *pl_name[] = { + [TTM_PL_VRAM] = "vram", + [TTM_PL_TT] = "gtt", + [TTM_PL_SYSTEM] = "cpu", + }; + unsigned int hw_ip, i; int ret; - memset(, 0, sizeof(stats)); - ret = amdgpu_bo_reserve(vm->root.bo, false); if (ret) return; - amdgpu_vm_get_memory(vm, ); + amdgpu_vm_get_memory(vm, stats, ARRAY_SIZE(stats)); amdgpu_bo_unreserve(vm->root.bo); amdgpu_ctx_mgr_usage(>ctx_mgr, usage); @@ -82,24 +85,35 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) */ drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid); - drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL); - drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL); - drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL); + + for (i = 0; i < TTM_PL_PRIV; i++) + drm_print_memory_stats(p, + [i].drm, + DRM_GEM_OBJECT_RESIDENT | + DRM_GEM_OBJECT_PURGEABLE, + pl_name[i]); + + /* Legacy amdgpu keys, alias to drm-resident-memory-: */ + drm_printf(p, "drm-memory-vram:\t%llu KiB\n", + stats[TTM_PL_VRAM].total/1024UL); + drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", + stats[TTM_PL_TT].total/1024UL); + drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", + stats[TTM_PL_SYSTEM].total/1024UL); + + /* Amdgpu specific memory accounting keys: */ drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n", - stats.visible_vram/1024UL); + stats[TTM_PL_VRAM].visible/1024UL); drm_printf(p, "amd-evicted-vram:\t%llu KiB\n", - stats.evicted_vram/1024UL); + stats[TTM_PL_VRAM].evicted/1024UL); drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n", - stats.evicted_visible_vram/1024UL); + stats[TTM_PL_VRAM].evicted_visible/1024UL);
[PATCH 1/2] Documentation/gpu: Document the situation with unqualified drm-memory-
From: Tvrtko Ursulin Currently it is not well defined what is drm-memory- compared to other categories. In practice the only driver which emits these keys is amdgpu and in them exposes the current resident buffer object memory (including shared). To prevent any confusion, document that drm-memory- is deprecated and an alias for drm-resident-memory-. While at it also clarify that the reserved sub-string 'memory' refers to the memory region component, and also clarify the intended semantics of other memory categories. v2: * Also mark drm-memory- as deprecated. * Add some more text describing memory categories. (Alex) v3: * Semantics of the amdgpu drm-memory is actually as drm-resident. Signed-off-by: Tvrtko Ursulin Cc: Alex Deucher Cc: Christian König Cc: Rob Clark --- Documentation/gpu/drm-usage-stats.rst | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..45d9b76a5748 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -128,7 +128,9 @@ Memory Each possible memory type which can be used to store buffer objects by the GPU in question shall be given a stable and unique name to be returned as the -string here. The name "memory" is reserved to refer to normal system memory. +string here. + +The region name "memory" is reserved to refer to normal system memory. Value shall reflect the amount of storage currently consumed by the buffer objects belong to this client, in the respective memory region. @@ -136,6 +138,9 @@ objects belong to this client, in the respective memory region. Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' indicating kibi- or mebi-bytes. +This key is deprecated and is an alias for drm-resident-. Only one of +the two should be present in the output. + - drm-shared-: [KiB|MiB] The total size of buffers that are shared with another file (e.g., have more @@ -143,20 +148,34 @@ than a single handle). - drm-total-: [KiB|MiB] -The total size of buffers that including shared and private memory. +The total size of all created buffers including shared and private memory. The +backing store for the buffers does not have to be currently instantiated to be +counted under this category. - drm-resident-: [KiB|MiB] -The total size of buffers that are resident in the specified region. +The total size of buffers that are resident (have their backing store present or +instantiated) in the specified region. + +This is an alias for drm-memory- and only one of the two should be +present in the output. - drm-purgeable-: [KiB|MiB] The total size of buffers that are purgeable. +For example drivers which implement a form of 'madvise' like functionality can +here count buffers which have instantiated backing store, but have been marked +with an equivalent of MADV_DONTNEED. + - drm-active-: [KiB|MiB] The total size of buffers that are active on one or more engines. +One practical example of this can be presence of unsignaled fences in an GEM +buffer reservation object. Therefore the active category is a subset of +resident. + Implementation Details == -- 2.44.0
Re: [RFC v2 0/2] Discussion around eviction improvements
On 16/05/2024 20:21, Alex Deucher wrote: On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin wrote: From: Tvrtko Ursulin Reduced re-spin of my previous series after Christian corrected a few misconceptions that I had. So lets see if what remains makes sense or is still misguided. To summarise, the series address the following two issues: * Migration rate limiting does not work, at least not for the common case where userspace configures VRAM+GTT. It thinks it can stop migration attempts by playing with bo->allowed_domains vs bo->preferred domains but, both from the code, and from empirical experiments, I see that not working at all. When both masks are identical fiddling with them achieves nothing. Even when they are not identical allowed has a fallback GTT placement which means that when over the migration budget ttm_bo_validate with bo->allowed_domains can cause migration from GTT to VRAM. * Driver thinks it will be re-validating evicted buffers on the next submission but it does not for the very common case of VRAM+GTT because it only checks if current placement is *none* of the preferred placements. For APUs at least, we should never migrate because GTT and VRAM are both system memory so are effectively equal performance-wise. Maybe I was curious about this but thought there could be a reason why VRAM carve-out is a fix small-ish size. It cannot be made 1:1 with RAM or some other solution? this regressed when Christian reworked ttm to better handle migrating buffers back to VRAM after suspend on dGPUs? I will leave this to Christian to answer but for what this series is concerned I'd say it is orthogonal to that. Here we have two fixes not limited to APU use cases, just so it happens fixing the migration throttling improves things there too. And that even despite the first patch which triggering *more* migration attempts. Because the second patch then correctly curbs them. First patch should help with transient overcommit on discrete, allowing things get back into VRAM as soon as there is space. Second patch tries to makes migration throttling work as intended. Volunteers for testing on discrete? :) These two patches appear to have a positive result for a memory intensive game like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting the migration budget appears to keep buffer blits at bay and improves the minimum frame rate, ie. makes things smoother. From the game's built-in benchmark, average of three runs each: FPS migrated KiBmin avg max min-1% min-0.1% because 20784781 10.00 37.00 89.6722.0012.33 patched 4227688 13.67 37.00 81.3323.3315.00 Hmm! s/because/before/ here obviously! Regards, Tvrtko Disclaimers that I have is that more runs would be needed to be more confident about the results. And more games. And APU versus discrete. Cc: Christian König Cc: Friedrich Vock Tvrtko Ursulin (2): drm/amdgpu: Re-validate evicted buffers drm/amdgpu: Actually respect buffer migration budget drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 - 2 files changed, 103 insertions(+), 30 deletions(-) -- 2.44.0
[RFC v2 0/2] Discussion around eviction improvements
From: Tvrtko Ursulin Reduced re-spin of my previous series after Christian corrected a few misconceptions that I had. So lets see if what remains makes sense or is still misguided. To summarise, the series address the following two issues: * Migration rate limiting does not work, at least not for the common case where userspace configures VRAM+GTT. It thinks it can stop migration attempts by playing with bo->allowed_domains vs bo->preferred domains but, both from the code, and from empirical experiments, I see that not working at all. When both masks are identical fiddling with them achieves nothing. Even when they are not identical allowed has a fallback GTT placement which means that when over the migration budget ttm_bo_validate with bo->allowed_domains can cause migration from GTT to VRAM. * Driver thinks it will be re-validating evicted buffers on the next submission but it does not for the very common case of VRAM+GTT because it only checks if current placement is *none* of the preferred placements. These two patches appear to have a positive result for a memory intensive game like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting the migration budget appears to keep buffer blits at bay and improves the minimum frame rate, ie. makes things smoother. >From the game's built-in benchmark, average of three runs each: FPS migrated KiBmin avg max min-1% min-0.1% because 20784781 10.00 37.00 89.6722.0012.33 patched 4227688 13.67 37.00 81.3323.3315.00 Disclaimers that I have is that more runs would be needed to be more confident about the results. And more games. And APU versus discrete. Cc: Christian König Cc: Friedrich Vock Tvrtko Ursulin (2): drm/amdgpu: Re-validate evicted buffers drm/amdgpu: Actually respect buffer migration budget drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 - 2 files changed, 103 insertions(+), 30 deletions(-) -- 2.44.0
[RFC 2/2] drm/amdgpu: Actually respect buffer migration budget
From: Tvrtko Ursulin Current code appears to live in a misconception that playing with buffer allowed and preferred placements can always control the decision on whether backing store migration will be attempted or not. That is however not the case when userspace sets buffer placements of VRAM+GTT, which is what radv does since commit 862b6a9a ("radv: Improve spilling on discrete GPUs."), with the end result of completely ignoring the migration budget. Fix this by validating against a local singleton placement set to the current backing store location. This way, when migration budget has been depleted, we can prevent ttm_bo_validate from seeing any other than the current placement. For the case of implicit GTT allowed domain added in amdgpu_bo_create when userspace only sets VRAM the behaviour should be the same. On the first pass the re-validation will attempt to migrate away from the fallback GTT domain, and if that did not succeed the buffer will remain in the fallback placement. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++-- 1 file changed, 85 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index ec888fc6ead8..08e7631f3a2e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -32,6 +32,7 @@ #include #include +#include #include #include "amdgpu_cs.h" @@ -775,6 +776,56 @@ void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes, spin_unlock(>mm_stats.lock); } +static bool +amdgpu_cs_bo_move_under_budget(struct amdgpu_cs_parser *p, + struct amdgpu_bo *abo) +{ + struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev); + + /* +* Don't move this buffer if we have depleted our allowance +* to move it. Don't move anything if the threshold is zero. +*/ + if (p->bytes_moved >= p->bytes_moved_threshold) + return false; + + if ((!abo->tbo.base.dma_buf || +list_empty(>tbo.base.dma_buf->attachments)) && + (!amdgpu_gmc_vram_full_visible(>gmc) && +(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) && + p->bytes_moved_vis >= p->bytes_moved_vis_threshold) { + /* +* And don't move a CPU_ACCESS_REQUIRED BO to limited +* visible VRAM if we've depleted our allowance to do +* that. +*/ + return false; + } + + return true; +} + +static bool +amdgpu_bo_fill_current_placement(struct amdgpu_bo *abo, +struct ttm_placement *placement, +struct ttm_place *place) +{ + struct ttm_placement *bo_placement = >placement; + int i; + + for (i = 0; i < bo_placement->num_placement; i++) { + if (bo_placement->placement[i].mem_type == + abo->tbo.resource->mem_type) { + *place = bo_placement->placement[i]; + placement->num_placement = 1; + placement->placement = place; + return true; + } + } + + return false; +} + static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); @@ -784,46 +835,53 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) .no_wait_gpu = false, .resv = bo->tbo.base.resv }; - uint32_t domain; + bool allow_move; int r; if (bo->tbo.pin_count) return 0; - /* Don't move this buffer if we have depleted our allowance -* to move it. Don't move anything if the threshold is zero. -*/ - if (p->bytes_moved < p->bytes_moved_threshold && - (!bo->tbo.base.dma_buf || - list_empty(>tbo.base.dma_buf->attachments))) { - if (!amdgpu_gmc_vram_full_visible(>gmc) && - (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) { - /* And don't move a CPU_ACCESS_REQUIRED BO to limited -* visible VRAM if we've depleted our allowance to do -* that. -*/ - if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) - domain = bo->preferred_domains; - else - domain = bo->allowed_domains; - } else { - domain = bo->preferred_domains; - } -
[RFC 1/2] drm/amdgpu: Re-validate evicted buffers
From: Tvrtko Ursulin Currently the driver appears to be thinking that it will be attempting to re-validate the evicted buffers on the next submission if they are not in their preferred placement. That however appears not to be true for the very common case of buffers with allowed placements of VRAM+GTT. Simply because the check can only detect if the current placement is *none* of the preferred ones, happily leaving VRAM+GTT buffers in the GTT placement "forever". Fix it by extending the VRAM+GTT special case to the re-validation logic. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 6bddd43604bc..e53ff914b62e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1248,10 +1248,25 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, * next command submission. */ if (amdgpu_vm_is_bo_always_valid(vm, bo)) { - uint32_t mem_type = bo->tbo.resource->mem_type; + unsigned current_domain = + amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type); + bool move_to_evict = false; - if (!(bo->preferred_domains & - amdgpu_mem_type_to_domain(mem_type))) + if (!(bo->preferred_domains & current_domain)) { + move_to_evict = true; + } else if ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK) == + (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) && + current_domain != AMDGPU_GEM_DOMAIN_VRAM) { + /* +* If userspace has provided a list of possible +* placements equal to VRAM+GTT, we assume VRAM is *the* +* preferred placement and so try to move it back there +* on the next submission. +*/ + move_to_evict = true; + } + + if (move_to_evict) amdgpu_vm_bo_evicted(_va->base); else amdgpu_vm_bo_idle(_va->base); -- 2.44.0
Re: [PATCH v4 8/8] drm/xe/client: Print runtime to fdinfo
On 15/05/2024 22:42, Lucas De Marchi wrote: Print the accumulated runtime for client when printing fdinfo. Each time a query is done it first does 2 things: 1) loop through all the exec queues for the current client and accumulate the runtime, per engine class. CTX_TIMESTAMP is used for that, being read from the context image. 2) Read a "GPU timestamp" that can be used for considering "how much GPU time has passed" and that has the same unit/refclock as the one recording the runtime. RING_TIMESTAMP is used for that via MMIO. Since for all current platforms RING_TIMESTAMP follows the same refclock, just read it once, using any first engine available. This is exported to userspace as 2 numbers in fdinfo: drm-cycles-: drm-total-cycles-: Userspace is expected to collect at least 2 samples, which allows to know the client engine busyness as per: RUNTIME1 - RUNTIME0 busyness = - T1 - T0 Since drm-cycles- always starts at 0, it's also possible to know if and engine was ever used by a client. It's expected that userspace will read any 2 samples every few seconds. Given the update frequency of the counters involved and that CTX_TIMESTAMP is 32-bits, the counter for each exec_queue can wrap around (assuming 100% utilization) after ~200s. The wraparound is not perceived by userspace since it's just accumulated for all the exec_queues in a 64-bit counter) but the measurement will not be accurate if the samples are too far apart. This could be mitigated by adding a workqueue to accumulate the counters every so often, but it's additional complexity for something that is done already by userspace every few seconds in tools like gputop (from igt), htop, nvtop, etc, with none of them really defaulting to 1 sample per minute or more. Signed-off-by: Lucas De Marchi --- Documentation/gpu/drm-usage-stats.rst | 21 +++- Documentation/gpu/xe/index.rst | 1 + Documentation/gpu/xe/xe-drm-usage-stats.rst | 10 ++ drivers/gpu/drm/xe/xe_drm_client.c | 121 +++- 4 files changed, 150 insertions(+), 3 deletions(-) create mode 100644 Documentation/gpu/xe/xe-drm-usage-stats.rst diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..a80f95ca1b2f 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -112,6 +112,19 @@ larger value within a reasonable period. Upon observing a value lower than what was previously read, userspace is expected to stay with that larger previous value until a monotonic update is seen. +- drm-total-cycles-: + +Engine identifier string must be the same as the one specified in the +drm-cycles- tag and shall contain the total number cycles for the given +engine. + +This is a timestamp in GPU unspecified unit that matches the update rate +of drm-cycles-. For drivers that implement this interface, the engine +utilization can be calculated entirely on the GPU clock domain, without +considering the CPU sleep time between 2 samples. + +A driver may implement either this key or drm-maxfreq-, but not both. + - drm-maxfreq-: [Hz|MHz|KHz] Engine identifier string must be the same as the one specified in the @@ -121,6 +134,9 @@ percentage utilization of the engine, whereas drm-engine- only reflects time active without considering what frequency the engine is operating as a percentage of its maximum frequency. +A driver may implement either this key or drm-total-cycles-, but not +both. + For the spec part: Acked-by: Tvrtko Ursulin Some minor comments and questions below. Memory ^^ @@ -168,5 +184,6 @@ be documented above and where possible, aligned with other drivers. Driver specific implementations --- -:ref:`i915-usage-stats` -:ref:`panfrost-usage-stats` +* :ref:`i915-usage-stats` +* :ref:`panfrost-usage-stats` +* :ref:`xe-usage-stats` diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst index c224ecaee81e..3f07aa3b5432 100644 --- a/Documentation/gpu/xe/index.rst +++ b/Documentation/gpu/xe/index.rst @@ -23,3 +23,4 @@ DG2, etc is provided to prototype the driver. xe_firmware xe_tile xe_debugging + xe-drm-usage-stats.rst diff --git a/Documentation/gpu/xe/xe-drm-usage-stats.rst b/Documentation/gpu/xe/xe-drm-usage-stats.rst new file mode 100644 index ..482d503ae68a --- /dev/null +++ b/Documentation/gpu/xe/xe-drm-usage-stats.rst @@ -0,0 +1,10 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +.. _xe-usage-stats: + + +Xe DRM client usage stats implementation + + +.. kernel-doc:: drivers/gpu/drm/xe/xe_drm_client.c + :doc: DRM Client usage stats diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.
Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget
On 15/05/2024 15:31, Christian König wrote: Am 15.05.24 um 12:59 schrieb Tvrtko Ursulin: On 15/05/2024 08:20, Christian König wrote: Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Current code appears to live in a misconception that playing with buffer allowed and preferred placements can control the decision on whether backing store migration will be attempted or not. Both from code inspection and from empirical experiments I see that not being true, and that both allowed and preferred placement are typically set to the same bitmask. That's not correct for the use case handled here, but see below. Which part is not correct, that bo->preferred_domains and bo->allower_domains are the same bitmask? Sorry totally forgot to explain that. This rate limit here was specially made for OpenGL applications which over commit VRAM. In those case preferred_domains will be VRAM only and allowed_domains will be VRAM|GTT. RADV always uses VRAM|GTT for both (which is correct). Got it, thanks! As such, when the code decides to throttle the migration for a client, it is in fact not achieving anything. Buffers can still be either migrated or not migrated based on the external (to this function and facility) logic. Fix it by not changing the buffer object placements if the migration budget has been spent. FIXME: Is it still required to call validate is the question.. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 22708954ae68..d07a1dd7c880 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) .no_wait_gpu = false, .resv = bo->tbo.base.resv }; + bool migration_allowed = true; struct ttm_resource *old_res; uint32_t domain; int r; @@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) * visible VRAM if we've depleted our allowance to do * that. */ - if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) + if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) { domain = bo->preferred_domains; - else + } else { domain = bo->allowed_domains; + migration_allowed = false; + } } else { domain = bo->preferred_domains; } } else { domain = bo->allowed_domains; + migration_allowed = false; } retry: - amdgpu_bo_placement_from_domain(bo, domain); + if (migration_allowed) + amdgpu_bo_placement_from_domain(bo, domain); That's completely invalid. Calling amdgpu_bo_placement_from_domain() is a mandatory prerequisite for calling ttm_bo_validate(); E.g. the usually code fow is: /* This initializes bo->placement */ amdgpu_bo_placement_from_domain() /* Eventually modify bo->placement to fit special requirements */ /* Apply the placement to the BO */ ttm_bo_validate(>tbo, >placement, ) To sum it up bo->placement should be a variable on the stack instead, but we never bothered to clean that up. I am not clear if you agree or not that the current method of trying to avoid migration doesn't really do anything? I totally agree, but the approach you taken to fix it is just quite broken. You can't leave bo->placement uninitialized and expect that ttm_bo_validate() won't move the BO. Yep, that much was clear, sorry that I did not explicitly acknowledge but just moved on to discussing how to fix it properly. On stack placements sounds plausible to force migration avoidance by putting a single current object placement in that list, if that is what you have in mind? Or a specialized flag/version of amdgpu_bo_placement_from_domain with an bool input like "allow_placement_change"? A very rough idea with no guarantee that it actually works: Add a TTM_PL_FLAG_RATE_LIMITED with all the TTM code to actually figure out how many bytes have been moved and how many bytes the current operation can move etc... Friedrich's patches actually looked like quite a step into the right direction for that already, so I would start from there. Then always feed amdgpu_bo_placement_from_domain() with the allowed_domains in the CS path and VM validation. Finally extend amdgpu_bo_placement_from_domain() to take a closer look at bo->preferred_domains, similar to how we do for the TTM_PL_FLAG_FALLBACK already and set the TTM_PL_FLAG_RATE_LIMITED flag as appropriate. Two things which I kind of don't like with the placement flag idea is
Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget
On 15/05/2024 08:20, Christian König wrote: Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Current code appears to live in a misconception that playing with buffer allowed and preferred placements can control the decision on whether backing store migration will be attempted or not. Both from code inspection and from empirical experiments I see that not being true, and that both allowed and preferred placement are typically set to the same bitmask. That's not correct for the use case handled here, but see below. Which part is not correct, that bo->preferred_domains and bo->allower_domains are the same bitmask? As such, when the code decides to throttle the migration for a client, it is in fact not achieving anything. Buffers can still be either migrated or not migrated based on the external (to this function and facility) logic. Fix it by not changing the buffer object placements if the migration budget has been spent. FIXME: Is it still required to call validate is the question.. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 22708954ae68..d07a1dd7c880 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) .no_wait_gpu = false, .resv = bo->tbo.base.resv }; + bool migration_allowed = true; struct ttm_resource *old_res; uint32_t domain; int r; @@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) * visible VRAM if we've depleted our allowance to do * that. */ - if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) + if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) { domain = bo->preferred_domains; - else + } else { domain = bo->allowed_domains; + migration_allowed = false; + } } else { domain = bo->preferred_domains; } } else { domain = bo->allowed_domains; + migration_allowed = false; } retry: - amdgpu_bo_placement_from_domain(bo, domain); + if (migration_allowed) + amdgpu_bo_placement_from_domain(bo, domain); That's completely invalid. Calling amdgpu_bo_placement_from_domain() is a mandatory prerequisite for calling ttm_bo_validate(); E.g. the usually code fow is: /* This initializes bo->placement */ amdgpu_bo_placement_from_domain() /* Eventually modify bo->placement to fit special requirements */ /* Apply the placement to the BO */ ttm_bo_validate(>tbo, >placement, ) To sum it up bo->placement should be a variable on the stack instead, but we never bothered to clean that up. I am not clear if you agree or not that the current method of trying to avoid migration doesn't really do anything? On stack placements sounds plausible to force migration avoidance by putting a single current object placement in that list, if that is what you have in mind? Or a specialized flag/version of amdgpu_bo_placement_from_domain with an bool input like "allow_placement_change"? Regards, Tvrtko Regards, Christian. + r = ttm_bo_validate(>tbo, >placement, ); if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
Re: [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting
On 15/05/2024 08:14, Christian König wrote: Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin The logic assumed any migration attempt worked and therefore would over- account the amount of data migrated during buffer re-validation. As a consequence client can be unfairly penalised by incorrectly considering its migration budget spent. Fix it by looking at the before and after buffer object backing store and only account if there was a change. FIXME: I think this needs a better solution to account for migrations between VRAM visible and non-visible portions. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index ec888fc6ead8..22708954ae68 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) .no_wait_gpu = false, .resv = bo->tbo.base.resv }; + struct ttm_resource *old_res; uint32_t domain; int r; if (bo->tbo.pin_count) return 0; + old_res = bo->tbo.resource; + /* Don't move this buffer if we have depleted our allowance * to move it. Don't move anything if the threshold is zero. */ @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) amdgpu_bo_placement_from_domain(bo, domain); r = ttm_bo_validate(>tbo, >placement, ); - p->bytes_moved += ctx.bytes_moved; - if (!amdgpu_gmc_vram_full_visible(>gmc) && - amdgpu_res_cpu_visible(adev, bo->tbo.resource)) - p->bytes_moved_vis += ctx.bytes_moved; - if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) { domain = bo->allowed_domains; goto retry; } + if (!r) { + struct ttm_resource *new_res = bo->tbo.resource; + bool moved = true; + + if (old_res == new_res) + moved = false; + else if (old_res && new_res && + old_res->mem_type == new_res->mem_type) + moved = false; The old resource might already be destroyed after you return from validation. So this here won't work. Apart from that even when a migration attempt fails the moved bytes should be accounted. When the validation attempt doesn't caused any moves then the bytecount here would be zero. So as far as I can see that is as fair as you can get. Right, I think I suffered a bit of tunnel vision here and completely ignore the _ctx_.moved_bytes part. Scratch this one too then. Regards, Tvrtko Regards, Christian. PS: Looks like our mail servers are once more not very reliable. If you get mails from me multiple times please just ignore it. + + if (moved) { + p->bytes_moved += ctx.bytes_moved; + if (!amdgpu_gmc_vram_full_visible(>gmc) && + amdgpu_res_cpu_visible(adev, bo->tbo.resource)) + p->bytes_moved_vis += ctx.bytes_moved; + } + } + return r; }