Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-26 Thread Tvrtko Ursulin




On 24/07/2024 12:16, Christian König wrote:

Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin:

[SNIP]

Absolutely.


Absolutely good and absolutely me, or absolutely you? :)


You, I don't even have time to finish all the stuff I already started :/


Okay, I think I can squeeze it in.


These are the TODO points and their opens:

- Adjust amdgpu_ctx_set_entity_priority() to call 
drm_sched_entity_modify_sched() regardless of the hw_type - to fix 
priority changes on a single sched other than gfx or compute.


Either that or to stop using the scheduler priority to implement 
userspace priorities and always use different HW queues for that.




- Document sched_list array lifetime must align with the entity and 
adjust the callers.


Open:

Do you still oppose keeping sched_list for num_scheds == 1?


Not if you can fix up all callers.

If so do you propose drm_sched_entity_modify_sched() keeps disagreeing 
with drm_sched_entity_init() on this detail? And keep the "one shot 
single sched_list" quirk in? Why is that nicer than simply keeping the 
list and remove that quirk? Once lifetime rules are clear it IMO is 
okay to always keep the list.


Yeah if every caller of drm_sched_entity_init() can be fixed I'm fine 
with that as well.


Okay so I will tackle the above few first.



- Remove drm_sched_entity_set_priority().

Open:

Should we at this point also modify amdgpu_device_init_schedulers() to 
stop initialising schedulers with DRM_SCHED_PRIORITY_COUNT run queues?


One step at a time.


And leave this for later.

Regards,

Tvrtko


[PULL] drm-intel-next-fixes

2024-07-25 Thread Tvrtko Ursulin


Hi Dave, Sima,

Two fixes for the merge window - turning off preemption on Gen8 since it
apparently just doesn't work reliably enough and a fix for potential NULL
pointer dereference when stolen memory probing failed.

Regards,

Tvrtko

drm-intel-next-fixes-2024-07-25:
- Do not consider preemption during execlists_dequeue for gen8 [gt] (Nitin Gote)
- Allow NULL memory region (Jonathan Cavitt)
The following changes since commit 509580fad7323b6a5da27e8365cd488f3b57210e:

  drm/i915/dp: Don't switch the LTTPR mode on an active link (2024-07-16 
08:14:29 +)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-next-fixes-2024-07-25

for you to fetch changes up to 26720dd2b5a1d088bff8f7e6355fca021c83718f:

  drm/i915: Allow NULL memory region (2024-07-23 09:34:13 +)


- Do not consider preemption during execlists_dequeue for gen8 [gt] (Nitin Gote)
- Allow NULL memory region (Jonathan Cavitt)


Jonathan Cavitt (1):
  drm/i915: Allow NULL memory region

Nitin Gote (1):
  drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8

 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 6 +-
 drivers/gpu/drm/i915/intel_memory_region.c   | 6 --
 2 files changed, 5 insertions(+), 7 deletions(-)


Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-24 Thread Tvrtko Ursulin




On 22/07/2024 16:13, Christian König wrote:

Am 22.07.24 um 16:43 schrieb Tvrtko Ursulin:


On 22/07/2024 15:06, Christian König wrote:

Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin:


On 19/07/2024 16:18, Christian König wrote:

Am 19.07.24 um 15:02 schrieb Christian König:

Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity
creation") a change was made which prevented priority changes for 
entities

with only one assigned scheduler.

The commit reduced drm_sched_entity_set_priority() to simply 
update the

entities priority, but the run queue selection logic in
drm_sched_entity_select_rq() was never able to actually change the
originally assigned run queue.

In practice that only affected amdgpu, being the only driver 
which can do
dynamic priority changes. And that appears was attempted to be 
rectified
there in 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx 
priority

override").

A few unresolved problems however were that this only fixed
drm_sched_entity_set_priority() *if* 
drm_sched_entity_modify_sched() was

called first. That was not documented anywhere.

Secondly, this only works if drm_sched_entity_modify_sched() is 
actually
called, which in amdgpu's case today is true only for gfx and 
compute.
Priority changes for other engines with only one scheduler 
assigned, such

as jpeg and video decode will still not work.

Note that this was also noticed in 981b04d96856 ("drm/sched: 
improve docs

around drm_sched_entity").

Completely different set of non-obvious confusion was that whereas
drm_sched_entity_init() was not keeping the passed in list of 
schedulers
(courtesy of 8c23056bdc7a ("drm/scheduler: do not keep a copy of 
sched
list")), drm_sched_entity_modify_sched() was disagreeing with 
that and

would simply assign the single item list.

That incosistency appears to be semi-silently fixed in ac4eb83ab255
("drm/sched: select new rq even if there is only one v3").

What was also not documented is why it was important to not keep the
list of schedulers when there is only one. I suspect it could have
something to do with the fact the passed in array is on stack for 
many
callers with just one scheduler. With more than one scheduler 
amdgpu is
the only caller, and there the container is not on the stack. 
Keeping a
stack backed list in the entity would obviously be undefined 
behaviour

*if* the list was kept.

Amdgpu however did only stop passing in stack backed container 
for the more
than one scheduler case in 977f7e1068be ("drm/amdgpu: allocate 
entities on

demand"). Until then I suspect dereferencing freed stack from
drm_sched_entity_select_rq() was still present.

In order to untangle all that and fix priority changes this patch is
bringing back the entity owned container for storing the passed in
scheduler list.


Please don't. That makes the mess just more horrible.

The background of not keeping the array is to intentionally 
prevent the priority override from working.


The bug is rather that adding drm_sched_entity_modify_sched() 
messed this up.


To give more background: Amdgpu has two different ways of handling 
priority:

1. The priority in the DRM scheduler.
2. Different HW rings with different priorities.

Your analysis is correct that drm_sched_entity_init() initially 
dropped the scheduler list to avoid using a stack allocated list, 
and that functionality is still used in amdgpu_ctx_init_entity() 
for example.


Setting the scheduler priority was basically just a workaround 
because we didn't had the hw priorities at that time. Since that is 
no longer the case I suggest to just completely drop the 
drm_sched_entity_set_priority() function instead.


Removing drm_sched_entity_set_priority() is one thing, but we also 
need to clear up the sched_list container ownership issue. It is 
neither documented, nor robustly handled in the code. The 
"num_scheds == 1" special casing throughout IMHO has to go too.


I disagree. Keeping the scheduler list in the entity is only useful 
for load balancing.


As long as only one scheduler is provided and we don't load balance 
the entity doesn't needs the scheduler list in the first place.


Once set_priority is removed then it indeed it doesn't. But even when 
it is removed it needs documenting who owns the passed in container. 
Today drivers are okay to pass a stack array when it is one element, 
but if they did it with more than one they would be in for a nasty 
surprise.


Yes, completely agree. But instead of copying the array  I would rather 
go into the direction to cleanup all callers and make the scheduler list 
mandatory to stay around as long as the scheduler lives.


The whole thing of one calling convention there and another one at a 
different place really sucks.


Ok, lets scroll a bit down to formulate a plan.

Another thing if you want to get rid of 

Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister

2024-07-24 Thread Tvrtko Ursulin



On 23/07/2024 16:30, Lucas De Marchi wrote:

On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:


On 22/07/2024 22:06, Lucas De Marchi wrote:

Instead of calling perf_pmu_unregister() when unbinding, defer that to
the destruction of i915 object. Since perf itself holds a reference in
the event, this only happens when all events are gone, which guarantees
i915 is not unregistering the pmu with live events.

Previously, running the following sequence would crash the system after
~2 tries:

1) bind device to i915
2) wait events to show up on sysfs
3) start perf  stat -I 1000 -e i915/rcs0-busy/
4) unbind driver
5) kill perf

Most of the time this crashes in perf_pmu_disable() while accessing the
percpu pmu_disable_count. This happens because perf_pmu_unregister()
destroys it with free_percpu(pmu->pmu_disable_count).

With a lazy unbind, the pmu is only unregistered after (5) as opposed to
after (4). The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully). This seems better
than completely crashing the system.


So effectively allows unbind to succeed without fully unbinding the 
driver from the device? That sounds like a significant drawback and if 
so, I wonder if a more complicated solution wouldn't be better after 
all. Or is there precedence for allowing userspace keeping their paws 
on unbound devices in this way?


keeping the resources alive but "unplunged" while the hardware
disappeared is a common thing to do... it's the whole point of the
drmm-managed resource for example. If you bind the driver and then
unbind it while userspace is holding a ref, next time you try to bind it
will come up with a different card number. A similar thing that could be
done is to adjust the name of the event - currently we add the mangled
pci slot.


Yes.. but what my point was this from your commit message:

"""
The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully).
"""

So the subsequent bind does not "come up with a different card number". 
Statement is it will come up with an error if we look at the PMU subset 
of functionality. I was wondering if there was precedent for that kind 
of situation.


Mangling the PMU driver name probably also wouldn't be great.


That said, I agree a better approach would be to allow
perf_pmu_unregister() to do its job even when there are open events. On
top of that (or as a way to help achieve that), make perf core replace
the callbacks with stubs when pmu is unregistered - that would even kill
the need for i915's checks on pmu->closed (and fix the lack thereof in
other drivers).

It can be a can of worms though and may be pushed back by perf core
maintainers, so it'd be good have their feedback.


Yeah definitely would be essential.

Regards,

Tvrtko


Signed-off-by: Lucas De Marchi 
---
 drivers/gpu/drm/i915/i915_pmu.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
b/drivers/gpu/drm/i915/i915_pmu.c

index 8708f905f4f4..df53a8fe53ec 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, 
void *res)

 struct i915_pmu *pmu = res;
 struct drm_i915_private *i915 = pmu_to_i915(pmu);
+    perf_pmu_unregister(>base);
 free_event_attributes(pmu);
 kfree(pmu->base.attr_groups);
 if (IS_DGFX(i915))
 kfree(pmu->name);
+
+    /*
+ * Make sure all currently running (but shortcut on pmu->closed) 
are
+ * gone before proceeding with free'ing the pmu object embedded 
in i915.

+ */
+    synchronize_rcu();
 }
 static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node 
*node)

 {
-    struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), 
cpuhp.node);

-
-    GEM_BUG_ON(!pmu->base.event_init);
-
 /* Select the first online CPU as a designated reader. */
 if (cpumask_empty(_pmu_cpumask))
 cpumask_set_cpu(cpu, _pmu_cpumask);
@@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int 
cpu, struct hlist_node *node)
 struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), 
cpuhp.node);

 unsigned int target = i915_pmu_target_cpu;
-    GEM_BUG_ON(!pmu->base.event_init);
-
 /*
  * Unregistering an instance generates a CPU offline event which 
we must
  * ignore to avoid incorrectly modifying the shared 
i915_pmu_cpumask.
@@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct 
drm_i915_private *i915)

 {
 struct i915_pmu *pmu = >pmu;
-    if (!pmu->base.event_init)
-    return;
-
 /*
- * "Disconnect" the PMU callbacks - since all are atomic

Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister

2024-07-23 Thread Tvrtko Ursulin



On 22/07/2024 22:06, Lucas De Marchi wrote:

Instead of calling perf_pmu_unregister() when unbinding, defer that to
the destruction of i915 object. Since perf itself holds a reference in
the event, this only happens when all events are gone, which guarantees
i915 is not unregistering the pmu with live events.

Previously, running the following sequence would crash the system after
~2 tries:

1) bind device to i915
2) wait events to show up on sysfs
3) start perf  stat -I 1000 -e i915/rcs0-busy/
4) unbind driver
5) kill perf

Most of the time this crashes in perf_pmu_disable() while accessing the
percpu pmu_disable_count. This happens because perf_pmu_unregister()
destroys it with free_percpu(pmu->pmu_disable_count).

With a lazy unbind, the pmu is only unregistered after (5) as opposed to
after (4). The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully). This seems better
than completely crashing the system.


So effectively allows unbind to succeed without fully unbinding the 
driver from the device? That sounds like a significant drawback and if 
so, I wonder if a more complicated solution wouldn't be better after 
all. Or is there precedence for allowing userspace keeping their paws on 
unbound devices in this way?


Regards,

Tvrtko



Signed-off-by: Lucas De Marchi 
---
  drivers/gpu/drm/i915/i915_pmu.c | 24 +---
  1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 8708f905f4f4..df53a8fe53ec 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, void *res)
struct i915_pmu *pmu = res;
struct drm_i915_private *i915 = pmu_to_i915(pmu);
  
+	perf_pmu_unregister(>base);

free_event_attributes(pmu);
kfree(pmu->base.attr_groups);
if (IS_DGFX(i915))
kfree(pmu->name);
+
+   /*
+* Make sure all currently running (but shortcut on pmu->closed) are
+* gone before proceeding with free'ing the pmu object embedded in i915.
+*/
+   synchronize_rcu();
  }
  
  static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)

  {
-   struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
-
-   GEM_BUG_ON(!pmu->base.event_init);
-
/* Select the first online CPU as a designated reader. */
if (cpumask_empty(_pmu_cpumask))
cpumask_set_cpu(cpu, _pmu_cpumask);
@@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct 
hlist_node *node)
struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
unsigned int target = i915_pmu_target_cpu;
  
-	GEM_BUG_ON(!pmu->base.event_init);

-
/*
 * Unregistering an instance generates a CPU offline event which we must
 * ignore to avoid incorrectly modifying the shared i915_pmu_cpumask.
@@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
  {
struct i915_pmu *pmu = >pmu;
  
-	if (!pmu->base.event_init)

-   return;
-
/*
-* "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
-* ensures all currently executing ones will have exited before we
-* proceed with unregistration.
+* "Disconnect" the PMU callbacks - unregistering the pmu will be done
+* later when all currently open events are gone
 */
pmu->closed = true;
-   synchronize_rcu();
  
  	hrtimer_cancel(>timer);

-
i915_pmu_unregister_cpuhp_state(pmu);
-   perf_pmu_unregister(>base);
  
  	pmu->base.event_init = NULL;

  }


Re: [PATCH 5/7] drm/i915/pmu: Let resource survive unbind

2024-07-23 Thread Tvrtko Ursulin



On 22/07/2024 22:06, Lucas De Marchi wrote:

There's no need to free the resources during unbind. Since perf events
may still access them due to open events, it's safer to free them when
dropping the last i915 reference.

Signed-off-by: Lucas De Marchi 
---
  drivers/gpu/drm/i915/i915_pmu.c | 21 -
  1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index b5d14dd318e4..8708f905f4f4 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -5,6 +5,7 @@
   */
  
  #include 

+#include 
  
  #include "gt/intel_engine.h"

  #include "gt/intel_engine_pm.h"
@@ -1152,6 +1153,17 @@ static void free_event_attributes(struct i915_pmu *pmu)
pmu->pmu_attr = NULL;
  }
  
+static void free_pmu(struct drm_device *dev, void *res)

+{
+   struct i915_pmu *pmu = res;
+   struct drm_i915_private *i915 = pmu_to_i915(pmu);
+
+   free_event_attributes(pmu);
+   kfree(pmu->base.attr_groups);
+   if (IS_DGFX(i915))
+   kfree(pmu->name);
+}
+
  static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
  {
struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
@@ -1302,6 +1314,9 @@ void i915_pmu_register(struct drm_i915_private *i915)
if (ret)
goto err_unreg;
  
+	if (drmm_add_action_or_reset(>drm, free_pmu, pmu))

+   goto err_unreg;


Is i915_pmu_unregister_cpuhp_state missing on this error path?

Regards,

Tvrtko


+
return;
  
  err_unreg:

@@ -1336,11 +1351,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
hrtimer_cancel(>timer);
  
  	i915_pmu_unregister_cpuhp_state(pmu);

-
perf_pmu_unregister(>base);
+
pmu->base.event_init = NULL;
-   kfree(pmu->base.attr_groups);
-   if (IS_DGFX(i915))
-   kfree(pmu->name);
-   free_event_attributes(pmu);
  }


Re: [PATCH 4/7] drm/i915/pmu: Drop is_igp()

2024-07-23 Thread Tvrtko Ursulin



On 22/07/2024 22:06, Lucas De Marchi wrote:

There's no reason to hardcode checking for integrated graphics on a
specific pci slot. That information is already available per platform an
can be checked with IS_DGFX().


Hmm probably reason was this, added is_igp:

commit 05488673a4d41383f9dd537f298e525e6b00fb93
Author: Tvrtko Ursulin 
AuthorDate: Wed Oct 16 10:38:02 2019 +0100
Commit: Tvrtko Ursulin 
CommitDate: Thu Oct 17 10:50:47 2019 +0100

drm/i915/pmu: Support multiple GPUs

Added IS_DGFX:

commit dc90fe3fd219c7693617ba09a9467e4aadc2e039
Author: José Roberto de Souza 
AuthorDate: Thu Oct 24 12:51:19 2019 -0700
Commit: Lucas De Marchi 
CommitDate: Fri Oct 25 13:53:51 2019 -0700

drm/i915: Add is_dgfx to device info

So innocently arrived just a bit before.

Regards,

Tvrtko


Signed-off-by: Lucas De Marchi 
---
  drivers/gpu/drm/i915/i915_pmu.c | 17 +++--
  1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 3a8bd11b87e7..b5d14dd318e4 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1235,17 +1235,6 @@ static void i915_pmu_unregister_cpuhp_state(struct 
i915_pmu *pmu)
cpuhp_state_remove_instance(cpuhp_slot, >cpuhp.node);
  }
  
-static bool is_igp(struct drm_i915_private *i915)

-{
-   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
-
-   /* IGP is :00:02.0 */
-   return pci_domain_nr(pdev->bus) == 0 &&
-  pdev->bus->number == 0 &&
-  PCI_SLOT(pdev->devfn) == 2 &&
-  PCI_FUNC(pdev->devfn) == 0;
-}
-
  void i915_pmu_register(struct drm_i915_private *i915)
  {
struct i915_pmu *pmu = >pmu;
@@ -1269,7 +1258,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
pmu->cpuhp.cpu = -1;
init_rc6(pmu);
  
-	if (!is_igp(i915)) {

+   if (IS_DGFX(i915)) {
pmu->name = kasprintf(GFP_KERNEL,
  "i915_%s",
  dev_name(i915->drm.dev));
@@ -1323,7 +1312,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
pmu->base.event_init = NULL;
free_event_attributes(pmu);
  err_name:
-   if (!is_igp(i915))
+   if (IS_DGFX(i915))
kfree(pmu->name);
  err:
drm_notice(>drm, "Failed to register PMU!\n");
@@ -1351,7 +1340,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
perf_pmu_unregister(>base);
pmu->base.event_init = NULL;
kfree(pmu->base.attr_groups);
-   if (!is_igp(i915))
+   if (IS_DGFX(i915))
kfree(pmu->name);
free_event_attributes(pmu);
  }


Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-22 Thread Tvrtko Ursulin



On 22/07/2024 15:06, Christian König wrote:

Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin:


On 19/07/2024 16:18, Christian König wrote:

Am 19.07.24 um 15:02 schrieb Christian König:

Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity
creation") a change was made which prevented priority changes for 
entities

with only one assigned scheduler.

The commit reduced drm_sched_entity_set_priority() to simply update 
the

entities priority, but the run queue selection logic in
drm_sched_entity_select_rq() was never able to actually change the
originally assigned run queue.

In practice that only affected amdgpu, being the only driver which 
can do
dynamic priority changes. And that appears was attempted to be 
rectified
there in 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx 
priority

override").

A few unresolved problems however were that this only fixed
drm_sched_entity_set_priority() *if* 
drm_sched_entity_modify_sched() was

called first. That was not documented anywhere.

Secondly, this only works if drm_sched_entity_modify_sched() is 
actually

called, which in amdgpu's case today is true only for gfx and compute.
Priority changes for other engines with only one scheduler 
assigned, such

as jpeg and video decode will still not work.

Note that this was also noticed in 981b04d96856 ("drm/sched: 
improve docs

around drm_sched_entity").

Completely different set of non-obvious confusion was that whereas
drm_sched_entity_init() was not keeping the passed in list of 
schedulers

(courtesy of 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched
list")), drm_sched_entity_modify_sched() was disagreeing with that and
would simply assign the single item list.

That incosistency appears to be semi-silently fixed in ac4eb83ab255
("drm/sched: select new rq even if there is only one v3").

What was also not documented is why it was important to not keep the
list of schedulers when there is only one. I suspect it could have
something to do with the fact the passed in array is on stack for many
callers with just one scheduler. With more than one scheduler 
amdgpu is
the only caller, and there the container is not on the stack. 
Keeping a

stack backed list in the entity would obviously be undefined behaviour
*if* the list was kept.

Amdgpu however did only stop passing in stack backed container for 
the more
than one scheduler case in 977f7e1068be ("drm/amdgpu: allocate 
entities on

demand"). Until then I suspect dereferencing freed stack from
drm_sched_entity_select_rq() was still present.

In order to untangle all that and fix priority changes this patch is
bringing back the entity owned container for storing the passed in
scheduler list.


Please don't. That makes the mess just more horrible.

The background of not keeping the array is to intentionally prevent 
the priority override from working.


The bug is rather that adding drm_sched_entity_modify_sched() messed 
this up.


To give more background: Amdgpu has two different ways of handling 
priority:

1. The priority in the DRM scheduler.
2. Different HW rings with different priorities.

Your analysis is correct that drm_sched_entity_init() initially 
dropped the scheduler list to avoid using a stack allocated list, and 
that functionality is still used in amdgpu_ctx_init_entity() for 
example.


Setting the scheduler priority was basically just a workaround 
because we didn't had the hw priorities at that time. Since that is 
no longer the case I suggest to just completely drop the 
drm_sched_entity_set_priority() function instead.


Removing drm_sched_entity_set_priority() is one thing, but we also 
need to clear up the sched_list container ownership issue. It is 
neither documented, nor robustly handled in the code. The "num_scheds 
== 1" special casing throughout IMHO has to go too.


I disagree. Keeping the scheduler list in the entity is only useful for 
load balancing.


As long as only one scheduler is provided and we don't load balance the 
entity doesn't needs the scheduler list in the first place.


Once set_priority is removed then it indeed it doesn't. But even when it 
is removed it needs documenting who owns the passed in container. Today 
drivers are okay to pass a stack array when it is one element, but if 
they did it with more than one they would be in for a nasty surprise.


Another thing if you want to get rid of frontend priority handling is 
to stop configuring scheduler instances with DRM_SCHED_PRIORITY_COUNT 
priority levels, to avoid wasting memory on pointless run queues.


I would rather like to completely drop the RR with the runlists 
altogether and keep only the FIFO approach around. This way priority can 
be implemented by boosting the score of submissions by a certain degree.


You mean larger refactoring of the scheduler removing the 1:N between 
drm_sch

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-22 Thread Tvrtko Ursulin



On 19/07/2024 16:18, Christian König wrote:

Am 19.07.24 um 15:02 schrieb Christian König:

Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity
creation") a change was made which prevented priority changes for 
entities

with only one assigned scheduler.

The commit reduced drm_sched_entity_set_priority() to simply update the
entities priority, but the run queue selection logic in
drm_sched_entity_select_rq() was never able to actually change the
originally assigned run queue.

In practice that only affected amdgpu, being the only driver which 
can do

dynamic priority changes. And that appears was attempted to be rectified
there in 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx priority
override").

A few unresolved problems however were that this only fixed
drm_sched_entity_set_priority() *if* drm_sched_entity_modify_sched() was
called first. That was not documented anywhere.

Secondly, this only works if drm_sched_entity_modify_sched() is actually
called, which in amdgpu's case today is true only for gfx and compute.
Priority changes for other engines with only one scheduler assigned, 
such

as jpeg and video decode will still not work.

Note that this was also noticed in 981b04d96856 ("drm/sched: improve 
docs

around drm_sched_entity").

Completely different set of non-obvious confusion was that whereas
drm_sched_entity_init() was not keeping the passed in list of schedulers
(courtesy of 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched
list")), drm_sched_entity_modify_sched() was disagreeing with that and
would simply assign the single item list.

That incosistency appears to be semi-silently fixed in ac4eb83ab255
("drm/sched: select new rq even if there is only one v3").

What was also not documented is why it was important to not keep the
list of schedulers when there is only one. I suspect it could have
something to do with the fact the passed in array is on stack for many
callers with just one scheduler. With more than one scheduler amdgpu is
the only caller, and there the container is not on the stack. Keeping a
stack backed list in the entity would obviously be undefined behaviour
*if* the list was kept.

Amdgpu however did only stop passing in stack backed container for 
the more
than one scheduler case in 977f7e1068be ("drm/amdgpu: allocate 
entities on

demand"). Until then I suspect dereferencing freed stack from
drm_sched_entity_select_rq() was still present.

In order to untangle all that and fix priority changes this patch is
bringing back the entity owned container for storing the passed in
scheduler list.


Please don't. That makes the mess just more horrible.

The background of not keeping the array is to intentionally prevent 
the priority override from working.


The bug is rather that adding drm_sched_entity_modify_sched() messed 
this up.


To give more background: Amdgpu has two different ways of handling 
priority:

1. The priority in the DRM scheduler.
2. Different HW rings with different priorities.

Your analysis is correct that drm_sched_entity_init() initially dropped 
the scheduler list to avoid using a stack allocated list, and that 
functionality is still used in amdgpu_ctx_init_entity() for example.


Setting the scheduler priority was basically just a workaround because 
we didn't had the hw priorities at that time. Since that is no longer 
the case I suggest to just completely drop the 
drm_sched_entity_set_priority() function instead.


Removing drm_sched_entity_set_priority() is one thing, but we also need 
to clear up the sched_list container ownership issue. It is neither 
documented, nor robustly handled in the code. The "num_scheds == 1" 
special casing throughout IMHO has to go too.


Another thing if you want to get rid of frontend priority handling is to 
stop configuring scheduler instances with DRM_SCHED_PRIORITY_COUNT 
priority levels, to avoid wasting memory on pointless run queues.


And final thing is to check whether the locking in 
drm_sched_entity_modify_sched() is okay. Because according to kerneldoc:


 * Note that this must be called under the same common lock for @entity as
 * drm_sched_job_arm() and drm_sched_entity_push_job(), or the driver 
needs to
 * guarantee through some other means that this is never called while 
new jobs

 * can be pushed to @entity.

I don't see that is the case. Priority override is under 
amdgpu_ctx_mgr->lock, while job arm and push appear not. I also cannot 
spot anything else preventing amdgpu_sched_ioctl() running in parallel 
to everything else.


In general scheduler priorities were meant to be used for things like 
kernel queues which would always have higher priority than user space 
submissions and using them for userspace turned out to be not such a 
good idea.


Out of curiousity what were the problems? I cannot think of anythi

[PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-19 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity
creation") a change was made which prevented priority changes for entities
with only one assigned scheduler.

The commit reduced drm_sched_entity_set_priority() to simply update the
entities priority, but the run queue selection logic in
drm_sched_entity_select_rq() was never able to actually change the
originally assigned run queue.

In practice that only affected amdgpu, being the only driver which can do
dynamic priority changes. And that appears was attempted to be rectified
there in 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx priority
override").

A few unresolved problems however were that this only fixed
drm_sched_entity_set_priority() *if* drm_sched_entity_modify_sched() was
called first. That was not documented anywhere.

Secondly, this only works if drm_sched_entity_modify_sched() is actually
called, which in amdgpu's case today is true only for gfx and compute.
Priority changes for other engines with only one scheduler assigned, such
as jpeg and video decode will still not work.

Note that this was also noticed in 981b04d96856 ("drm/sched: improve docs
around drm_sched_entity").

Completely different set of non-obvious confusion was that whereas
drm_sched_entity_init() was not keeping the passed in list of schedulers
(courtesy of 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched
list")), drm_sched_entity_modify_sched() was disagreeing with that and
would simply assign the single item list.

That incosistency appears to be semi-silently fixed in ac4eb83ab255
("drm/sched: select new rq even if there is only one v3").

What was also not documented is why it was important to not keep the
list of schedulers when there is only one. I suspect it could have
something to do with the fact the passed in array is on stack for many
callers with just one scheduler. With more than one scheduler amdgpu is
the only caller, and there the container is not on the stack. Keeping a
stack backed list in the entity would obviously be undefined behaviour
*if* the list was kept.

Amdgpu however did only stop passing in stack backed container for the more
than one scheduler case in 977f7e1068be ("drm/amdgpu: allocate entities on
demand"). Until then I suspect dereferencing freed stack from
drm_sched_entity_select_rq() was still present.

In order to untangle all that and fix priority changes this patch is
bringing back the entity owned container for storing the passed in
scheduler list. Container is now owned by the entity and the pointers are
owned by the drivers. List of schedulers is always kept including for the
one scheduler case.

The patch can therefore also removes the single scheduler special case,
which means that priority changes should now work (be able to change the
selected run-queue) for all drivers and engines. In other words
drm_sched_entity_set_priority() should now just work for all cases.

To enable maintaining its own container some API calls needed to grow a
capability for returning success/failure, which is a change which
percolates mostly through amdgpu source.

Signed-off-by: Tvrtko Ursulin 
Fixes: b3ac17667f11 ("drm/scheduler: rework entity creation")
References: 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched list")
References: 977f7e1068be ("drm/amdgpu: allocate entities on demand")
References: 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx priority 
override")
References: ac4eb83ab255 ("drm/sched: select new rq even if there is only one 
v3")
References: 981b04d96856 ("drm/sched: improve docs around drm_sched_entity")
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Daniel Vetter 
Cc: amd-...@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.6+
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 31 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 13 +--
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c   |  3 +-
 drivers/gpu/drm/scheduler/sched_entity.c  | 96 ---
 include/drm/gpu_scheduler.h   | 16 ++--
 6 files changed, 100 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 5cb33ac99f70..387247f8307e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -802,15 +802,15 @@ struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx 
*ctx,
return fence;
 }
 
-static void amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx,
-  struct amdgpu_ctx_entity *aentity,
-  int hw_ip,
-  int32_t priority)
+static int amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx,
+   

[PULL] drm-intel-next-fixes

2024-07-18 Thread Tvrtko Ursulin


Hi Dave, Sima,

One display fix for the merge window relating to DisplayPort LTTPR. It
fixes at least Dell UD22 dock when used on Intel N100 systems.

Regards,

Tvrtko

drm-intel-next-fixes-2024-07-18:
- Reset intel_dp->link_trained before retraining the link [dp] (Imre Deak)
- Don't switch the LTTPR mode on an active link [dp] (Imre Deak)
The following changes since commit c58c39163a7e2c4c8885c57e4e74931c7b482e53:

  drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB (2024-07-12 
13:13:15 +1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-next-fixes-2024-07-18

for you to fetch changes up to 509580fad7323b6a5da27e8365cd488f3b57210e:

  drm/i915/dp: Don't switch the LTTPR mode on an active link (2024-07-16 
08:14:29 +)


- Reset intel_dp->link_trained before retraining the link [dp] (Imre Deak)
- Don't switch the LTTPR mode on an active link [dp] (Imre Deak)


Imre Deak (2):
  drm/i915/dp: Reset intel_dp->link_trained before retraining the link
  drm/i915/dp: Don't switch the LTTPR mode on an active link

 drivers/gpu/drm/i915/display/intel_dp.c|  2 +
 .../gpu/drm/i915/display/intel_dp_link_training.c  | 55 +++---
 2 files changed, 50 insertions(+), 7 deletions(-)


Re: [PATCH] drm/v3d: Expose memory stats through fdinfo

2024-07-11 Thread Tvrtko Ursulin



On 11/07/2024 15:25, Maíra Canal wrote:

Use the common DRM function `drm_show_memory_stats()` to expose standard
fdinfo memory stats.

V3D exposes global GPU memory stats through debugfs. Those stats will be
preserved while the DRM subsystem doesn't have a standard solution to
expose global GPU stats.

Signed-off-by: Maíra Canal 
---

* Example fdinfo output:

$ cat /proc/10100/fdinfo/19
pos:0
flags:  0242
mnt_id: 25
ino:521
drm-driver: v3d
drm-client-id:  81
drm-engine-bin: 4916187 ns
v3d-jobs-bin:   98 jobs
drm-engine-render:  154563573 ns
v3d-jobs-render:98 jobs
drm-engine-tfu: 10574 ns
v3d-jobs-tfu:   1 jobs
drm-engine-csd: 0 ns
v3d-jobs-csd:   0 jobs
drm-engine-cache_clean: 0 ns
v3d-jobs-cache_clean:   0 jobs
drm-engine-cpu: 0 ns
v3d-jobs-cpu:   0 jobs
drm-total-memory:   15168 KiB
drm-shared-memory:  9336 KiB
drm-active-memory:  0

* Example gputop output:

DRM minor 128
   PID  MEM  RSS   bin  render   tfucsd 
   cache_cleancpu   NAME
10257  19M  19M |  3.6% ▎ || 43.2% ██▋   ||  0.0%   ||  0.0%
   ||  0.0%   ||  0.0%   | glmark2
  9963   3M   3M |  0.3% ▏ ||  2.6% ▎ ||  0.0%   ||  0.0%   
||  0.0%   ||  0.0%   | glxgears
  9965  10M  10M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   
||  0.0%   ||  0.0%   | Xwayland
10100  14M  14M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%
   ||  0.0%   ||  0.0%   | chromium-browse

Best Regards,
- Maíra

  drivers/gpu/drm/v3d/v3d_bo.c  | 12 
  drivers/gpu/drm/v3d/v3d_drv.c |  2 ++
  2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a165cbcdd27b..ecb80fd75b1a 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -26,6 +26,17 @@
  #include "v3d_drv.h"
  #include "uapi/drm/v3d_drm.h"
  
+static enum drm_gem_object_status v3d_gem_status(struct drm_gem_object *obj)

+{
+   struct v3d_bo *bo = to_v3d_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   if (bo->base.pages)
+   res |= DRM_GEM_OBJECT_RESIDENT;


To check my understanding of v3d - pages are actually always there for 
the lifetime of the object? If so this could be just "return 
DRM_GEM_OBJECT_RESIDENT", although granted, like you have it is more 
future proof.


Either way:

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


+
+   return res;
+}
+
  /* Called DRM core on the last userspace/kernel unreference of the
   * BO.
   */
@@ -63,6 +74,7 @@ static const struct drm_gem_object_funcs v3d_gem_funcs = {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = v3d_gem_status,
.vm_ops = _gem_shmem_vm_ops,
  };
  
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c

index a47f00b443d3..e883f405f26a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -184,6 +184,8 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
   v3d_queue_to_string(queue), jobs_completed);
}
+
+   drm_show_memory_stats(p, file);
  }
  
  static const struct file_operations v3d_drm_fops = {


[PATCH 11/11] drm/v3d: Add some local variables in queries/extensions

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Add some local variables to make the code a bit less verbose, with the
main benefit being pulling some lines to under 80 columns wide.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 88 ++--
 1 file changed, 49 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index b282d12571b5..d607aa9c4ec2 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   struct v3d_timestamp_query_info *query_info = >timestamp_query;
unsigned int i;
int err;
 
@@ -473,10 +474,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
 
-   job->timestamp_query.queries = kvmalloc_array(timestamp.count,
- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   query_info->queries = kvmalloc_array(timestamp.count,
+sizeof(struct v3d_timestamp_query),
+GFP_KERNEL);
+   if (!query_info->queries)
return -ENOMEM;
 
offsets = u64_to_user_ptr(timestamp.offsets);
@@ -490,20 +491,21 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
goto error;
}
 
-   job->timestamp_query.queries[i].offset = offset;
+   query_info->queries[i].offset = offset;
 
if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
 
-   job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->timestamp_query.queries[i].syncobj) {
+   query_info->queries[i].syncobj = drm_syncobj_find(file_priv,
+ sync);
+   if (!query_info->queries[i].syncobj) {
err = -ENOENT;
goto error;
}
}
-   job->timestamp_query.count = timestamp.count;
+   query_info->count = timestamp.count;
 
return 0;
 
@@ -519,6 +521,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   struct v3d_timestamp_query_info *query_info = >timestamp_query;
unsigned int i;
int err;
 
@@ -537,10 +540,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY;
 
-   job->timestamp_query.queries = kvmalloc_array(reset.count,
- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   query_info->queries = kvmalloc_array(reset.count,
+sizeof(struct v3d_timestamp_query),
+GFP_KERNEL);
+   if (!query_info->queries)
return -ENOMEM;
 
syncs = u64_to_user_ptr(reset.syncs);
@@ -548,20 +551,21 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
for (i = 0; i < reset.count; i++) {
u32 sync;
 
-   job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
+   query_info->queries[i].offset = reset.offset + 8 * i;
 
if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
 
-   job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->timestamp_query.queries[i].syncobj) {
+   query_info->queries[i].syncobj = drm_syncobj_find(file_priv,
+ sync);
+   if (!query_info->queries[i].syncobj) {
err = -ENOENT;
goto error;
}
}
-   job->timestamp_query.count = reset.count;
+   query_info->count = reset.count;
 
return 0;
 
@@ -578,6 +582,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_copy_timestamp_query copy;
+   struct v3d_timestamp_query_info *query_info = >timestamp_query;
unsigned int i;
int err;

[PATCH 05/11] drm/v3d: Validate passed in drm syncobj handles in the performance extension

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfully or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 9a3e32075ebe..4cdfabbf4964 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -710,6 +710,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
}
 
job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
@@ -790,6 +794,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
}
 
job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->performance_query.count = copy.count;
job->performance_query.nperfmons = copy.nperfmons;
-- 
2.44.0



[PATCH 09/11] drm/v3d: Move perfmon init completely into own unit

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Now that the build time dependencies on various array sizes have been
removed, we can move the perfmon init completely into its own compilation
unit and remove the hardcoded defines.

This improves on the temporary fix quickly delivered in commit
9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning").

Signed-off-by: Tvrtko Ursulin 
References: 9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning")
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c |  9 +---
 drivers/gpu/drm/v3d/v3d_drv.h |  6 +--
 drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++
 .../gpu/drm/v3d/v3d_performance_counters.h| 16 ---
 4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index a47f00b443d3..491c638a4d74 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
args->value = 1;
return 0;
case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
-   args->value = v3d->max_counters;
+   args->value = v3d->perfmon_info.max_counters;
return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
@@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
-   if (v3d->ver >= 71)
-   v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS;
-   else if (v3d->ver >= 42)
-   v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS;
-   else
-   v3d->max_counters = 0;
+   v3d_perfmon_init(v3d);
 
v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
if (IS_ERR(v3d->reset)) {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index b1dfec49ba7d..8524761bc62d 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,10 +104,7 @@ struct v3d_dev {
int ver;
bool single_irq_line;
 
-   /* Different revisions of V3D have different total number of performance
-* counters
-*/
-   unsigned int max_counters;
+   struct v3d_perfmon_info perfmon_info;
 
void __iomem *hub_regs;
void __iomem *core_regs[3];
@@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
 
 /* v3d_perfmon.c */
+void v3d_perfmon_init(struct v3d_dev *v3d);
 void v3d_perfmon_get(struct v3d_perfmon *perfmon);
 void v3d_perfmon_put(struct v3d_perfmon *perfmon);
 void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index b7d0b02e1a95..cd7f1eedf17f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc 
v3d_v71_performance_counters[] = {
{"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any 
other reason (vary/W/Z)"},
 };
 
+void v3d_perfmon_init(struct v3d_dev *v3d)
+{
+   const struct v3d_perf_counter_desc *counters = NULL;
+   unsigned int max = 0;
+
+   if (v3d->ver >= 71) {
+   counters = v3d_v71_performance_counters;
+   max = ARRAY_SIZE(v3d_v71_performance_counters);
+   } else if (v3d->ver >= 42) {
+   counters = v3d_v42_performance_counters;
+   max = ARRAY_SIZE(v3d_v42_performance_counters);
+   }
+
+   v3d->perfmon_info.max_counters = max;
+   v3d->perfmon_info.counters = counters;
+}
+
 void v3d_perfmon_get(struct v3d_perfmon *perfmon)
 {
if (perfmon)
@@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 
/* Make sure all counters are valid. */
for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= v3d->max_counters)
+   if (req->counters[i] >= v3d->perfmon_info.max_counters)
return -EINVAL;
}
 
@@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, 
void *data,
return -EINVAL;
}
 
-   /* Make sure that the counter ID is valid */
-   if (req->counter >= v3d->max_counters)
-   return -EINVAL;
-
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) !=
-V3D_V42_NUM_PERFCOUNTERS);
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v71_performance_counters) !=
-V3D_V71_NUM_PERFCOUNTERS);
-   BUILD_BUG_ON(V3D_MAX_COUNTE

[PATCH 08/11] drm/v3d: Do not use intermediate storage when copying performance query results

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Removing the intermediate buffer removes the last use of the
V3D_MAX_COUNTERS define, which will enable further driver cleanup.

While at it pull the 32 vs 64 bit copying decision outside the loop in
order to reduce the number of conditional instructions.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Iago Toral Quiroga 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 59 +
 1 file changed, 37 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 7b2195ba4248..d193072703f3 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job)
v3d_put_bo_vaddr(bo);
 }
 
+static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value)
+{
+   dst[idx] = value;
+}
+
+static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value)
+{
+   dst[idx] = value;
+}
+
 static void
-write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value)
+write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value)
 {
-   if (do_64bit) {
-   u64 *dst64 = (u64 *)dst;
-
-   dst64[idx] = value;
-   } else {
-   u32 *dst32 = (u32 *)dst;
-
-   dst32[idx] = (u32)value;
-   }
+   if (do_64bit)
+   write_to_buffer_64(dst, idx, value);
+   else
+   write_to_buffer_32(dst, idx, value);
 }
 
 static void
@@ -505,18 +510,24 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job)
 }
 
 static void
-v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 
query)
+v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data,
+  unsigned int query)
 {
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
-   struct v3d_copy_query_results_info *copy = >copy;
+   struct v3d_performance_query_info *performance_query =
+   >performance_query;
struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
+   struct v3d_performance_query *perf_query =
+   _query->queries[query];
struct v3d_dev *v3d = job->base.v3d;
-   struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_MAX_COUNTERS];
+   unsigned int i, j, offset;
+
+   for (i = 0, offset = 0;
+i < performance_query->nperfmons;
+i++, offset += DRM_V3D_MAX_PERF_COUNTERS) {
+   struct v3d_perfmon *perfmon;
 
-   for (int i = 0; i < performance_query->nperfmons; i++) {
perfmon = v3d_perfmon_find(v3d_priv,
-  
performance_query->queries[query].kperfmon_ids[i]);
+  perf_query->kperfmon_ids[i]);
if (!perfmon) {
DRM_DEBUG("Failed to find perfmon.");
continue;
@@ -524,14 +535,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job 
*job, void *data, u32 quer
 
v3d_perfmon_stop(v3d, perfmon, true);
 
-   memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], 
perfmon->values,
-  perfmon->ncounters * sizeof(u64));
+   if (job->copy.do_64bit) {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_64(data, offset + j,
+  perfmon->values[j]);
+   } else {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_32(data, offset + j,
+  perfmon->values[j]);
+   }
 
v3d_perfmon_put(perfmon);
}
-
-   for (int i = 0; i < performance_query->ncounters; i++)
-   write_to_buffer(data, i, copy->do_64bit, counter_values[i]);
 }
 
 static void
-- 
2.44.0



[PATCH 07/11] drm/v3d: Size the kperfmon_ids array at runtime

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Instead of statically reserving pessimistic space for the kperfmon_ids
array, make the userspace extension code allocate the exactly required
amount of space.

Apart from saving some memory at runtime, this also removes the need for
the V3D_MAX_PERFMONS macro whose removal will benefit further driver
cleanup.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.h|  6 +-
 drivers/gpu/drm/v3d/v3d_sched.c  |  4 +++-
 drivers/gpu/drm/v3d/v3d_submit.c | 17 +++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index dd3ead4cb8bd..b1dfec49ba7d 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,13 +351,9 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
 };
 
-/* Number of perfmons required to handle all supported performance counters */
-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
- DRM_V3D_MAX_PERF_COUNTERS)
-
 struct v3d_performance_query {
/* Performance monitor IDs for this query */
-   u32 kperfmon_ids[V3D_MAX_PERFMONS];
+   u32 *kperfmon_ids;
 
/* Syncobj that indicates the query availability */
struct drm_syncobj *syncobj;
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 5fbbee47c6b7..7b2195ba4248 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -94,8 +94,10 @@ v3d_performance_query_info_free(struct 
v3d_performance_query_info *query_info,
if (query_info->queries) {
unsigned int i;
 
-   for (i = 0; i < count; i++)
+   for (i = 0; i < count; i++) {
drm_syncobj_put(query_info->queries[i].syncobj);
+   kvfree(query_info->queries[i].kperfmon_ids);
+   }
 
kvfree(query_info->queries);
}
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index ce56e31a027d..d1060e60aafa 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -671,10 +671,20 @@ v3d_copy_query_info(struct v3d_performance_query_info 
*query_info,
goto error;
}
 
+   query->kperfmon_ids =
+   kvmalloc_array(nperfmons,
+  sizeof(struct v3d_performance_query *),
+  GFP_KERNEL);
+   if (!query->kperfmon_ids) {
+   err = -ENOMEM;
+   goto error;
+   }
+
ids_pointer = u64_to_user_ptr(ids);
 
for (j = 0; j < nperfmons; j++) {
if (get_user(id, ids_pointer++)) {
+   kvfree(query->kperfmon_ids);
err = -EFAULT;
goto error;
}
@@ -684,6 +694,7 @@ v3d_copy_query_info(struct v3d_performance_query_info 
*query_info,
 
query->syncobj = drm_syncobj_find(file_priv, sync);
if (!query->syncobj) {
+   kvfree(query->kperfmon_ids);
err = -ENOENT;
goto error;
}
@@ -717,9 +728,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
 
-   if (reset.nperfmons > V3D_MAX_PERFMONS)
-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(reset.count,
@@ -767,9 +775,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
 
-   if (copy.nperfmons > V3D_MAX_PERFMONS)
-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(copy.count,
-- 
2.44.0



[PATCH 06/11] drm/v3d: Move part of copying of reset/copy performance extension to a helper

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

The loop which looks up the syncobj and copies the kperfmon ids is
identical so lets move it to a helper.

The only change is replacing copy_from_user with get_user when copying a
scalar.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 152 ++-
 1 file changed, 68 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 4cdfabbf4964..ce56e31a027d 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -644,15 +644,64 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
return err;
 }
 
+static int
+v3d_copy_query_info(struct v3d_performance_query_info *query_info,
+   unsigned int count,
+   unsigned int nperfmons,
+   u32 __user *syncs,
+   u64 __user *kperfmon_ids,
+   struct drm_file *file_priv)
+{
+   unsigned int i, j;
+   int err;
+
+   for (i = 0; i < count; i++) {
+   struct v3d_performance_query *query = _info->queries[i];
+   u32 __user *ids_pointer;
+   u32 sync, id;
+   u64 ids;
+
+   if (get_user(sync, syncs++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   if (get_user(ids, kperfmon_ids++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   ids_pointer = u64_to_user_ptr(ids);
+
+   for (j = 0; j < nperfmons; j++) {
+   if (get_user(id, ids_pointer++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   query->kperfmon_ids[j] = id;
+   }
+
+   query->syncobj = drm_syncobj_find(file_priv, sync);
+   if (!query->syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
+   }
+
+   return 0;
+
+error:
+   v3d_performance_query_info_free(query_info, i);
+   return err;
+}
+
 static int
 v3d_get_cpu_reset_performance_params(struct drm_file *file_priv,
 struct drm_v3d_extension __user *ext,
 struct v3d_cpu_job *job)
 {
-   u32 __user *syncs;
-   u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
-   unsigned int i, j;
int err;
 
if (!job) {
@@ -679,50 +728,19 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (!job->performance_query.queries)
return -ENOMEM;
 
-   syncs = u64_to_user_ptr(reset.syncs);
-   kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
+   err = v3d_copy_query_info(>performance_query,
+ reset.count,
+ reset.nperfmons,
+ u64_to_user_ptr(reset.syncs),
+ u64_to_user_ptr(reset.kperfmon_ids),
+ file_priv);
+   if (err)
+   return err;
 
-   for (i = 0; i < reset.count; i++) {
-   u32 sync;
-   u64 ids;
-   u32 __user *ids_pointer;
-   u32 id;
-
-   if (copy_from_user(, syncs++, sizeof(sync))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   ids_pointer = u64_to_user_ptr(ids);
-
-   for (j = 0; j < reset.nperfmons; j++) {
-   if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   job->performance_query.queries[i].kperfmon_ids[j] = id;
-   }
-
-   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->performance_query.queries[i].syncobj) {
-   err = -ENOENT;
-   goto error;
-   }
-   }
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
 
return 0;
-
-error:
-   v3d_performance_query_info_free(>performance_query, i);
-   return err;
 }
 
 static int
@@ -730,10 +748,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
  struct drm_v3d_extension __user *ext,
  struct v3d_cpu_job *job)
 {
-   u32 __user *syncs;

[PATCH 04/11] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfully or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp 
query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 50be4e8a7512..9a3e32075ebe 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -498,6 +498,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = timestamp.count;
 
@@ -552,6 +556,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = reset.count;
 
@@ -616,6 +624,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = copy.count;
 
-- 
2.44.0



[PATCH 10/11] drm/v3d: Prefer get_user for scalar types

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

It makes it just a tiny bit more obvious what is going on.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index d1060e60aafa..b282d12571b5 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -485,14 +485,14 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
for (i = 0; i < timestamp.count; i++) {
u32 offset, sync;
 
-   if (copy_from_user(, offsets++, sizeof(offset))) {
+   if (get_user(offset, offsets++)) {
err = -EFAULT;
goto error;
}
 
job->timestamp_query.queries[i].offset = offset;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
@@ -550,7 +550,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
@@ -611,14 +611,14 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
for (i = 0; i < copy.count; i++) {
u32 offset, sync;
 
-   if (copy_from_user(, offsets++, sizeof(offset))) {
+   if (get_user(offset, offsets++)) {
err = -EFAULT;
goto error;
}
 
job->timestamp_query.queries[i].offset = offset;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
-- 
2.44.0



[PATCH 03/11] drm/v3d: Fix potential memory leak in the performance extension

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
 drivers/gpu/drm/v3d/v3d_sched.c  | 22 ++
 drivers/gpu/drm/v3d/v3d_submit.c | 52 
 3 files changed, 50 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index e208ffdfba32..dd3ead4cb8bd 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 /* v3d_sched.c */
 void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info,
   unsigned int count);
+void v3d_performance_query_info_free(struct v3d_performance_query_info 
*query_info,
+unsigned int count);
 void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 59dc0287dab9..5fbbee47c6b7 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -87,20 +87,30 @@ v3d_timestamp_query_info_free(struct 
v3d_timestamp_query_info *query_info,
}
 }
 
+void
+v3d_performance_query_info_free(struct v3d_performance_query_info *query_info,
+   unsigned int count)
+{
+   if (query_info->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(query_info->queries[i].syncobj);
+
+   kvfree(query_info->queries);
+   }
+}
+
 static void
 v3d_cpu_job_free(struct drm_sched_job *sched_job)
 {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
 
v3d_timestamp_query_info_free(>timestamp_query,
  job->timestamp_query.count);
 
-   if (performance_query->queries) {
-   for (int i = 0; i < performance_query->count; i++)
-   drm_syncobj_put(performance_query->queries[i].syncobj);
-   kvfree(performance_query->queries);
-   }
+   v3d_performance_query_info_free(>performance_query,
+   job->performance_query.count);
 
v3d_job_cleanup(>base);
 }
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 121bf1314b80..50be4e8a7512 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -640,6 +640,8 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u32 __user *syncs;
u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
+   unsigned int i, j;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -668,39 +670,43 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
syncs = u64_to_user_ptr(reset.syncs);
kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
 
-   for (int i = 0; i < reset.count; i++) {
+   for (i = 0; i < reset.count; i++) {
u32 sync;
u64 ids;
u32 __user *ids_pointer;
u32 id;
 
if (copy_from_user(, syncs++, sizeof(sync))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
-   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-
if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
ids_pointer = u64_to_user_ptr(ids);
 
-   for (int j = 0; j < reset.nperfmons; j++) {
+   for (j = 0; j < reset.nperfmons; j++) {
if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
job-&g

[PATCH 02/11] drm/v3d: Fix potential memory leak in the timestamp extension

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp 
query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
 drivers/gpu/drm/v3d/v3d_sched.c  | 22 +++-
 drivers/gpu/drm/v3d/v3d_submit.c | 43 ++--
 3 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 099b962bdfde..e208ffdfba32 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 
 /* v3d_sched.c */
+void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info,
+  unsigned int count);
 void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 03df37a3acf5..59dc0287dab9 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job)
v3d_job_cleanup(job);
 }
 
+void
+v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info,
+ unsigned int count)
+{
+   if (query_info->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(query_info->queries[i].syncobj);
+
+   kvfree(query_info->queries);
+   }
+}
+
 static void
 v3d_cpu_job_free(struct drm_sched_job *sched_job)
 {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_timestamp_query_info *timestamp_query = 
>timestamp_query;
struct v3d_performance_query_info *performance_query = 
>performance_query;
 
-   if (timestamp_query->queries) {
-   for (int i = 0; i < timestamp_query->count; i++)
-   drm_syncobj_put(timestamp_query->queries[i].syncobj);
-   kvfree(timestamp_query->queries);
-   }
+   v3d_timestamp_query_info_free(>timestamp_query,
+ job->timestamp_query.count);
 
if (performance_query->queries) {
for (int i = 0; i < performance_query->count; i++)
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 263fefc1d04f..121bf1314b80 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,8 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   unsigned int i;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -480,19 +482,19 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
offsets = u64_to_user_ptr(timestamp.offsets);
syncs = u64_to_user_ptr(timestamp.syncs);
 
-   for (int i = 0; i < timestamp.count; i++) {
+   for (i = 0; i < timestamp.count; i++) {
u32 offset, sync;
 
if (copy_from_user(, offsets++, sizeof(offset))) {
-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
job->timestamp_query.queries[i].offset = offset;
 
if (copy_from_user(, syncs++, sizeof(sync))) {
-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
@@ -500,6 +502,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
job->timestamp_query.count = timestamp.count;
 
return 0;
+
+error:
+   v3d_timestamp_query_info_free(>timestamp_query, i);
+   return err;
 }
 
 static int
@@ -509,6 +515,8 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   unsigned int i;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -533,14 +541,14 @@ v3d_get_cpu_reset_timestamp_params(st

[PATCH 01/11] drm/v3d: Prevent out of bounds access in performance query extensions

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Check that the number of perfmons userspace is passing in the copy and
reset extensions is not greater than the internal kernel storage where
the ids will be copied into.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
Reviewed-by: Iago Toral Quiroga 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 88f63d526b22..263fefc1d04f 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
 
+   if (reset.nperfmons > V3D_MAX_PERFMONS)
+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(reset.count,
@@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
 
+   if (copy.nperfmons > V3D_MAX_PERFMONS)
+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(copy.count,
-- 
2.44.0



[PATCH v4 00/11] v3d: Perfmon cleanup

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

When we had to quickly deal with a tree build issue via merging
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we
promised to follow up with a nicer solution.

As in the process of eliminating the hardcoded defines we have discovered a few
issues in handling of corner cases and userspace input validation, the fix has
turned into a larger series, but hopefully the end result is a justifiable
cleanup.

v2:
 * Re-order the patches so fixes come first while last three are optional
   cleanups.

v3:
 * Fixed a bunch of rebase errors I made when re-ordering patches from v1 to v2.
 * Dropped the double underscore from __v3d_timestamp_query_info_free.
 * Added v3d prefix to v3d_copy_query_info.
 * Renamed qinfo to query_info.
 * Fixed some spelling errors and bad patch references.
 * Added mention to get_user to one commit message.
 * Dropped one patch from the series which became redundant due other
   re-ordering.
 * Re-ordered last two patches with the view of dropping the last.

v4:
 * Fixed more rebase errors and details in commit messages.

 Cc: Maíra Canal 

Tvrtko Ursulin (11):
  drm/v3d: Prevent out of bounds access in performance query extensions
  drm/v3d: Fix potential memory leak in the timestamp extension
  drm/v3d: Fix potential memory leak in the performance extension
  drm/v3d: Validate passed in drm syncobj handles in the timestamp
extension
  drm/v3d: Validate passed in drm syncobj handles in the performance
extension
  drm/v3d: Move part of copying of reset/copy performance extension to a
helper
  drm/v3d: Size the kperfmon_ids array at runtime
  drm/v3d: Do not use intermediate storage when copying performance
query results
  drm/v3d: Move perfmon init completely into own unit
  drm/v3d: Prefer get_user for scalar types
  drm/v3d: Add some local variables in queries/extensions

 drivers/gpu/drm/v3d/v3d_drv.c |   9 +-
 drivers/gpu/drm/v3d/v3d_drv.h |  16 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c |  44 +--
 .../gpu/drm/v3d/v3d_performance_counters.h|  16 +-
 drivers/gpu/drm/v3d/v3d_sched.c   | 105 +--
 drivers/gpu/drm/v3d/v3d_submit.c  | 294 +++---
 6 files changed, 290 insertions(+), 194 deletions(-)

-- 
2.44.0



Re: [PATCH 03/11] drm/v3d: Fix potential memory leak in the performance extension

2024-07-11 Thread Tvrtko Ursulin



On 11/07/2024 14:00, Maíra Canal wrote:

On 7/11/24 06:15, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the 
reset performance query job")

Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_drv.h    |  2 ++
  drivers/gpu/drm/v3d/v3d_sched.c  | 22 ++
  drivers/gpu/drm/v3d/v3d_submit.c | 50 
  3 files changed, 49 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h 
b/drivers/gpu/drm/v3d/v3d_drv.h

index e208ffdfba32..dd3ead4cb8bd 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo);
  /* v3d_sched.c */
  void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info 
*query_info,

 unsigned int count);
+void v3d_performance_query_info_free(struct 
v3d_performance_query_info *query_info,

+ unsigned int count);
  void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
  int v3d_sched_init(struct v3d_dev *v3d);
  void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c 
b/drivers/gpu/drm/v3d/v3d_sched.c

index 59dc0287dab9..5fbbee47c6b7 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -87,20 +87,30 @@ v3d_timestamp_query_info_free(struct 
v3d_timestamp_query_info *query_info,

  }
  }
+void
+v3d_performance_query_info_free(struct v3d_performance_query_info 
*query_info,

+    unsigned int count)
+{
+    if (query_info->queries) {
+    unsigned int i;
+
+    for (i = 0; i < count; i++)
+    drm_syncobj_put(query_info->queries[i].syncobj);
+
+    kvfree(query_info->queries);
+    }
+}
+
  static void
  v3d_cpu_job_free(struct drm_sched_job *sched_job)
  {
  struct v3d_cpu_job *job = to_cpu_job(sched_job);
-    struct v3d_performance_query_info *performance_query = 
>performance_query;

  v3d_timestamp_query_info_free(>timestamp_query,
    job->timestamp_query.count);
-    if (performance_query->queries) {
-    for (int i = 0; i < performance_query->count; i++)
-    drm_syncobj_put(performance_query->queries[i].syncobj);
-    kvfree(performance_query->queries);
-    }
+    v3d_performance_query_info_free(>performance_query,
+    job->performance_query.count);
  v3d_job_cleanup(>base);
  }
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c 
b/drivers/gpu/drm/v3d/v3d_submit.c

index 121bf1314b80..d626c8539b04 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -640,6 +640,8 @@ v3d_get_cpu_reset_performance_params(struct 
drm_file *file_priv,

  u32 __user *syncs;
  u64 __user *kperfmon_ids;
  struct drm_v3d_reset_performance_query reset;
+    unsigned int i, j;
+    int err;
  if (!job) {
  DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -668,39 +670,43 @@ v3d_get_cpu_reset_performance_params(struct 
drm_file *file_priv,

  syncs = u64_to_user_ptr(reset.syncs);
  kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
-    for (int i = 0; i < reset.count; i++) {
+    for (i = 0; i < reset.count; i++) {
  u32 sync;
  u64 ids;
  u32 __user *ids_pointer;
  u32 id;
  if (copy_from_user(, syncs++, sizeof(sync))) {
-    kvfree(job->performance_query.queries);
-    return -EFAULT;
+    err = -EFAULT;
+    goto error;
  }
-    job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);

-
  if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-    kvfree(job->performance_query.queries);
-    return -EFAULT;
+    err = -EFAULT;
+    goto error;
  }
  ids_pointer = u64_to_user_ptr(ids);
-    for (int j = 0; j < reset.nperfmons; j++) {
+    for (j = 0; j < reset.nperfmons; j++) {
  if (copy_from_user(, ids_pointer++, sizeof(id))) {
-    kvfree(job->performance_query.queries);
-    return -EFAULT;
+    err = -EFAULT;
+    goto error;
  }
  job->performance_query.queries[i].kperfmon_ids[j] = id;
  }
+
+    job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);

  }
  job->performance_query.count = reset.count;
  job->performance_query.nperfmons = reset.nperfmons;
  return 0;
+
+error:
+    v3d_performance_query_info_free(>performance_qu

Re: [PATCH 11/12] drm/v3d: Do not use intermediate storage when copying performance query results

2024-07-11 Thread Tvrtko Ursulin



On 11/07/2024 13:31, Iago Toral wrote:

El mar, 09-07-2024 a las 17:34 +0100, Tvrtko Ursulin escribió:

From: Tvrtko Ursulin 

Removing the intermediate buffer removes the last use of the
V3D_MAX_COUNTERS define, which will enable further driver cleanup.

While at it pull the 32 vs 64 bit copying decision outside the loop
in
order to reduce the number of conditional instructions.

Signed-off-by: Tvrtko Ursulin 
---
  drivers/gpu/drm/v3d/v3d_sched.c | 60 ---
--
  1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
b/drivers/gpu/drm/v3d/v3d_sched.c
index fc8730264386..77f795e38fad 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job
*job)
    v3d_put_bo_vaddr(bo);
  }
  
+static void write_to_buffer_32(u32 *dst, unsigned int idx, u32

value)
+{
+   dst[idx] = value;
+}
+
+static void write_to_buffer_64(u64 *dst, unsigned int idx, u64
value)
+{
+   dst[idx] = value;
+}
+
  static void
-write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value)
+write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64
value)
  {
-   if (do_64bit) {
-   u64 *dst64 = (u64 *)dst;
-
-   dst64[idx] = value;
-   } else {
-   u32 *dst32 = (u32 *)dst;
-
-   dst32[idx] = (u32)value;
-   }
+   if (do_64bit)
+   write_to_buffer_64(dst, idx, value);
+   else
+   write_to_buffer_32(dst, idx, value);
  }
  
  static void

@@ -505,18 +510,23 @@ v3d_reset_performance_queries(struct
v3d_cpu_job *job)
  }
  
  static void

-v3d_write_performance_query_result(struct v3d_cpu_job *job, void
*data, u32 query)
+v3d_write_performance_query_result(struct v3d_cpu_job *job, void
*data,
+      unsigned int query)
  {
-   struct v3d_performance_query_info *performance_query = 

performance_query;

-   struct v3d_copy_query_results_info *copy = >copy;
+   struct v3d_performance_query_info *performance_query =
+   

performance_query;

    struct v3d_file_priv *v3d_priv = job->base.file-

driver_priv;

    struct v3d_dev *v3d = job->base.v3d;
-   struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_MAX_COUNTERS];
+   unsigned int i, j, offset;
  
-	for (int i = 0; i < performance_query->nperfmons; i++) {

-   perfmon = v3d_perfmon_find(v3d_priv,
-      performance_query-

queries[query].kperfmon_ids[i]);

+   for (i = 0, offset = 0;
+    i < performance_query->nperfmons;
+    i++, offset += DRM_V3D_MAX_PERF_COUNTERS) {
+   struct v3d_performance_query *q =
+   _query->queries[query];


Looks like we could move this before the loop.


Indeed! I will change it and re-send, either for v4 of the series, or 
single update if there will not be any other changes required.



Otherwise this patch is:
Reviewed-by: Iago Toral Quiroga 


Thanks!

Regards,

Tvrtko


+   struct v3d_perfmon *perfmon;
+
+   perfmon = v3d_perfmon_find(v3d_priv, q-

kperfmon_ids[i]);

    if (!perfmon) {
    DRM_DEBUG("Failed to find perfmon.");
    continue;
@@ -524,14 +534,18 @@ v3d_write_performance_query_result(struct
v3d_cpu_job *job, void *data, u32 quer
  
  		v3d_perfmon_stop(v3d, perfmon, true);
  
-		memcpy(_values[i *

DRM_V3D_MAX_PERF_COUNTERS], perfmon->values,
-      perfmon->ncounters * sizeof(u64));
+   if (job->copy.do_64bit) {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_64(data, offset + j,
+      perfmon-

values[j]);

+   } else {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_32(data, offset + j,
+      perfmon-

values[j]);

+   }
  
  		v3d_perfmon_put(perfmon);

    }
-
-   for (int i = 0; i < performance_query->ncounters; i++)
-   write_to_buffer(data, i, copy->do_64bit,
counter_values[i]);
  }
  
  static void




Re: [PATCH] drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8

2024-07-11 Thread Tvrtko Ursulin



On 11/07/2024 06:12, Nitin Gote wrote:

We're seeing a GPU HANG issue on a CHV platform, which was caused by
bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for 
gen8").

Gen8 platform has only timeslice and doesn't support a preemption mechanism
as engines do not have a preemption timer and doesn't send an irq if the
preemption timeout expires. So, add a fix to not consider preemption
during dequeuing for gen8 platforms.

Also move can_preemt() above need_preempt() function to resolve implicit
declaration of function ‘can_preempt' error and make can_preempt()
function param as const to resolve error: passing argument 1 of
‘can_preempt’ discards ‘const’ qualifier from the pointer target type.

v2: Simplify can_preemt() function (Tvrtko Ursulin)


Yeah sorry for that yesterday when I thought gen8 emit bb was dead code, 
somehow I thought there was a gen9 emit_bb flavour. Looks like I 
confused it with something else.




Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for 
gen8")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396
Suggested-by: Andi Shyti 
Signed-off-by: Nitin Gote 
Cc: Chris Wilson 
CC:  # v5.2+
---
  .../drm/i915/gt/intel_execlists_submission.c| 17 -
  1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 21829439e686..59885d7721e4 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -294,11 +294,19 @@ static int virtual_prio(const struct 
intel_engine_execlists *el)
return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN;
  }
  
+static bool can_preempt(const struct intel_engine_cs *engine)

+{
+   return GRAPHICS_VER(engine->i915) > 8;
+}
+
  static bool need_preempt(const struct intel_engine_cs *engine,
 const struct i915_request *rq)
  {
int last_prio;
  
+	if (!can_preempt(engine))

+   return false;
+
if (!intel_engine_has_semaphores(engine))


Patch looks clean now. Hmmm one new observation is whether the "has 
semaphores" check is now redundant? Looks preemption depends on 
semaphore support in logical_ring_default_vfuncs().


Regards,

Tvrtko


return false;
  
@@ -3313,15 +3321,6 @@ static void remove_from_engine(struct i915_request *rq)

i915_request_notify_execute_cb_imm(rq);
  }
  
-static bool can_preempt(struct intel_engine_cs *engine)

-{
-   if (GRAPHICS_VER(engine->i915) > 8)
-   return true;
-
-   /* GPGPU on bdw requires extra w/a; not implemented */
-   return engine->class != RENDER_CLASS;
-}
-
  static void kick_execlists(const struct i915_request *rq, int prio)
  {
struct intel_engine_cs *engine = rq->engine;


[PATCH 10/11] drm/v3d: Prefer get_user for scalar types

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

It makes it just a tiny bit more obvious what is going on.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index d1060e60aafa..b282d12571b5 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -485,14 +485,14 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
for (i = 0; i < timestamp.count; i++) {
u32 offset, sync;
 
-   if (copy_from_user(, offsets++, sizeof(offset))) {
+   if (get_user(offset, offsets++)) {
err = -EFAULT;
goto error;
}
 
job->timestamp_query.queries[i].offset = offset;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
@@ -550,7 +550,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
@@ -611,14 +611,14 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
for (i = 0; i < copy.count; i++) {
u32 offset, sync;
 
-   if (copy_from_user(, offsets++, sizeof(offset))) {
+   if (get_user(offset, offsets++)) {
err = -EFAULT;
goto error;
}
 
job->timestamp_query.queries[i].offset = offset;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
-- 
2.44.0



[PATCH 11/11] drm/v3d: Add some local variables in queries/extensions

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Add some local variables to make the code a bit less verbose, with the
main benefit being pulling some lines to under 80 columns wide.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 88 ++--
 1 file changed, 49 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index b282d12571b5..d607aa9c4ec2 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   struct v3d_timestamp_query_info *query_info = >timestamp_query;
unsigned int i;
int err;
 
@@ -473,10 +474,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
 
-   job->timestamp_query.queries = kvmalloc_array(timestamp.count,
- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   query_info->queries = kvmalloc_array(timestamp.count,
+sizeof(struct v3d_timestamp_query),
+GFP_KERNEL);
+   if (!query_info->queries)
return -ENOMEM;
 
offsets = u64_to_user_ptr(timestamp.offsets);
@@ -490,20 +491,21 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
goto error;
}
 
-   job->timestamp_query.queries[i].offset = offset;
+   query_info->queries[i].offset = offset;
 
if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
 
-   job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->timestamp_query.queries[i].syncobj) {
+   query_info->queries[i].syncobj = drm_syncobj_find(file_priv,
+ sync);
+   if (!query_info->queries[i].syncobj) {
err = -ENOENT;
goto error;
}
}
-   job->timestamp_query.count = timestamp.count;
+   query_info->count = timestamp.count;
 
return 0;
 
@@ -519,6 +521,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   struct v3d_timestamp_query_info *query_info = >timestamp_query;
unsigned int i;
int err;
 
@@ -537,10 +540,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY;
 
-   job->timestamp_query.queries = kvmalloc_array(reset.count,
- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   query_info->queries = kvmalloc_array(reset.count,
+sizeof(struct v3d_timestamp_query),
+GFP_KERNEL);
+   if (!query_info->queries)
return -ENOMEM;
 
syncs = u64_to_user_ptr(reset.syncs);
@@ -548,20 +551,21 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
for (i = 0; i < reset.count; i++) {
u32 sync;
 
-   job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
+   query_info->queries[i].offset = reset.offset + 8 * i;
 
if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
 
-   job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->timestamp_query.queries[i].syncobj) {
+   query_info->queries[i].syncobj = drm_syncobj_find(file_priv,
+ sync);
+   if (!query_info->queries[i].syncobj) {
err = -ENOENT;
goto error;
}
}
-   job->timestamp_query.count = reset.count;
+   query_info->count = reset.count;
 
return 0;
 
@@ -578,6 +582,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_copy_timestamp_query copy;
+   struct v3d_timestamp_query_info *query_info = >timestamp_query;
unsigned int i;
int err;

[PATCH 06/11] drm/v3d: Move part of copying of reset/copy performance extension to a helper

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

The loop which looks up the syncobj and copies the kperfmon ids is
identical so lets move it to a helper.

The only change is replacing copy_from_user with get_user when copying a
scalar.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 152 ++-
 1 file changed, 68 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 3838ebade45d..ce56e31a027d 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -644,15 +644,64 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
return err;
 }
 
+static int
+v3d_copy_query_info(struct v3d_performance_query_info *query_info,
+   unsigned int count,
+   unsigned int nperfmons,
+   u32 __user *syncs,
+   u64 __user *kperfmon_ids,
+   struct drm_file *file_priv)
+{
+   unsigned int i, j;
+   int err;
+
+   for (i = 0; i < count; i++) {
+   struct v3d_performance_query *query = _info->queries[i];
+   u32 __user *ids_pointer;
+   u32 sync, id;
+   u64 ids;
+
+   if (get_user(sync, syncs++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   if (get_user(ids, kperfmon_ids++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   ids_pointer = u64_to_user_ptr(ids);
+
+   for (j = 0; j < nperfmons; j++) {
+   if (get_user(id, ids_pointer++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   query->kperfmon_ids[j] = id;
+   }
+
+   query->syncobj = drm_syncobj_find(file_priv, sync);
+   if (!query->syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
+   }
+
+   return 0;
+
+error:
+   v3d_performance_query_info_free(query_info, i);
+   return err;
+}
+
 static int
 v3d_get_cpu_reset_performance_params(struct drm_file *file_priv,
 struct drm_v3d_extension __user *ext,
 struct v3d_cpu_job *job)
 {
-   u32 __user *syncs;
-   u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
-   unsigned int i, j;
int err;
 
if (!job) {
@@ -679,50 +728,19 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (!job->performance_query.queries)
return -ENOMEM;
 
-   syncs = u64_to_user_ptr(reset.syncs);
-   kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
+   err = v3d_copy_query_info(>performance_query,
+ reset.count,
+ reset.nperfmons,
+ u64_to_user_ptr(reset.syncs),
+ u64_to_user_ptr(reset.kperfmon_ids),
+ file_priv);
+   if (err)
+   return err;
 
-   for (i = 0; i < reset.count; i++) {
-   u32 sync;
-   u64 ids;
-   u32 __user *ids_pointer;
-   u32 id;
-
-   if (copy_from_user(, syncs++, sizeof(sync))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   ids_pointer = u64_to_user_ptr(ids);
-
-   for (j = 0; j < reset.nperfmons; j++) {
-   if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   job->performance_query.queries[i].kperfmon_ids[j] = id;
-   }
-
-   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->performance_query.queries[i].syncobj) {
-   err = -ENOENT;
-   goto error;
-   }
-   }
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
 
return 0;
-
-error:
-   v3d_performance_query_info_free(>performance_query, i);
-   return err;
 }
 
 static int
@@ -730,10 +748,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
  struct drm_v3d_extension __user *ext,
  struct v3d_cpu_job *job)
 {
-   u32 __user *syncs;

[PATCH 09/11] drm/v3d: Move perfmon init completely into own unit

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Now that the build time dependencies on various array sizes have been
removed, we can move the perfmon init completely into its own compilation
unit and remove the hardcoded defines.

This improves on the temporary fix quickly delivered in
9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning").

Signed-off-by: Tvrtko Ursulin 
References: 9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning")
---
 drivers/gpu/drm/v3d/v3d_drv.c |  9 +---
 drivers/gpu/drm/v3d/v3d_drv.h |  6 +--
 drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++
 .../gpu/drm/v3d/v3d_performance_counters.h| 16 ---
 4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index a47f00b443d3..491c638a4d74 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
args->value = 1;
return 0;
case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
-   args->value = v3d->max_counters;
+   args->value = v3d->perfmon_info.max_counters;
return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
@@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
-   if (v3d->ver >= 71)
-   v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS;
-   else if (v3d->ver >= 42)
-   v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS;
-   else
-   v3d->max_counters = 0;
+   v3d_perfmon_init(v3d);
 
v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
if (IS_ERR(v3d->reset)) {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index b1dfec49ba7d..8524761bc62d 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,10 +104,7 @@ struct v3d_dev {
int ver;
bool single_irq_line;
 
-   /* Different revisions of V3D have different total number of performance
-* counters
-*/
-   unsigned int max_counters;
+   struct v3d_perfmon_info perfmon_info;
 
void __iomem *hub_regs;
void __iomem *core_regs[3];
@@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
 
 /* v3d_perfmon.c */
+void v3d_perfmon_init(struct v3d_dev *v3d);
 void v3d_perfmon_get(struct v3d_perfmon *perfmon);
 void v3d_perfmon_put(struct v3d_perfmon *perfmon);
 void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index b7d0b02e1a95..cd7f1eedf17f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc 
v3d_v71_performance_counters[] = {
{"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any 
other reason (vary/W/Z)"},
 };
 
+void v3d_perfmon_init(struct v3d_dev *v3d)
+{
+   const struct v3d_perf_counter_desc *counters = NULL;
+   unsigned int max = 0;
+
+   if (v3d->ver >= 71) {
+   counters = v3d_v71_performance_counters;
+   max = ARRAY_SIZE(v3d_v71_performance_counters);
+   } else if (v3d->ver >= 42) {
+   counters = v3d_v42_performance_counters;
+   max = ARRAY_SIZE(v3d_v42_performance_counters);
+   }
+
+   v3d->perfmon_info.max_counters = max;
+   v3d->perfmon_info.counters = counters;
+}
+
 void v3d_perfmon_get(struct v3d_perfmon *perfmon)
 {
if (perfmon)
@@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 
/* Make sure all counters are valid. */
for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= v3d->max_counters)
+   if (req->counters[i] >= v3d->perfmon_info.max_counters)
return -EINVAL;
}
 
@@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, 
void *data,
return -EINVAL;
}
 
-   /* Make sure that the counter ID is valid */
-   if (req->counter >= v3d->max_counters)
-   return -EINVAL;
-
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) !=
-V3D_V42_NUM_PERFCOUNTERS);
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v71_performance_counters) !=
-V3D_V71_NUM_PERFCOUNTERS);
-   BUILD_BUG_ON(V3D_MAX_COUNTERS < V3D_V42_NUM_PERFCOUNTERS);
-   BUILD_BUG_ON(V3D_MAX_COUNTER

[PATCH 08/11] drm/v3d: Do not use intermediate storage when copying performance query results

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Removing the intermediate buffer removes the last use of the
V3D_MAX_COUNTERS define, which will enable further driver cleanup.

While at it pull the 32 vs 64 bit copying decision outside the loop in
order to reduce the number of conditional instructions.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 60 -
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 7b2195ba4248..2564467735fc 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job)
v3d_put_bo_vaddr(bo);
 }
 
+static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value)
+{
+   dst[idx] = value;
+}
+
+static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value)
+{
+   dst[idx] = value;
+}
+
 static void
-write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value)
+write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value)
 {
-   if (do_64bit) {
-   u64 *dst64 = (u64 *)dst;
-
-   dst64[idx] = value;
-   } else {
-   u32 *dst32 = (u32 *)dst;
-
-   dst32[idx] = (u32)value;
-   }
+   if (do_64bit)
+   write_to_buffer_64(dst, idx, value);
+   else
+   write_to_buffer_32(dst, idx, value);
 }
 
 static void
@@ -505,18 +510,23 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job)
 }
 
 static void
-v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 
query)
+v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data,
+  unsigned int query)
 {
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
-   struct v3d_copy_query_results_info *copy = >copy;
+   struct v3d_performance_query_info *performance_query =
+   >performance_query;
struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
struct v3d_dev *v3d = job->base.v3d;
-   struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_MAX_COUNTERS];
+   unsigned int i, j, offset;
 
-   for (int i = 0; i < performance_query->nperfmons; i++) {
-   perfmon = v3d_perfmon_find(v3d_priv,
-  
performance_query->queries[query].kperfmon_ids[i]);
+   for (i = 0, offset = 0;
+i < performance_query->nperfmons;
+i++, offset += DRM_V3D_MAX_PERF_COUNTERS) {
+   struct v3d_performance_query *q =
+   _query->queries[query];
+   struct v3d_perfmon *perfmon;
+
+   perfmon = v3d_perfmon_find(v3d_priv, q->kperfmon_ids[i]);
if (!perfmon) {
DRM_DEBUG("Failed to find perfmon.");
continue;
@@ -524,14 +534,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job 
*job, void *data, u32 quer
 
v3d_perfmon_stop(v3d, perfmon, true);
 
-   memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], 
perfmon->values,
-  perfmon->ncounters * sizeof(u64));
+   if (job->copy.do_64bit) {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_64(data, offset + j,
+  perfmon->values[j]);
+   } else {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_32(data, offset + j,
+  perfmon->values[j]);
+   }
 
v3d_perfmon_put(perfmon);
}
-
-   for (int i = 0; i < performance_query->ncounters; i++)
-   write_to_buffer(data, i, copy->do_64bit, counter_values[i]);
 }
 
 static void
-- 
2.44.0



[PATCH 07/11] drm/v3d: Size the kperfmon_ids array at runtime

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Instead of statically reserving pessimistic space for the kperfmon_ids
array, make the userspace extension code allocate the exactly required
amount of space.

Apart from saving some memory at runtime, this also removes the need for
the V3D_MAX_PERFMONS macro whose removal will benefit further driver
cleanup.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_drv.h|  6 +-
 drivers/gpu/drm/v3d/v3d_sched.c  |  4 +++-
 drivers/gpu/drm/v3d/v3d_submit.c | 17 +++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index dd3ead4cb8bd..b1dfec49ba7d 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,13 +351,9 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
 };
 
-/* Number of perfmons required to handle all supported performance counters */
-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
- DRM_V3D_MAX_PERF_COUNTERS)
-
 struct v3d_performance_query {
/* Performance monitor IDs for this query */
-   u32 kperfmon_ids[V3D_MAX_PERFMONS];
+   u32 *kperfmon_ids;
 
/* Syncobj that indicates the query availability */
struct drm_syncobj *syncobj;
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 5fbbee47c6b7..7b2195ba4248 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -94,8 +94,10 @@ v3d_performance_query_info_free(struct 
v3d_performance_query_info *query_info,
if (query_info->queries) {
unsigned int i;
 
-   for (i = 0; i < count; i++)
+   for (i = 0; i < count; i++) {
drm_syncobj_put(query_info->queries[i].syncobj);
+   kvfree(query_info->queries[i].kperfmon_ids);
+   }
 
kvfree(query_info->queries);
}
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index ce56e31a027d..d1060e60aafa 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -671,10 +671,20 @@ v3d_copy_query_info(struct v3d_performance_query_info 
*query_info,
goto error;
}
 
+   query->kperfmon_ids =
+   kvmalloc_array(nperfmons,
+  sizeof(struct v3d_performance_query *),
+  GFP_KERNEL);
+   if (!query->kperfmon_ids) {
+   err = -ENOMEM;
+   goto error;
+   }
+
ids_pointer = u64_to_user_ptr(ids);
 
for (j = 0; j < nperfmons; j++) {
if (get_user(id, ids_pointer++)) {
+   kvfree(query->kperfmon_ids);
err = -EFAULT;
goto error;
}
@@ -684,6 +694,7 @@ v3d_copy_query_info(struct v3d_performance_query_info 
*query_info,
 
query->syncobj = drm_syncobj_find(file_priv, sync);
if (!query->syncobj) {
+   kvfree(query->kperfmon_ids);
err = -ENOENT;
goto error;
}
@@ -717,9 +728,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
 
-   if (reset.nperfmons > V3D_MAX_PERFMONS)
-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(reset.count,
@@ -767,9 +775,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
 
-   if (copy.nperfmons > V3D_MAX_PERFMONS)
-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(copy.count,
-- 
2.44.0



[PATCH v3 00/11] v3d: Perfmon cleanup

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

When we had to quickly deal with a tree build issue via merging
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we
promised to follow up with a nicer solution.

As in the process of eliminating the hardcoded defines we have discovered a few
issues in handling of corner cases and userspace input validation, the fix has
turned into a larger series, but hopefully the end result is a justifiable
cleanup.

v2:
 * Re-order the patches so fixes come first while last three are optional
   cleanups.

v3:
 * Fixed a bunch of rebase errors I made when re-ordering patches from v1 to v2.
 * Dropped the double underscore from __v3d_timestamp_query_info_free.
 * Added v3d prefix to v3d_copy_query_info.
 * Renamed qinfo to query_info.
 * Fixed some spelling errors and bad patch references.
 * Added mention to get_user to one commit message.
 * Dropped one patch from the series which became redundant due other
   re-ordering.
 * Re-ordered last two patches with the view of dropping the last.

 Cc: Maíra Canal 

Tvrtko Ursulin (11):
  drm/v3d: Prevent out of bounds access in performance query extensions
  drm/v3d: Fix potential memory leak in the timestamp extension
  drm/v3d: Fix potential memory leak in the performance extension
  drm/v3d: Validate passed in drm syncobj handles in the timestamp
extension
  drm/v3d: Validate passed in drm syncobj handles in the performance
extension
  drm/v3d: Move part of copying of reset/copy performance extension to a
helper
  drm/v3d: Size the kperfmon_ids array at runtime
  drm/v3d: Do not use intermediate storage when copying performance
query results
  drm/v3d: Move perfmon init completely into own unit
  drm/v3d: Prefer get_user for scalar types
  drm/v3d: Add some local variables in queries/extensions

 drivers/gpu/drm/v3d/v3d_drv.c |   9 +-
 drivers/gpu/drm/v3d/v3d_drv.h |  16 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c |  44 +--
 .../gpu/drm/v3d/v3d_performance_counters.h|  16 +-
 drivers/gpu/drm/v3d/v3d_sched.c   | 106 ---
 drivers/gpu/drm/v3d/v3d_submit.c  | 294 +++---
 6 files changed, 290 insertions(+), 195 deletions(-)

-- 
2.44.0



[PATCH 05/11] drm/v3d: Validate passed in drm syncobj handles in the performance extension

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfuly or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_submit.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index e3a00c8394a5..3838ebade45d 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -710,6 +710,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
}
 
job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
@@ -790,6 +794,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
}
 
job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->performance_query.count = copy.count;
job->performance_query.nperfmons = copy.nperfmons;
-- 
2.44.0



[PATCH 03/11] drm/v3d: Fix potential memory leak in the performance extension

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
 drivers/gpu/drm/v3d/v3d_sched.c  | 22 ++
 drivers/gpu/drm/v3d/v3d_submit.c | 50 
 3 files changed, 49 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index e208ffdfba32..dd3ead4cb8bd 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 /* v3d_sched.c */
 void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info,
   unsigned int count);
+void v3d_performance_query_info_free(struct v3d_performance_query_info 
*query_info,
+unsigned int count);
 void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 59dc0287dab9..5fbbee47c6b7 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -87,20 +87,30 @@ v3d_timestamp_query_info_free(struct 
v3d_timestamp_query_info *query_info,
}
 }
 
+void
+v3d_performance_query_info_free(struct v3d_performance_query_info *query_info,
+   unsigned int count)
+{
+   if (query_info->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(query_info->queries[i].syncobj);
+
+   kvfree(query_info->queries);
+   }
+}
+
 static void
 v3d_cpu_job_free(struct drm_sched_job *sched_job)
 {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
 
v3d_timestamp_query_info_free(>timestamp_query,
  job->timestamp_query.count);
 
-   if (performance_query->queries) {
-   for (int i = 0; i < performance_query->count; i++)
-   drm_syncobj_put(performance_query->queries[i].syncobj);
-   kvfree(performance_query->queries);
-   }
+   v3d_performance_query_info_free(>performance_query,
+   job->performance_query.count);
 
v3d_job_cleanup(>base);
 }
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 121bf1314b80..d626c8539b04 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -640,6 +640,8 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u32 __user *syncs;
u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
+   unsigned int i, j;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -668,39 +670,43 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
syncs = u64_to_user_ptr(reset.syncs);
kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
 
-   for (int i = 0; i < reset.count; i++) {
+   for (i = 0; i < reset.count; i++) {
u32 sync;
u64 ids;
u32 __user *ids_pointer;
u32 id;
 
if (copy_from_user(, syncs++, sizeof(sync))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
-   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-
if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
ids_pointer = u64_to_user_ptr(ids);
 
-   for (int j = 0; j < reset.nperfmons; j++) {
+   for (j = 0; j < reset.nperfmons; j++) {
if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
job-&g

[PATCH 02/11] drm/v3d: Fix potential memory leak in the timestamp extension

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp 
query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
 drivers/gpu/drm/v3d/v3d_sched.c  | 22 +++-
 drivers/gpu/drm/v3d/v3d_submit.c | 43 ++--
 3 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 099b962bdfde..e208ffdfba32 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 
 /* v3d_sched.c */
+void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info,
+  unsigned int count);
 void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 03df37a3acf5..59dc0287dab9 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job)
v3d_job_cleanup(job);
 }
 
+void
+v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info,
+ unsigned int count)
+{
+   if (query_info->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(query_info->queries[i].syncobj);
+
+   kvfree(query_info->queries);
+   }
+}
+
 static void
 v3d_cpu_job_free(struct drm_sched_job *sched_job)
 {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_timestamp_query_info *timestamp_query = 
>timestamp_query;
struct v3d_performance_query_info *performance_query = 
>performance_query;
 
-   if (timestamp_query->queries) {
-   for (int i = 0; i < timestamp_query->count; i++)
-   drm_syncobj_put(timestamp_query->queries[i].syncobj);
-   kvfree(timestamp_query->queries);
-   }
+   v3d_timestamp_query_info_free(>timestamp_query,
+ job->timestamp_query.count);
 
if (performance_query->queries) {
for (int i = 0; i < performance_query->count; i++)
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 263fefc1d04f..121bf1314b80 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,8 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   unsigned int i;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -480,19 +482,19 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
offsets = u64_to_user_ptr(timestamp.offsets);
syncs = u64_to_user_ptr(timestamp.syncs);
 
-   for (int i = 0; i < timestamp.count; i++) {
+   for (i = 0; i < timestamp.count; i++) {
u32 offset, sync;
 
if (copy_from_user(, offsets++, sizeof(offset))) {
-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
job->timestamp_query.queries[i].offset = offset;
 
if (copy_from_user(, syncs++, sizeof(sync))) {
-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
@@ -500,6 +502,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
job->timestamp_query.count = timestamp.count;
 
return 0;
+
+error:
+   v3d_timestamp_query_info_free(>timestamp_query, i);
+   return err;
 }
 
 static int
@@ -509,6 +515,8 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   unsigned int i;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -533,14 +541,14 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*f

[PATCH 04/11] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfully or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp 
query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_submit.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index d626c8539b04..e3a00c8394a5 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -498,6 +498,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = timestamp.count;
 
@@ -552,6 +556,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = reset.count;
 
@@ -616,6 +624,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = copy.count;
 
-- 
2.44.0



[PATCH 01/11] drm/v3d: Prevent out of bounds access in performance query extensions

2024-07-11 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Check that the number of perfmons userspace is passing in the copy and
reset extensions is not greater than the internal kernel storage where
the ids will be copied into.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
Reviewed-by: Iago Toral Quiroga 
Reviewed-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 88f63d526b22..263fefc1d04f 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
 
+   if (reset.nperfmons > V3D_MAX_PERFMONS)
+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(reset.count,
@@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
 
+   if (copy.nperfmons > V3D_MAX_PERFMONS)
+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(copy.count,
-- 
2.44.0



Re: [PATCH 11/12] drm/v3d: Add some local variables in queries/extensions

2024-07-11 Thread Tvrtko Ursulin



On 10/07/2024 18:43, Maíra Canal wrote:

On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Add some local variables to make the code a bit less verbose, with the
main benefit being pulling some lines to under 80 columns wide.

Signed-off-by: Tvrtko Ursulin 


I'd prefer `query_info`, but anyway:


Yeah it does look nicer - done throughout the series.

I also bumped this patch to be last in the series since I don't 
"believe" in it that much any more. We probably should drop it.


Regards,

Tvrtko


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/v3d/v3d_submit.c | 79 +---
  1 file changed, 42 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c 
b/drivers/gpu/drm/v3d/v3d_submit.c

index 34ecd844f16a..b0c2a8e9cb06 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,

  {
  u32 __user *offsets, *syncs;
  struct drm_v3d_timestamp_query timestamp;
+    struct v3d_timestamp_query_info *qinfo = >timestamp_query;
  unsigned int i;
  int err;
@@ -473,10 +474,10 @@ v3d_get_cpu_timestamp_query_params(struct 
drm_file *file_priv,

  job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
-    job->timestamp_query.queries = kvmalloc_array(timestamp.count,
-  sizeof(struct v3d_timestamp_query),
-  GFP_KERNEL);
-    if (!job->timestamp_query.queries)
+    qinfo->queries = kvmalloc_array(timestamp.count,
+    sizeof(struct v3d_timestamp_query),
+    GFP_KERNEL);
+    if (!qinfo->queries)
  return -ENOMEM;
  offsets = u64_to_user_ptr(timestamp.offsets);
@@ -490,20 +491,20 @@ v3d_get_cpu_timestamp_query_params(struct 
drm_file *file_priv,

  goto error;
  }
-    job->timestamp_query.queries[i].offset = offset;
+    qinfo->queries[i].offset = offset;
  if (copy_from_user(, syncs++, sizeof(sync))) {
  err = -EFAULT;
  goto error;
  }
-    job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);

-    if (!job->timestamp_query.queries[i].syncobj) {
+    qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+    if (!qinfo->queries[i].syncobj) {
  err = -ENOENT;
  goto error;
  }
  }
-    job->timestamp_query.count = timestamp.count;
+    qinfo->count = timestamp.count;
  return 0;
@@ -519,6 +520,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,

  {
  u32 __user *syncs;
  struct drm_v3d_reset_timestamp_query reset;
+    struct v3d_timestamp_query_info *qinfo = >timestamp_query;
  unsigned int i;
  int err;
@@ -537,10 +539,10 @@ v3d_get_cpu_reset_timestamp_params(struct 
drm_file *file_priv,

  job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY;
-    job->timestamp_query.queries = kvmalloc_array(reset.count,
-  sizeof(struct v3d_timestamp_query),
-  GFP_KERNEL);
-    if (!job->timestamp_query.queries)
+    qinfo->queries = kvmalloc_array(reset.count,
+    sizeof(struct v3d_timestamp_query),
+    GFP_KERNEL);
+    if (!qinfo->queries)
  return -ENOMEM;
  syncs = u64_to_user_ptr(reset.syncs);
@@ -548,20 +550,20 @@ v3d_get_cpu_reset_timestamp_params(struct 
drm_file *file_priv,

  for (i = 0; i < reset.count; i++) {
  u32 sync;
-    job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
+    qinfo->queries[i].offset = reset.offset + 8 * i;
  if (copy_from_user(, syncs++, sizeof(sync))) {
  err = -EFAULT;
  goto error;
  }
-    job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);

-    if (!job->timestamp_query.queries[i].syncobj) {
+    qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+    if (!qinfo->queries[i].syncobj) {
  err = -ENOENT;
  goto error;
  }
  }
-    job->timestamp_query.count = reset.count;
+    qinfo->count = reset.count;
  return 0;
@@ -578,6 +580,7 @@ v3d_get_cpu_copy_query_results_params(struct 
drm_file *file_priv,

  {
  u32 __user *offsets, *syncs;
  struct drm_v3d_copy_timestamp_query copy;
+    struct v3d_timestamp_query_info *qinfo = >timestamp_query;
  unsigned int i;
  int err;
@@ -599,10 +602,10 @@ v3d_get_cpu_copy_query_results_params(struct 
drm_file *file_priv,

  job->job_type = V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY;
-    job->timestamp_query.queries = kvmalloc_array(copy.count,
-  sizeof(struct v3d_timestamp_query),
-  GFP_KERNEL);
-    if (!

Re: [PATCH 09/12] drm/v3d: Move perfmon init completely into own unit

2024-07-11 Thread Tvrtko Ursulin



On 10/07/2024 18:38, Maíra Canal wrote:

On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Now that the build time dependencies on various array sizes have been
removed, we can move the perfmon init completely into its own compilation
unit and remove the hardcoded defines.

This improves on the temporary fix quickly delivered in
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit").


I believe you mean:

9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning")

Currently, it is reference the current patch.


Well that was a hillarious mistake, well spotted!

Regards,

Tvrtko


Apart from this fix, this is

Reviewed-by: Maíra Canal 

Best Regards,
- Maíra



Signed-off-by: Tvrtko Ursulin 
References: 792d16b5375d ("drm/v3d: Move perfmon init completely into 
own unit")

---
  drivers/gpu/drm/v3d/v3d_drv.c |  9 +---
  drivers/gpu/drm/v3d/v3d_drv.h |  6 +--
  drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++
  .../gpu/drm/v3d/v3d_performance_counters.h    | 16 ---
  4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c 
b/drivers/gpu/drm/v3d/v3d_drv.c

index a47f00b443d3..491c638a4d74 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device 
*dev, void *data,

  args->value = 1;
  return 0;
  case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
-    args->value = v3d->max_counters;
+    args->value = v3d->perfmon_info.max_counters;
  return 0;
  default:
  DRM_DEBUG("Unknown parameter %d\n", args->param);
@@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct 
platform_device *pdev)

  v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
  WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
-    if (v3d->ver >= 71)
-    v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS;
-    else if (v3d->ver >= 42)
-    v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS;
-    else
-    v3d->max_counters = 0;
+    v3d_perfmon_init(v3d);
  v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
  if (IS_ERR(v3d->reset)) {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h 
b/drivers/gpu/drm/v3d/v3d_drv.h

index 00fe5d993175..6d2d34cd135c 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,10 +104,7 @@ struct v3d_dev {
  int ver;
  bool single_irq_line;
-    /* Different revisions of V3D have different total number of 
performance

- * counters
- */
-    unsigned int max_counters;
+    struct v3d_perfmon_info perfmon_info;
  void __iomem *hub_regs;
  void __iomem *core_regs[3];
@@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d);
  void v3d_sched_fini(struct v3d_dev *v3d);
  /* v3d_perfmon.c */
+void v3d_perfmon_init(struct v3d_dev *v3d);
  void v3d_perfmon_get(struct v3d_perfmon *perfmon);
  void v3d_perfmon_put(struct v3d_perfmon *perfmon);
  void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon 
*perfmon);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c

index b7d0b02e1a95..cd7f1eedf17f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc 
v3d_v71_performance_counters[] = {
  {"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for 
any other reason (vary/W/Z)"},

  };
+void v3d_perfmon_init(struct v3d_dev *v3d)
+{
+    const struct v3d_perf_counter_desc *counters = NULL;
+    unsigned int max = 0;
+
+    if (v3d->ver >= 71) {
+    counters = v3d_v71_performance_counters;
+    max = ARRAY_SIZE(v3d_v71_performance_counters);
+    } else if (v3d->ver >= 42) {
+    counters = v3d_v42_performance_counters;
+    max = ARRAY_SIZE(v3d_v42_performance_counters);
+    }
+
+    v3d->perfmon_info.max_counters = max;
+    v3d->perfmon_info.counters = counters;
+}
+
  void v3d_perfmon_get(struct v3d_perfmon *perfmon)
  {
  if (perfmon)
@@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device 
*dev, void *data,

  /* Make sure all counters are valid. */
  for (i = 0; i < req->ncounters; i++) {
-    if (req->counters[i] >= v3d->max_counters)
+    if (req->counters[i] >= v3d->perfmon_info.max_counters)
  return -EINVAL;
  }
@@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct 
drm_device *dev, void *data,

  return -EINVAL;
  }
-    /* Make sure that the counter ID is valid */
-    if (req->counter >= v3d->max_counters)
-    return -EINVAL;
-
-    BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) !=
- V3D_V42_NUM_PERFCOUNTERS);
-    BUILD_BUG_ON(ARRAY_SI

Re: [PATCH 04/12] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension

2024-07-11 Thread Tvrtko Ursulin



On 10/07/2024 18:06, Maíra Canal wrote:

On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfuly or otherwise fail the


I believe you mean "Fix it by checking if the handle..."

Also s/successfuly/successfully


Oops, thank you!




extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the 
timestamp query job")

Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_submit.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c 
b/drivers/gpu/drm/v3d/v3d_submit.c

index ca1b1ad0a75c..3313423080e7 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -497,6 +497,10 @@ v3d_get_cpu_timestamp_query_params(struct 
drm_file *file_priv,

  }
  job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);

+    if (!job->timestamp_query.queries[i].syncobj) {
+    err = -ENOENT;


I'm not sure if err should be -ENOENT or -EINVAL, but based on other 
drivers, I believe it should be -EINVAL.


After a quick grep I am inclined to think ENOENT is correct. DRM core 
uses that, and drivers seem generally confused (split between ENOENT and 
EINVAL). With one even going for ENODEV!


Regards,

Tvrtko

+    goto error;
+    }
  }
  job->timestamp_query.count = timestamp.count;
@@ -550,6 +554,10 @@ v3d_get_cpu_reset_timestamp_params(struct 
drm_file *file_priv,

  }
  job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);

+    if (!job->timestamp_query.queries[i].syncobj) {
+    err = -ENOENT;
+    goto error;
+    }
  }
  job->timestamp_query.count = reset.count;
@@ -613,6 +621,10 @@ v3d_get_cpu_copy_query_results_params(struct 
drm_file *file_priv,

  }
  job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);

+    if (!job->timestamp_query.queries[i].syncobj) {
+    err = -ENOENT;
+    goto error;
+    }
  }
  job->timestamp_query.count = copy.count;


Re: [PATCH 01/12] drm/v3d: Prevent out of bounds access in performance query extensions

2024-07-10 Thread Tvrtko Ursulin



On 10/07/2024 14:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Check that the number of perfmons userspace is passing in the copy and
reset extensions is not greater than the internal kernel storage where
the ids will be copied into.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance 
query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+


On this one I forgot to carry over from v1:

Reviewed-by: Iago Toral Quiroga 

Regards,

Tvrtko


---
  drivers/gpu/drm/v3d/v3d_submit.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 88f63d526b22..263fefc1d04f 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
  
+	if (reset.nperfmons > V3D_MAX_PERFMONS)

+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
  
  	job->performance_query.queries = kvmalloc_array(reset.count,

@@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
  
+	if (copy.nperfmons > V3D_MAX_PERFMONS)

+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
  
  	job->performance_query.queries = kvmalloc_array(copy.count,


Re: [PATCH 00/12] v3d: Perfmon cleanup

2024-07-10 Thread Tvrtko Ursulin



Hi Iago,

On 10/07/2024 07:06, Iago Toral wrote:

El mar, 09-07-2024 a las 17:34 +0100, Tvrtko Ursulin escribió:

From: Tvrtko Ursulin 

When we had to quickly deal with a tree build issue via merging
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"),
we
promised to follow up with a nicer solution.

As in the process of eliminating the hardcoded defines we have
discovered a few
issues in handling of corner cases and userspace input validation,
the fix has
turned into a larger series, but hopefully the end result is a
justifiable
cleanup.



Thanks for going the extra mile with this :)

Patches 1 and 5-8 are:
Reviewed-by: Iago Toral Quiroga 


Thank you!

Unfortunately I had to re-order the patches in the series so fixes come 
first, and as that caused a lot of churn in each patch I did not apply 
your r-b's when re-sending.


Hmmm actually I should have for the first patch, that one is unchanged. 
I will fix that one.


Regards,

Tvrtko


Tvrtko Ursulin (12):
   drm/v3d: Prevent out of bounds access in performance query
extensions
   drm/v3d: Prefer get_user for scalar types
   drm/v3d: Add some local variables in queries/extensions
   drm/v3d: Align data types of internal and uapi counts
   drm/v3d: Fix potential memory leak in the timestamp extension
   drm/v3d: Fix potential memory leak in the performance extension
   drm/v3d: Validate passed in drm syncobj handles in the timestamp
     extension
   drm/v3d: Validate passed in drm syncobj handles in the performance
     extension
   drm/v3d: Move part of copying of reset/copy performance extension
to a
     helper
   drm/v3d: Size the kperfmon_ids array at runtime
   drm/v3d: Do not use intermediate storage when copying performance
     query results
   drm/v3d: Move perfmon init completely into own unit

  drivers/gpu/drm/v3d/v3d_drv.c |   9 +-
  drivers/gpu/drm/v3d/v3d_drv.h |  16 +-
  drivers/gpu/drm/v3d/v3d_perfmon.c |  44 +--
  .../gpu/drm/v3d/v3d_performance_counters.h    |  16 +-
  drivers/gpu/drm/v3d/v3d_sched.c   | 106 ---
  drivers/gpu/drm/v3d/v3d_submit.c  | 285 ++--
--
  6 files changed, 281 insertions(+), 195 deletions(-)





[PATCH 09/12] drm/v3d: Move perfmon init completely into own unit

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Now that the build time dependencies on various array sizes have been
removed, we can move the perfmon init completely into its own compilation
unit and remove the hardcoded defines.

This improves on the temporary fix quickly delivered in
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit").

Signed-off-by: Tvrtko Ursulin 
References: 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit")
---
 drivers/gpu/drm/v3d/v3d_drv.c |  9 +---
 drivers/gpu/drm/v3d/v3d_drv.h |  6 +--
 drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++
 .../gpu/drm/v3d/v3d_performance_counters.h| 16 ---
 4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index a47f00b443d3..491c638a4d74 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
args->value = 1;
return 0;
case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
-   args->value = v3d->max_counters;
+   args->value = v3d->perfmon_info.max_counters;
return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
@@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
-   if (v3d->ver >= 71)
-   v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS;
-   else if (v3d->ver >= 42)
-   v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS;
-   else
-   v3d->max_counters = 0;
+   v3d_perfmon_init(v3d);
 
v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
if (IS_ERR(v3d->reset)) {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 00fe5d993175..6d2d34cd135c 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,10 +104,7 @@ struct v3d_dev {
int ver;
bool single_irq_line;
 
-   /* Different revisions of V3D have different total number of performance
-* counters
-*/
-   unsigned int max_counters;
+   struct v3d_perfmon_info perfmon_info;
 
void __iomem *hub_regs;
void __iomem *core_regs[3];
@@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
 
 /* v3d_perfmon.c */
+void v3d_perfmon_init(struct v3d_dev *v3d);
 void v3d_perfmon_get(struct v3d_perfmon *perfmon);
 void v3d_perfmon_put(struct v3d_perfmon *perfmon);
 void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index b7d0b02e1a95..cd7f1eedf17f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc 
v3d_v71_performance_counters[] = {
{"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any 
other reason (vary/W/Z)"},
 };
 
+void v3d_perfmon_init(struct v3d_dev *v3d)
+{
+   const struct v3d_perf_counter_desc *counters = NULL;
+   unsigned int max = 0;
+
+   if (v3d->ver >= 71) {
+   counters = v3d_v71_performance_counters;
+   max = ARRAY_SIZE(v3d_v71_performance_counters);
+   } else if (v3d->ver >= 42) {
+   counters = v3d_v42_performance_counters;
+   max = ARRAY_SIZE(v3d_v42_performance_counters);
+   }
+
+   v3d->perfmon_info.max_counters = max;
+   v3d->perfmon_info.counters = counters;
+}
+
 void v3d_perfmon_get(struct v3d_perfmon *perfmon)
 {
if (perfmon)
@@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 
/* Make sure all counters are valid. */
for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= v3d->max_counters)
+   if (req->counters[i] >= v3d->perfmon_info.max_counters)
return -EINVAL;
}
 
@@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, 
void *data,
return -EINVAL;
}
 
-   /* Make sure that the counter ID is valid */
-   if (req->counter >= v3d->max_counters)
-   return -EINVAL;
-
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) !=
-V3D_V42_NUM_PERFCOUNTERS);
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v71_performance_counters) !=
-V3D_V71_NUM_PERFCOUNTERS);
-   BUILD_BUG_ON(V3D_MAX_COUNTERS < V3D_V42_NUM_PERFCOUNTERS);
-   

[PATCH 12/12] drm/v3d: Prefer get_user for scalar types

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

It makes it just a tiny bit more obvious what is going on.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index b0c2a8e9cb06..9273b0aadb79 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -486,14 +486,14 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
for (i = 0; i < timestamp.count; i++) {
u32 offset, sync;
 
-   if (copy_from_user(, offsets++, sizeof(offset))) {
+   if (get_user(offset, offsets++)) {
err = -EFAULT;
goto error;
}
 
qinfo->queries[i].offset = offset;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
@@ -552,7 +552,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
qinfo->queries[i].offset = reset.offset + 8 * i;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
@@ -614,14 +614,14 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
for (i = 0; i < copy.count; i++) {
u32 offset, sync;
 
-   if (copy_from_user(, offsets++, sizeof(offset))) {
+   if (get_user(offset, offsets++)) {
err = -EFAULT;
goto error;
}
 
qinfo->queries[i].offset = offset;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
-- 
2.44.0



[PATCH 05/12] drm/v3d: Validate passed in drm syncobj handles in the performance extension

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfuly or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_submit.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 3313423080e7..b51600e236c8 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -706,6 +706,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
}
 
job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
@@ -787,6 +791,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
}
 
job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->performance_query.count = copy.count;
job->performance_query.nperfmons = copy.nperfmons;
-- 
2.44.0



[PATCH 10/12] drm/v3d: Align data types of internal and uapi counts

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

In the timestamp and performance extensions userspace type for counts is
u32 so lets use unsigned in the kernel too.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 8dae3ab5f936..34ecd844f16a 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   unsigned int i;
int err;
 
if (!job) {
@@ -481,7 +482,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
offsets = u64_to_user_ptr(timestamp.offsets);
syncs = u64_to_user_ptr(timestamp.syncs);
 
-   for (int i = 0; i < timestamp.count; i++) {
+   for (i = 0; i < timestamp.count; i++) {
u32 offset, sync;
 
if (copy_from_user(, offsets++, sizeof(offset))) {
@@ -518,6 +519,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   unsigned int i;
int err;
 
if (!job) {
@@ -543,7 +545,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
syncs = u64_to_user_ptr(reset.syncs);
 
-   for (int i = 0; i < reset.count; i++) {
+   for (i = 0; i < reset.count; i++) {
u32 sync;
 
job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
@@ -576,7 +578,8 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_copy_timestamp_query copy;
-   int i, err;
+   unsigned int i;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
-- 
2.44.0



[PATCH 08/12] drm/v3d: Do not use intermediate storage when copying performance query results

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Removing the intermediate buffer removes the last use of the
V3D_MAX_COUNTERS define, which will enable further driver cleanup.

While at it pull the 32 vs 64 bit copying decision outside the loop in
order to reduce the number of conditional instructions.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 60 -
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index fc8730264386..77f795e38fad 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job)
v3d_put_bo_vaddr(bo);
 }
 
+static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value)
+{
+   dst[idx] = value;
+}
+
+static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value)
+{
+   dst[idx] = value;
+}
+
 static void
-write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value)
+write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value)
 {
-   if (do_64bit) {
-   u64 *dst64 = (u64 *)dst;
-
-   dst64[idx] = value;
-   } else {
-   u32 *dst32 = (u32 *)dst;
-
-   dst32[idx] = (u32)value;
-   }
+   if (do_64bit)
+   write_to_buffer_64(dst, idx, value);
+   else
+   write_to_buffer_32(dst, idx, value);
 }
 
 static void
@@ -505,18 +510,23 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job)
 }
 
 static void
-v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 
query)
+v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data,
+  unsigned int query)
 {
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
-   struct v3d_copy_query_results_info *copy = >copy;
+   struct v3d_performance_query_info *performance_query =
+   >performance_query;
struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
struct v3d_dev *v3d = job->base.v3d;
-   struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_MAX_COUNTERS];
+   unsigned int i, j, offset;
 
-   for (int i = 0; i < performance_query->nperfmons; i++) {
-   perfmon = v3d_perfmon_find(v3d_priv,
-  
performance_query->queries[query].kperfmon_ids[i]);
+   for (i = 0, offset = 0;
+i < performance_query->nperfmons;
+i++, offset += DRM_V3D_MAX_PERF_COUNTERS) {
+   struct v3d_performance_query *q =
+   _query->queries[query];
+   struct v3d_perfmon *perfmon;
+
+   perfmon = v3d_perfmon_find(v3d_priv, q->kperfmon_ids[i]);
if (!perfmon) {
DRM_DEBUG("Failed to find perfmon.");
continue;
@@ -524,14 +534,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job 
*job, void *data, u32 quer
 
v3d_perfmon_stop(v3d, perfmon, true);
 
-   memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], 
perfmon->values,
-  perfmon->ncounters * sizeof(u64));
+   if (job->copy.do_64bit) {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_64(data, offset + j,
+  perfmon->values[j]);
+   } else {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_32(data, offset + j,
+  perfmon->values[j]);
+   }
 
v3d_perfmon_put(perfmon);
}
-
-   for (int i = 0; i < performance_query->ncounters; i++)
-   write_to_buffer(data, i, copy->do_64bit, counter_values[i]);
 }
 
 static void
-- 
2.44.0



[PATCH 06/12] drm/v3d: Move part of copying of reset/copy performance extension to a helper

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

The loop which looks up the syncobj and copies the kperfmon ids is
identical so lets move it to a helper.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 148 +--
 1 file changed, 64 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index b51600e236c8..35682433f75b 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -641,13 +641,63 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
return err;
 }
 
+static int
+copy_query_info(struct v3d_performance_query_info *qinfo,
+   unsigned int count,
+   unsigned int nperfmons,
+   u32 __user *syncs,
+   u64 __user *kperfmon_ids,
+   struct drm_file *fpriv)
+{
+   unsigned int i, j;
+   int err;
+
+   for (i = 0; i < count; i++) {
+   struct v3d_performance_query *query = >queries[i];
+   u32 __user *ids_pointer;
+   u32 sync, id;
+   u64 ids;
+
+   if (get_user(sync, syncs++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   if (get_user(ids, kperfmon_ids++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   ids_pointer = u64_to_user_ptr(ids);
+
+   for (j = 0; j < nperfmons; j++) {
+   if (get_user(id, ids_pointer++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   query->kperfmon_ids[j] = id;
+   }
+
+   query->syncobj = drm_syncobj_find(fpriv, sync);
+   if (!query->syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
+   }
+
+   return 0;
+
+error:
+   __v3d_performance_query_info_free(qinfo, i);
+   return err;
+}
+
 static int
 v3d_get_cpu_reset_performance_params(struct drm_file *file_priv,
 struct drm_v3d_extension __user *ext,
 struct v3d_cpu_job *job)
 {
-   u32 __user *syncs;
-   u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
int err;
 
@@ -675,50 +725,17 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (!job->performance_query.queries)
return -ENOMEM;
 
-   syncs = u64_to_user_ptr(reset.syncs);
-   kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
+   err = copy_query_info(qinfo, reset.count, reset.nperfmons,
+ u64_to_user_ptr(reset.syncs),
+ u64_to_user_ptr(reset.kperfmon_ids),
+ file_priv);
+   if (err)
+   return err;
 
-   for (int i = 0; i < reset.count; i++) {
-   u32 sync;
-   u64 ids;
-   u32 __user *ids_pointer;
-   u32 id;
-
-   if (copy_from_user(, syncs++, sizeof(sync))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   ids_pointer = u64_to_user_ptr(ids);
-
-   for (int j = 0; j < reset.nperfmons; j++) {
-   if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   job->performance_query.queries[i].kperfmon_ids[j] = id;
-   }
-
-   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->performance_query.queries[i].syncobj) {
-   err = -ENOENT;
-   goto error;
-   }
-   }
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
 
return 0;
-
-error:
-   __v3d_performance_query_info_free(qinfo, i);
-   return err;
 }
 
 static int
@@ -726,8 +743,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
  struct drm_v3d_extension __user *ext,
  struct v3d_cpu_job *job)
 {
-   u32 __user *syncs;
-   u64 __user *kperfmon_ids;
struct drm_v3d_copy_performance_query copy;
int err;
 
@@ -758,44 +773,13 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (!job->performance_query.queries)
return -ENOMEM;
 
-   syncs = u64_

[PATCH 07/12] drm/v3d: Size the kperfmon_ids array at runtime

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Instead of statically reserving pessimistic space for the kperfmon_ids
array, make the userspace extension code allocate the exactly required
amount of space.

Apart from saving some memory at runtime, this also removes the need for
the V3D_MAX_PERFMONS macro whose removal will benefit further driver
cleanup.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_drv.h|  6 +-
 drivers/gpu/drm/v3d/v3d_sched.c  |  4 +++-
 drivers/gpu/drm/v3d/v3d_submit.c | 17 +++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 38c80168da51..00fe5d993175 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,13 +351,9 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
 };
 
-/* Number of perfmons required to handle all supported performance counters */
-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
- DRM_V3D_MAX_PERF_COUNTERS)
-
 struct v3d_performance_query {
/* Performance monitor IDs for this query */
-   u32 kperfmon_ids[V3D_MAX_PERFMONS];
+   u32 *kperfmon_ids;
 
/* Syncobj that indicates the query availability */
struct drm_syncobj *syncobj;
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 173801aa54ee..fc8730264386 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -94,8 +94,10 @@ __v3d_performance_query_info_free(struct 
v3d_performance_query_info *qinfo,
if (qinfo->queries) {
unsigned int i;
 
-   for (i = 0; i < count; i++)
+   for (i = 0; i < count; i++) {
drm_syncobj_put(qinfo->queries[i].syncobj);
+   kvfree(qinfo->queries[i].kperfmon_ids);
+   }
 
kvfree(qinfo->queries);
}
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 35682433f75b..8dae3ab5f936 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -668,10 +668,20 @@ copy_query_info(struct v3d_performance_query_info *qinfo,
goto error;
}
 
+   query->kperfmon_ids =
+   kvmalloc_array(nperfmons,
+  sizeof(struct v3d_performance_query *),
+  GFP_KERNEL);
+   if (!query->kperfmon_ids) {
+   err = -ENOMEM;
+   goto error;
+   }
+
ids_pointer = u64_to_user_ptr(ids);
 
for (j = 0; j < nperfmons; j++) {
if (get_user(id, ids_pointer++)) {
+   kvfree(query->kperfmon_ids);
err = -EFAULT;
goto error;
}
@@ -681,6 +691,7 @@ copy_query_info(struct v3d_performance_query_info *qinfo,
 
query->syncobj = drm_syncobj_find(fpriv, sync);
if (!query->syncobj) {
+   kvfree(query->kperfmon_ids);
err = -ENOENT;
goto error;
}
@@ -714,9 +725,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
 
-   if (reset.nperfmons > V3D_MAX_PERFMONS)
-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(reset.count,
@@ -762,9 +770,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
 
-   if (copy.nperfmons > V3D_MAX_PERFMONS)
-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(copy.count,
-- 
2.44.0



[PATCH 11/12] drm/v3d: Add some local variables in queries/extensions

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Add some local variables to make the code a bit less verbose, with the
main benefit being pulling some lines to under 80 columns wide.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 79 +---
 1 file changed, 42 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 34ecd844f16a..b0c2a8e9cb06 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   struct v3d_timestamp_query_info *qinfo = >timestamp_query;
unsigned int i;
int err;
 
@@ -473,10 +474,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
 
-   job->timestamp_query.queries = kvmalloc_array(timestamp.count,
- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   qinfo->queries = kvmalloc_array(timestamp.count,
+   sizeof(struct v3d_timestamp_query),
+   GFP_KERNEL);
+   if (!qinfo->queries)
return -ENOMEM;
 
offsets = u64_to_user_ptr(timestamp.offsets);
@@ -490,20 +491,20 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
goto error;
}
 
-   job->timestamp_query.queries[i].offset = offset;
+   qinfo->queries[i].offset = offset;
 
if (copy_from_user(, syncs++, sizeof(sync))) {
err = -EFAULT;
goto error;
}
 
-   job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->timestamp_query.queries[i].syncobj) {
+   qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+   if (!qinfo->queries[i].syncobj) {
err = -ENOENT;
goto error;
}
}
-   job->timestamp_query.count = timestamp.count;
+   qinfo->count = timestamp.count;
 
return 0;
 
@@ -519,6 +520,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   struct v3d_timestamp_query_info *qinfo = >timestamp_query;
unsigned int i;
int err;
 
@@ -537,10 +539,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY;
 
-   job->timestamp_query.queries = kvmalloc_array(reset.count,
- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   qinfo->queries = kvmalloc_array(reset.count,
+   sizeof(struct v3d_timestamp_query),
+   GFP_KERNEL);
+   if (!qinfo->queries)
return -ENOMEM;
 
syncs = u64_to_user_ptr(reset.syncs);
@@ -548,20 +550,20 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
for (i = 0; i < reset.count; i++) {
u32 sync;
 
-   job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
+   qinfo->queries[i].offset = reset.offset + 8 * i;
 
if (copy_from_user(, syncs++, sizeof(sync))) {
err = -EFAULT;
goto error;
}
 
-   job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->timestamp_query.queries[i].syncobj) {
+   qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+   if (!qinfo->queries[i].syncobj) {
err = -ENOENT;
goto error;
}
}
-   job->timestamp_query.count = reset.count;
+   qinfo->count = reset.count;
 
return 0;
 
@@ -578,6 +580,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_copy_timestamp_query copy;
+   struct v3d_timestamp_query_info *qinfo = >timestamp_query;
unsigned int i;
int err;
 
@@ -599,10 +602,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY;
 
-   job->timestamp_query.queries = kvmalloc_array(copy

[PATCH 04/12] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfuly or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp 
query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_submit.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index ca1b1ad0a75c..3313423080e7 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -497,6 +497,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = timestamp.count;
 
@@ -550,6 +554,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = reset.count;
 
@@ -613,6 +621,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = copy.count;
 
-- 
2.44.0



[PATCH v2 00/12] v3d: Perfmon cleanup

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

When we had to quickly deal with a tree build issue via merging
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we
promised to follow up with a nicer solution.

As in the process of eliminating the hardcoded defines we have discovered a few
issues in handling of corner cases and userspace input validation, the fix has
turned into a larger series, but hopefully the end result is a justifiable
cleanup.

v2:
 * Re-order the patches so fixes come first while last three are optional
   cleanups.

Tvrtko Ursulin (12):
  drm/v3d: Prevent out of bounds access in performance query extensions
  drm/v3d: Fix potential memory leak in the timestamp extension
  drm/v3d: Fix potential memory leak in the performance extension
  drm/v3d: Validate passed in drm syncobj handles in the timestamp
extension
  drm/v3d: Validate passed in drm syncobj handles in the performance
extension
  drm/v3d: Move part of copying of reset/copy performance extension to a
helper
  drm/v3d: Size the kperfmon_ids array at runtime
  drm/v3d: Do not use intermediate storage when copying performance
query results
  drm/v3d: Move perfmon init completely into own unit
  drm/v3d: Align data types of internal and uapi counts
  drm/v3d: Add some local variables in queries/extensions
  drm/v3d: Prefer get_user for scalar types

 drivers/gpu/drm/v3d/v3d_drv.c |   9 +-
 drivers/gpu/drm/v3d/v3d_drv.h |  16 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c |  44 +--
 .../gpu/drm/v3d/v3d_performance_counters.h|  16 +-
 drivers/gpu/drm/v3d/v3d_sched.c   | 106 ---
 drivers/gpu/drm/v3d/v3d_submit.c  | 285 ++
 6 files changed, 281 insertions(+), 195 deletions(-)

-- 
2.44.0



[PATCH 02/12] drm/v3d: Fix potential memory leak in the timestamp extension

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp 
query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
 drivers/gpu/drm/v3d/v3d_sched.c  | 22 +--
 drivers/gpu/drm/v3d/v3d_submit.c | 36 ++--
 3 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 099b962bdfde..95651c3c926f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 
 /* v3d_sched.c */
+void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo,
+unsigned int count);
 void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 03df37a3acf5..e45d3ddc6f82 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job)
v3d_job_cleanup(job);
 }
 
+void
+__v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo,
+   unsigned int count)
+{
+   if (qinfo->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(qinfo->queries[i].syncobj);
+
+   kvfree(qinfo->queries);
+   }
+}
+
 static void
 v3d_cpu_job_free(struct drm_sched_job *sched_job)
 {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_timestamp_query_info *timestamp_query = 
>timestamp_query;
struct v3d_performance_query_info *performance_query = 
>performance_query;
 
-   if (timestamp_query->queries) {
-   for (int i = 0; i < timestamp_query->count; i++)
-   drm_syncobj_put(timestamp_query->queries[i].syncobj);
-   kvfree(timestamp_query->queries);
-   }
+   __v3d_timestamp_query_info_free(>timestamp_query,
+   job->timestamp_query.count);
 
if (performance_query->queries) {
for (int i = 0; i < performance_query->count; i++)
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 263fefc1d04f..2818afdd4807 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -484,15 +485,15 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
u32 offset, sync;
 
if (copy_from_user(, offsets++, sizeof(offset))) {
-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
job->timestamp_query.queries[i].offset = offset;
 
if (copy_from_user(, syncs++, sizeof(sync))) {
-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
@@ -500,6 +501,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
job->timestamp_query.count = timestamp.count;
 
return 0;
+
+error:
+   __v3d_timestamp_query_info_free(qinfo, i);
+   return err;
 }
 
 static int
@@ -509,6 +514,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -539,8 +545,8 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
 
if (copy_from_user(, syncs++, sizeof(sync))) {
-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+

[PATCH 01/12] drm/v3d: Prevent out of bounds access in performance query extensions

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Check that the number of perfmons userspace is passing in the copy and
reset extensions is not greater than the internal kernel storage where
the ids will be copied into.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_submit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 88f63d526b22..263fefc1d04f 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
 
+   if (reset.nperfmons > V3D_MAX_PERFMONS)
+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(reset.count,
@@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
 
+   if (copy.nperfmons > V3D_MAX_PERFMONS)
+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(copy.count,
-- 
2.44.0



[PATCH 03/12] drm/v3d: Fix potential memory leak in the performance extension

2024-07-10 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
 drivers/gpu/drm/v3d/v3d_sched.c  | 22 +-
 drivers/gpu/drm/v3d/v3d_submit.c | 40 +---
 3 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 95651c3c926f..38c80168da51 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 /* v3d_sched.c */
 void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo,
 unsigned int count);
+void __v3d_performance_query_info_free(struct v3d_performance_query_info 
*qinfo,
+  unsigned int count);
 void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index e45d3ddc6f82..173801aa54ee 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -87,20 +87,30 @@ __v3d_timestamp_query_info_free(struct 
v3d_timestamp_query_info *qinfo,
}
 }
 
+void
+__v3d_performance_query_info_free(struct v3d_performance_query_info *qinfo,
+ unsigned int count)
+{
+   if (qinfo->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(qinfo->queries[i].syncobj);
+
+   kvfree(qinfo->queries);
+   }
+}
+
 static void
 v3d_cpu_job_free(struct drm_sched_job *sched_job)
 {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
 
__v3d_timestamp_query_info_free(>timestamp_query,
job->timestamp_query.count);
 
-   if (performance_query->queries) {
-   for (int i = 0; i < performance_query->count; i++)
-   drm_syncobj_put(performance_query->queries[i].syncobj);
-   kvfree(performance_query->queries);
-   }
+   __v3d_performance_query_info_free(>performance_query,
+ job->performance_query.count);
 
v3d_job_cleanup(>base);
 }
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 2818afdd4807..ca1b1ad0a75c 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -637,6 +637,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u32 __user *syncs;
u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -672,32 +673,36 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u32 id;
 
if (copy_from_user(, syncs++, sizeof(sync))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
-   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-
if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
ids_pointer = u64_to_user_ptr(ids);
 
for (int j = 0; j < reset.nperfmons; j++) {
if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
job->performance_query.queries[i].kperfmon_ids[j] = id;
}
+
+   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
}
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
 
return 0;
+
+error:
+   __v3d_performance_query_info_free(qinfo, i);
+   return err;
 }

Re: [PATCH] drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8

2024-07-10 Thread Tvrtko Ursulin



On 09/07/2024 15:02, Tvrtko Ursulin wrote:


On 09/07/2024 13:53, Nitin Gote wrote:

We're seeing a GPU HANG issue on a CHV platform, which was caused by
bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries 
for gen8").


Gen8 platform has only timeslice and doesn't support a preemption 
mechanism

as engines do not have a preemption timer and doesn't send an irq if the
preemption timeout expires. So, add a fix to not consider preemption
during dequeuing for gen8 platforms.

Also move can_preemt() above need_preempt() function to resolve implicit
declaration of function ‘can_preempt' error and make can_preempt()
function param as const to resolve error: passing argument 1 of
‘can_preempt’ discards ‘const’ qualifier from the pointer target type.

Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption 
boundaries for gen8")

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396
Suggested-by: Andi Shyti 
Signed-off-by: Nitin Gote 
Cc: Chris Wilson 
CC:  # v5.2+
---
  .../drm/i915/gt/intel_execlists_submission.c  | 24 ---
  1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c

index 21829439e686..30631cc690f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -294,11 +294,26 @@ static int virtual_prio(const struct 
intel_engine_execlists *el)

  return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN;
  }
+static bool can_preempt(const struct intel_engine_cs *engine)
+{
+    if (GRAPHICS_VER(engine->i915) > 8)
+    return true;
+
+    if (IS_CHERRYVIEW(engine->i915) || IS_BROADWELL(engine->i915))
+    return false;
+
+    /* GPGPU on bdw requires extra w/a; not implemented */
+    return engine->class != RENDER_CLASS;


Aren't BDW and CHV the only Gen8 platforms, in which case this function 
can be simplifies as:


...
{
 return GRAPHICS_VER(engine->i915) > 8;
}

?


+}
+
  static bool need_preempt(const struct intel_engine_cs *engine,
   const struct i915_request *rq)
  {
  int last_prio;
+    if ((GRAPHICS_VER(engine->i915) <= 8) && can_preempt(engine))


The GRAPHICS_VER check here looks redundant with the one inside 
can_preempt().


One more thing - I think gen8_emit_bb_start() becomes dead code after 
this and can be removed.


Regards,

Tvrtko


+    return false;
+
  if (!intel_engine_has_semaphores(engine))
  return false;
@@ -3313,15 +3328,6 @@ static void remove_from_engine(struct 
i915_request *rq)

  i915_request_notify_execute_cb_imm(rq);
  }
-static bool can_preempt(struct intel_engine_cs *engine)
-{
-    if (GRAPHICS_VER(engine->i915) > 8)
-    return true;
-
-    /* GPGPU on bdw requires extra w/a; not implemented */
-    return engine->class != RENDER_CLASS;
-}
-
  static void kick_execlists(const struct i915_request *rq, int prio)
  {
  struct intel_engine_cs *engine = rq->engine;


[PATCH 12/12] drm/v3d: Move perfmon init completely into own unit

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Now that the build time dependencies on various array sizes have been
removed, we can move the perfmon init completely into its own compilation
unit and remove the hardcoded defines.

This improves on the temporary fix quickly delivered in
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit").

Signed-off-by: Tvrtko Ursulin 
References: 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit")
---
 drivers/gpu/drm/v3d/v3d_drv.c |  9 +---
 drivers/gpu/drm/v3d/v3d_drv.h |  6 +--
 drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++
 .../gpu/drm/v3d/v3d_performance_counters.h| 16 ---
 4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index a47f00b443d3..491c638a4d74 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
args->value = 1;
return 0;
case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
-   args->value = v3d->max_counters;
+   args->value = v3d->perfmon_info.max_counters;
return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
@@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
-   if (v3d->ver >= 71)
-   v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS;
-   else if (v3d->ver >= 42)
-   v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS;
-   else
-   v3d->max_counters = 0;
+   v3d_perfmon_init(v3d);
 
v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
if (IS_ERR(v3d->reset)) {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 00fe5d993175..6d2d34cd135c 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,10 +104,7 @@ struct v3d_dev {
int ver;
bool single_irq_line;
 
-   /* Different revisions of V3D have different total number of performance
-* counters
-*/
-   unsigned int max_counters;
+   struct v3d_perfmon_info perfmon_info;
 
void __iomem *hub_regs;
void __iomem *core_regs[3];
@@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
 
 /* v3d_perfmon.c */
+void v3d_perfmon_init(struct v3d_dev *v3d);
 void v3d_perfmon_get(struct v3d_perfmon *perfmon);
 void v3d_perfmon_put(struct v3d_perfmon *perfmon);
 void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index b7d0b02e1a95..cd7f1eedf17f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc 
v3d_v71_performance_counters[] = {
{"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any 
other reason (vary/W/Z)"},
 };
 
+void v3d_perfmon_init(struct v3d_dev *v3d)
+{
+   const struct v3d_perf_counter_desc *counters = NULL;
+   unsigned int max = 0;
+
+   if (v3d->ver >= 71) {
+   counters = v3d_v71_performance_counters;
+   max = ARRAY_SIZE(v3d_v71_performance_counters);
+   } else if (v3d->ver >= 42) {
+   counters = v3d_v42_performance_counters;
+   max = ARRAY_SIZE(v3d_v42_performance_counters);
+   }
+
+   v3d->perfmon_info.max_counters = max;
+   v3d->perfmon_info.counters = counters;
+}
+
 void v3d_perfmon_get(struct v3d_perfmon *perfmon)
 {
if (perfmon)
@@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 
/* Make sure all counters are valid. */
for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= v3d->max_counters)
+   if (req->counters[i] >= v3d->perfmon_info.max_counters)
return -EINVAL;
}
 
@@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, 
void *data,
return -EINVAL;
}
 
-   /* Make sure that the counter ID is valid */
-   if (req->counter >= v3d->max_counters)
-   return -EINVAL;
-
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performance_counters) !=
-V3D_V42_NUM_PERFCOUNTERS);
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v71_performance_counters) !=
-V3D_V71_NUM_PERFCOUNTERS);
-   BUILD_BUG_ON(V3D_MAX_COUNTERS < V3D_V42_NUM_PERFCOUNTERS);
-   

[PATCH 11/12] drm/v3d: Do not use intermediate storage when copying performance query results

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Removing the intermediate buffer removes the last use of the
V3D_MAX_COUNTERS define, which will enable further driver cleanup.

While at it pull the 32 vs 64 bit copying decision outside the loop in
order to reduce the number of conditional instructions.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 60 -
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index fc8730264386..77f795e38fad 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job)
v3d_put_bo_vaddr(bo);
 }
 
+static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value)
+{
+   dst[idx] = value;
+}
+
+static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value)
+{
+   dst[idx] = value;
+}
+
 static void
-write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value)
+write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value)
 {
-   if (do_64bit) {
-   u64 *dst64 = (u64 *)dst;
-
-   dst64[idx] = value;
-   } else {
-   u32 *dst32 = (u32 *)dst;
-
-   dst32[idx] = (u32)value;
-   }
+   if (do_64bit)
+   write_to_buffer_64(dst, idx, value);
+   else
+   write_to_buffer_32(dst, idx, value);
 }
 
 static void
@@ -505,18 +510,23 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job)
 }
 
 static void
-v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 
query)
+v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data,
+  unsigned int query)
 {
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
-   struct v3d_copy_query_results_info *copy = >copy;
+   struct v3d_performance_query_info *performance_query =
+   >performance_query;
struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
struct v3d_dev *v3d = job->base.v3d;
-   struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_MAX_COUNTERS];
+   unsigned int i, j, offset;
 
-   for (int i = 0; i < performance_query->nperfmons; i++) {
-   perfmon = v3d_perfmon_find(v3d_priv,
-  
performance_query->queries[query].kperfmon_ids[i]);
+   for (i = 0, offset = 0;
+i < performance_query->nperfmons;
+i++, offset += DRM_V3D_MAX_PERF_COUNTERS) {
+   struct v3d_performance_query *q =
+   _query->queries[query];
+   struct v3d_perfmon *perfmon;
+
+   perfmon = v3d_perfmon_find(v3d_priv, q->kperfmon_ids[i]);
if (!perfmon) {
DRM_DEBUG("Failed to find perfmon.");
continue;
@@ -524,14 +534,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job 
*job, void *data, u32 quer
 
v3d_perfmon_stop(v3d, perfmon, true);
 
-   memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], 
perfmon->values,
-  perfmon->ncounters * sizeof(u64));
+   if (job->copy.do_64bit) {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_64(data, offset + j,
+  perfmon->values[j]);
+   } else {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_32(data, offset + j,
+  perfmon->values[j]);
+   }
 
v3d_perfmon_put(perfmon);
}
-
-   for (int i = 0; i < performance_query->ncounters; i++)
-   write_to_buffer(data, i, copy->do_64bit, counter_values[i]);
 }
 
 static void
-- 
2.44.0



[PATCH 10/12] drm/v3d: Size the kperfmon_ids array at runtime

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Instead of statically reserving pessimistic space for the kperfmon_ids
array, make the userspace extension code allocate the exactly required
amount of space.

Apart from saving some memory at runtime, this also removes the need for
the V3D_MAX_PERFMONS macro whose removal will benefit further driver
cleanup.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_drv.h|  6 +-
 drivers/gpu/drm/v3d/v3d_sched.c  |  4 +++-
 drivers/gpu/drm/v3d/v3d_submit.c | 17 +++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 38c80168da51..00fe5d993175 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,13 +351,9 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
 };
 
-/* Number of perfmons required to handle all supported performance counters */
-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
- DRM_V3D_MAX_PERF_COUNTERS)
-
 struct v3d_performance_query {
/* Performance monitor IDs for this query */
-   u32 kperfmon_ids[V3D_MAX_PERFMONS];
+   u32 *kperfmon_ids;
 
/* Syncobj that indicates the query availability */
struct drm_syncobj *syncobj;
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 173801aa54ee..fc8730264386 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -94,8 +94,10 @@ __v3d_performance_query_info_free(struct 
v3d_performance_query_info *qinfo,
if (qinfo->queries) {
unsigned int i;
 
-   for (i = 0; i < count; i++)
+   for (i = 0; i < count; i++) {
drm_syncobj_put(qinfo->queries[i].syncobj);
+   kvfree(qinfo->queries[i].kperfmon_ids);
+   }
 
kvfree(qinfo->queries);
}
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index a2e55ba8222b..e1a7622a43f9 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -674,10 +674,20 @@ copy_query_info(struct v3d_performance_query_info *qinfo,
goto error;
}
 
+   query->kperfmon_ids =
+   kvmalloc_array(nperfmons,
+  sizeof(struct v3d_performance_query *),
+  GFP_KERNEL);
+   if (!query->kperfmon_ids) {
+   err = -ENOMEM;
+   goto error;
+   }
+
ids_pointer = u64_to_user_ptr(ids);
 
for (j = 0; j < nperfmons; j++) {
if (get_user(id, ids_pointer++)) {
+   kvfree(query->kperfmon_ids);
err = -EFAULT;
goto error;
}
@@ -687,6 +697,7 @@ copy_query_info(struct v3d_performance_query_info *qinfo,
 
query->syncobj = drm_syncobj_find(fpriv, sync);
if (!query->syncobj) {
+   kvfree(query->kperfmon_ids);
err = -ENOENT;
goto error;
}
@@ -721,9 +732,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
 
-   if (reset.nperfmons > V3D_MAX_PERFMONS)
-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
 
qinfo->queries = kvmalloc_array(reset.count,
@@ -770,9 +778,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
 
-   if (copy.nperfmons > V3D_MAX_PERFMONS)
-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
 
qinfo->queries = kvmalloc_array(copy.count,
-- 
2.44.0



[PATCH 09/12] drm/v3d: Move part of copying of reset/copy performance extension to a helper

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

The loop which looks up the syncobj and copies the kperfmon ids is
identical so lets move it to a helper.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 148 +--
 1 file changed, 64 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 2c4bb39c9ac6..a2e55ba8222b 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -647,16 +647,65 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
return err;
 }
 
+static int
+copy_query_info(struct v3d_performance_query_info *qinfo,
+   unsigned int count,
+   unsigned int nperfmons,
+   u32 __user *syncs,
+   u64 __user *kperfmon_ids,
+   struct drm_file *fpriv)
+{
+   unsigned int i, j;
+   int err;
+
+   for (i = 0; i < count; i++) {
+   struct v3d_performance_query *query = >queries[i];
+   u32 __user *ids_pointer;
+   u32 sync, id;
+   u64 ids;
+
+   if (get_user(sync, syncs++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   if (get_user(ids, kperfmon_ids++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   ids_pointer = u64_to_user_ptr(ids);
+
+   for (j = 0; j < nperfmons; j++) {
+   if (get_user(id, ids_pointer++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   query->kperfmon_ids[j] = id;
+   }
+
+   query->syncobj = drm_syncobj_find(fpriv, sync);
+   if (!query->syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
+   }
+
+   return 0;
+
+error:
+   __v3d_performance_query_info_free(qinfo, i);
+   return err;
+}
+
 static int
 v3d_get_cpu_reset_performance_params(struct drm_file *file_priv,
 struct drm_v3d_extension __user *ext,
 struct v3d_cpu_job *job)
 {
-   u32 __user *syncs;
-   u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
struct v3d_performance_query_info *qinfo = >performance_query;
-   unsigned int i, j;
int err;
 
if (!job) {
@@ -683,50 +732,17 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (!qinfo->queries)
return -ENOMEM;
 
-   syncs = u64_to_user_ptr(reset.syncs);
-   kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
+   err = copy_query_info(qinfo, reset.count, reset.nperfmons,
+ u64_to_user_ptr(reset.syncs),
+ u64_to_user_ptr(reset.kperfmon_ids),
+ file_priv);
+   if (err)
+   return err;
 
-   for (i = 0; i < reset.count; i++) {
-   u32 sync;
-   u64 ids;
-   u32 __user *ids_pointer;
-   u32 id;
-
-   if (get_user(sync, syncs++)) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   if (get_user(ids, kperfmon_ids++)) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   ids_pointer = u64_to_user_ptr(ids);
-
-   for (j = 0; j < reset.nperfmons; j++) {
-   if (get_user(id, ids_pointer++)) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   qinfo->queries[i].kperfmon_ids[j] = id;
-   }
-
-   qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
-   if (!qinfo->queries[i].syncobj) {
-   err = -ENOENT;
-   goto error;
-   }
-   }
qinfo->count = reset.count;
qinfo->nperfmons = reset.nperfmons;
 
return 0;
-
-error:
-   __v3d_performance_query_info_free(qinfo, i);
-   return err;
 }
 
 static int
@@ -734,11 +750,8 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
  struct drm_v3d_extension __user *ext,
  struct v3d_cpu_job *job)
 {
-   u32 __user *syncs;
-   u64 __user *kperfmon_ids;
struct drm_v3d_copy_performance_query copy;
struct v3d_performance_query_info *qinfo = >performance_query;
-   unsigned int i, j;
int err;
 
if (!job) {
@@ -768,42 +781,13 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if 

[PATCH 07/12] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfuly or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp 
query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_submit.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 81afcfccc6bb..a408db3d3e32 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -499,6 +499,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
}
 
qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+   if (!qinfo->queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
qinfo->count = timestamp.count;
 
@@ -554,6 +558,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
}
 
qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+   if (!qinfo->queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
qinfo->count = reset.count;
 
@@ -619,6 +627,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
}
 
qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+   if (!qinfo->queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
qinfo->count = copy.count;
 
-- 
2.44.0



[PATCH 05/12] drm/v3d: Fix potential memory leak in the timestamp extension

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp 
query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
 drivers/gpu/drm/v3d/v3d_sched.c  | 22 ++--
 drivers/gpu/drm/v3d/v3d_submit.c | 35 +++-
 3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 099b962bdfde..95651c3c926f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 
 /* v3d_sched.c */
+void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo,
+unsigned int count);
 void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 03df37a3acf5..e45d3ddc6f82 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job)
v3d_job_cleanup(job);
 }
 
+void
+__v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo,
+   unsigned int count)
+{
+   if (qinfo->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(qinfo->queries[i].syncobj);
+
+   kvfree(qinfo->queries);
+   }
+}
+
 static void
 v3d_cpu_job_free(struct drm_sched_job *sched_job)
 {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_timestamp_query_info *timestamp_query = 
>timestamp_query;
struct v3d_performance_query_info *performance_query = 
>performance_query;
 
-   if (timestamp_query->queries) {
-   for (int i = 0; i < timestamp_query->count; i++)
-   drm_syncobj_put(timestamp_query->queries[i].syncobj);
-   kvfree(timestamp_query->queries);
-   }
+   __v3d_timestamp_query_info_free(>timestamp_query,
+   job->timestamp_query.count);
 
if (performance_query->queries) {
for (int i = 0; i < performance_query->count; i++)
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index c960bc6ca32d..0f1c900c7d35 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -454,6 +454,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
struct drm_v3d_timestamp_query timestamp;
struct v3d_timestamp_query_info *qinfo = >timestamp_query;
unsigned int i;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -486,15 +487,15 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
u32 offset, sync;
 
if (get_user(offset, offsets++)) {
-   kvfree(qinfo->queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
qinfo->queries[i].offset = offset;
 
if (get_user(sync, syncs++)) {
-   kvfree(qinfo->queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
@@ -502,6 +503,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
qinfo->count = timestamp.count;
 
return 0;
+
+error:
+   __v3d_timestamp_query_info_free(qinfo, i);
+   return err;
 }
 
 static int
@@ -513,6 +518,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
struct drm_v3d_reset_timestamp_query reset;
struct v3d_timestamp_query_info *qinfo = >timestamp_query;
unsigned int i;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -543,8 +549,8 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
qinfo->queries[i].offset = reset.offset + 8 * i;
 
if (get_user(sync, syncs++)) {
-   kvfree(qinfo->queries);
-   return -EFAULT;
+   err = -EFAULT;
+   got

[PATCH 08/12] drm/v3d: Validate passed in drm syncobj handles in the performance extension

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfuly or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_submit.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index a408db3d3e32..2c4bb39c9ac6 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -714,6 +714,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
}
 
qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+   if (!qinfo->queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
qinfo->count = reset.count;
qinfo->nperfmons = reset.nperfmons;
@@ -795,6 +799,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
}
 
qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+   if (!qinfo->queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
qinfo->count = copy.count;
qinfo->nperfmons = copy.nperfmons;
-- 
2.44.0



[PATCH 06/12] drm/v3d: Fix potential memory leak in the performance extension

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
 drivers/gpu/drm/v3d/v3d_sched.c  | 22 -
 drivers/gpu/drm/v3d/v3d_submit.c | 42 
 3 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 95651c3c926f..38c80168da51 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 /* v3d_sched.c */
 void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo,
 unsigned int count);
+void __v3d_performance_query_info_free(struct v3d_performance_query_info 
*qinfo,
+  unsigned int count);
 void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index e45d3ddc6f82..173801aa54ee 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -87,20 +87,30 @@ __v3d_timestamp_query_info_free(struct 
v3d_timestamp_query_info *qinfo,
}
 }
 
+void
+__v3d_performance_query_info_free(struct v3d_performance_query_info *qinfo,
+ unsigned int count)
+{
+   if (qinfo->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(qinfo->queries[i].syncobj);
+
+   kvfree(qinfo->queries);
+   }
+}
+
 static void
 v3d_cpu_job_free(struct drm_sched_job *sched_job)
 {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
 
__v3d_timestamp_query_info_free(>timestamp_query,
job->timestamp_query.count);
 
-   if (performance_query->queries) {
-   for (int i = 0; i < performance_query->count; i++)
-   drm_syncobj_put(performance_query->queries[i].syncobj);
-   kvfree(performance_query->queries);
-   }
+   __v3d_performance_query_info_free(>performance_query,
+ job->performance_query.count);
 
v3d_job_cleanup(>base);
 }
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 0f1c900c7d35..81afcfccc6bb 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -645,6 +645,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
struct drm_v3d_reset_performance_query reset;
struct v3d_performance_query_info *qinfo = >performance_query;
unsigned int i, j;
+   int err;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -680,32 +681,36 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u32 id;
 
if (get_user(sync, syncs++)) {
-   kvfree(qinfo->queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
-   qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
-
if (get_user(ids, kperfmon_ids++)) {
-   kvfree(qinfo->queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
ids_pointer = u64_to_user_ptr(ids);
 
for (j = 0; j < reset.nperfmons; j++) {
if (get_user(id, ids_pointer++)) {
-   kvfree(qinfo->queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
 
qinfo->queries[i].kperfmon_ids[j] = id;
}
+
+   qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
}
qinfo->count = reset.count;
qinfo->nperfmons = reset.nperfmons;
 
return 0;
+
+error:
+   __v3d_performance_query_info_free(qinfo, i);
+   return err;
 }
 
 static int
@@ -718,6 +723,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
struct drm_v3d_cop

[PATCH 04/12] drm/v3d: Align data types of internal and uapi counts

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

In the timestamp and performance extensions userspace type for counts is
u32 so lets use unsigned in the kernel too.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index f99cd61a3e65..c960bc6ca32d 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -453,6 +453,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
struct v3d_timestamp_query_info *qinfo = >timestamp_query;
+   unsigned int i;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -481,7 +482,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
offsets = u64_to_user_ptr(timestamp.offsets);
syncs = u64_to_user_ptr(timestamp.syncs);
 
-   for (int i = 0; i < timestamp.count; i++) {
+   for (i = 0; i < timestamp.count; i++) {
u32 offset, sync;
 
if (get_user(offset, offsets++)) {
@@ -511,6 +512,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
struct v3d_timestamp_query_info *qinfo = >timestamp_query;
+   unsigned int i;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -535,7 +537,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
syncs = u64_to_user_ptr(reset.syncs);
 
-   for (int i = 0; i < reset.count; i++) {
+   for (i = 0; i < reset.count; i++) {
u32 sync;
 
qinfo->queries[i].offset = reset.offset + 8 * i;
@@ -561,7 +563,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
u32 __user *offsets, *syncs;
struct drm_v3d_copy_timestamp_query copy;
struct v3d_timestamp_query_info *qinfo = >timestamp_query;
-   int i;
+   unsigned int i;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -627,6 +629,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
struct v3d_performance_query_info *qinfo = >performance_query;
+   unsigned int i, j;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -655,7 +658,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
syncs = u64_to_user_ptr(reset.syncs);
kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
 
-   for (int i = 0; i < reset.count; i++) {
+   for (i = 0; i < reset.count; i++) {
u32 sync;
u64 ids;
u32 __user *ids_pointer;
@@ -675,7 +678,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
 
ids_pointer = u64_to_user_ptr(ids);
 
-   for (int j = 0; j < reset.nperfmons; j++) {
+   for (j = 0; j < reset.nperfmons; j++) {
if (get_user(id, ids_pointer++)) {
kvfree(qinfo->queries);
return -EFAULT;
@@ -699,6 +702,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
u64 __user *kperfmon_ids;
struct drm_v3d_copy_performance_query copy;
struct v3d_performance_query_info *qinfo = >performance_query;
+   unsigned int i, j;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -730,7 +734,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
syncs = u64_to_user_ptr(copy.syncs);
kperfmon_ids = u64_to_user_ptr(copy.kperfmon_ids);
 
-   for (int i = 0; i < copy.count; i++) {
+   for (i = 0; i < copy.count; i++) {
u32 sync;
u64 ids;
u32 __user *ids_pointer;
@@ -750,7 +754,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
 
ids_pointer = u64_to_user_ptr(ids);
 
-   for (int j = 0; j < copy.nperfmons; j++) {
+   for (j = 0; j < copy.nperfmons; j++) {
if (get_user(id, ids_pointer++)) {
kvfree(qinfo->queries);
return -EFAULT;
-- 
2.44.0



[PATCH 03/12] drm/v3d: Add some local variables in queries/extensions

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Add some local variables to make the code a bit less verbose, with the
main benefit being pulling some lines to under 80 columns wide.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 103 ---
 1 file changed, 54 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 5c71e9adfc65..f99cd61a3e65 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   struct v3d_timestamp_query_info *qinfo = >timestamp_query;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -471,10 +472,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
 
-   job->timestamp_query.queries = kvmalloc_array(timestamp.count,
- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   qinfo->queries = kvmalloc_array(timestamp.count,
+   sizeof(struct v3d_timestamp_query),
+   GFP_KERNEL);
+   if (!qinfo->queries)
return -ENOMEM;
 
offsets = u64_to_user_ptr(timestamp.offsets);
@@ -484,20 +485,20 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
u32 offset, sync;
 
if (get_user(offset, offsets++)) {
-   kvfree(job->timestamp_query.queries);
+   kvfree(qinfo->queries);
return -EFAULT;
}
 
-   job->timestamp_query.queries[i].offset = offset;
+   qinfo->queries[i].offset = offset;
 
if (get_user(sync, syncs++)) {
-   kvfree(job->timestamp_query.queries);
+   kvfree(qinfo->queries);
return -EFAULT;
}
 
-   job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
}
-   job->timestamp_query.count = timestamp.count;
+   qinfo->count = timestamp.count;
 
return 0;
 }
@@ -509,6 +510,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   struct v3d_timestamp_query_info *qinfo = >timestamp_query;
 
if (!job) {
DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -525,10 +527,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY;
 
-   job->timestamp_query.queries = kvmalloc_array(reset.count,
- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   qinfo->queries = kvmalloc_array(reset.count,
+   sizeof(struct v3d_timestamp_query),
+   GFP_KERNEL);
+   if (!qinfo->queries)
return -ENOMEM;
 
syncs = u64_to_user_ptr(reset.syncs);
@@ -536,16 +538,16 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
for (int i = 0; i < reset.count; i++) {
u32 sync;
 
-   job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
+   qinfo->queries[i].offset = reset.offset + 8 * i;
 
if (get_user(sync, syncs++)) {
-   kvfree(job->timestamp_query.queries);
+   kvfree(qinfo->queries);
return -EFAULT;
}
 
-   job->timestamp_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
+   qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
}
-   job->timestamp_query.count = reset.count;
+   qinfo->count = reset.count;
 
return 0;
 }
@@ -558,6 +560,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
 {
u32 __user *offsets, *syncs;
struct drm_v3d_copy_timestamp_query copy;
+   struct v3d_timestamp_query_info *qinfo = >timestamp_query;
int i;
 
if (!job) {
@@ -578,10 +581,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
 
job->job_type = V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY;
 
-   job-&g

[PATCH 02/12] drm/v3d: Prefer get_user for scalar types

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

It makes it just a tiny bit more obvious what is going on.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_submit.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 263fefc1d04f..5c71e9adfc65 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -483,14 +483,14 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
for (int i = 0; i < timestamp.count; i++) {
u32 offset, sync;
 
-   if (copy_from_user(, offsets++, sizeof(offset))) {
+   if (get_user(offset, offsets++)) {
kvfree(job->timestamp_query.queries);
return -EFAULT;
}
 
job->timestamp_query.queries[i].offset = offset;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
kvfree(job->timestamp_query.queries);
return -EFAULT;
}
@@ -538,7 +538,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
 
job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
kvfree(job->timestamp_query.queries);
return -EFAULT;
}
@@ -590,14 +590,14 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
for (i = 0; i < copy.count; i++) {
u32 offset, sync;
 
-   if (copy_from_user(, offsets++, sizeof(offset))) {
+   if (get_user(offset, offsets++)) {
kvfree(job->timestamp_query.queries);
return -EFAULT;
}
 
job->timestamp_query.queries[i].offset = offset;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
kvfree(job->timestamp_query.queries);
return -EFAULT;
}
@@ -657,14 +657,14 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u32 __user *ids_pointer;
u32 id;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
kvfree(job->performance_query.queries);
return -EFAULT;
}
 
job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
 
-   if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
+   if (get_user(ids, kperfmon_ids++)) {
kvfree(job->performance_query.queries);
return -EFAULT;
}
@@ -672,7 +672,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
ids_pointer = u64_to_user_ptr(ids);
 
for (int j = 0; j < reset.nperfmons; j++) {
-   if (copy_from_user(, ids_pointer++, sizeof(id))) {
+   if (get_user(id, ids_pointer++)) {
kvfree(job->performance_query.queries);
return -EFAULT;
}
@@ -731,14 +731,14 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
u32 __user *ids_pointer;
u32 id;
 
-   if (copy_from_user(, syncs++, sizeof(sync))) {
+   if (get_user(sync, syncs++)) {
kvfree(job->performance_query.queries);
return -EFAULT;
}
 
job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
 
-   if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
+   if (get_user(ids, kperfmon_ids++)) {
kvfree(job->performance_query.queries);
return -EFAULT;
}
@@ -746,7 +746,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
ids_pointer = u64_to_user_ptr(ids);
 
for (int j = 0; j < copy.nperfmons; j++) {
-   if (copy_from_user(, ids_pointer++, sizeof(id))) {
+   if (get_user(id, ids_pointer++)) {
kvfree(job->performance_query.queries);
return -EFAULT;
}
-- 
2.44.0



[PATCH 00/12] v3d: Perfmon cleanup

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

When we had to quickly deal with a tree build issue via merging
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we
promised to follow up with a nicer solution.

As in the process of eliminating the hardcoded defines we have discovered a few
issues in handling of corner cases and userspace input validation, the fix has
turned into a larger series, but hopefully the end result is a justifiable
cleanup.

Tvrtko Ursulin (12):
  drm/v3d: Prevent out of bounds access in performance query extensions
  drm/v3d: Prefer get_user for scalar types
  drm/v3d: Add some local variables in queries/extensions
  drm/v3d: Align data types of internal and uapi counts
  drm/v3d: Fix potential memory leak in the timestamp extension
  drm/v3d: Fix potential memory leak in the performance extension
  drm/v3d: Validate passed in drm syncobj handles in the timestamp
extension
  drm/v3d: Validate passed in drm syncobj handles in the performance
extension
  drm/v3d: Move part of copying of reset/copy performance extension to a
helper
  drm/v3d: Size the kperfmon_ids array at runtime
  drm/v3d: Do not use intermediate storage when copying performance
query results
  drm/v3d: Move perfmon init completely into own unit

 drivers/gpu/drm/v3d/v3d_drv.c |   9 +-
 drivers/gpu/drm/v3d/v3d_drv.h |  16 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c |  44 +--
 .../gpu/drm/v3d/v3d_performance_counters.h|  16 +-
 drivers/gpu/drm/v3d/v3d_sched.c   | 106 ---
 drivers/gpu/drm/v3d/v3d_submit.c  | 285 ++
 6 files changed, 281 insertions(+), 195 deletions(-)

-- 
2.44.0



[PATCH 01/12] drm/v3d: Prevent out of bounds access in performance query extensions

2024-07-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Check that the number of perfmons userspace is passing in the copy and
reset extensions is not greater than the internal kernel storage where
the ids will be copied into.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset 
performance query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
 drivers/gpu/drm/v3d/v3d_submit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 88f63d526b22..263fefc1d04f 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
 
+   if (reset.nperfmons > V3D_MAX_PERFMONS)
+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(reset.count,
@@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
 
+   if (copy.nperfmons > V3D_MAX_PERFMONS)
+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
 
job->performance_query.queries = kvmalloc_array(copy.count,
-- 
2.44.0



Re: [PATCH] drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8

2024-07-09 Thread Tvrtko Ursulin



On 09/07/2024 13:53, Nitin Gote wrote:

We're seeing a GPU HANG issue on a CHV platform, which was caused by
bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for 
gen8").

Gen8 platform has only timeslice and doesn't support a preemption mechanism
as engines do not have a preemption timer and doesn't send an irq if the
preemption timeout expires. So, add a fix to not consider preemption
during dequeuing for gen8 platforms.

Also move can_preemt() above need_preempt() function to resolve implicit
declaration of function ‘can_preempt' error and make can_preempt()
function param as const to resolve error: passing argument 1 of
‘can_preempt’ discards ‘const’ qualifier from the pointer target type.

Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for 
gen8")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396
Suggested-by: Andi Shyti 
Signed-off-by: Nitin Gote 
Cc: Chris Wilson 
CC:  # v5.2+
---
  .../drm/i915/gt/intel_execlists_submission.c  | 24 ---
  1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 21829439e686..30631cc690f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -294,11 +294,26 @@ static int virtual_prio(const struct 
intel_engine_execlists *el)
return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN;
  }
  
+static bool can_preempt(const struct intel_engine_cs *engine)

+{
+   if (GRAPHICS_VER(engine->i915) > 8)
+   return true;
+
+   if (IS_CHERRYVIEW(engine->i915) || IS_BROADWELL(engine->i915))
+   return false;
+
+   /* GPGPU on bdw requires extra w/a; not implemented */
+   return engine->class != RENDER_CLASS;


Aren't BDW and CHV the only Gen8 platforms, in which case this function 
can be simplifies as:


...
{
return GRAPHICS_VER(engine->i915) > 8;
}

?


+}
+
  static bool need_preempt(const struct intel_engine_cs *engine,
 const struct i915_request *rq)
  {
int last_prio;
  
+	if ((GRAPHICS_VER(engine->i915) <= 8) && can_preempt(engine))


The GRAPHICS_VER check here looks redundant with the one inside 
can_preempt().


Regards,

Tvrtko


+   return false;
+
if (!intel_engine_has_semaphores(engine))
return false;
  
@@ -3313,15 +3328,6 @@ static void remove_from_engine(struct i915_request *rq)

i915_request_notify_execute_cb_imm(rq);
  }
  
-static bool can_preempt(struct intel_engine_cs *engine)

-{
-   if (GRAPHICS_VER(engine->i915) > 8)
-   return true;
-
-   /* GPGPU on bdw requires extra w/a; not implemented */
-   return engine->class != RENDER_CLASS;
-}
-
  static void kick_execlists(const struct i915_request *rq, int prio)
  {
struct intel_engine_cs *engine = rq->engine;


[PULL] drm-intel-gt-next

2024-07-04 Thread Tvrtko Ursulin


Hi Dave, Sima,

The final pull for 6.11 is quite small and only contains a handful of
fixes in areas such as stolen memory probing on ATS-M, GuC priority
handling, out of memory reporting noise downgrade and fence register
hanlding race condition reported by CI.

Regards,

Tvrtko

drm-intel-gt-next-2024-07-04:
Driver Changes:

Fixes/improvements/new stuff:

- Downgrade stolen lmem setup warning [gem] (Jonathan Cavitt)
- Evaluate GuC priority within locks [gt/uc] (Andi Shyti)
- Fix potential UAF by revoke of fence registers [gt] (Janusz Krzysztofik)
- Return NULL instead of '0' [gem] (Andi Shyti)
- Use the correct format specifier for resource_size_t [gem] (Andi Shyti)
- Suppress oom warning in favour of ENOMEM to userspace [gem] (Nirmoy Das)

Miscellaneous:

- Evaluate forcewake usage within locks [gt] (Andi Shyti)
- Fix typo in comment [gt/uc] (Andi Shyti)
The following changes since commit 79655e867ad6dfde2734c67c7704c0dd5bf1e777:

  drm/i915/mtl: Update workaround 14018575942 (2024-06-11 16:06:20 +0200)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-gt-next-2024-07-04

for you to fetch changes up to 3b85152cb167bd24fe84ceb91b719b5904ca354f:

  drm/i915/gem: Suppress oom warning in favour of ENOMEM to userspace 
(2024-06-28 00:11:01 +0200)


Driver Changes:

Fixes/improvements/new stuff:

- Downgrade stolen lmem setup warning [gem] (Jonathan Cavitt)
- Evaluate GuC priority within locks [gt/uc] (Andi Shyti)
- Fix potential UAF by revoke of fence registers [gt] (Janusz Krzysztofik)
- Return NULL instead of '0' [gem] (Andi Shyti)
- Use the correct format specifier for resource_size_t [gem] (Andi Shyti)
- Suppress oom warning in favour of ENOMEM to userspace [gem] (Nirmoy Das)

Miscellaneous:

- Evaluate forcewake usage within locks [gt] (Andi Shyti)
- Fix typo in comment [gt/uc] (Andi Shyti)


Andi Shyti (5):
  drm/i915/gt: debugfs: Evaluate forcewake usage within locks
  drm/i915/gt/uc: Fix typo in comment
  drm/i915/gt/uc: Evaluate GuC priority within locks
  drm/i915/gem: Return NULL instead of '0'
  drm/i915/gem: Use the correct format specifier for resource_size_t

Janusz Krzysztofik (1):
  drm/i915/gt: Fix potential UAF by revoke of fence registers

Jonathan Cavitt (1):
  drm/i915/gem: Downgrade stolen lmem setup warning

Nirmoy Das (1):
  drm/i915/gem: Suppress oom warning in favour of ENOMEM to userspace

 drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  8 +--
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c |  4 
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++-
 drivers/gpu/drm/i915/i915_scatterlist.c   |  8 +++
 6 files changed, 32 insertions(+), 18 deletions(-)


Re: [PATCH 1/4] drm/scheduler: implement hardware time accounting

2024-07-02 Thread Tvrtko Ursulin



Hi,

I few questions below.

On 01/07/2024 18:14, Lucas Stach wrote:

From: Christian König 

Multiple drivers came up with the requirement to measure how
much runtime each entity accumulated on the HW.

A previous attempt of accounting this had to be reverted because
HW submissions can have a lifetime exceeding that of the entity
originally issuing them.

Amdgpu on the other hand solves this task by keeping track of
all the submissions and calculating how much time they have used
on demand.

Move this approach over into the scheduler to provide an easy to
use interface for all drivers.

Signed-off-by: Christian König 
Signed-off-by: Lucas Stach 
---
v2:
- rebase to v6.10-rc1
- fix for non-power-of-two number of HW submission
- add comment explaining the logic behind the fence tracking array
- rename some function and fix documentation
---
  drivers/gpu/drm/scheduler/sched_entity.c | 82 +++-
  drivers/gpu/drm/scheduler/sched_fence.c  | 19 ++
  include/drm/gpu_scheduler.h  | 31 +
  3 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 58c8161289fe..d678d0b9b29e 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -62,7 +62,9 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
  unsigned int num_sched_list,
  atomic_t *guilty)
  {
-   if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
+   unsigned int i, num_submissions = 0;
+
+   if (!entity || !sched_list)
return -EINVAL;
  
  	memset(entity, 0, sizeof(struct drm_sched_entity));

@@ -98,6 +100,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 (s32) 
DRM_SCHED_PRIORITY_KERNEL);
}
entity->rq = sched_list[0]->sched_rq[entity->priority];
+
+   for (i = 0; i < num_sched_list; ++i) {
+   num_submissions = max(num_submissions,
+ sched_list[i]->credit_limit);
+   }


Does this work (in concept and naming) for all drivers if introduction 
of credits broke the 1:1 between jobs and hw "ring" capacity?


How big is the array for different drivers?


}
  
  	init_completion(>entity_idle);

@@ -110,11 +117,52 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
  
  	atomic_set(>fence_seq, 0);

entity->fence_context = dma_fence_context_alloc(2);
+   spin_lock_init(>accounting_lock);
+
+   if (!num_submissions)
+   return 0;
+
+   entity->max_hw_submissions = num_submissions;
+   entity->hw_submissions = kcalloc(num_submissions, sizeof(void *),
+GFP_KERNEL);
+   if (!entity->hw_submissions)
+   return -ENOMEM;
  
  	return 0;

  }
  EXPORT_SYMBOL(drm_sched_entity_init);
  
+/**

+ * drm_sched_entity_time_spent - Accumulated HW runtime used by this entity
+ * @entity: scheduler entity to check
+ *
+ * Get the current accumulated HW runtime used by all submissions made through
+ * this entity.
+ */
+ktime_t drm_sched_entity_time_spent(struct drm_sched_entity *entity)
+{
+   ktime_t result;
+   unsigned int i;
+
+   if (!entity->max_hw_submissions)
+   return ns_to_ktime(0);
+
+   spin_lock(>accounting_lock);
+   result = entity->hw_time_used;
+   for (i = 0; i < entity->max_hw_submissions; ++i) {
+   struct drm_sched_fence *fence = entity->hw_submissions[i];
+
+   if (!fence)
+   continue;
+
+   result = ktime_add(result, drm_sched_fence_get_runtime(fence));


Does this end up counting from when jobs have been submitted to the hw 
backend and may not be actually executing?


Say if a driver configures a backend N deep and is filled with N jobs, 
while in actuality they are executed sequentially one at a time, the 
time as reported here would over-account by some series such as 
(job[0].finish - job[0].submit) + ... + (job[N].finish - job[N].submit)?


Or in other words if one submits N jobs to a ring serving an 1-wide hw 
backend, will we see "N*100%" utilisation instead of "100%" if sampling 
while first job is still executing, the rest queued behind it?


Regards,

Tvrtko


+   }
+   spin_unlock(>accounting_lock);
+
+   return result;
+}
+EXPORT_SYMBOL(drm_sched_entity_time_spent);
+
  /**
   * drm_sched_entity_modify_sched - Modify sched of an entity
   * @entity: scheduler entity to init
@@ -326,6 +374,8 @@ EXPORT_SYMBOL(drm_sched_entity_flush);
   */
  void drm_sched_entity_fini(struct drm_sched_entity *entity)
  {
+   unsigned int i;
+
/*
 * If consumption of existing IBs wasn't completed. Forcefully remove
 * them here. Also makes sure that the 

Re: [RFC PATCH 2/6] drm/cgroup: Add memory accounting DRM cgroup

2024-07-01 Thread Tvrtko Ursulin



On 01/07/2024 10:25, Maarten Lankhorst wrote:

Den 2024-06-28 kl. 16:04, skrev Maxime Ripard:

Hi,

On Thu, Jun 27, 2024 at 09:22:56PM GMT, Maarten Lankhorst wrote:

Den 2024-06-27 kl. 19:16, skrev Maxime Ripard:

Hi,

Thanks for working on this!

On Thu, Jun 27, 2024 at 05:47:21PM GMT, Maarten Lankhorst wrote:

The initial version was based roughly on the rdma and misc cgroup
controllers, with a lot of the accounting code borrowed from rdma.

The current version is a complete rewrite with page counter; it uses
the same min/low/max semantics as the memory cgroup as a result.

There's a small mismatch as TTM uses u64, and page_counter long pages.
In practice it's not a problem. 32-bits systems don't really come with

=4GB cards and as long as we're consistently wrong with units, it's

fine. The device page size may not be in the same units as kernel page
size, and each region might also have a different page size (VRAM vs GART
for example).

The interface is simple:
- populate drmcgroup_device->regions[..] name and size for each active
region, set num_regions accordingly.
- Call drm(m)cg_register_device()
- Use drmcg_try_charge to check if you can allocate a chunk of memory,
use drmcg_uncharge when freeing it. This may return an error code,
or -EAGAIN when the cgroup limit is reached. In that case a reference
to the limiting pool is returned.
- The limiting cs can be used as compare function for
drmcs_evict_valuable.
- After having evicted enough, drop reference to limiting cs with
drmcs_pool_put.

This API allows you to limit device resources with cgroups.
You can see the supported cards in /sys/fs/cgroup/drm.capacity
You need to echo +drm to cgroup.subtree_control, and then you can
partition memory.

Signed-off-by: Maarten Lankhorst
Co-developed-by: Friedrich Vock

I'm sorry, I should have wrote minutes on the discussion we had with TJ
and Tvrtko the other day.

We're all very interested in making this happen, but doing a "DRM"
cgroup doesn't look like the right path to us.

Indeed, we have a significant number of drivers that won't have a
dedicated memory but will depend on DMA allocations one way or the
other, and those pools are shared between multiple frameworks (DRM,
V4L2, DMA-Buf Heaps, at least).

This was also pointed out by Sima some time ago here:
https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/

So we'll want that cgroup subsystem to be cross-framework. We settled on
a "device" cgroup during the discussion, but I'm sure we'll have plenty
of bikeshedding.

The other thing we agreed on, based on the feedback TJ got on the last
iterations of his series was to go for memcg for drivers not using DMA
allocations.

It's the part where I expect some discussion there too :)

So we went back to a previous version of TJ's work, and I've started to
work on:

- Integration of the cgroup in the GEM DMA and GEM VRAM helpers (this
  works on tidss right now)

- Integration of all heaps into that cgroup but the system one
  (working on this at the moment)


Should be similar to what I have then. I think you could use my work to
continue it.

I made nothing DRM specific except the name, if you renamed it the device
resource management cgroup and changed the init function signature to take a
name instead of a drm pointer, nothing would change. This is exactly what
I'm hoping to accomplish, including reserving memory.


I've started to work on rebasing my current work onto your series today,
and I'm not entirely sure how what I described would best fit. Let's
assume we have two KMS device, one using shmem, one using DMA
allocations, two heaps, one using the page allocator, the other using
CMA, and one v4l2 device using dma allocations.

So we would have one KMS device and one heap using the page allocator,
and one KMS device, one heap, and one v4l2 driver using the DMA
allocator.

Would these make different cgroup devices, or different cgroup regions?


Each driver would register a device, whatever feels most logical for that 
device I suppose.

My guess is that a prefix would also be nice here, so register a device with 
name of drm/$name or v4l2/$name, heap/$name. I didn't give it much thought and 
we're still experimenting, so just try something. :)

There's no limit to amount of devices, I only fixed amount of pools to match 
TTM, but even that could be increased arbitrarily. I just don't think there is 
a point in doing so.


Do we need a plan for top level controls which do not include region 
names? If the latter will be driver specific then I am thinking of ease 
of configuring it all from the outside. Especially considering that one 
cgroup can have multiple devices in it.


Second question is about double accounting for shmem backed objects. I 
think they will be seen, for drivers which allocate backing store at 
buffer objects creation time, under the cgroup of process doing the 
creation, in the existing memory controller. Right?


Is 

Re: [PATCH 5/6] drm/amdgpu: always enable move threshold for BOs

2024-06-28 Thread Tvrtko Ursulin



Hey Christian,

Any thoughts on the below reply? Did I get it wrong or I found a 
legitimate issue?


Regards,

Tvrtko

On 14/06/2024 17:06, Tvrtko Ursulin wrote:


On 14/06/2024 10:53, Christian König wrote:


  if (domain & abo->preferred_domains & 
AMDGPU_GEM_DOMAIN_VRAM &&

-    !(adev->flags & AMD_IS_APU))
-    places[c].flags |= TTM_PL_FLAG_FALLBACK;
+    !(adev->flags & AMD_IS_APU)) {
+    /*
+ * When GTT is just an alternative to VRAM make sure 
that we
+ * only use it as fallback and still try to fill up 
VRAM first.

+    */
+    if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
+    places[c].flags |= TTM_PL_FLAG_FALLBACK;
+
+    /*
+ * Enable GTT when the threshold of moved bytes is
+ * reached. This prevents any non essential buffer move
+ * when the links are already saturated.
+ */
+    places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
+    }


For the APU case I *think* this works, but for discrete I am not sure 
yet.


Agree, APUs are basically already fine as they are. VRAM is just used 
so that it isn't wasted there.


Well yeah it works, but because re-validation is broken so it cannot hit 
the broken migration budget. ;)


As a side note and disclaimer, the TTM "resource compatible" logic 
has a half-life of about one week in my brain until I need to almost 
re-figure it all out. I don't know if it just me, but I find it 
really non-intuitive and almost like double, triple, or even 
quadruple negation way of thinking about things.


Yeah I was also going back and forth between the different approaches 
multiple times and just ended up in this implementation because it 
seemed to do what I wanted to have.


It's certainly not very intuitive what's going on here.



It is not helping that with this proposal you set threshold on just 
one of the possible object placements which further increases the 
asymmetry. For me intuitive thing would be that thresholds apply to 
the act of changing the current placement directly. Not indirectly 
via playing with one of the placement flags dynamically.


Interesting idea, how would the handling then be? Currently we have 
only the stages - 'don't evict' and 'evict'. Should we make it 
something more like 'don't move', 'move', 'evict' ?


Intuitively I would think "don't move" aligns with the "out of migration 
budget" concept.


Since in this patch you add move_threshold to ttm_operation_context 
could it simply be used as the overall criteria if it is set?


In a way like:

  1. If the current placement is from the list of userspace supplied 
valid ones, and

  2. Migration limit has been set, and
  3. It is spent.

-> Then just don't migrate, return "all is good" from ttm_bo_validate.

Though I am not sure at the moment how that would interact with the 
amdgpu_evict_flags and placements userspace did not specify.


Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and 
TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it 
will be considered compatible while under the migration budget?


(Side note, the fact both flags are set I also find very difficult to 
mentally model.)


Say a buffer was evicted to GTT already. What then brings it back to 
VRAM?


The first subsequent ttm_bo_validate pass (!evicting) says GTT is 
fine (applicable) while ctx->bytes_moved < ctx->move_threshold, no? 
Isn't that the opposite of what would be required and causes nothing 
to be migrated back in? What am I missing?


The flag says that GTT is fine when ctx->bytes_moved >= 
ctx->move_threshold. The logic is exactly inverted to what you described.


This way a BO will be moved back into VRAM as long as bytes moved 
doesn't exceed the threshold.


I'm afraid I need to sketch it out... If buffer is currently in GTT and 
placements are VRAM+GTT.


ttm_bo_validate(evicting=false)

1st iteration:
res=GTT != place=VRAM
    continue

2nd iteration:
res=GTT == place=GTT+FALLBACK+THRESHOLD

ttm_place_applicable(GTT)
   moved < threshold
     return true

Buffer stays in GTT while under migration budget -> wrong, no? Or am I 
still confused?


Regards,

Tvrtko

Setting both flags has the effect of saying: It's ok that the BO stays 
in GTT when you either above the move threshold or would have to evict 
something.


Regards,
Christian.



Regards,

Tvrtko




Re: [PATCH] dma-buf/sw_sync: Add a reference when adding fence to timeline list

2024-06-19 Thread Tvrtko Ursulin



On 14/06/2024 19:00, Thadeu Lima de Souza Cascardo wrote:

On Fri, Jun 14, 2024 at 11:52:03AM +0100, Tvrtko Ursulin wrote:


On 24/03/2024 10:15, Thadeu Lima de Souza Cascardo wrote:

commit e531fdb5cd5e ("dma-buf/sw_sync: Avoid recursive lock during fence
signal") fixed a recursive locking when a signal callback released a fence.
It did it by taking an extra reference while traversing it on the list and
holding the timeline lock.

However, this is racy and may end up adding to a kref that is 0, triggering
a well deserved warning, as later that reference would be put again.

CPU 0   CPU 1
sync_file_release   sync_timeline_signal
dma_fence_put
  timeline_fence_release
  spin_lock_irq(>lock)
  dma_fence_get(>base)
  spin_lock_irqsave(fence->lock, flags)

As shown above, it is possible for the last reference to be dropped, but
sync_timeline_signal take the lock before timeline_fence_release, which
will lead to a 0->1 kref transition, which is not allowed.

This is because there is still a pointer to the fence object in the list,
which should be accounted as a reference.

In previous discussions about this [3], it was called out that keeping such
a reference was not a good idea because the fence also holds a reference to
the timeline, hence leading to a loop. However, accounting for that
reference doesn't change that the loop already exists. And userspace holds
references in the form of file descriptors, so it is still possible to
avoid potential memory leaks.

This fix also avoids other issues. The nested locking is still possible to
trigger when closing the timeline, as sw_sync_debugfs_release also calls
dma_fence_signal_locked while holding the lock. By holding a reference and
releasing it only after doing the signal, that nested locking is avoided.

There are a few quirks about the reference counting here, though.

In the simple case when sync_pt_create adds a new fence to the list, it
returns with 2 references instead of 1. That is dealt with as
sw_sync_ioctl_create_fence always puts a reference after calling
sync_file_create. That is necessary for multiple reasons.

One is that it takes care of the error case when sync_file_create fails.

The extra reference is put, while the fence is still held on the list, so
its last reference will be put when it is removed from the list either in
sync_timeline_signal or sw_sync_debugfs_release.


So any fences where sync_file_create failed linger around until
sw_sync_debugfs_release? Okay-ish I guess since it is a pathological case.



The challenge here is to determine which one of the multiple cases we are
dealing with. Since we don't hold the lock while sync_file_create is
called, we are left with this situation. An alternative would be to fold
sync_pt_create into sw_sync_ioctl_create_fence, so at least we can
determine which case is which. That would also fix the case where we handle
userspace a file descriptor with a fence that is not even on the list.


Since sync_pt_create is local and has only this single caller it could 
be worth exploring this option to see if it could simplify things and 
get rid of this lingering objects corner case.



It also avoids the race when a signal may come in between sync_pt_create
and sync_file_create as the lock is dropped. If that happens, the fence
will be removed from the list, but a reference will still be kept as
sync_file_create takes a reference.

Then, there is the case when a fence with the given seqno already exists.
sync_pt_create returns with an extra reference to it, that we later put.
Similar reasoning can be applied here. That one extra reference is
necessary to avoid a race with signaling (and release), and we later put
that extra reference.

Finally, there is the case when the fence is already signaled and not added
to the list. In such case, sync_pt_create must return with a single
reference as this fence has not been added to the timeline list. It will
either be freed in case sync_file_create fails or the file will keep its
reference, which is later put when the file is released.

This is based on Chris Wilson attempt [2] to fix recursive locking during
timeline signal. Hence, their signoff.

Link: 
https://lore.kernel.org/all/20200714154102.450826-1-...@basnieuwenhuizen.nl/ [1]
Link: 
https://lore.kernel.org/all/20200715100432.13928-2-ch...@chris-wilson.co.uk/ [2]
Link: 
https://lore.kernel.org/all/20230817213729.110087-1-robdcl...@gmail.com/T/ [3]
Fixes: e531fdb5cd5e ("dma-buf/sw_sync: Avoid recursive lock during fence 
signal")
Signed-off-by: Chris Wilson 
Signed-off-by: Thadeu Lima de Souza Cascardo 
Cc: Chris Wilson 
Cc: Bas Nieuwenhuizen 
Cc: Rob Clark 
---
   drivers/dma-buf/sw_sync.c | 42 ---
   1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/sw_sync.c b

Re: [PATCH 5/6] drm/amdgpu: always enable move threshold for BOs

2024-06-14 Thread Tvrtko Ursulin



On 14/06/2024 10:53, Christian König wrote:


  if (domain & abo->preferred_domains & 
AMDGPU_GEM_DOMAIN_VRAM &&

-    !(adev->flags & AMD_IS_APU))
-    places[c].flags |= TTM_PL_FLAG_FALLBACK;
+    !(adev->flags & AMD_IS_APU)) {
+    /*
+ * When GTT is just an alternative to VRAM make sure 
that we
+ * only use it as fallback and still try to fill up VRAM 
first.

+    */
+    if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
+    places[c].flags |= TTM_PL_FLAG_FALLBACK;
+
+    /*
+ * Enable GTT when the threshold of moved bytes is
+ * reached. This prevents any non essential buffer move
+ * when the links are already saturated.
+ */
+    places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
+    }


For the APU case I *think* this works, but for discrete I am not sure 
yet.


Agree, APUs are basically already fine as they are. VRAM is just used so 
that it isn't wasted there.


Well yeah it works, but because re-validation is broken so it cannot hit 
the broken migration budget. ;)


As a side note and disclaimer, the TTM "resource compatible" logic has 
a half-life of about one week in my brain until I need to almost 
re-figure it all out. I don't know if it just me, but I find it really 
non-intuitive and almost like double, triple, or even quadruple 
negation way of thinking about things.


Yeah I was also going back and forth between the different approaches 
multiple times and just ended up in this implementation because it 
seemed to do what I wanted to have.


It's certainly not very intuitive what's going on here.



It is not helping that with this proposal you set threshold on just 
one of the possible object placements which further increases the 
asymmetry. For me intuitive thing would be that thresholds apply to 
the act of changing the current placement directly. Not indirectly via 
playing with one of the placement flags dynamically.


Interesting idea, how would the handling then be? Currently we have only 
the stages - 'don't evict' and 'evict'. Should we make it something more 
like 'don't move', 'move', 'evict' ?


Intuitively I would think "don't move" aligns with the "out of migration 
budget" concept.


Since in this patch you add move_threshold to ttm_operation_context 
could it simply be used as the overall criteria if it is set?


In a way like:

 1. If the current placement is from the list of userspace supplied 
valid ones, and

 2. Migration limit has been set, and
 3. It is spent.

-> Then just don't migrate, return "all is good" from ttm_bo_validate.

Though I am not sure at the moment how that would interact with the 
amdgpu_evict_flags and placements userspace did not specify.


Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and 
TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it will 
be considered compatible while under the migration budget?


(Side note, the fact both flags are set I also find very difficult to 
mentally model.)


Say a buffer was evicted to GTT already. What then brings it back to 
VRAM?


The first subsequent ttm_bo_validate pass (!evicting) says GTT is fine 
(applicable) while ctx->bytes_moved < ctx->move_threshold, no? Isn't 
that the opposite of what would be required and causes nothing to be 
migrated back in? What am I missing?


The flag says that GTT is fine when ctx->bytes_moved >= 
ctx->move_threshold. The logic is exactly inverted to what you described.


This way a BO will be moved back into VRAM as long as bytes moved 
doesn't exceed the threshold.


I'm afraid I need to sketch it out... If buffer is currently in GTT and 
placements are VRAM+GTT.


ttm_bo_validate(evicting=false)

1st iteration:
res=GTT != place=VRAM
   continue

2nd iteration:
res=GTT == place=GTT+FALLBACK+THRESHOLD

ttm_place_applicable(GTT)
  moved < threshold
return true

Buffer stays in GTT while under migration budget -> wrong, no? Or am I 
still confused?


Regards,

Tvrtko

Setting both flags has the effect of saying: It's ok that the BO stays 
in GTT when you either above the move threshold or would have to evict 
something.


Regards,
Christian.



Regards,

Tvrtko




Re: [PATCH] dma-buf/sw_sync: Add a reference when adding fence to timeline list

2024-06-14 Thread Tvrtko Ursulin



On 24/03/2024 10:15, Thadeu Lima de Souza Cascardo wrote:

commit e531fdb5cd5e ("dma-buf/sw_sync: Avoid recursive lock during fence
signal") fixed a recursive locking when a signal callback released a fence.
It did it by taking an extra reference while traversing it on the list and
holding the timeline lock.

However, this is racy and may end up adding to a kref that is 0, triggering
a well deserved warning, as later that reference would be put again.

CPU 0   CPU 1
sync_file_release   sync_timeline_signal
   dma_fence_put
 timeline_fence_release
  spin_lock_irq(>lock)
  dma_fence_get(>base)
 spin_lock_irqsave(fence->lock, flags)

As shown above, it is possible for the last reference to be dropped, but
sync_timeline_signal take the lock before timeline_fence_release, which
will lead to a 0->1 kref transition, which is not allowed.

This is because there is still a pointer to the fence object in the list,
which should be accounted as a reference.

In previous discussions about this [3], it was called out that keeping such
a reference was not a good idea because the fence also holds a reference to
the timeline, hence leading to a loop. However, accounting for that
reference doesn't change that the loop already exists. And userspace holds
references in the form of file descriptors, so it is still possible to
avoid potential memory leaks.

This fix also avoids other issues. The nested locking is still possible to
trigger when closing the timeline, as sw_sync_debugfs_release also calls
dma_fence_signal_locked while holding the lock. By holding a reference and
releasing it only after doing the signal, that nested locking is avoided.

There are a few quirks about the reference counting here, though.

In the simple case when sync_pt_create adds a new fence to the list, it
returns with 2 references instead of 1. That is dealt with as
sw_sync_ioctl_create_fence always puts a reference after calling
sync_file_create. That is necessary for multiple reasons.

One is that it takes care of the error case when sync_file_create fails.

The extra reference is put, while the fence is still held on the list, so
its last reference will be put when it is removed from the list either in
sync_timeline_signal or sw_sync_debugfs_release.


So any fences where sync_file_create failed linger around until 
sw_sync_debugfs_release? Okay-ish I guess since it is a pathological case.



It also avoids the race when a signal may come in between sync_pt_create
and sync_file_create as the lock is dropped. If that happens, the fence
will be removed from the list, but a reference will still be kept as
sync_file_create takes a reference.

Then, there is the case when a fence with the given seqno already exists.
sync_pt_create returns with an extra reference to it, that we later put.
Similar reasoning can be applied here. That one extra reference is
necessary to avoid a race with signaling (and release), and we later put
that extra reference.

Finally, there is the case when the fence is already signaled and not added
to the list. In such case, sync_pt_create must return with a single
reference as this fence has not been added to the timeline list. It will
either be freed in case sync_file_create fails or the file will keep its
reference, which is later put when the file is released.

This is based on Chris Wilson attempt [2] to fix recursive locking during
timeline signal. Hence, their signoff.

Link: 
https://lore.kernel.org/all/20200714154102.450826-1-...@basnieuwenhuizen.nl/ [1]
Link: 
https://lore.kernel.org/all/20200715100432.13928-2-ch...@chris-wilson.co.uk/ [2]
Link: 
https://lore.kernel.org/all/20230817213729.110087-1-robdcl...@gmail.com/T/ [3]
Fixes: e531fdb5cd5e ("dma-buf/sw_sync: Avoid recursive lock during fence 
signal")
Signed-off-by: Chris Wilson 
Signed-off-by: Thadeu Lima de Souza Cascardo 
Cc: Chris Wilson 
Cc: Bas Nieuwenhuizen 
Cc: Rob Clark 
---
  drivers/dma-buf/sw_sync.c | 42 ---
  1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index c353029789cf..83b624ac4faa 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -151,16 +151,7 @@ static const char *timeline_fence_get_timeline_name(struct 
dma_fence *fence)
  
  static void timeline_fence_release(struct dma_fence *fence)

  {
-   struct sync_pt *pt = dma_fence_to_sync_pt(fence);
struct sync_timeline *parent = dma_fence_parent(fence);
-   unsigned long flags;
-
-   spin_lock_irqsave(fence->lock, flags);
-   if (!list_empty(>link)) {
-   list_del(>link);
-   rb_erase(>node, >pt_tree);
-   }
-   spin_unlock_irqrestore(fence->lock, flags);
  
  	sync_timeline_put(parent);

dma_fence_free(fence);
@@ -229,7 +220,6 @@ static const struct 

[PULL] drm-intel-gt-next

2024-06-12 Thread Tvrtko Ursulin


Hi Dave, Sima,

Here is the main pull request for drm-intel-gt-next targeting 6.11.

First is the new userspace API for allowing upload of custom context
state used for replaying GPU hang error state captures. This will be
used by Mesa (see
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594) for
debugging GPU hangs captured in the wild on real hardware. So far that
was only possible under simulation and that via some hacks. Also,
simulation in general has certain limitations to what hangs it can
reproduce. As the UAPI it is intended for Mesa developers only, it is
hidden behind a kconfig and runtime enablement switches.

Then there are fixes for hangs on Meteorlake due incorrect reduced CCS
configuration and a missing video engine workaround. Then fixes for a
couple race conditions in multi GT and breadcrumb handling, and a more
robust functional level reset by extending the timeout used.

A couple tiny cleanups here and there and finally one back-merge which
was required to land some display code base refactoring.

Regards,

Tvrtko

drm-intel-gt-next-2024-06-12:
UAPI Changes:

- Support replaying GPU hangs with captured context image (Tvrtko Ursulin)

Driver Changes:

Fixes/improvements/new stuff:

- Automate CCS Mode setting during engine resets [gt] (Andi Shyti)
- Revert "drm/i915: Remove extra multi-gt pm-references" (Janusz Krzysztofik)
- Fix HAS_REGION() usage in intel_gt_probe_lmem() (Ville Syrjälä)
- Disarm breadcrumbs if engines are already idle [gt] (Chris Wilson)
- Shadow default engine context image in the context (Tvrtko Ursulin)
- Support replaying GPU hangs with captured context image (Tvrtko Ursulin)
- avoid FIELD_PREP warning [guc] (Arnd Bergmann)
- Fix CCS id's calculation for CCS mode setting [gt] (Andi Shyti)
- Increase FLR timeout from 3s to 9s (Andi Shyti)
- Update workaround 14018575942 [mtl] (Angus Chen)

Future platform enablement:

- Enable w/a 16021333562 for DG2, MTL and ARL [guc] (John Harrison)

Miscellaneous:

- Pass the region ID rather than a bitmask to HAS_REGION() (Ville Syrjälä)
- Remove counter productive REGION_* wrappers (Ville Syrjälä)
- Fix typo [gem/i915_gem_ttm_move] (Deming Wang)
- Delete the live_hearbeat_fast selftest [gt] (Krzysztof Niemiec)
The following changes since commit 431c590c3ab0469dfedad3a832fe73556396ee52:

  drm/tests: Add a unit test for range bias allocation (2024-05-16 12:50:14 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-gt-next-2024-06-12

for you to fetch changes up to 79655e867ad6dfde2734c67c7704c0dd5bf1e777:

  drm/i915/mtl: Update workaround 14018575942 (2024-06-11 16:06:20 +0200)


UAPI Changes:

- Support replaying GPU hangs with captured context image (Tvrtko Ursulin)

Driver Changes:

Fixes/improvements/new stuff:

- Automate CCS Mode setting during engine resets [gt] (Andi Shyti)
- Revert "drm/i915: Remove extra multi-gt pm-references" (Janusz Krzysztofik)
- Fix HAS_REGION() usage in intel_gt_probe_lmem() (Ville Syrjälä)
- Disarm breadcrumbs if engines are already idle [gt] (Chris Wilson)
- Shadow default engine context image in the context (Tvrtko Ursulin)
- Support replaying GPU hangs with captured context image (Tvrtko Ursulin)
- avoid FIELD_PREP warning [guc] (Arnd Bergmann)
- Fix CCS id's calculation for CCS mode setting [gt] (Andi Shyti)
- Increase FLR timeout from 3s to 9s (Andi Shyti)
- Update workaround 14018575942 [mtl] (Angus Chen)

Future platform enablement:

- Enable w/a 16021333562 for DG2, MTL and ARL [guc] (John Harrison)

Miscellaneous:

- Pass the region ID rather than a bitmask to HAS_REGION() (Ville Syrjälä)
- Remove counter productive REGION_* wrappers (Ville Syrjälä)
- Fix typo [gem/i915_gem_ttm_move] (Deming Wang)
- Delete the live_hearbeat_fast selftest [gt] (Krzysztof Niemiec)


Andi Shyti (3):
  drm/i915/gt: Automate CCS Mode setting during engine resets
  drm/i915/gt: Fix CCS id's calculation for CCS mode setting
  drm/i915: Increase FLR timeout from 3s to 9s

Angus Chen (1):
  drm/i915/mtl: Update workaround 14018575942

Arnd Bergmann (1):
  drm/i915/guc: avoid FIELD_PREP warning

Chris Wilson (1):
  drm/i915/gt: Disarm breadcrumbs if engines are already idle

Deming Wang (1):
  drm/i915/gem/i915_gem_ttm_move: Fix typo

Janusz Krzysztofik (1):
  Revert "drm/i915: Remove extra multi-gt pm-references"

John Harrison (1):
  drm/i915/guc: Enable w/a 16021333562 for DG2, MTL and ARL

Niemiec, Krzysztof (1):
  drm/i915/gt: Delete the live_hearbeat_fast selftest

Tvrtko Ursulin (3):
  Merge drm/drm-next into drm-intel-gt-next
  drm/i915: Shadow default engine context image in the context
  drm/i915: Support replaying GPU hangs with captured context image

Ville Syrjälä (3):
  drm/i915: Fix HAS_REGION() usage in i

Re: [PATCH 5/6] drm/amdgpu: always enable move threshold for BOs

2024-06-11 Thread Tvrtko Ursulin



Hi Christian,

On 04/06/2024 17:05, Christian König wrote:

This should prevent buffer moves when the threshold is reached during
CS.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 36 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 22 +
  2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ec888fc6ead8..9a217932a4fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,7 +784,6 @@ static int amdgpu_cs_bo_validate(void *param, struct 
amdgpu_bo *bo)
.no_wait_gpu = false,
.resv = bo->tbo.base.resv
};
-   uint32_t domain;
int r;
  
  	if (bo->tbo.pin_count)

@@ -796,37 +795,28 @@ static int amdgpu_cs_bo_validate(void *param, struct 
amdgpu_bo *bo)
if (p->bytes_moved < p->bytes_moved_threshold &&
(!bo->tbo.base.dma_buf ||
list_empty(>tbo.base.dma_buf->attachments))) {
+
+   /* And don't move a CPU_ACCESS_REQUIRED BO to limited
+* visible VRAM if we've depleted our allowance to do
+* that.
+*/
if (!amdgpu_gmc_vram_full_visible(>gmc) &&
-   (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {
-   /* And don't move a CPU_ACCESS_REQUIRED BO to limited
-* visible VRAM if we've depleted our allowance to do
-* that.
-*/
-   if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
-   domain = bo->preferred_domains;
-   else
-   domain = bo->allowed_domains;
-   } else {
-   domain = bo->preferred_domains;
-   }
-   } else {
-   domain = bo->allowed_domains;
+   (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) &&
+   p->bytes_moved_vis < p->bytes_moved_vis_threshold)
+   ctx.move_threshold = p->bytes_moved_vis_threshold -
+   p->bytes_moved_vis;
+   else
+   ctx.move_threshold = p->bytes_moved_vis_threshold -
+   p->bytes_moved;
}
  
-retry:

-   amdgpu_bo_placement_from_domain(bo, domain);
+   amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
r = ttm_bo_validate(>tbo, >placement, );
  
  	p->bytes_moved += ctx.bytes_moved;

if (!amdgpu_gmc_vram_full_visible(>gmc) &&
amdgpu_res_cpu_visible(adev, bo->tbo.resource))
p->bytes_moved_vis += ctx.bytes_moved;
-
-   if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
-   domain = bo->allowed_domains;
-   goto retry;
-   }
-
return r;
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index 8c92065c2d52..cae1a5420c58 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -168,13 +168,23 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
AMDGPU_PL_PREEMPT : TTM_PL_TT;
places[c].flags = 0;
-   /*
-* When GTT is just an alternative to VRAM make sure that we
-* only use it as fallback and still try to fill up VRAM first.
-*/
+
if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM &&
-   !(adev->flags & AMD_IS_APU))
-   places[c].flags |= TTM_PL_FLAG_FALLBACK;
+   !(adev->flags & AMD_IS_APU)) {
+   /*
+* When GTT is just an alternative to VRAM make sure 
that we
+* only use it as fallback and still try to fill up 
VRAM first.
+   */
+   if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
+   places[c].flags |= TTM_PL_FLAG_FALLBACK;
+
+   /*
+* Enable GTT when the threshold of moved bytes is
+* reached. This prevents any non essential buffer move
+* when the links are already saturated.
+*/
+   places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
+   }


For the APU case I *think* this works, but for discrete I am not sure yet.

As a side note and disclaimer, the TTM "resource compatible" logic has a 
half-life of about one week in my brain until I need to almost re-figure 
it all out. I don't know if it just me, but I find it really 

Re: [PATCH] drm/i915/gt: debugfs: Evaluate forcewake usage within locks

2024-06-11 Thread Tvrtko Ursulin



On 10/06/2024 10:24, Nirmoy Das wrote:

Hi Andi,

On 6/7/2024 4:51 PM, Andi Shyti wrote:

The forcewake count and domains listing is multi process critical
and the uncore provides a spinlock for such cases.

Lock the forcewake evaluation section in the fw_domains_show()
debugfs interface.

Signed-off-by: Andi Shyti 


Needs a Fixes tag, below seems to be correct one.


Fixes: 9dd4b065446a ("drm/i915/gt: Move pm debug files into a gt aware 
debugfs")


Cc:  # v5.6+

Reviewed-by: Nirmoy Das 


What is the back story here and why would it need backporting? IGT cares 
about the atomic view of user_forcewake_count and individual domains or 
what?


Regards,

Tvrtko




Regards,

Nirmoy



---
  drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c

index 4fcba42cfe34..0437fd8217e0 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
@@ -71,6 +71,8 @@ static int fw_domains_show(struct seq_file *m, void 
*data)

  struct intel_uncore_forcewake_domain *fw_domain;
  unsigned int tmp;
+    spin_lock_irq(>lock);
+
  seq_printf(m, "user.bypass_count = %u\n",
 uncore->user_forcewake_count);
@@ -79,6 +81,8 @@ static int fw_domains_show(struct seq_file *m, void 
*data)

 intel_uncore_forcewake_domain_to_str(fw_domain->id),
 READ_ONCE(fw_domain->wake_count));
+    spin_unlock_irq(>lock);
+
  return 0;
  }
  DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE(fw_domains);


Re: [PATCH] drm/i915/gt: Delete the live_hearbeat_fast selftest

2024-06-10 Thread Tvrtko Ursulin



Hi Andi,

On 10/06/2024 13:10, Andi Shyti wrote:

Hi Tvrtko,

On Mon, Jun 10, 2024 at 12:42:31PM +0100, Tvrtko Ursulin wrote:

On 03/06/2024 17:20, Niemiec, Krzysztof wrote:

The test is trying to push the heartbeat frequency to the limit, which
might sometimes fail. Such a failure does not provide valuable
information, because it does not indicate that there is something
necessarily wrong with either the driver or the hardware.

Remove the test to prevent random, unnecessary failures from appearing
in CI.

Suggested-by: Chris Wilson 
Signed-off-by: Niemiec, Krzysztof 


Just a note in passing that comma in the email display name is I believe not
RFC 5322 compliant and there might be tools which barf on it(*). If you can
put it in double quotes, it would be advisable.


yes, we discussed it with Krzysztof, I noticed it right after I
submitted the code.


Regards,

Tvrtko

*) Such as my internal pull request generator which uses CPAN's
Email::Address::XS. :)


If we are in time, we can fix it as Krzysztof Niemiec 


Sorry about this oversight,


It's not a big deal (it isn't the first and only occurence) and no need 
to do anything more than correct the display name going forward.


Regards,

Tvrtko


Re: [PATCH] drm/i915/gt: Delete the live_hearbeat_fast selftest

2024-06-10 Thread Tvrtko Ursulin



On 03/06/2024 17:20, Niemiec, Krzysztof wrote:

The test is trying to push the heartbeat frequency to the limit, which
might sometimes fail. Such a failure does not provide valuable
information, because it does not indicate that there is something
necessarily wrong with either the driver or the hardware.

Remove the test to prevent random, unnecessary failures from appearing
in CI.

Suggested-by: Chris Wilson 
Signed-off-by: Niemiec, Krzysztof 


Just a note in passing that comma in the email display name is I believe 
not RFC 5322 compliant and there might be tools which barf on it(*). If 
you can put it in double quotes, it would be advisable.


Regards,

Tvrtko

*) Such as my internal pull request generator which uses CPAN's 
Email::Address::XS. :)



---
  .../drm/i915/gt/selftest_engine_heartbeat.c   | 110 --
  1 file changed, 110 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index ef014df4c4fc..9e4f0e417b3b 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -193,115 +193,6 @@ static int live_idle_pulse(void *arg)
return err;
  }
  
-static int cmp_u32(const void *_a, const void *_b)

-{
-   const u32 *a = _a, *b = _b;
-
-   return *a - *b;
-}
-
-static int __live_heartbeat_fast(struct intel_engine_cs *engine)
-{
-   const unsigned int error_threshold = max(2u, jiffies_to_usecs(6));
-   struct intel_context *ce;
-   struct i915_request *rq;
-   ktime_t t0, t1;
-   u32 times[5];
-   int err;
-   int i;
-
-   ce = intel_context_create(engine);
-   if (IS_ERR(ce))
-   return PTR_ERR(ce);
-
-   intel_engine_pm_get(engine);
-
-   err = intel_engine_set_heartbeat(engine, 1);
-   if (err)
-   goto err_pm;
-
-   for (i = 0; i < ARRAY_SIZE(times); i++) {
-   do {
-   /* Manufacture a tick */
-   intel_engine_park_heartbeat(engine);
-   GEM_BUG_ON(engine->heartbeat.systole);
-   engine->serial++; /*  pretend we are not idle! */
-   intel_engine_unpark_heartbeat(engine);
-
-   flush_delayed_work(>heartbeat.work);
-   if (!delayed_work_pending(>heartbeat.work)) {
-   pr_err("%s: heartbeat %d did not start\n",
-  engine->name, i);
-   err = -EINVAL;
-   goto err_pm;
-   }
-
-   rcu_read_lock();
-   rq = READ_ONCE(engine->heartbeat.systole);
-   if (rq)
-   rq = i915_request_get_rcu(rq);
-   rcu_read_unlock();
-   } while (!rq);
-
-   t0 = ktime_get();
-   while (rq == READ_ONCE(engine->heartbeat.systole))
-   yield(); /* work is on the local cpu! */
-   t1 = ktime_get();
-
-   i915_request_put(rq);
-   times[i] = ktime_us_delta(t1, t0);
-   }
-
-   sort(times, ARRAY_SIZE(times), sizeof(times[0]), cmp_u32, NULL);
-
-   pr_info("%s: Heartbeat delay: %uus [%u, %u]\n",
-   engine->name,
-   times[ARRAY_SIZE(times) / 2],
-   times[0],
-   times[ARRAY_SIZE(times) - 1]);
-
-   /*
-* Ideally, the upper bound on min work delay would be something like
-* 2 * 2 (worst), +1 for scheduling, +1 for slack. In practice, we
-* are, even with system_wq_highpri, at the mercy of the CPU scheduler
-* and may be stuck behind some slow work for many millisecond. Such
-* as our very own display workers.
-*/
-   if (times[ARRAY_SIZE(times) / 2] > error_threshold) {
-   pr_err("%s: Heartbeat delay was %uus, expected less than 
%dus\n",
-  engine->name,
-  times[ARRAY_SIZE(times) / 2],
-  error_threshold);
-   err = -EINVAL;
-   }
-
-   reset_heartbeat(engine);
-err_pm:
-   intel_engine_pm_put(engine);
-   intel_context_put(ce);
-   return err;
-}
-
-static int live_heartbeat_fast(void *arg)
-{
-   struct intel_gt *gt = arg;
-   struct intel_engine_cs *engine;
-   enum intel_engine_id id;
-   int err = 0;
-
-   /* Check that the heartbeat ticks at the desired rate. */
-   if (!CONFIG_DRM_I915_HEARTBEAT_INTERVAL)
-   return 0;
-
-   for_each_engine(engine, gt, id) {
-   err = __live_heartbeat_fast(engine);
-   if (err)
-   break;
-   }
-
-   return err;
-}
-
  static int __live_heartbeat_off(struct intel_engine_cs *engine)
  {
int err;
@@ -372,7 +263,6 @@ int 

Re: [PATCH] drm/v3d: Fix perfmon build error/warning

2024-06-07 Thread Tvrtko Ursulin



On 05/06/2024 08:19, Iago Toral wrote:

Thanks for looking at ixing this Tvrtko.

El mar, 04-06-2024 a las 17:02 +0100, Tvrtko Ursulin escribió:

From: Tvrtko Ursulin 

Move static const array into the source file to fix the "defined but
not
used" errors.

The fix is perhaps not the prettiest due hand crafting the array
sizes
in v3d_performance_counters.h, but I did add some build time asserts
to
validate the counts look sensible, so hopefully it is good enough for
a
quick fix.


If we need this to go in ASAP I am fine with this patch as-is, so:

Reviewed-by: Iago Toral Quiroga 

With that said, if we are still in time for a bit of iteration may I
suggest that instead of hard-coding the counters we instead add helper
functions in drivers/gpu/drm/v3d/v3d_perfmon.c that call ARRAY_SIZE on
the corresponding array based on v3d->ver? It is fine if we prefer to
merge this as-is and do this change later as a follow-up patch.


I agree it isn't pretty and I was (and am) planning to see if things can 
be improved. The reason I gave up on a prettier solution in the original 
attempt is the the fact one array is statically sized (at build time) 
based on the max number of counters:


/* Number of perfmons required to handle all supported performance 
counters */

#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
              DRM_V3D_MAX_PERF_COUNTERS)

struct v3d_performance_query {
    /* Performance monitor IDs for this query */
    u32 kperfmon_ids[V3D_MAX_PERFMONS];

So need to see how to untangle that and then perhaps even go a step 
further than the getters but move the whole perfmon init into v3d_perfmon.c.


Regards,

Tvrtko



Iago


Signed-off-by: Tvrtko Ursulin 
Fixes: 3cbcbe016c31 ("drm/v3d: Add Performance Counters descriptions
for V3D 4.2 and 7.1")
Reported-by: kernel test robot 
Closes:
https://lore.kernel.org/oe-kbuild-all/202405211137.huefklkg-...@intel.com/Cc
: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc: Jani Nikula 
Cc: Ashutosh Dixit 
---
  drivers/gpu/drm/v3d/v3d_drv.c |   4 +-
  drivers/gpu/drm/v3d/v3d_drv.h |   3 -
  drivers/gpu/drm/v3d/v3d_perfmon.c | 204
+-
  .../gpu/drm/v3d/v3d_performance_counters.h    | 189 +---
  4 files changed, 205 insertions(+), 195 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
b/drivers/gpu/drm/v3d/v3d_drv.c
index f7477488b1cc..a47f00b443d3 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -299,9 +299,9 @@ static int v3d_platform_drm_probe(struct
platform_device *pdev)
    WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
  
  	if (v3d->ver >= 71)

-   v3d->max_counters =
ARRAY_SIZE(v3d_v71_performance_counters);
+   v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS;
    else if (v3d->ver >= 42)
-   v3d->max_counters =
ARRAY_SIZE(v3d_v42_performance_counters);
+   v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS;
    else
    v3d->max_counters = 0;
  
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h

b/drivers/gpu/drm/v3d/v3d_drv.h
index 556cbb400ba0..099b962bdfde 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,9 +351,6 @@ struct v3d_timestamp_query {
    struct drm_syncobj *syncobj;
  };
  
-/* Maximum number of performance counters supported by any version

of V3D */
-#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters)
-
  /* Number of perfmons required to handle all supported performance
counters */
  #define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
      DRM_V3D_MAX_PERF_COUNTERS)
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index 73e2bb8bdb7f..b7d0b02e1a95 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -9,6 +9,192 @@
  #define V3D_PERFMONID_MIN 1
  #define V3D_PERFMONID_MAX U32_MAX
  
+static const struct v3d_perf_counter_desc

v3d_v42_performance_counters[] = {
+   {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP]
Valid primitives that result in no rendered pixels, for all rendered
tiles"},
+   {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid
primitives for all rendered tiles (primitives may be counted in more
than one tile)"},
+   {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped
quads"},
+   {"FEP", "FEP-valid-quads", "[FEP] Valid quads"},
+   {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads
with no pixels passing the stencil test"},
+   {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB]
Quads with no pixels passing the Z and stencil tests"},
+   {"TLB", "TL

Re: [PATCH v3 6/7] drm/drm_file: add display of driver's internal memory size

2024-06-06 Thread Tvrtko Ursulin



On 06/06/2024 02:49, Adrián Larumbe wrote:


Some drivers must allocate a considerable amount of memory for bookkeeping
structures and GPU's MCU-kernel shared communication regions. These are
often created as a result of the invocation of the driver's ioctl()
interface functions, so it is sensible to consider them as being owned by
the render context associated with an open drm file.

However, at the moment drm_show_memory_stats only traverses the UM-exposed
drm objects for which a handle exists. Private driver objects and memory
regions, though connected to a render context, are unaccounted for in their
fdinfo numbers.

Add a new drm_memory_stats 'internal' memory category.

Because deciding what constitutes an 'internal' object and where to find
these are driver-dependent, calculation of this size must be done through a
driver-provided function pointer, which becomes the third argument of
drm_show_memory_stats. Drivers which have no interest in exposing the size
of internal memory objects can keep passing NULL for unaltered behaviour.

Signed-off-by: Adrián Larumbe 


Please Cc people who were previously involved in defining 
drm-usage-stats.rst. I added Rob, but I am not sure if I forgot someone 
from the top of my head.


Internal as a category sounds potentially useful. One reservation I have 
though is itdoes not necessarily fit with the others but is something 
semantically different from them.


In i915 I had the similar desire to account for internal objects and 
have approached it by similarly tracking them outside the DRM idr but 
counting them under the existing respective categories and memory 
regions. Ie. internal objects can also be purgeable or not, etc, and can 
be backed by either system memory or device local memory.


Advantage is it is more accurate in those aspect and does not require 
adding a new category.


Downside of this is that 'internal' is bunched with the explicit 
userspace objects so perhaps less accurate in this other aspect.


Regards,

Tvrtko


---
  Documentation/gpu/drm-usage-stats.rst   | 4 
  drivers/gpu/drm/drm_file.c  | 9 +++--
  drivers/gpu/drm/msm/msm_drv.c   | 2 +-
  drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +-
  include/drm/drm_file.h  | 7 ++-
  5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index 6dc299343b48..0da5ebecd232 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -157,6 +157,10 @@ The total size of buffers that are purgeable.
  
  The total size of buffers that are active on one or more engines.
  
+- drm-internal-:  [KiB|MiB]

+
+The total size of GEM objects that aren't exposed to user space.
+
  Implementation Details
  ==
  
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c

index 638ffaf5..d1c13eed8d34 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -874,9 +874,10 @@ void drm_print_memory_stats(struct drm_printer *p,
enum drm_gem_object_status supported_status,
const char *region)
  {
-   print_size(p, "total", region, stats->private + stats->shared);
+   print_size(p, "total", region, stats->private + stats->shared + 
stats->internal);
print_size(p, "shared", region, stats->shared);
print_size(p, "active", region, stats->active);
+   print_size(p, "internal", region, stats->internal);
  
  	if (supported_status & DRM_GEM_OBJECT_RESIDENT)

print_size(p, "resident", region, stats->resident);
@@ -890,11 +891,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
   * drm_show_memory_stats - Helper to collect and show standard fdinfo memory 
stats
   * @p: the printer to print output to
   * @file: the DRM file
+ * @func: driver-specific function pointer to count the size of internal 
objects
   *
   * Helper to iterate over GEM objects with a handle allocated in the specified
   * file.
   */
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, 
internal_bos func)
  {
struct drm_gem_object *obj;
struct drm_memory_stats status = {};
@@ -940,6 +942,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
spin_unlock(>table_lock);
  
+	if (func)

+   func(, file);
+
drm_print_memory_stats(p, , supported_status, "memory");
  }
  EXPORT_SYMBOL(drm_show_memory_stats);
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 9c33f4e3f822..f97d3cdc4f50 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
  
  	msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
  
-	drm_show_memory_stats(p, 

[PATCH] drm/v3d: Fix perfmon build error/warning

2024-06-04 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Move static const array into the source file to fix the "defined but not
used" errors.

The fix is perhaps not the prettiest due hand crafting the array sizes
in v3d_performance_counters.h, but I did add some build time asserts to
validate the counts look sensible, so hopefully it is good enough for a
quick fix.

Signed-off-by: Tvrtko Ursulin 
Fixes: 3cbcbe016c31 ("drm/v3d: Add Performance Counters descriptions for V3D 
4.2 and 7.1")
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202405211137.huefklkg-...@intel.com/Cc: 
Maíra Canal 
Cc: Iago Toral Quiroga 
Cc: Jani Nikula 
Cc: Ashutosh Dixit 
---
 drivers/gpu/drm/v3d/v3d_drv.c |   4 +-
 drivers/gpu/drm/v3d/v3d_drv.h |   3 -
 drivers/gpu/drm/v3d/v3d_perfmon.c | 204 +-
 .../gpu/drm/v3d/v3d_performance_counters.h| 189 +---
 4 files changed, 205 insertions(+), 195 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index f7477488b1cc..a47f00b443d3 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -299,9 +299,9 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
if (v3d->ver >= 71)
-   v3d->max_counters = ARRAY_SIZE(v3d_v71_performance_counters);
+   v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS;
else if (v3d->ver >= 42)
-   v3d->max_counters = ARRAY_SIZE(v3d_v42_performance_counters);
+   v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS;
else
v3d->max_counters = 0;
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 556cbb400ba0..099b962bdfde 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,9 +351,6 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
 };
 
-/* Maximum number of performance counters supported by any version of V3D */
-#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters)
-
 /* Number of perfmons required to handle all supported performance counters */
 #define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
  DRM_V3D_MAX_PERF_COUNTERS)
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index 73e2bb8bdb7f..b7d0b02e1a95 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -9,6 +9,192 @@
 #define V3D_PERFMONID_MIN  1
 #define V3D_PERFMONID_MAX  U32_MAX
 
+static const struct v3d_perf_counter_desc v3d_v42_performance_counters[] = {
+   {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP] Valid 
primitives that result in no rendered pixels, for all rendered tiles"},
+   {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid primitives 
for all rendered tiles (primitives may be counted in more than one tile)"},
+   {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped quads"},
+   {"FEP", "FEP-valid-quads", "[FEP] Valid quads"},
+   {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads with no 
pixels passing the stencil test"},
+   {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB] Quads with 
no pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-passing-z-and-stencil-test", "[TLB] Quads with any 
pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-with-zero-coverage", "[TLB] Quads with all pixels 
having zero coverage"},
+   {"TLB", "TLB-quads-with-non-zero-coverage", "[TLB] Quads with any 
pixels having non-zero coverage"},
+   {"TLB", "TLB-quads-written-to-color-buffer", "[TLB] Quads with valid 
pixels written to colour buffer"},
+   {"PTB", "PTB-primitives-discarded-outside-viewport", "[PTB] Primitives 
discarded by being outside the viewport"},
+   {"PTB", "PTB-primitives-need-clipping", "[PTB] Primitives that need 
clipping"},
+   {"PTB", "PTB-primitives-discarded-reversed", "[PTB] Primitives that are 
discarded because they are reversed"},
+   {"QPU", "QPU-total-idle-clk-cycles", "[QPU] Total idle clock cycles for 
all QPUs"},
+   {"QPU", "QPU-total-active-clk-cycles-vertex-coord-shading", "[QPU] 
Total active clock cycles for all QPUs doing vertex/coordinate/user shading 
(counts only when QPU is not stalled)"},
+   {"QPU", "QPU-total-ac

Re: [PATCH 2/2] drm/amdgpu: Use drm_print_memory_stats helper from fdinfo

2024-05-30 Thread Tvrtko Ursulin



Hi,

On 20/05/2024 12:13, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Convert fdinfo memory stats to use the common drm_print_memory_stats
helper.

This achieves alignment with the common keys as documented in
drm-usage-stats.rst, adding specifically drm-total- key the driver was
missing until now.

Additionally I made the code stop skipping total size for objects which
currently do not have a backing store, and I added resident, active and
purgeable reporting.

Legacy keys have been preserved, with the outlook of only potentially
removing only the drm-memory- when the time gets right.

The example output now looks like this:

  pos:  0
  flags:0212
  mnt_id:   24
  ino:  1239
  drm-driver:   amdgpu
  drm-client-id:4
  drm-pdev: :04:00.0
  pasid:32771
  drm-total-cpu:0
  drm-shared-cpu:   0
  drm-active-cpu:   0
  drm-resident-cpu: 0
  drm-purgeable-cpu:0
  drm-total-gtt:2392 KiB
  drm-shared-gtt:   0
  drm-active-gtt:   0
  drm-resident-gtt: 2392 KiB
  drm-purgeable-gtt:0
  drm-total-vram:   44564 KiB
  drm-shared-vram:  31952 KiB
  drm-active-vram:  0
  drm-resident-vram:44564 KiB
  drm-purgeable-vram:   0
  drm-memory-vram:  44564 KiB
  drm-memory-gtt:   2392 KiB
  drm-memory-cpu:   0 KiB
  amd-memory-visible-vram:  44564 KiB
  amd-evicted-vram: 0 KiB
  amd-evicted-visible-vram: 0 KiB
  amd-requested-vram:   44564 KiB
  amd-requested-visible-vram:   11952 KiB
  amd-requested-gtt:2392 KiB
  drm-engine-compute:   46464671 ns

v2:
  * Track purgeable via AMDGPU_GEM_CREATE_DISCARDABLE.


Any interest on this work from AMD side? As a summary it adds active and 
purgeable reporting and converts to using the drm_print_memory_stats 
helper for outputting all the fields as documented in drm-usage-stats.rst.


Regards,

Tvrtko



Signed-off-by: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Rob Clark 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 48 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 96 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 35 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +-
  6 files changed, 122 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index c7df7fa3459f..00a4ab082459 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -59,18 +59,21 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
struct amdgpu_fpriv *fpriv = file->driver_priv;
struct amdgpu_vm *vm = >vm;
  
-	struct amdgpu_mem_stats stats;

+   struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST + 1] = { };
ktime_t usage[AMDGPU_HW_IP_NUM];
-   unsigned int hw_ip;
+   const char *pl_name[] = {
+   [TTM_PL_VRAM] = "vram",
+   [TTM_PL_TT] = "gtt",
+   [TTM_PL_SYSTEM] = "cpu",
+   };
+   unsigned int hw_ip, i;
int ret;
  
-	memset(, 0, sizeof(stats));

-
ret = amdgpu_bo_reserve(vm->root.bo, false);
if (ret)
return;
  
-	amdgpu_vm_get_memory(vm, );

+   amdgpu_vm_get_memory(vm, stats, ARRAY_SIZE(stats));
amdgpu_bo_unreserve(vm->root.bo);
  
  	amdgpu_ctx_mgr_usage(>ctx_mgr, usage);

@@ -82,24 +85,35 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
 */
  
  	drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);

-   drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL);
-   drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL);
-   drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL);
+
+   for (i = 0; i < TTM_PL_PRIV; i++)
+   drm_print_memory_stats(p,
+  [i].drm,
+  DRM_GEM_OBJECT_RESIDENT |
+  DRM_GEM_OBJECT_PURGEABLE,
+  pl_name[i]);
+
+   /* Legacy amdgpu keys, alias to drm-resident-memory-: */
+   drm_printf(p, "drm-memory-vram:\t%llu KiB\n",
+  stats[TTM_PL_VRAM].total/1024UL);
+   drm_printf(p, "drm-memory-gtt: \t%llu KiB\n",
+  stats[TTM_PL_TT].total/1024UL);
+   drm_printf(p, "drm-memory-cpu: \t%llu KiB\n",
+  stats[TTM_PL_SYSTEM].total/1024UL);
+
+   /* Amdgpu specific memory accounting keys: */
drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
-  stats.visible_vram/1024UL);
+  stats[TTM_PL_VRAM].visible/1024UL);
drm_printf(p, &

Re: [RFC v2 0/2] Discussion around eviction improvements

2024-05-30 Thread Tvrtko Ursulin



Hi,

On 16/05/2024 13:18, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Reduced re-spin of my previous series after Christian corrected a few
misconceptions that I had. So lets see if what remains makes sense or is still
misguided.

To summarise, the series address the following two issues:

  * Migration rate limiting does not work, at least not for the common case
where userspace configures VRAM+GTT. It thinks it can stop migration 
attempts
by playing with bo->allowed_domains vs bo->preferred domains but, both from
the code, and from empirical experiments, I see that not working at all. 
When
both masks are identical fiddling with them achieves nothing. Even when they
are not identical allowed has a fallback GTT placement which means that when
over the migration budget ttm_bo_validate with bo->allowed_domains can cause
migration from GTT to VRAM.

  * Driver thinks it will be re-validating evicted buffers on the next 
submission
but it does not for the very common case of VRAM+GTT because it only checks
if current placement is *none* of the preferred placements.

These two patches appear to have a positive result for a memory intensive game
like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
the migration budget appears to keep buffer blits at bay and improves the
minimum frame rate, ie. makes things smoother.


From the game's built-in benchmark, average of three runs each:


FPS
migrated KiBmin avg max min-1%  min-0.1%
   because 20784781 10.00  37.00   89.6722.0012.33
   patched  4227688 13.67  37.00   81.3323.3315.00


Any feedback on this series?

As described above, neither migration rate limiting or re-validation of 
evicted buffers seems to work as expected currently.


Regards,

Tvrtko


Disclaimers that I have is that more runs would be needed to be more confident
about the results. And more games. And APU versus discrete.

Cc: Christian König 
Cc: Friedrich Vock 

Tvrtko Ursulin (2):
   drm/amdgpu: Re-validate evicted buffers
   drm/amdgpu: Actually respect buffer migration budget

  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 -
  2 files changed, 103 insertions(+), 30 deletions(-)



[PATCH] drm/i915: 2 GiB of relocations ought to be enough for anybody*

2024-05-21 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Kernel test robot reports i915 can hit a warn in kvmalloc_node which has
a purpose of dissalowing crazy size kernel allocations. This was added in
7661809d493b ("mm: don't allow oversized kvmalloc() calls"):

   /* Don't even allow crazy sizes */
   if (WARN_ON_ONCE(size > INT_MAX))
   return NULL;

This would be kind of okay since i915 at one point dropped the need for
making a shadow copy of the relocation list, but then it got re-added in
fd1500fcd442 ("Revert "drm/i915/gem: Drop relocation slowpath".") a year
after Linus added the above warning.

It is plausible that the issue was not seen until now because to trigger
gem_exec_reloc test requires a combination of an relatively older
generation hardware but with at least 8GiB of RAM installed. Probably even
more depending on runtime checks.

Lets cap what we allow userspace to pass in using the matching limit.
There should be no issue for real userspace since we are talking about
"crazy" number of relocations which have no practical purpose.

*) Well IGT tests might get upset but they can be easily adjusted.

Signed-off-by: Tvrtko Ursulin 
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-lkp/202405151008.6ddd1aaf-oliver.s...@intel.com
Cc: Kees Cook 
Cc: Kent Overstreet 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index d3a771afb083..4b34bf4fde77 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1533,7 +1533,7 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, 
struct eb_vma *ev)
u64_to_user_ptr(entry->relocs_ptr);
unsigned long remain = entry->relocation_count;
 
-   if (unlikely(remain > N_RELOC(ULONG_MAX)))
+   if (unlikely(remain > N_RELOC(INT_MAX)))
return -EINVAL;
 
/*
@@ -1641,7 +1641,7 @@ static int check_relocations(const struct 
drm_i915_gem_exec_object2 *entry)
if (size == 0)
return 0;
 
-   if (size > N_RELOC(ULONG_MAX))
+   if (size > N_RELOC(INT_MAX))
return -EINVAL;
 
addr = u64_to_user_ptr(entry->relocs_ptr);
-- 
2.44.0



[PATCH 2/2] drm/amdgpu: Use drm_print_memory_stats helper from fdinfo

2024-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Convert fdinfo memory stats to use the common drm_print_memory_stats
helper.

This achieves alignment with the common keys as documented in
drm-usage-stats.rst, adding specifically drm-total- key the driver was
missing until now.

Additionally I made the code stop skipping total size for objects which
currently do not have a backing store, and I added resident, active and
purgeable reporting.

Legacy keys have been preserved, with the outlook of only potentially
removing only the drm-memory- when the time gets right.

The example output now looks like this:

 pos:   0
 flags: 0212
 mnt_id:24
 ino:   1239
 drm-driver:amdgpu
 drm-client-id: 4
 drm-pdev:  :04:00.0
 pasid: 32771
 drm-total-cpu: 0
 drm-shared-cpu:0
 drm-active-cpu:0
 drm-resident-cpu:  0
 drm-purgeable-cpu: 0
 drm-total-gtt: 2392 KiB
 drm-shared-gtt:0
 drm-active-gtt:0
 drm-resident-gtt:  2392 KiB
 drm-purgeable-gtt: 0
 drm-total-vram:44564 KiB
 drm-shared-vram:   31952 KiB
 drm-active-vram:   0
 drm-resident-vram: 44564 KiB
 drm-purgeable-vram:0
 drm-memory-vram:   44564 KiB
 drm-memory-gtt:2392 KiB
 drm-memory-cpu:0 KiB
 amd-memory-visible-vram:   44564 KiB
 amd-evicted-vram:  0 KiB
 amd-evicted-visible-vram:  0 KiB
 amd-requested-vram:44564 KiB
 amd-requested-visible-vram:11952 KiB
 amd-requested-gtt: 2392 KiB
 drm-engine-compute:46464671 ns

v2:
 * Track purgeable via AMDGPU_GEM_CREATE_DISCARDABLE.

Signed-off-by: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Rob Clark 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 48 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 96 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 35 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +-
 6 files changed, 122 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index c7df7fa3459f..00a4ab082459 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -59,18 +59,21 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
struct amdgpu_fpriv *fpriv = file->driver_priv;
struct amdgpu_vm *vm = >vm;
 
-   struct amdgpu_mem_stats stats;
+   struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST + 1] = { };
ktime_t usage[AMDGPU_HW_IP_NUM];
-   unsigned int hw_ip;
+   const char *pl_name[] = {
+   [TTM_PL_VRAM] = "vram",
+   [TTM_PL_TT] = "gtt",
+   [TTM_PL_SYSTEM] = "cpu",
+   };
+   unsigned int hw_ip, i;
int ret;
 
-   memset(, 0, sizeof(stats));
-
ret = amdgpu_bo_reserve(vm->root.bo, false);
if (ret)
return;
 
-   amdgpu_vm_get_memory(vm, );
+   amdgpu_vm_get_memory(vm, stats, ARRAY_SIZE(stats));
amdgpu_bo_unreserve(vm->root.bo);
 
amdgpu_ctx_mgr_usage(>ctx_mgr, usage);
@@ -82,24 +85,35 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
 */
 
drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
-   drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL);
-   drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL);
-   drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL);
+
+   for (i = 0; i < TTM_PL_PRIV; i++)
+   drm_print_memory_stats(p,
+  [i].drm,
+  DRM_GEM_OBJECT_RESIDENT |
+  DRM_GEM_OBJECT_PURGEABLE,
+  pl_name[i]);
+
+   /* Legacy amdgpu keys, alias to drm-resident-memory-: */
+   drm_printf(p, "drm-memory-vram:\t%llu KiB\n",
+  stats[TTM_PL_VRAM].total/1024UL);
+   drm_printf(p, "drm-memory-gtt: \t%llu KiB\n",
+  stats[TTM_PL_TT].total/1024UL);
+   drm_printf(p, "drm-memory-cpu: \t%llu KiB\n",
+  stats[TTM_PL_SYSTEM].total/1024UL);
+
+   /* Amdgpu specific memory accounting keys: */
drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
-  stats.visible_vram/1024UL);
+  stats[TTM_PL_VRAM].visible/1024UL);
drm_printf(p, "amd-evicted-vram:\t%llu KiB\n",
-  stats.evicted_vram/1024UL);
+  stats[TTM_PL_VRAM].evicted/1024UL);
drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n",
-  stats.evicted_visible_vram/1024UL);
+  stats[TTM_PL_VRAM].evicted_visible/1024UL);
  

[PATCH 1/2] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Currently it is not well defined what is drm-memory- compared to other
categories.

In practice the only driver which emits these keys is amdgpu and in them
exposes the current resident buffer object memory (including shared).

To prevent any confusion, document that drm-memory- is deprecated and an
alias for drm-resident-memory-.

While at it also clarify that the reserved sub-string 'memory' refers to
the memory region component, and also clarify the intended semantics of
other memory categories.

v2:
 * Also mark drm-memory- as deprecated.
 * Add some more text describing memory categories. (Alex)

v3:
 * Semantics of the amdgpu drm-memory is actually as drm-resident.

Signed-off-by: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: Christian König 
Cc: Rob Clark 
---
 Documentation/gpu/drm-usage-stats.rst | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index 6dc299343b48..45d9b76a5748 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -128,7 +128,9 @@ Memory
 
 Each possible memory type which can be used to store buffer objects by the
 GPU in question shall be given a stable and unique name to be returned as the
-string here.  The name "memory" is reserved to refer to normal system memory.
+string here.
+
+The region name "memory" is reserved to refer to normal system memory.
 
 Value shall reflect the amount of storage currently consumed by the buffer
 objects belong to this client, in the respective memory region.
@@ -136,6 +138,9 @@ objects belong to this client, in the respective memory 
region.
 Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
 indicating kibi- or mebi-bytes.
 
+This key is deprecated and is an alias for drm-resident-. Only one of
+the two should be present in the output.
+
 - drm-shared-:  [KiB|MiB]
 
 The total size of buffers that are shared with another file (e.g., have more
@@ -143,20 +148,34 @@ than a single handle).
 
 - drm-total-:  [KiB|MiB]
 
-The total size of buffers that including shared and private memory.
+The total size of all created buffers including shared and private memory. The
+backing store for the buffers does not have to be currently instantiated to be
+counted under this category.
 
 - drm-resident-:  [KiB|MiB]
 
-The total size of buffers that are resident in the specified region.
+The total size of buffers that are resident (have their backing store present 
or
+instantiated) in the specified region.
+
+This is an alias for drm-memory- and only one of the two should be
+present in the output.
 
 - drm-purgeable-:  [KiB|MiB]
 
 The total size of buffers that are purgeable.
 
+For example drivers which implement a form of 'madvise' like functionality can
+here count buffers which have instantiated backing store, but have been marked
+with an equivalent of MADV_DONTNEED.
+
 - drm-active-:  [KiB|MiB]
 
 The total size of buffers that are active on one or more engines.
 
+One practical example of this can be presence of unsignaled fences in an GEM
+buffer reservation object. Therefore the active category is a subset of
+resident.
+
 Implementation Details
 ==
 
-- 
2.44.0



Re: [RFC v2 0/2] Discussion around eviction improvements

2024-05-17 Thread Tvrtko Ursulin



On 16/05/2024 20:21, Alex Deucher wrote:

On Thu, May 16, 2024 at 8:18 AM Tvrtko Ursulin  wrote:


From: Tvrtko Ursulin 

Reduced re-spin of my previous series after Christian corrected a few
misconceptions that I had. So lets see if what remains makes sense or is still
misguided.

To summarise, the series address the following two issues:

  * Migration rate limiting does not work, at least not for the common case
where userspace configures VRAM+GTT. It thinks it can stop migration 
attempts
by playing with bo->allowed_domains vs bo->preferred domains but, both from
the code, and from empirical experiments, I see that not working at all. 
When
both masks are identical fiddling with them achieves nothing. Even when they
are not identical allowed has a fallback GTT placement which means that when
over the migration budget ttm_bo_validate with bo->allowed_domains can cause
migration from GTT to VRAM.

  * Driver thinks it will be re-validating evicted buffers on the next 
submission
but it does not for the very common case of VRAM+GTT because it only checks
if current placement is *none* of the preferred placements.


For APUs at least, we should never migrate because GTT and VRAM are
both system memory so are effectively equal performance-wise.  Maybe


I was curious about this but thought there could be a reason why VRAM 
carve-out is a fix small-ish size. It cannot be made 1:1 with RAM or 
some other solution?



this regressed when Christian reworked ttm to better handle migrating
buffers back to VRAM after suspend on dGPUs?


I will leave this to Christian to answer but for what this series is 
concerned I'd say it is orthogonal to that.


Here we have two fixes not limited to APU use cases, just so it happens 
fixing the migration throttling improves things there too. And that even 
despite the first patch which triggering *more* migration attempts. 
Because the second patch then correctly curbs them.


First patch should help with transient overcommit on discrete, allowing 
things get back into VRAM as soon as there is space.


Second patch tries to makes migration throttling work as intended.

Volunteers for testing on discrete? :)



These two patches appear to have a positive result for a memory intensive game
like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
the migration budget appears to keep buffer blits at bay and improves the
minimum frame rate, ie. makes things smoother.

 From the game's built-in benchmark, average of three runs each:

 FPS
 migrated KiBmin avg max min-1%  min-0.1%
   because  20784781 10.00  37.00   89.6722.0012.33
   patched   4227688 13.67  37.00   81.3323.3315.00


Hmm! s/because/before/ here obviously!

Regards,

Tvrtko


Disclaimers that I have is that more runs would be needed to be more confident
about the results. And more games. And APU versus discrete.

Cc: Christian König 
Cc: Friedrich Vock 

Tvrtko Ursulin (2):
   drm/amdgpu: Re-validate evicted buffers
   drm/amdgpu: Actually respect buffer migration budget

  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 -
  2 files changed, 103 insertions(+), 30 deletions(-)

--
2.44.0



[RFC v2 0/2] Discussion around eviction improvements

2024-05-16 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Reduced re-spin of my previous series after Christian corrected a few
misconceptions that I had. So lets see if what remains makes sense or is still
misguided.

To summarise, the series address the following two issues:

 * Migration rate limiting does not work, at least not for the common case
   where userspace configures VRAM+GTT. It thinks it can stop migration attempts
   by playing with bo->allowed_domains vs bo->preferred domains but, both from
   the code, and from empirical experiments, I see that not working at all. When
   both masks are identical fiddling with them achieves nothing. Even when they
   are not identical allowed has a fallback GTT placement which means that when
   over the migration budget ttm_bo_validate with bo->allowed_domains can cause
   migration from GTT to VRAM.

 * Driver thinks it will be re-validating evicted buffers on the next submission
   but it does not for the very common case of VRAM+GTT because it only checks
   if current placement is *none* of the preferred placements.

These two patches appear to have a positive result for a memory intensive game
like Assassin's Creed Valhalla. On an APU like Steam Deck the game has a working
set around 5 GiB, while the VRAM is configured to 1 GiB. Correctly respecting
the migration budget appears to keep buffer blits at bay and improves the
minimum frame rate, ie. makes things smoother.

>From the game's built-in benchmark, average of three runs each:

FPS
migrated KiBmin avg max min-1%  min-0.1%
  because  20784781 10.00  37.00   89.6722.0012.33
  patched   4227688 13.67  37.00   81.3323.3315.00

Disclaimers that I have is that more runs would be needed to be more confident
about the results. And more games. And APU versus discrete.

Cc: Christian König 
Cc: Friedrich Vock 

Tvrtko Ursulin (2):
  drm/amdgpu: Re-validate evicted buffers
  drm/amdgpu: Actually respect buffer migration budget

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  21 -
 2 files changed, 103 insertions(+), 30 deletions(-)

-- 
2.44.0



[RFC 2/2] drm/amdgpu: Actually respect buffer migration budget

2024-05-16 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Current code appears to live in a misconception that playing with buffer
allowed and preferred placements can always control the decision on
whether backing store migration will be attempted or not.

That is however not the case when userspace sets buffer placements of
VRAM+GTT, which is what radv does since commit 862b6a9a ("radv: Improve
spilling on discrete GPUs."), with the end result of completely ignoring
the migration budget.

Fix this by validating against a local singleton placement set to the
current backing store location. This way, when migration budget has been
depleted, we can prevent ttm_bo_validate from seeing any other than the
current placement.

For the case of implicit GTT allowed domain added in amdgpu_bo_create
when userspace only sets VRAM the behaviour should be the same. On the
first pass the re-validation will attempt to migrate away from the
fallback GTT domain, and if that did not succeed the buffer will remain in
the fallback placement.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 112 +++--
 1 file changed, 85 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ec888fc6ead8..08e7631f3a2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -32,6 +32,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include "amdgpu_cs.h"
@@ -775,6 +776,56 @@ void amdgpu_cs_report_moved_bytes(struct amdgpu_device 
*adev, u64 num_bytes,
spin_unlock(>mm_stats.lock);
 }
 
+static bool
+amdgpu_cs_bo_move_under_budget(struct amdgpu_cs_parser *p,
+  struct amdgpu_bo *abo)
+{
+   struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
+
+   /*
+* Don't move this buffer if we have depleted our allowance
+* to move it. Don't move anything if the threshold is zero.
+*/
+   if (p->bytes_moved >= p->bytes_moved_threshold)
+   return false;
+
+   if ((!abo->tbo.base.dma_buf ||
+list_empty(>tbo.base.dma_buf->attachments)) &&
+   (!amdgpu_gmc_vram_full_visible(>gmc) &&
+(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) &&
+   p->bytes_moved_vis >= p->bytes_moved_vis_threshold) {
+   /*
+* And don't move a CPU_ACCESS_REQUIRED BO to limited
+* visible VRAM if we've depleted our allowance to do
+* that.
+*/
+   return false;
+   }
+
+   return true;
+}
+
+static bool
+amdgpu_bo_fill_current_placement(struct amdgpu_bo *abo,
+struct ttm_placement *placement,
+struct ttm_place *place)
+{
+   struct ttm_placement *bo_placement = >placement;
+   int i;
+
+   for (i = 0; i < bo_placement->num_placement; i++) {
+   if (bo_placement->placement[i].mem_type ==
+   abo->tbo.resource->mem_type) {
+   *place = bo_placement->placement[i];
+   placement->num_placement = 1;
+   placement->placement = place;
+   return true;
+   }
+   }
+
+   return false;
+}
+
 static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
@@ -784,46 +835,53 @@ static int amdgpu_cs_bo_validate(void *param, struct 
amdgpu_bo *bo)
.no_wait_gpu = false,
.resv = bo->tbo.base.resv
};
-   uint32_t domain;
+   bool allow_move;
int r;
 
if (bo->tbo.pin_count)
return 0;
 
-   /* Don't move this buffer if we have depleted our allowance
-* to move it. Don't move anything if the threshold is zero.
-*/
-   if (p->bytes_moved < p->bytes_moved_threshold &&
-   (!bo->tbo.base.dma_buf ||
-   list_empty(>tbo.base.dma_buf->attachments))) {
-   if (!amdgpu_gmc_vram_full_visible(>gmc) &&
-   (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {
-   /* And don't move a CPU_ACCESS_REQUIRED BO to limited
-* visible VRAM if we've depleted our allowance to do
-* that.
-*/
-   if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
-   domain = bo->preferred_domains;
-   else
-   domain = bo->allowed_domains;
-   } else {
-   domain = bo->preferred_domains;
-   }
-  

[RFC 1/2] drm/amdgpu: Re-validate evicted buffers

2024-05-16 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Currently the driver appears to be thinking that it will be attempting to
re-validate the evicted buffers on the next submission if they are not in
their preferred placement.

That however appears not to be true for the very common case of buffers
with allowed placements of VRAM+GTT. Simply because the check can only
detect if the current placement is *none* of the preferred ones, happily
leaving VRAM+GTT buffers in the GTT placement "forever".

Fix it by extending the VRAM+GTT special case to the re-validation logic.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 6bddd43604bc..e53ff914b62e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1248,10 +1248,25 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, 
struct amdgpu_bo_va *bo_va,
 * next command submission.
 */
if (amdgpu_vm_is_bo_always_valid(vm, bo)) {
-   uint32_t mem_type = bo->tbo.resource->mem_type;
+   unsigned current_domain =
+   amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
+   bool move_to_evict = false;
 
-   if (!(bo->preferred_domains &
- amdgpu_mem_type_to_domain(mem_type)))
+   if (!(bo->preferred_domains & current_domain)) {
+   move_to_evict = true;
+   } else if ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK) ==
+  (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
+  current_domain != AMDGPU_GEM_DOMAIN_VRAM) {
+   /*
+* If userspace has provided a list of possible
+* placements equal to VRAM+GTT, we assume VRAM is *the*
+* preferred placement and so try to move it back there
+* on the next submission.
+*/
+   move_to_evict = true;
+   }
+
+   if (move_to_evict)
amdgpu_vm_bo_evicted(_va->base);
else
amdgpu_vm_bo_idle(_va->base);
-- 
2.44.0



Re: [PATCH v4 8/8] drm/xe/client: Print runtime to fdinfo

2024-05-16 Thread Tvrtko Ursulin



On 15/05/2024 22:42, Lucas De Marchi wrote:

Print the accumulated runtime for client when printing fdinfo.
Each time a query is done it first does 2 things:

1) loop through all the exec queues for the current client and
accumulate the runtime, per engine class. CTX_TIMESTAMP is used for
that, being read from the context image.

2) Read a "GPU timestamp" that can be used for considering "how much GPU
time has passed" and that has the same unit/refclock as the one
recording the runtime. RING_TIMESTAMP is used for that via MMIO.

Since for all current platforms RING_TIMESTAMP follows the same
refclock, just read it once, using any first engine available.

This is exported to userspace as 2 numbers in fdinfo:

drm-cycles-: 
drm-total-cycles-: 

Userspace is expected to collect at least 2 samples, which allows to
know the client engine busyness as per:

RUNTIME1 - RUNTIME0
busyness = -
  T1 - T0

Since drm-cycles- always starts at 0, it's also possible to know
if and engine was ever used by a client.

It's expected that userspace will read any 2 samples every few seconds.
Given the update frequency of the counters involved and that
CTX_TIMESTAMP is 32-bits, the counter for each exec_queue can wrap
around (assuming 100% utilization) after ~200s. The wraparound is not
perceived by userspace since it's just accumulated for all the
exec_queues in a 64-bit counter) but the measurement will not be
accurate if the samples are too far apart.

This could be mitigated by adding a workqueue to accumulate the counters
every so often, but it's additional complexity for something that is
done already by userspace every few seconds in tools like gputop (from
igt), htop, nvtop, etc, with none of them really defaulting to 1 sample
per minute or more.

Signed-off-by: Lucas De Marchi 
---
  Documentation/gpu/drm-usage-stats.rst   |  21 +++-
  Documentation/gpu/xe/index.rst  |   1 +
  Documentation/gpu/xe/xe-drm-usage-stats.rst |  10 ++
  drivers/gpu/drm/xe/xe_drm_client.c  | 121 +++-
  4 files changed, 150 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/gpu/xe/xe-drm-usage-stats.rst

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index 6dc299343b48..a80f95ca1b2f 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -112,6 +112,19 @@ larger value within a reasonable period. Upon observing a 
value lower than what
  was previously read, userspace is expected to stay with that larger previous
  value until a monotonic update is seen.
  
+- drm-total-cycles-: 

+
+Engine identifier string must be the same as the one specified in the
+drm-cycles- tag and shall contain the total number cycles for the given
+engine.
+
+This is a timestamp in GPU unspecified unit that matches the update rate
+of drm-cycles-. For drivers that implement this interface, the engine
+utilization can be calculated entirely on the GPU clock domain, without
+considering the CPU sleep time between 2 samples.
+
+A driver may implement either this key or drm-maxfreq-, but not both.
+
  - drm-maxfreq-:  [Hz|MHz|KHz]
  
  Engine identifier string must be the same as the one specified in the

@@ -121,6 +134,9 @@ percentage utilization of the engine, whereas 
drm-engine- only reflects
  time active without considering what frequency the engine is operating as a
  percentage of its maximum frequency.
  
+A driver may implement either this key or drm-total-cycles-, but not

+both.
+


For the spec part:

Acked-by: Tvrtko Ursulin 

Some minor comments and questions below.


  Memory
  ^^
  
@@ -168,5 +184,6 @@ be documented above and where possible, aligned with other drivers.

  Driver specific implementations
  ---
  
-:ref:`i915-usage-stats`

-:ref:`panfrost-usage-stats`
+* :ref:`i915-usage-stats`
+* :ref:`panfrost-usage-stats`
+* :ref:`xe-usage-stats`
diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst
index c224ecaee81e..3f07aa3b5432 100644
--- a/Documentation/gpu/xe/index.rst
+++ b/Documentation/gpu/xe/index.rst
@@ -23,3 +23,4 @@ DG2, etc is provided to prototype the driver.
 xe_firmware
 xe_tile
 xe_debugging
+   xe-drm-usage-stats.rst
diff --git a/Documentation/gpu/xe/xe-drm-usage-stats.rst 
b/Documentation/gpu/xe/xe-drm-usage-stats.rst
new file mode 100644
index ..482d503ae68a
--- /dev/null
+++ b/Documentation/gpu/xe/xe-drm-usage-stats.rst
@@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+.. _xe-usage-stats:
+
+
+Xe DRM client usage stats implementation
+
+
+.. kernel-doc:: drivers/gpu/drm/xe/xe_drm_client.c
+   :doc: DRM Client usage stats
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c 
b/drivers/gpu/drm/xe/xe_drm_client.

Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

2024-05-15 Thread Tvrtko Ursulin



On 15/05/2024 15:31, Christian König wrote:

Am 15.05.24 um 12:59 schrieb Tvrtko Ursulin:


On 15/05/2024 08:20, Christian König wrote:

Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Current code appears to live in a misconception that playing with 
buffer

allowed and preferred placements can control the decision on whether
backing store migration will be attempted or not.

Both from code inspection and from empirical experiments I see that not
being true, and that both allowed and preferred placement are typically
set to the same bitmask.


That's not correct for the use case handled here, but see below.


Which part is not correct, that bo->preferred_domains and 
bo->allower_domains are the same bitmask?


Sorry totally forgot to explain that.

This rate limit here was specially made for OpenGL applications which 
over commit VRAM. In those case preferred_domains will be VRAM only and 
allowed_domains will be VRAM|GTT.


RADV always uses VRAM|GTT for both (which is correct).


Got it, thanks!

As such, when the code decides to throttle the migration for a 
client, it
is in fact not achieving anything. Buffers can still be either 
migrated or
not migrated based on the external (to this function and facility) 
logic.


Fix it by not changing the buffer object placements if the migration
budget has been spent.

FIXME:
Is it still required to call validate is the question..

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index 22708954ae68..d07a1dd7c880 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

  .no_wait_gpu = false,
  .resv = bo->tbo.base.resv
  };
+    bool migration_allowed = true;
  struct ttm_resource *old_res;
  uint32_t domain;
  int r;
@@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

   * visible VRAM if we've depleted our allowance to do
   * that.
   */
-    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
+    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) {
  domain = bo->preferred_domains;
-    else
+    } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
+    }
  } else {
  domain = bo->preferred_domains;
  }
  } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
  }
  retry:
-    amdgpu_bo_placement_from_domain(bo, domain);
+    if (migration_allowed)
+    amdgpu_bo_placement_from_domain(bo, domain);


That's completely invalid. Calling amdgpu_bo_placement_from_domain() 
is a mandatory prerequisite for calling ttm_bo_validate();


E.g. the usually code fow is:

/* This initializes bo->placement */
amdgpu_bo_placement_from_domain()

/* Eventually modify bo->placement to fit special requirements */


/* Apply the placement to the BO */
ttm_bo_validate(>tbo, >placement, )

To sum it up bo->placement should be a variable on the stack instead, 
but we never bothered to clean that up.


I am not clear if you agree or not that the current method of trying 
to avoid migration doesn't really do anything?


I totally agree, but the approach you taken to fix it is just quite 
broken. You can't leave bo->placement uninitialized and expect that 
ttm_bo_validate() won't move the BO.


Yep, that much was clear, sorry that I did not explicitly acknowledge 
but just moved on to discussing how to fix it properly.


On stack placements sounds plausible to force migration avoidance by 
putting a single current object placement in that list, if that is 
what you have in mind? Or a specialized flag/version of 
amdgpu_bo_placement_from_domain with an bool input like 
"allow_placement_change"?


A very rough idea with no guarantee that it actually works:

Add a TTM_PL_FLAG_RATE_LIMITED with all the TTM code to actually figure 
out how many bytes have been moved and how many bytes the current 
operation can move etc...


Friedrich's patches actually looked like quite a step into the right 
direction for that already, so I would start from there.


Then always feed amdgpu_bo_placement_from_domain() with the 
allowed_domains in the CS path and VM validation.


Finally extend amdgpu_bo_placement_from_domain() to take a closer look 
at bo->preferred_domains, similar to how we do for the 
TTM_PL_FLAG_FALLBACK already and set the TTM_PL_FLAG_RATE_LIMITED flag 
as appropriate.


Two things which I kind of don't like with the placement flag idea is

Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

2024-05-15 Thread Tvrtko Ursulin



On 15/05/2024 08:20, Christian König wrote:

Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Current code appears to live in a misconception that playing with buffer
allowed and preferred placements can control the decision on whether
backing store migration will be attempted or not.

Both from code inspection and from empirical experiments I see that not
being true, and that both allowed and preferred placement are typically
set to the same bitmask.


That's not correct for the use case handled here, but see below.


Which part is not correct, that bo->preferred_domains and 
bo->allower_domains are the same bitmask?




As such, when the code decides to throttle the migration for a client, it
is in fact not achieving anything. Buffers can still be either 
migrated or

not migrated based on the external (to this function and facility) logic.

Fix it by not changing the buffer object placements if the migration
budget has been spent.

FIXME:
Is it still required to call validate is the question..

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index 22708954ae68..d07a1dd7c880 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

  .no_wait_gpu = false,
  .resv = bo->tbo.base.resv
  };
+    bool migration_allowed = true;
  struct ttm_resource *old_res;
  uint32_t domain;
  int r;
@@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

   * visible VRAM if we've depleted our allowance to do
   * that.
   */
-    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
+    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) {
  domain = bo->preferred_domains;
-    else
+    } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
+    }
  } else {
  domain = bo->preferred_domains;
  }
  } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
  }
  retry:
-    amdgpu_bo_placement_from_domain(bo, domain);
+    if (migration_allowed)
+    amdgpu_bo_placement_from_domain(bo, domain);


That's completely invalid. Calling amdgpu_bo_placement_from_domain() is 
a mandatory prerequisite for calling ttm_bo_validate();


E.g. the usually code fow is:

/* This initializes bo->placement */
amdgpu_bo_placement_from_domain()

/* Eventually modify bo->placement to fit special requirements */


/* Apply the placement to the BO */
ttm_bo_validate(>tbo, >placement, )

To sum it up bo->placement should be a variable on the stack instead, 
but we never bothered to clean that up.


I am not clear if you agree or not that the current method of trying to 
avoid migration doesn't really do anything?


On stack placements sounds plausible to force migration avoidance by 
putting a single current object placement in that list, if that is what 
you have in mind? Or a specialized flag/version of 
amdgpu_bo_placement_from_domain with an bool input like 
"allow_placement_change"?


Regards,

Tvrtko



Regards,
Christian.


+
  r = ttm_bo_validate(>tbo, >placement, );
  if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {




Re: [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting

2024-05-15 Thread Tvrtko Ursulin




On 15/05/2024 08:14, Christian König wrote:

Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

The logic assumed any migration attempt worked and therefore would over-
account the amount of data migrated during buffer re-validation. As a
consequence client can be unfairly penalised by incorrectly considering
its migration budget spent.

Fix it by looking at the before and after buffer object backing store and
only account if there was a change.

FIXME:
I think this needs a better solution to account for migrations between
VRAM visible and non-visible portions.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +-
  1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index ec888fc6ead8..22708954ae68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

  .no_wait_gpu = false,
  .resv = bo->tbo.base.resv
  };
+    struct ttm_resource *old_res;
  uint32_t domain;
  int r;
  if (bo->tbo.pin_count)
  return 0;
+    old_res = bo->tbo.resource;
+
  /* Don't move this buffer if we have depleted our allowance
   * to move it. Don't move anything if the threshold is zero.
   */
@@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

  amdgpu_bo_placement_from_domain(bo, domain);
  r = ttm_bo_validate(>tbo, >placement, );
-    p->bytes_moved += ctx.bytes_moved;
-    if (!amdgpu_gmc_vram_full_visible(>gmc) &&
-    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
-    p->bytes_moved_vis += ctx.bytes_moved;
-
  if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
  domain = bo->allowed_domains;
  goto retry;
  }
+    if (!r) {
+    struct ttm_resource *new_res = bo->tbo.resource;
+    bool moved = true;
+
+    if (old_res == new_res)
+    moved = false;
+    else if (old_res && new_res &&
+ old_res->mem_type == new_res->mem_type)
+    moved = false;


The old resource might already be destroyed after you return from 
validation. So this here won't work.


Apart from that even when a migration attempt fails the moved bytes 
should be accounted.


When the validation attempt doesn't caused any moves then the bytecount 
here would be zero.


So as far as I can see that is as fair as you can get.


Right, I think I suffered a bit of tunnel vision here and completely 
ignore the _ctx_.moved_bytes part. Scratch this one too then.


Regards,

Tvrtko



Regards,
Christian.

PS: Looks like our mail servers are once more not very reliable.

If you get mails from me multiple times please just ignore it.


+
+    if (moved) {
+    p->bytes_moved += ctx.bytes_moved;
+    if (!amdgpu_gmc_vram_full_visible(>gmc) &&
+    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
+    p->bytes_moved_vis += ctx.bytes_moved;
+    }
+    }
+
  return r;
  }




  1   2   3   4   5   6   7   8   9   10   >