from:"Maíra Canal"

[PATCH 1/2] drm/v3d: Add V3D tech revision to the device information

2024-07-14 Thread Maíra Canal

The V3D tech revision can be a useful information when configuring
jobs. Therefore, expose it in the `struct v3d_dev` with the V3D tech
version.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 5 -
 drivers/gpu/drm/v3d/v3d_drv.h | 8 +---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index d38628e4fc2f..d7ff1f5fa481 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -267,7 +267,7 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
struct v3d_dev *v3d;
int ret;
u32 mmu_debug;
-   u32 ident1;
+   u32 ident1, ident3;
u64 mask;
 
v3d = devm_drm_dev_alloc(dev, _drm_driver, struct v3d_dev, drm);
@@ -300,6 +300,9 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
+   ident3 = V3D_READ(V3D_HUB_IDENT3);
+   v3d->rev = V3D_GET_FIELD(ident3, V3D_HUB_IDENT3_IPREV);
+
v3d_perfmon_init(v3d);
 
v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 8524761bc62d..cf4b23369dc4 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -98,10 +98,12 @@ struct v3d_perfmon {
 struct v3d_dev {
struct drm_device drm;
 
-   /* Short representation (e.g. 33, 41) of the V3D tech version
-* and revision.
-*/
+   /* Short representation (e.g. 33, 41) of the V3D tech version */
int ver;
+
+   /* Short representation (e.g. 5, 6) of the V3D tech revision */
+   int rev;
+
bool single_irq_line;
 
struct v3d_perfmon_info perfmon_info;
-- 
2.45.2

[PATCH 2/2] drm/v3d: Fix Indirect Dispatch configuration for V3D 7.1.6 and later

2024-07-14 Thread Maíra Canal

`args->cfg[4]` is configured in Indirect Dispatch using the number of
batches. Currently, for all V3D tech versions, `args->cfg[4]` equals the
number of batches subtracted by 1. But, for V3D 7.1.6 and later, we must not
subtract 1 from the number of batches.

Implement the fix by checking the V3D tech version and revision.

Fixes several `dEQP-VK.synchronization*` CTS tests related to Indirect Dispatch.

Fixes: 18b8413b25b7 ("drm/v3d: Create a CPU job extension for a indirect CSD 
job")
Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index d193072703f3..cafa3a298c11 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -353,7 +353,8 @@ v3d_rewrite_csd_job_wg_counts_from_indirect(struct 
v3d_cpu_job *job)
struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);
struct v3d_bo *indirect = to_v3d_bo(indirect_csd->indirect);
struct drm_v3d_submit_csd *args = _csd->job->args;
-   u32 *wg_counts;
+   struct v3d_dev *v3d = job->base.v3d;
+   u32 num_batches, *wg_counts;
 
v3d_get_bo_vaddr(bo);
v3d_get_bo_vaddr(indirect);
@@ -366,8 +367,17 @@ v3d_rewrite_csd_job_wg_counts_from_indirect(struct 
v3d_cpu_job *job)
args->cfg[0] = wg_counts[0] << V3D_CSD_CFG012_WG_COUNT_SHIFT;
args->cfg[1] = wg_counts[1] << V3D_CSD_CFG012_WG_COUNT_SHIFT;
args->cfg[2] = wg_counts[2] << V3D_CSD_CFG012_WG_COUNT_SHIFT;
-   args->cfg[4] = DIV_ROUND_UP(indirect_csd->wg_size, 16) *
-  (wg_counts[0] * wg_counts[1] * wg_counts[2]) - 1;
+
+   num_batches = DIV_ROUND_UP(indirect_csd->wg_size, 16) *
+ (wg_counts[0] * wg_counts[1] * wg_counts[2]);
+
+   /* V3D 7.1.6 and later don't subtract 1 from the number of batches */
+   if (v3d->ver < 71 || (v3d->ver == 71 && v3d->rev < 6))
+   args->cfg[4] = num_batches - 1;
+   else
+   args->cfg[4] = num_batches;
+
+   WARN_ON(args->cfg[4] == ~0);
 
for (int i = 0; i < 3; i++) {
/* 0x indicates that the uniform rewrite is not needed 
*/
-- 
2.45.2

Re: [PATCH v4 00/11] v3d: Perfmon cleanup

2024-07-13 Thread Maíra Canal


On 7/11/24 10:53, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

When we had to quickly deal with a tree build issue via merging
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit"), we
promised to follow up with a nicer solution.

As in the process of eliminating the hardcoded defines we have discovered a few
issues in handling of corner cases and userspace input validation, the fix has
turned into a larger series, but hopefully the end result is a justifiable
cleanup.

v2:
  * Re-order the patches so fixes come first while last three are optional
cleanups.

v3:
  * Fixed a bunch of rebase errors I made when re-ordering patches from v1 to 
v2.
  * Dropped the double underscore from __v3d_timestamp_query_info_free.
  * Added v3d prefix to v3d_copy_query_info.
  * Renamed qinfo to query_info.
  * Fixed some spelling errors and bad patch references.
  * Added mention to get_user to one commit message.
  * Dropped one patch from the series which became redundant due other
re-ordering.
  * Re-ordered last two patches with the view of dropping the last.

v4:
  * Fixed more rebase errors and details in commit messages.

  Cc: Maíra Canal 

Tvrtko Ursulin (11):
   drm/v3d: Prevent out of bounds access in performance query extensions
   drm/v3d: Fix potential memory leak in the timestamp extension
   drm/v3d: Fix potential memory leak in the performance extension
   drm/v3d: Validate passed in drm syncobj handles in the timestamp
 extension
   drm/v3d: Validate passed in drm syncobj handles in the performance
 extension
   drm/v3d: Move part of copying of reset/copy performance extension to a
 helper
   drm/v3d: Size the kperfmon_ids array at runtime
   drm/v3d: Do not use intermediate storage when copying performance
 query results
   drm/v3d: Move perfmon init completely into own unit
   drm/v3d: Prefer get_user for scalar types
   drm/v3d: Add some local variables in queries/extensions


I just applied all patches to drm-misc/drm-misc-next!

@Maxime, @Thomas or @Maarten, is it possible to cherry-pick the
following patches to drm-misc-fixes?

f32b5128d2c4 drm/v3d: Prevent out of bounds access in performance query 
extensions

753ce4fea621 drm/v3d: Fix potential memory leak in the timestamp extension
484de39fa5f5 drm/v3d: Fix potential memory leak in the performance extension
8d1276d1b8f7 drm/v3d: Validate passed in drm syncobj handles in the 
timestamp extension
a546b7e4d73c drm/v3d: Validate passed in drm syncobj handles in the 
performance extension


Tvrtko made sure to make them independent (Thanks Tvrtko!), so I believe
it is going to be pretty straight forward to cherry-pick them.

Thanks Tvrtko for the patches and all the maintainers for the great
work!

Best Regards,
- Maíra



  drivers/gpu/drm/v3d/v3d_drv.c |   9 +-
  drivers/gpu/drm/v3d/v3d_drv.h |  16 +-
  drivers/gpu/drm/v3d/v3d_perfmon.c |  44 +--
  .../gpu/drm/v3d/v3d_performance_counters.h|  16 +-
  drivers/gpu/drm/v3d/v3d_sched.c   | 105 +--
  drivers/gpu/drm/v3d/v3d_submit.c  | 294 +++---
  6 files changed, 290 insertions(+), 194 deletions(-)

Re: [PATCH] drm/v3d: Expose memory stats through fdinfo

2024-07-13 Thread Maíra Canal


On 7/11/24 11:25, Maíra Canal wrote:

Use the common DRM function `drm_show_memory_stats()` to expose standard
fdinfo memory stats.

V3D exposes global GPU memory stats through debugfs. Those stats will be
preserved while the DRM subsystem doesn't have a standard solution to
expose global GPU stats.

Signed-off-by: Maíra Canal 


Applied to drm-misc/drm-misc-next!

Best Regards,
- Maíra


---

* Example fdinfo output:

$ cat /proc/10100/fdinfo/19
pos:0
flags:  0242
mnt_id: 25
ino:521
drm-driver: v3d
drm-client-id:  81
drm-engine-bin: 4916187 ns
v3d-jobs-bin:   98 jobs
drm-engine-render:  154563573 ns
v3d-jobs-render:98 jobs
drm-engine-tfu: 10574 ns
v3d-jobs-tfu:   1 jobs
drm-engine-csd: 0 ns
v3d-jobs-csd:   0 jobs
drm-engine-cache_clean: 0 ns
v3d-jobs-cache_clean:   0 jobs
drm-engine-cpu: 0 ns
v3d-jobs-cpu:   0 jobs
drm-total-memory:   15168 KiB
drm-shared-memory:  9336 KiB
drm-active-memory:  0

* Example gputop output:

DRM minor 128
   PID  MEM  RSS   bin  render   tfucsd 
   cache_cleancpu   NAME
10257  19M  19M |  3.6% ▎ || 43.2% ██▋   ||  0.0%   ||  0.0%
   ||  0.0%   ||  0.0%   | glmark2
  9963   3M   3M |  0.3% ▏ ||  2.6% ▎ ||  0.0%   ||  0.0%   
||  0.0%   ||  0.0%   | glxgears
  9965  10M  10M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   
||  0.0%   ||  0.0%   | Xwayland
10100  14M  14M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%
   ||  0.0%   ||  0.0%   | chromium-browse

Best Regards,
- Maíra

  drivers/gpu/drm/v3d/v3d_bo.c  | 12 
  drivers/gpu/drm/v3d/v3d_drv.c |  2 ++
  2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a165cbcdd27b..ecb80fd75b1a 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -26,6 +26,17 @@
  #include "v3d_drv.h"
  #include "uapi/drm/v3d_drm.h"
  
+static enum drm_gem_object_status v3d_gem_status(struct drm_gem_object *obj)

+{
+   struct v3d_bo *bo = to_v3d_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   if (bo->base.pages)
+   res |= DRM_GEM_OBJECT_RESIDENT;
+
+   return res;
+}
+
  /* Called DRM core on the last userspace/kernel unreference of the
   * BO.
   */
@@ -63,6 +74,7 @@ static const struct drm_gem_object_funcs v3d_gem_funcs = {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = v3d_gem_status,
.vm_ops = _gem_shmem_vm_ops,
  };
  
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c

index a47f00b443d3..e883f405f26a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -184,6 +184,8 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
   v3d_queue_to_string(queue), jobs_completed);
}
+
+   drm_show_memory_stats(p, file);
  }
  
  static const struct file_operations v3d_drm_fops = {

Re: [PATCH v8 05/17] drm/vkms: Add dummy pixel_read/pixel_write callbacks to avoid NULL pointers

2024-07-13 Thread Maíra Canal


On 5/16/24 10:04, Louis Chauvet wrote:

Introduce two callbacks which does nothing. They are used in replacement
of NULL and it avoid kernel OOPS if this NULL is called.


I don't believe we should avoid a reasonable kernel OOPS. As you
noticed, if the user got this kernel OOPS it means that there is a
mismatch between what formats are announced by atomic_check and what is
really supported by atomic_update. This is a driver error.

If this happens, I want the kernel OOPS because it means I need to fix
the driver. Sometimes a kernel OOPS can save you some valuable debugging
time.

I'd probably drop this patch.

Best Regards,
- Maíra



If those callback are used, it means that there is a mismatch between
what formats are announced by atomic_check and what is realy supported by
atomic_update.

Acked-by: Pekka Paalanen 
Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/vkms_formats.c | 45 -
  1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index 6b3e17374b19..c28c32b00e39 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -135,6 +135,21 @@ static void RGB565_to_argb_u16(u8 *in_pixel, struct 
pixel_argb_u16 *out_pixel)
out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
  }
  
+/**

+ * magenta_to_argb_u16() - pixel_read callback which always read magenta
+ *
+ * This callback is used when an invalid format is requested for plane reading.
+ * It is used to avoid null pointer to be used as a function. In theory, this 
function should
+ * never be called, except if you found a bug in the driver/DRM core.
+ */
+static void magenta_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+{
+   out_pixel->a = (u16)0x;
+   out_pixel->r = (u16)0x;
+   out_pixel->g = 0;
+   out_pixel->b = (u16)0x;
+}
+
  /**
   * vkms_compose_row - compose a single row of a plane
   * @stage_buffer: output line with the composed pixels
@@ -237,6 +252,16 @@ static void argb_u16_to_RGB565(u8 *out_pixel, struct 
pixel_argb_u16 *in_pixel)
*pixel = cpu_to_le16(r << 11 | g << 5 | b);
  }
  
+/**

+ * argb_u16_to_nothing() - pixel_write callback with no effect
+ *
+ * This callback is used when an invalid format is requested for writeback.
+ * It is used to avoid null pointer to be used as a function. In theory, this 
should never
+ * happen, except if there is a bug in the driver
+ */
+static void argb_u16_to_nothing(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
+{}
+
  /**
   * vkms_writeback_row() - Generic loop for all supported writeback format. It 
is executed just
   * after the blending to write a line in the writeback buffer.
@@ -260,8 +285,10 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
  
  /**

   * get_pixel_conversion_function() - Retrieve the correct read_pixel function 
for a specific
- * format. The returned pointer is NULL for unsupported pixel formats. The 
caller must ensure that
- * the pointer is valid before using it in a vkms_plane_state.
+ * format.
+ *
+ * If the format is not supported by VKMS a warning is emitted and a dummy "always 
read magenta"
+ * function is returned.
   *
   * @format: DRM_FORMAT_* value for which to obtain a conversion function (see 
[drm_fourcc.h])
   */
@@ -284,18 +311,21 @@ pixel_read_t get_pixel_read_function(u32 format)
 * format must:
 * - Be listed in vkms_formats in vkms_plane.c
 * - Have a pixel_read callback defined here
+*
+* To avoid kernel crash, a dummy "always read magenta" 
function is used. It means
+* that during the composition, this plane will always be 
magenta.
 */
WARN(true,
 "Pixel format %p4cc is not supported by VKMS planes. This is a 
kernel bug, atomic check must forbid this configuration.\n",
 );
-   return (pixel_read_t)NULL;
+   return _to_argb_u16;
}
  }
  
  /**

   * get_pixel_write_function() - Retrieve the correct write_pixel function for 
a specific format.
- * The returned pointer is NULL for unsupported pixel formats. The caller must 
ensure that the
- * pointer is valid before using it in a vkms_writeback_job.
+ * If the format is not supported by VKMS a warning is emitted and a dummy "don't 
do anything"
+ * function is returned.
   *
   * @format: DRM_FORMAT_* value for which to obtain a conversion function (see 
[drm_fourcc.h])
   */
@@ -318,10 +348,13 @@ pixel_write_t get_pixel_write_function(u32 format)
 * format must:
 * - Be listed in vkms_wb_formats in vkms_writeback.c
 * - Have a pixel_write callback defined here
+*
+* To avoid kernel crash, a dummy "don't do anything" function 
is used. It means
+*

Re: [PATCH v8 04/17] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions

2024-07-13 Thread Maíra Canal


On 5/16/24 10:04, Louis Chauvet wrote:

Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
compiler to check if the passed functions take the correct arguments.
Such typedefs will help ensuring consistency across the code base in
case of update of these prototypes.

Rename input/output variable in a consistent way between read_line and
write_line.

A warn has been added in get_pixel_*_function to alert when an unsupported
pixel format is requested. As those formats are checked before
atomic_update callbacks, it should never happen.

Document for those typedefs.

Reviewed-by: Pekka Paalanen 
Signed-off-by: Louis Chauvet 
---
  drivers/gpu/drm/vkms/vkms_drv.h |  23 ++-
  drivers/gpu/drm/vkms/vkms_formats.c | 124 +---
  drivers/gpu/drm/vkms/vkms_formats.h |   4 +-
  drivers/gpu/drm/vkms/vkms_plane.c   |   2 +-
  4 files changed, 95 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 212f4ab6a71f..b1542b83b090 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -53,12 +53,31 @@ struct line_buffer {
struct pixel_argb_u16 *pixels;
  };
  
+/**

+ * typedef pixel_write_t - These functions are used to read a pixel from a
+ *  pixel_argb_u16, convert it in a specific format and write it in the 
@dst_pixels
+ * buffer.
+ *
+ * @out_pixel: destination address to write the pixel
+ * @in_pixel: pixel to write
+ */
+typedef void (*pixel_write_t)(u8 *out_pixel, struct pixel_argb_u16 *in_pixel);
+
  struct vkms_writeback_job {
struct iosys_map data[DRM_FORMAT_MAX_PLANES];
struct vkms_frame_info wb_frame_info;
-   void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
+   pixel_write_t pixel_write;
  };
  
+/**

+ * typedef pixel_read_t - These functions are used to read a pixel in the 
source frame,
+ * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
+ *
+ * @in_pixel: pointer to the pixel to read
+ * @out_pixel: pointer to write the converted pixel
+ */
+typedef void (*pixel_read_t)(u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
+
  /**
   * struct vkms_plane_state - Driver specific plane state
   * @base: base plane state
@@ -69,7 +88,7 @@ struct vkms_writeback_job {
  struct vkms_plane_state {
struct drm_shadow_plane_state base;
struct vkms_frame_info *frame_info;
-   void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel);
+   pixel_read_t pixel_read;
  };
  
  struct vkms_plane {

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index f157c43da4d6..6b3e17374b19 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -75,7 +75,7 @@ static int get_x_position(const struct vkms_frame_info 
*frame_info, int limit, i
   * They are used in the vkms_compose_row() function to handle multiple 
formats.
   */
  
-static void ARGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)

+static void ARGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
/*
 * The 257 is the "conversion ratio". This number is obtained by the
@@ -83,48 +83,48 @@ static void ARGB_to_argb_u16(u8 *src_pixels, struct 
pixel_argb_u16 *out_pixe
 * the best color value in a pixel format with more possibilities.
 * A similar idea applies to others RGB color conversions.
 */
-   out_pixel->a = (u16)src_pixels[3] * 257;
-   out_pixel->r = (u16)src_pixels[2] * 257;
-   out_pixel->g = (u16)src_pixels[1] * 257;
-   out_pixel->b = (u16)src_pixels[0] * 257;
+   out_pixel->a = (u16)in_pixel[3] * 257;
+   out_pixel->r = (u16)in_pixel[2] * 257;
+   out_pixel->g = (u16)in_pixel[1] * 257;
+   out_pixel->b = (u16)in_pixel[0] * 257;
  }
  
-static void XRGB_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)

+static void XRGB_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
out_pixel->a = (u16)0x;
-   out_pixel->r = (u16)src_pixels[2] * 257;
-   out_pixel->g = (u16)src_pixels[1] * 257;
-   out_pixel->b = (u16)src_pixels[0] * 257;
+   out_pixel->r = (u16)in_pixel[2] * 257;
+   out_pixel->g = (u16)in_pixel[1] * 257;
+   out_pixel->b = (u16)in_pixel[0] * 257;
  }
  
-static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)

+static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 
*out_pixel)
  {
-   u16 *pixels = (u16 *)src_pixels;
+   u16 *pixel = (u16 *)in_pixel;
  
-	out_pixel->a = le16_to_cpu(pixels[3]);

-   out_pixel->r = le16_to_cpu(pixels[2]);
-   out_pixel->g = le16_to_cpu(pixels[1]);
-   out_pixel->b = le16_to_cpu(pixels[0]);
+   out_pixel->a = le16_to_cpu(pixel[3]);
+   out_pixel->r = le16_to_cpu(pixel[2]);
+   out_pixel->g = le16_to_cpu(pixel[1]);
+   out_pixel->b =

Re: [PATCH v8 03/17] drm/vkms: write/update the documentation for pixel conversion and pixel write functions

2024-07-13 Thread Maíra Canal


On 5/16/24 10:04, Louis Chauvet wrote:

Add some documentation on pixel conversion functions.
Update of outdated comments for pixel_write functions.

Signed-off-by: Louis Chauvet 
Acked-by: Pekka Paalanen 
---
  drivers/gpu/drm/vkms/vkms_composer.c |  7 
  drivers/gpu/drm/vkms/vkms_drv.h  | 15 -
  drivers/gpu/drm/vkms/vkms_formats.c  | 62 ++--
  3 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
b/drivers/gpu/drm/vkms/vkms_composer.c
index c6d9b4a65809..da0651a94c9b 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -189,6 +189,13 @@ static void blend(struct vkms_writeback_job *wb,
  
  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
  
+	/*

+* The planes are composed line-by-line to avoid heavy memory usage. It 
is a necessary
+* complexity to avoid poor blending performance.
+*
+* The function vkms_compose_row is used to read a line, 
pixel-by-pixel, into the staging


Nit: I know it's not kerneldoc, but I'd be glad if you use
vkms_compose_row()


+* buffer.
+*/
for (size_t y = 0; y < crtc_y_limit; y++) {
fill_background(_color, output_buffer);
  
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h

index b4b357447292..212f4ab6a71f 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -25,6 +25,17 @@
  
  #define VKMS_LUT_SIZE 256
  
+/**

+ * struct vkms_frame_info - structure to store the state of a frame


s/structure/Structure


+ *
+ * @fb: backing drm framebuffer
+ * @src: source rectangle of this frame in the source framebuffer, stored in 
16.16 fixed-point form
+ * @dst: destination rectangle in the crtc buffer, stored in whole pixel units
+ * @map: see drm_shadow_plane_state@data
+ * @rotation: rotation applied to the source.
+ *
+ * @src and @dst should have the same size modulo the rotation.
+ */
  struct vkms_frame_info {
struct drm_framebuffer *fb;
struct drm_rect src, dst;
@@ -49,9 +60,11 @@ struct vkms_writeback_job {
  };
  
  /**

- * vkms_plane_state - Driver specific plane state
+ * struct vkms_plane_state - Driver specific plane state
   * @base: base plane state
   * @frame_info: data required for composing computation
+ * @pixel_read: function to read a pixel in this plane. The creator of a 
vkms_plane_state must


s/vkms_plane_state/struct vkms_plane_state


+ * ensure that this pointer is valid


Note: "If the @argument description has multiple lines, the continuation
of the description should start at the same column as the previous
line:" [1]

[1] https://docs.kernel.org/doc-guide/kernel-doc.html#function-parameters


   */
  struct vkms_plane_state {
struct drm_shadow_plane_state base;
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c 
b/drivers/gpu/drm/vkms/vkms_formats.c
index d597c48452ac..f157c43da4d6 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -9,6 +9,18 @@
  
  #include "vkms_formats.h"
  
+/**

+ * pixel_offset() - Get the offset of the pixel at coordinates x/y in the 
first plane
+ *
+ * @frame_info: Buffer metadata
+ * @x: The x coordinate of the wanted pixel in the buffer
+ * @y: The y coordinate of the wanted pixel in the buffer
+ *
+ * The caller must ensure that the framebuffer associated with this request 
uses a pixel format
+ * where block_h == block_w == 1.
+ * If this requirement is not fulfilled, the resulting offset can point to an 
other pixel or
+ * outside of the buffer.
+ */
  static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, 
int y)
  {
struct drm_framebuffer *fb = frame_info->fb;
@@ -16,18 +28,22 @@ static size_t pixel_offset(const struct vkms_frame_info 
*frame_info, int x, int
return fb->offsets[0] + (y * fb->pitches[0]) + (x * fb->format->cpp[0]);
  }
  
-/*

- * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
+/**
+ * packed_pixels_addr() - Get the pointer to the block containing the pixel at 
the given
+ * coordinates
   *
   * @frame_info: Buffer metadata
- * @x: The x(width) coordinate of the 2D buffer
- * @y: The y(Heigth) coordinate of the 2D buffer
+ * @x: The x (width) coordinate inside the plane
+ * @y: The y (height) coordinate inside the plane
   *
   * Takes the information stored in the frame_info, a pair of coordinates, and
   * returns the address of the first color channel.
   * This function assumes the channels are packed together, i.e. a color 
channel
   * comes immediately after another in the memory. And therefore, this function
   * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
+ *
+ * The caller must ensure that the framebuffer associated with this request 
uses a pixel format
+ * where block_h == block_w == 1, otherwise the returned pointer can be 
outside the buffer.
   */

[PATCH] drm/v3d: Expose memory stats through fdinfo

2024-07-11 Thread Maíra Canal

Use the common DRM function `drm_show_memory_stats()` to expose standard
fdinfo memory stats.

V3D exposes global GPU memory stats through debugfs. Those stats will be
preserved while the DRM subsystem doesn't have a standard solution to
expose global GPU stats.

Signed-off-by: Maíra Canal 
---

* Example fdinfo output:

$ cat /proc/10100/fdinfo/19
pos:0
flags:  0242
mnt_id: 25
ino:521
drm-driver: v3d
drm-client-id:  81
drm-engine-bin: 4916187 ns
v3d-jobs-bin:   98 jobs
drm-engine-render:  154563573 ns
v3d-jobs-render:98 jobs
drm-engine-tfu: 10574 ns
v3d-jobs-tfu:   1 jobs
drm-engine-csd: 0 ns
v3d-jobs-csd:   0 jobs
drm-engine-cache_clean: 0 ns
v3d-jobs-cache_clean:   0 jobs
drm-engine-cpu: 0 ns
v3d-jobs-cpu:   0 jobs
drm-total-memory:   15168 KiB
drm-shared-memory:  9336 KiB
drm-active-memory:  0

* Example gputop output:

DRM minor 128
  PID  MEM  RSS   bin  render   tfucsd  
  cache_cleancpu   NAME
10257  19M  19M |  3.6% ▎ || 43.2% ██▋   ||  0.0%   ||  0.0%
   ||  0.0%   ||  0.0%   | glmark2
 9963   3M   3M |  0.3% ▏ ||  2.6% ▎ ||  0.0%   ||  0.0%
   ||  0.0%   ||  0.0%   | glxgears
 9965  10M  10M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%
   ||  0.0%   ||  0.0%   | Xwayland
10100  14M  14M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%
   ||  0.0%   ||  0.0%   | chromium-browse

Best Regards,
- Maíra

 drivers/gpu/drm/v3d/v3d_bo.c  | 12 
 drivers/gpu/drm/v3d/v3d_drv.c |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a165cbcdd27b..ecb80fd75b1a 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -26,6 +26,17 @@
 #include "v3d_drv.h"
 #include "uapi/drm/v3d_drm.h"
 
+static enum drm_gem_object_status v3d_gem_status(struct drm_gem_object *obj)
+{
+   struct v3d_bo *bo = to_v3d_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   if (bo->base.pages)
+   res |= DRM_GEM_OBJECT_RESIDENT;
+
+   return res;
+}
+
 /* Called DRM core on the last userspace/kernel unreference of the
  * BO.
  */
@@ -63,6 +74,7 @@ static const struct drm_gem_object_funcs v3d_gem_funcs = {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = v3d_gem_status,
.vm_ops = _gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index a47f00b443d3..e883f405f26a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -184,6 +184,8 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
   v3d_queue_to_string(queue), jobs_completed);
}
+
+   drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations v3d_drm_fops = {
-- 
2.45.2

Re: [PATCH 09/11] drm/v3d: Move perfmon init completely into own unit

2024-07-11 Thread Maíra Canal


On 7/11/24 06:15, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Now that the build time dependencies on various array sizes have been
removed, we can move the perfmon init completely into its own compilation
unit and remove the hardcoded defines.

This improves on the temporary fix quickly delivered in
9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning").


Just to please checkpatch, can you write "...delivered in commit 
9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning")."


You can send the next version with my R-b:

Reviewed-by: Maíra Canal 

Best Regards,
- Maíra



Signed-off-by: Tvrtko Ursulin 
References: 9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning")
---
  drivers/gpu/drm/v3d/v3d_drv.c |  9 +---
  drivers/gpu/drm/v3d/v3d_drv.h |  6 +--
  drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++
  .../gpu/drm/v3d/v3d_performance_counters.h| 16 ---
  4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index a47f00b443d3..491c638a4d74 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
args->value = 1;
return 0;
case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
-   args->value = v3d->max_counters;
+   args->value = v3d->perfmon_info.max_counters;
return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
@@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
  
-	if (v3d->ver >= 71)

-   v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS;
-   else if (v3d->ver >= 42)
-   v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS;
-   else
-   v3d->max_counters = 0;
+   v3d_perfmon_init(v3d);
  
  	v3d->reset = devm_reset_control_get_exclusive(dev, NULL);

if (IS_ERR(v3d->reset)) {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index b1dfec49ba7d..8524761bc62d 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,10 +104,7 @@ struct v3d_dev {
int ver;
bool single_irq_line;
  
-	/* Different revisions of V3D have different total number of performance

-* counters
-*/
-   unsigned int max_counters;
+   struct v3d_perfmon_info perfmon_info;
  
  	void __iomem *hub_regs;

void __iomem *core_regs[3];
@@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d);
  void v3d_sched_fini(struct v3d_dev *v3d);
  
  /* v3d_perfmon.c */

+void v3d_perfmon_init(struct v3d_dev *v3d);
  void v3d_perfmon_get(struct v3d_perfmon *perfmon);
  void v3d_perfmon_put(struct v3d_perfmon *perfmon);
  void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index b7d0b02e1a95..cd7f1eedf17f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc 
v3d_v71_performance_counters[] = {
{"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any other 
reason (vary/W/Z)"},
  };
  
+void v3d_perfmon_init(struct v3d_dev *v3d)

+{
+   const struct v3d_perf_counter_desc *counters = NULL;
+   unsigned int max = 0;
+
+   if (v3d->ver >= 71) {
+   counters = v3d_v71_performance_counters;
+   max = ARRAY_SIZE(v3d_v71_performance_counters);
+   } else if (v3d->ver >= 42) {
+   counters = v3d_v42_performance_counters;
+   max = ARRAY_SIZE(v3d_v42_performance_counters);
+   }
+
+   v3d->perfmon_info.max_counters = max;
+   v3d->perfmon_info.counters = counters;
+}
+
  void v3d_perfmon_get(struct v3d_perfmon *perfmon)
  {
if (perfmon)
@@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
  
  	/* Make sure all counters are valid. */

for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= v3d->max_counters)
+   if (req->counters[i] >= v3d->perfmon_info.max_counters)
return -EINVAL;
}
  
@@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,

return -EINVAL;
}
  
-	/* Make sure that the counter ID is valid */

-   if (req->counter >= v3d->max_counters)
-   return -EINVAL;
-
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v4

Re: [PATCH 08/11] drm/v3d: Do not use intermediate storage when copying performance query results

2024-07-11 Thread Maíra Canal


On 7/11/24 06:15, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Removing the intermediate buffer removes the last use of the
V3D_MAX_COUNTERS define, which will enable further driver cleanup.

While at it pull the 32 vs 64 bit copying decision outside the loop in
order to reduce the number of conditional instructions.

Signed-off-by: Tvrtko Ursulin 


After addressing Iago's comment, you can add my

Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/v3d/v3d_sched.c | 60 -
  1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 7b2195ba4248..2564467735fc 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -421,18 +421,23 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job)
v3d_put_bo_vaddr(bo);
  }
  
+static void write_to_buffer_32(u32 *dst, unsigned int idx, u32 value)

+{
+   dst[idx] = value;
+}
+
+static void write_to_buffer_64(u64 *dst, unsigned int idx, u64 value)
+{
+   dst[idx] = value;
+}
+
  static void
-write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value)
+write_to_buffer(void *dst, unsigned int idx, bool do_64bit, u64 value)
  {
-   if (do_64bit) {
-   u64 *dst64 = (u64 *)dst;
-
-   dst64[idx] = value;
-   } else {
-   u32 *dst32 = (u32 *)dst;
-
-   dst32[idx] = (u32)value;
-   }
+   if (do_64bit)
+   write_to_buffer_64(dst, idx, value);
+   else
+   write_to_buffer_32(dst, idx, value);
  }
  
  static void

@@ -505,18 +510,23 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job)
  }
  
  static void

-v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 
query)
+v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data,
+  unsigned int query)
  {
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
-   struct v3d_copy_query_results_info *copy = >copy;
+   struct v3d_performance_query_info *performance_query =
+   >performance_query;
struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
struct v3d_dev *v3d = job->base.v3d;
-   struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_MAX_COUNTERS];
+   unsigned int i, j, offset;
  
-	for (int i = 0; i < performance_query->nperfmons; i++) {

-   perfmon = v3d_perfmon_find(v3d_priv,
-  
performance_query->queries[query].kperfmon_ids[i]);
+   for (i = 0, offset = 0;
+i < performance_query->nperfmons;
+i++, offset += DRM_V3D_MAX_PERF_COUNTERS) {
+   struct v3d_performance_query *q =
+   _query->queries[query];
+   struct v3d_perfmon *perfmon;
+
+   perfmon = v3d_perfmon_find(v3d_priv, q->kperfmon_ids[i]);
if (!perfmon) {
DRM_DEBUG("Failed to find perfmon.");
continue;
@@ -524,14 +534,18 @@ v3d_write_performance_query_result(struct v3d_cpu_job 
*job, void *data, u32 quer
  
  		v3d_perfmon_stop(v3d, perfmon, true);
  
-		memcpy(_values[i * DRM_V3D_MAX_PERF_COUNTERS], perfmon->values,

-  perfmon->ncounters * sizeof(u64));
+   if (job->copy.do_64bit) {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_64(data, offset + j,
+  perfmon->values[j]);
+   } else {
+   for (j = 0; j < perfmon->ncounters; j++)
+   write_to_buffer_32(data, offset + j,
+  perfmon->values[j]);
+   }
  
  		v3d_perfmon_put(perfmon);

}
-
-   for (int i = 0; i < performance_query->ncounters; i++)
-   write_to_buffer(data, i, copy->do_64bit, counter_values[i]);
  }
  
  static void

Re: [PATCH 07/11] drm/v3d: Size the kperfmon_ids array at runtime

2024-07-11 Thread Maíra Canal


On 7/11/24 06:15, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Instead of statically reserving pessimistic space for the kperfmon_ids
array, make the userspace extension code allocate the exactly required
amount of space.

Apart from saving some memory at runtime, this also removes the need for
the V3D_MAX_PERFMONS macro whose removal will benefit further driver
cleanup.

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/v3d/v3d_drv.h|  6 +-
  drivers/gpu/drm/v3d/v3d_sched.c  |  4 +++-
  drivers/gpu/drm/v3d/v3d_submit.c | 17 +++--
  3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index dd3ead4cb8bd..b1dfec49ba7d 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,13 +351,9 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
  };
  
-/* Number of perfmons required to handle all supported performance counters */

-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
- DRM_V3D_MAX_PERF_COUNTERS)
-
  struct v3d_performance_query {
/* Performance monitor IDs for this query */
-   u32 kperfmon_ids[V3D_MAX_PERFMONS];
+   u32 *kperfmon_ids;
  
  	/* Syncobj that indicates the query availability */

struct drm_syncobj *syncobj;
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 5fbbee47c6b7..7b2195ba4248 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -94,8 +94,10 @@ v3d_performance_query_info_free(struct 
v3d_performance_query_info *query_info,
if (query_info->queries) {
unsigned int i;
  
-		for (i = 0; i < count; i++)

+   for (i = 0; i < count; i++) {
drm_syncobj_put(query_info->queries[i].syncobj);
+   kvfree(query_info->queries[i].kperfmon_ids);
+   }
  
  		kvfree(query_info->queries);

}
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index ce56e31a027d..d1060e60aafa 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -671,10 +671,20 @@ v3d_copy_query_info(struct v3d_performance_query_info 
*query_info,
goto error;
}
  
+		query->kperfmon_ids =

+   kvmalloc_array(nperfmons,
+  sizeof(struct v3d_performance_query *),
+  GFP_KERNEL);
+   if (!query->kperfmon_ids) {
+   err = -ENOMEM;
+   goto error;
+   }
+
ids_pointer = u64_to_user_ptr(ids);
  
  		for (j = 0; j < nperfmons; j++) {

if (get_user(id, ids_pointer++)) {
+   kvfree(query->kperfmon_ids);
err = -EFAULT;
goto error;
}
@@ -684,6 +694,7 @@ v3d_copy_query_info(struct v3d_performance_query_info 
*query_info,
  
  		query->syncobj = drm_syncobj_find(file_priv, sync);

if (!query->syncobj) {
+   kvfree(query->kperfmon_ids);
err = -ENOENT;
goto error;
}
@@ -717,9 +728,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
  
-	if (reset.nperfmons > V3D_MAX_PERFMONS)

-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
  
  	job->performance_query.queries = kvmalloc_array(reset.count,

@@ -767,9 +775,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
  
-	if (copy.nperfmons > V3D_MAX_PERFMONS)

-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
  
  	job->performance_query.queries = kvmalloc_array(copy.count,

Re: [PATCH 06/11] drm/v3d: Move part of copying of reset/copy performance extension to a helper

2024-07-11 Thread Maíra Canal


On 7/11/24 06:15, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

The loop which looks up the syncobj and copies the kperfmon ids is
identical so lets move it to a helper.

The only change is replacing copy_from_user with get_user when copying a
scalar.

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/v3d/v3d_submit.c | 152 ++-
  1 file changed, 68 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 3838ebade45d..ce56e31a027d 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -644,15 +644,64 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
return err;
  }
  
+static int

+v3d_copy_query_info(struct v3d_performance_query_info *query_info,
+   unsigned int count,
+   unsigned int nperfmons,
+   u32 __user *syncs,
+   u64 __user *kperfmon_ids,
+   struct drm_file *file_priv)
+{
+   unsigned int i, j;
+   int err;
+
+   for (i = 0; i < count; i++) {
+   struct v3d_performance_query *query = _info->queries[i];
+   u32 __user *ids_pointer;
+   u32 sync, id;
+   u64 ids;
+
+   if (get_user(sync, syncs++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   if (get_user(ids, kperfmon_ids++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   ids_pointer = u64_to_user_ptr(ids);
+
+   for (j = 0; j < nperfmons; j++) {
+   if (get_user(id, ids_pointer++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   query->kperfmon_ids[j] = id;
+   }
+
+   query->syncobj = drm_syncobj_find(file_priv, sync);
+   if (!query->syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
+   }
+
+   return 0;
+
+error:
+   v3d_performance_query_info_free(query_info, i);
+   return err;
+}
+
  static int
  v3d_get_cpu_reset_performance_params(struct drm_file *file_priv,
 struct drm_v3d_extension __user *ext,
 struct v3d_cpu_job *job)
  {
-   u32 __user *syncs;
-   u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
-   unsigned int i, j;
int err;
  
  	if (!job) {

@@ -679,50 +728,19 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (!job->performance_query.queries)
return -ENOMEM;
  
-	syncs = u64_to_user_ptr(reset.syncs);

-   kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
+   err = v3d_copy_query_info(>performance_query,
+ reset.count,
+ reset.nperfmons,
+ u64_to_user_ptr(reset.syncs),
+ u64_to_user_ptr(reset.kperfmon_ids),
+ file_priv);
+   if (err)
+   return err;
  
-	for (i = 0; i < reset.count; i++) {

-   u32 sync;
-   u64 ids;
-   u32 __user *ids_pointer;
-   u32 id;
-
-   if (copy_from_user(, syncs++, sizeof(sync))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   ids_pointer = u64_to_user_ptr(ids);
-
-   for (j = 0; j < reset.nperfmons; j++) {
-   if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   job->performance_query.queries[i].kperfmon_ids[j] = id;
-   }
-
-   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->performance_query.queries[i].syncobj) {
-   err = -ENOENT;
-   goto error;
-   }
-   }
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
  
  	return 0;

-
-error:
-   v3d_performance_query_info_free(>performance_query, i);
-   return err;
  }
  
  static int

@@ -730,10 +748,7 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
  struct drm_v3d_extension __user *ext,

Re: [PATCH 05/11] drm/v3d: Validate passed in drm syncobj handles in the performance extension

2024-07-11 Thread Maíra Canal


On 7/11/24 06:15, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfuly or otherwise fail the
extension by jumping into the existing unwind.


I'm not a English-native speaker, but again I need to say that it feels
to me that it is something missing in this sentence.

I suggested "Fix it by checking if the handle..."

Also, s/successfuly/successfully



Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance 
query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_submit.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index e3a00c8394a5..3838ebade45d 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -710,6 +710,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
}
  
  		job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
@@ -790,6 +794,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
}
  
  		job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->performance_query.count = copy.count;
job->performance_query.nperfmons = copy.nperfmons;

Re: [PATCH 04/11] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension

2024-07-11 Thread Maíra Canal


On 7/11/24 06:15, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfully or otherwise fail the
extension by jumping into the existing unwind.


I'm not a English-native speaker, but again I need to say that it feels 
to me that it is something missing in this sentence.


I suggested "Fix it by checking if the handle..."



Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query 
job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_submit.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index d626c8539b04..e3a00c8394a5 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -498,6 +498,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
}
  
  		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = timestamp.count;
  
@@ -552,6 +556,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv,

}
  
  		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = reset.count;
  
@@ -616,6 +624,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv,

}
  
  		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = copy.count;

Re: [PATCH 02/11] drm/v3d: Fix potential memory leak in the timestamp extension

2024-07-11 Thread Maíra Canal


On 7/11/24 06:15, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query 
job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
  drivers/gpu/drm/v3d/v3d_sched.c  | 22 +++-
  drivers/gpu/drm/v3d/v3d_submit.c | 43 ++--
  3 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 099b962bdfde..e208ffdfba32 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
  void v3d_mmu_remove_ptes(struct v3d_bo *bo);
  
  /* v3d_sched.c */

+void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info,
+  unsigned int count);
  void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
  int v3d_sched_init(struct v3d_dev *v3d);
  void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 03df37a3acf5..59dc0287dab9 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job)
v3d_job_cleanup(job);
  }
  
+void

+v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *query_info,
+ unsigned int count)
+{
+   if (query_info->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(query_info->queries[i].syncobj);
+
+   kvfree(query_info->queries);
+   }
+}
+
  static void
  v3d_cpu_job_free(struct drm_sched_job *sched_job)
  {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_timestamp_query_info *timestamp_query = 
>timestamp_query;
struct v3d_performance_query_info *performance_query = 
>performance_query;
  
-	if (timestamp_query->queries) {

-   for (int i = 0; i < timestamp_query->count; i++)
-   drm_syncobj_put(timestamp_query->queries[i].syncobj);
-   kvfree(timestamp_query->queries);
-   }
+   v3d_timestamp_query_info_free(>timestamp_query,
+ job->timestamp_query.count);
  
  	if (performance_query->queries) {

for (int i = 0; i < performance_query->count; i++)
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 263fefc1d04f..121bf1314b80 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,8 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
  {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   unsigned int i;
+   int err;
  
  	if (!job) {

DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -480,19 +482,19 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
offsets = u64_to_user_ptr(timestamp.offsets);
syncs = u64_to_user_ptr(timestamp.syncs);
  
-	for (int i = 0; i < timestamp.count; i++) {

+   for (i = 0; i < timestamp.count; i++) {
u32 offset, sync;
  
  		if (copy_from_user(, offsets++, sizeof(offset))) {

-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
  		job->timestamp_query.queries[i].offset = offset;
  
  		if (copy_from_user(, syncs++, sizeof(sync))) {

-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
  		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

@@ -500,6 +502,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
job->timestamp_query.count = timestamp.count;
  
  	return 0;

+
+error:
+   v3d_timestamp_query_info_free(>timestamp_query, i);
+   return err;
  }
  
  static int

@@ -509,6 +515,8 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
  {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   unsigned int i;
+   int err;
  
  	if (!job) {

DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -533,14 +541,14 @@ v3d

Re: [PATCH 03/11] drm/v3d: Fix potential memory leak in the performance extension

2024-07-11 Thread Maíra Canal


On 7/11/24 06:15, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance 
query job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
  drivers/gpu/drm/v3d/v3d_sched.c  | 22 ++
  drivers/gpu/drm/v3d/v3d_submit.c | 50 
  3 files changed, 49 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index e208ffdfba32..dd3ead4cb8bd 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo);
  /* v3d_sched.c */
  void v3d_timestamp_query_info_free(struct v3d_timestamp_query_info 
*query_info,
   unsigned int count);
+void v3d_performance_query_info_free(struct v3d_performance_query_info 
*query_info,
+unsigned int count);
  void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
  int v3d_sched_init(struct v3d_dev *v3d);
  void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 59dc0287dab9..5fbbee47c6b7 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -87,20 +87,30 @@ v3d_timestamp_query_info_free(struct 
v3d_timestamp_query_info *query_info,
}
  }
  
+void

+v3d_performance_query_info_free(struct v3d_performance_query_info *query_info,
+   unsigned int count)
+{
+   if (query_info->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(query_info->queries[i].syncobj);
+
+   kvfree(query_info->queries);
+   }
+}
+
  static void
  v3d_cpu_job_free(struct drm_sched_job *sched_job)
  {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
  
  	v3d_timestamp_query_info_free(>timestamp_query,

  job->timestamp_query.count);
  
-	if (performance_query->queries) {

-   for (int i = 0; i < performance_query->count; i++)
-   drm_syncobj_put(performance_query->queries[i].syncobj);
-   kvfree(performance_query->queries);
-   }
+   v3d_performance_query_info_free(>performance_query,
+   job->performance_query.count);
  
  	v3d_job_cleanup(>base);

  }
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 121bf1314b80..d626c8539b04 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -640,6 +640,8 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u32 __user *syncs;
u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
+   unsigned int i, j;
+   int err;
  
  	if (!job) {

DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -668,39 +670,43 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
syncs = u64_to_user_ptr(reset.syncs);
kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
  
-	for (int i = 0; i < reset.count; i++) {

+   for (i = 0; i < reset.count; i++) {
u32 sync;
u64 ids;
u32 __user *ids_pointer;
u32 id;
  
  		if (copy_from_user(, syncs++, sizeof(sync))) {

-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
-		job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

-
if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
  		ids_pointer = u64_to_user_ptr(ids);
  
-		for (int j = 0; j < reset.nperfmons; j++) {

+   for (j = 0; j < reset.nperfmons; j++) {
if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
  			job->performance_query.queries[i]

Re: [PATCH 12/12] drm/v3d: Prefer get_user for scalar types

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

It makes it just a tiny bit more obvious what is going on.

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/v3d/v3d_submit.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index b0c2a8e9cb06..9273b0aadb79 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -486,14 +486,14 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
for (i = 0; i < timestamp.count; i++) {
u32 offset, sync;
  
-		if (copy_from_user(, offsets++, sizeof(offset))) {

+   if (get_user(offset, offsets++)) {
err = -EFAULT;
goto error;
}
  
  		qinfo->queries[i].offset = offset;
  
-		if (copy_from_user(, syncs++, sizeof(sync))) {

+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
@@ -552,7 +552,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
  
  		qinfo->queries[i].offset = reset.offset + 8 * i;
  
-		if (copy_from_user(, syncs++, sizeof(sync))) {

+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}
@@ -614,14 +614,14 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
for (i = 0; i < copy.count; i++) {
u32 offset, sync;
  
-		if (copy_from_user(, offsets++, sizeof(offset))) {

+   if (get_user(offset, offsets++)) {
err = -EFAULT;
goto error;
}
  
  		qinfo->queries[i].offset = offset;
  
-		if (copy_from_user(, syncs++, sizeof(sync))) {

+   if (get_user(sync, syncs++)) {
err = -EFAULT;
goto error;
}

Re: [PATCH 11/12] drm/v3d: Add some local variables in queries/extensions

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Add some local variables to make the code a bit less verbose, with the
main benefit being pulling some lines to under 80 columns wide.

Signed-off-by: Tvrtko Ursulin 


I'd prefer `query_info`, but anyway:

Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/v3d/v3d_submit.c | 79 +---
  1 file changed, 42 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 34ecd844f16a..b0c2a8e9cb06 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
  {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   struct v3d_timestamp_query_info *qinfo = >timestamp_query;
unsigned int i;
int err;
  
@@ -473,10 +474,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv,
  
  	job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
  
-	job->timestamp_query.queries = kvmalloc_array(timestamp.count,

- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   qinfo->queries = kvmalloc_array(timestamp.count,
+   sizeof(struct v3d_timestamp_query),
+   GFP_KERNEL);
+   if (!qinfo->queries)
return -ENOMEM;
  
  	offsets = u64_to_user_ptr(timestamp.offsets);

@@ -490,20 +491,20 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
goto error;
}
  
-		job->timestamp_query.queries[i].offset = offset;

+   qinfo->queries[i].offset = offset;
  
  		if (copy_from_user(, syncs++, sizeof(sync))) {

err = -EFAULT;
goto error;
}
  
-		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

-   if (!job->timestamp_query.queries[i].syncobj) {
+   qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+   if (!qinfo->queries[i].syncobj) {
err = -ENOENT;
goto error;
}
}
-   job->timestamp_query.count = timestamp.count;
+   qinfo->count = timestamp.count;
  
  	return 0;
  
@@ -519,6 +520,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv,

  {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   struct v3d_timestamp_query_info *qinfo = >timestamp_query;
unsigned int i;
int err;
  
@@ -537,10 +539,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv,
  
  	job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY;
  
-	job->timestamp_query.queries = kvmalloc_array(reset.count,

- sizeof(struct 
v3d_timestamp_query),
- GFP_KERNEL);
-   if (!job->timestamp_query.queries)
+   qinfo->queries = kvmalloc_array(reset.count,
+   sizeof(struct v3d_timestamp_query),
+   GFP_KERNEL);
+   if (!qinfo->queries)
return -ENOMEM;
  
  	syncs = u64_to_user_ptr(reset.syncs);

@@ -548,20 +550,20 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
for (i = 0; i < reset.count; i++) {
u32 sync;
  
-		job->timestamp_query.queries[i].offset = reset.offset + 8 * i;

+   qinfo->queries[i].offset = reset.offset + 8 * i;
  
  		if (copy_from_user(, syncs++, sizeof(sync))) {

err = -EFAULT;
goto error;
}
  
-		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

-   if (!job->timestamp_query.queries[i].syncobj) {
+   qinfo->queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+   if (!qinfo->queries[i].syncobj) {
err = -ENOENT;
goto error;
}
}
-   job->timestamp_query.count = reset.count;
+   qinfo->count = reset.count;
  
  	return 0;
  
@@ -578,6 +580,7 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv,

  {
u32 __user *offsets, *syncs;
struct drm_v3d_copy_timestamp_query copy;
+   struct v3d_timestamp_query_info *qinfo = >timestamp_query;
unsigned int i;
int err;
  
@@ -599,10 +602,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv,
  
  	job->job_type = V3D_CPU

Re: [PATCH 10/12] drm/v3d: Align data types of internal and uapi counts

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

In the timestamp and performance extensions userspace type for counts is
u32 so lets use unsigned in the kernel too.

Signed-off-by: Tvrtko Ursulin 
---
  drivers/gpu/drm/v3d/v3d_submit.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 8dae3ab5f936..34ecd844f16a 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
  {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   unsigned int i;
int err;
  
  	if (!job) {

@@ -481,7 +482,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
offsets = u64_to_user_ptr(timestamp.offsets);
syncs = u64_to_user_ptr(timestamp.syncs);
  
-	for (int i = 0; i < timestamp.count; i++) {

+   for (i = 0; i < timestamp.count; i++) {


Can't we just replace this line for
for (u32 i = 0; i < timestamp.count; i++) {
or
for (unsigned int i = 0; i < timestamp.count; i++) {
?

Well, anyway, just a minor nit, this is:

Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


u32 offset, sync;
  
  		if (copy_from_user(, offsets++, sizeof(offset))) {

@@ -518,6 +519,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
  {
u32 __user *syncs;
struct drm_v3d_reset_timestamp_query reset;
+   unsigned int i;
int err;
  
  	if (!job) {

@@ -543,7 +545,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
  
  	syncs = u64_to_user_ptr(reset.syncs);
  
-	for (int i = 0; i < reset.count; i++) {

+   for (i = 0; i < reset.count; i++) {
u32 sync;
  
  		job->timestamp_query.queries[i].offset = reset.offset + 8 * i;

@@ -576,7 +578,8 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
  {
u32 __user *offsets, *syncs;
struct drm_v3d_copy_timestamp_query copy;
-   int i, err;
+   unsigned int i;
+   int err;
  
  	if (!job) {

DRM_DEBUG("CPU job extension was attached to a GPU job.\n");

Re: [PATCH 09/12] drm/v3d: Move perfmon init completely into own unit

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Now that the build time dependencies on various array sizes have been
removed, we can move the perfmon init completely into its own compilation
unit and remove the hardcoded defines.

This improves on the temporary fix quickly delivered in
792d16b5375d ("drm/v3d: Move perfmon init completely into own unit").


I believe you mean:

9c3951ec27b9 ("drm/v3d: Fix perfmon build error/warning")

Currently, it is reference the current patch.

Apart from this fix, this is

Reviewed-by: Maíra Canal 

Best Regards,
- Maíra



Signed-off-by: Tvrtko Ursulin 
References: 792d16b5375d ("drm/v3d: Move perfmon init completely into own unit")
---
  drivers/gpu/drm/v3d/v3d_drv.c |  9 +---
  drivers/gpu/drm/v3d/v3d_drv.h |  6 +--
  drivers/gpu/drm/v3d/v3d_perfmon.c | 44 +++
  .../gpu/drm/v3d/v3d_performance_counters.h| 16 ---
  4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index a47f00b443d3..491c638a4d74 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -95,7 +95,7 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
args->value = 1;
return 0;
case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
-   args->value = v3d->max_counters;
+   args->value = v3d->perfmon_info.max_counters;
return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
@@ -298,12 +298,7 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
  
-	if (v3d->ver >= 71)

-   v3d->max_counters = V3D_V71_NUM_PERFCOUNTERS;
-   else if (v3d->ver >= 42)
-   v3d->max_counters = V3D_V42_NUM_PERFCOUNTERS;
-   else
-   v3d->max_counters = 0;
+   v3d_perfmon_init(v3d);
  
  	v3d->reset = devm_reset_control_get_exclusive(dev, NULL);

if (IS_ERR(v3d->reset)) {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 00fe5d993175..6d2d34cd135c 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,10 +104,7 @@ struct v3d_dev {
int ver;
bool single_irq_line;
  
-	/* Different revisions of V3D have different total number of performance

-* counters
-*/
-   unsigned int max_counters;
+   struct v3d_perfmon_info perfmon_info;
  
  	void __iomem *hub_regs;

void __iomem *core_regs[3];
@@ -568,6 +565,7 @@ int v3d_sched_init(struct v3d_dev *v3d);
  void v3d_sched_fini(struct v3d_dev *v3d);
  
  /* v3d_perfmon.c */

+void v3d_perfmon_init(struct v3d_dev *v3d);
  void v3d_perfmon_get(struct v3d_perfmon *perfmon);
  void v3d_perfmon_put(struct v3d_perfmon *perfmon);
  void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index b7d0b02e1a95..cd7f1eedf17f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -195,6 +195,23 @@ static const struct v3d_perf_counter_desc 
v3d_v71_performance_counters[] = {
{"QPU", "QPU-stalls-other", "[QPU] Stalled qcycles waiting for any other 
reason (vary/W/Z)"},
  };
  
+void v3d_perfmon_init(struct v3d_dev *v3d)

+{
+   const struct v3d_perf_counter_desc *counters = NULL;
+   unsigned int max = 0;
+
+   if (v3d->ver >= 71) {
+   counters = v3d_v71_performance_counters;
+   max = ARRAY_SIZE(v3d_v71_performance_counters);
+   } else if (v3d->ver >= 42) {
+   counters = v3d_v42_performance_counters;
+   max = ARRAY_SIZE(v3d_v42_performance_counters);
+   }
+
+   v3d->perfmon_info.max_counters = max;
+   v3d->perfmon_info.counters = counters;
+}
+
  void v3d_perfmon_get(struct v3d_perfmon *perfmon)
  {
if (perfmon)
@@ -321,7 +338,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
  
  	/* Make sure all counters are valid. */

for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= v3d->max_counters)
+   if (req->counters[i] >= v3d->perfmon_info.max_counters)
return -EINVAL;
}
  
@@ -416,26 +433,15 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,

return -EINVAL;
}
  
-	/* Make sure that the counter ID is valid */

-   if (req->counter >= v3d->max_counters)
-   return -EINVAL;
-
-   BUILD_BUG_ON(ARRAY_SIZE(v3d_v42_performa

Re: [PATCH 07/12] drm/v3d: Size the kperfmon_ids array at runtime

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Instead of statically reserving pessimistic space for the kperfmon_ids
array, make the userspace extension code allocate the exactly required
amount of space.

Apart from saving some memory at runtime, this also removes the need for
the V3D_MAX_PERFMONS macro whose removal will benefit further driver
cleanup.

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/v3d/v3d_drv.h|  6 +-
  drivers/gpu/drm/v3d/v3d_sched.c  |  4 +++-
  drivers/gpu/drm/v3d/v3d_submit.c | 17 +++--
  3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 38c80168da51..00fe5d993175 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,13 +351,9 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
  };
  
-/* Number of perfmons required to handle all supported performance counters */

-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
- DRM_V3D_MAX_PERF_COUNTERS)
-
  struct v3d_performance_query {
/* Performance monitor IDs for this query */
-   u32 kperfmon_ids[V3D_MAX_PERFMONS];
+   u32 *kperfmon_ids;
  
  	/* Syncobj that indicates the query availability */

struct drm_syncobj *syncobj;
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 173801aa54ee..fc8730264386 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -94,8 +94,10 @@ __v3d_performance_query_info_free(struct 
v3d_performance_query_info *qinfo,
if (qinfo->queries) {
unsigned int i;
  
-		for (i = 0; i < count; i++)

+   for (i = 0; i < count; i++) {
drm_syncobj_put(qinfo->queries[i].syncobj);
+   kvfree(qinfo->queries[i].kperfmon_ids);
+   }
  
  		kvfree(qinfo->queries);

}
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 35682433f75b..8dae3ab5f936 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -668,10 +668,20 @@ copy_query_info(struct v3d_performance_query_info *qinfo,
goto error;
}
  
+		query->kperfmon_ids =

+   kvmalloc_array(nperfmons,
+  sizeof(struct v3d_performance_query *),
+  GFP_KERNEL);
+   if (!query->kperfmon_ids) {
+   err = -ENOMEM;
+   goto error;
+   }
+
ids_pointer = u64_to_user_ptr(ids);
  
  		for (j = 0; j < nperfmons; j++) {

if (get_user(id, ids_pointer++)) {
+   kvfree(query->kperfmon_ids);
err = -EFAULT;
goto error;
}
@@ -681,6 +691,7 @@ copy_query_info(struct v3d_performance_query_info *qinfo,
  
  		query->syncobj = drm_syncobj_find(fpriv, sync);

if (!query->syncobj) {
+   kvfree(query->kperfmon_ids);
err = -ENOENT;
goto error;
}
@@ -714,9 +725,6 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
  
-	if (reset.nperfmons > V3D_MAX_PERFMONS)

-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
  
  	job->performance_query.queries = kvmalloc_array(reset.count,

@@ -762,9 +770,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
  
-	if (copy.nperfmons > V3D_MAX_PERFMONS)

-   return -EINVAL;
-
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
  
  	job->performance_query.queries = kvmalloc_array(copy.count,

Re: [PATCH 06/12] drm/v3d: Move part of copying of reset/copy performance extension to a helper

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

The loop which looks up the syncobj and copies the kperfmon ids is
identical so lets move it to a helper.

Signed-off-by: Tvrtko Ursulin 
---
  drivers/gpu/drm/v3d/v3d_submit.c | 148 +--
  1 file changed, 64 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index b51600e236c8..35682433f75b 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -641,13 +641,63 @@ v3d_get_cpu_copy_query_results_params(struct drm_file 
*file_priv,
return err;
  }
  
+static int


Could you prefix the name of this function with `v3d_`?


+copy_query_info(struct v3d_performance_query_info *qinfo,
+   unsigned int count,
+   unsigned int nperfmons,
+   u32 __user *syncs,
+   u64 __user *kperfmon_ids,
+   struct drm_file *fpriv)


Nit: s/fpriv/file_priv


+{
+   unsigned int i, j;
+   int err;
+
+   for (i = 0; i < count; i++) {
+   struct v3d_performance_query *query = >queries[i];
+   u32 __user *ids_pointer;
+   u32 sync, id;
+   u64 ids;
+
+   if (get_user(sync, syncs++)) {


Could you mention on the commit message that now you are using
`get_user()`?



+   err = -EFAULT;
+   goto error;
+   }
+
+   if (get_user(ids, kperfmon_ids++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   ids_pointer = u64_to_user_ptr(ids);
+
+   for (j = 0; j < nperfmons; j++) {
+   if (get_user(id, ids_pointer++)) {
+   err = -EFAULT;
+   goto error;
+   }
+
+   query->kperfmon_ids[j] = id;
+   }
+
+   query->syncobj = drm_syncobj_find(fpriv, sync);
+   if (!query->syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
+   }
+
+   return 0;
+
+error:
+   __v3d_performance_query_info_free(qinfo, i);
+   return err;
+}
+
  static int
  v3d_get_cpu_reset_performance_params(struct drm_file *file_priv,
 struct drm_v3d_extension __user *ext,
 struct v3d_cpu_job *job)
  {
-   u32 __user *syncs;
-   u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
int err;
  
@@ -675,50 +725,17 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv,

if (!job->performance_query.queries)
return -ENOMEM;
  
-	syncs = u64_to_user_ptr(reset.syncs);

-   kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
+   err = copy_query_info(qinfo, reset.count, reset.nperfmons,
+ u64_to_user_ptr(reset.syncs),
+ u64_to_user_ptr(reset.kperfmon_ids),
+ file_priv);


I'm missing where `qinfo` is being declared.


+   if (err)
+   return err;
  
-	for (int i = 0; i < reset.count; i++) {

-   u32 sync;
-   u64 ids;
-   u32 __user *ids_pointer;
-   u32 id;
-
-   if (copy_from_user(, syncs++, sizeof(sync))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   ids_pointer = u64_to_user_ptr(ids);
-
-   for (int j = 0; j < reset.nperfmons; j++) {
-   if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   err = -EFAULT;
-   goto error;
-   }
-
-   job->performance_query.queries[i].kperfmon_ids[j] = id;
-   }
-
-   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
-   if (!job->performance_query.queries[i].syncobj) {
-   err = -ENOENT;
-   goto error;
-   }
-   }
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
  
  	return 0;

-
-error:
-   __v3d_performance_query_info_free(qinfo, i);
-   return err;
  }
  
  static int

@@ -726,8 +743,6 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
  struct drm_v3d_extension __user *ext,
  struct v3d_cpu_job *job)
  {
-   u32 __user *syncs;
-   u64 __user *kperfmon_ids;
struct drm_v3d_copy_performance_query copy;

Re: [PATCH 05/12] drm/v3d: Validate passed in drm syncobj handles in the performance extension

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfuly or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance 
query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_submit.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 3313423080e7..b51600e236c8 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -706,6 +706,10 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
}
  
  		job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;


Same from previous patch.

Best Regards,
- Maíra


+   goto error;
+   }
}
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
@@ -787,6 +791,10 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
}
  
  		job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->performance_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->performance_query.count = copy.count;
job->performance_query.nperfmons = copy.nperfmons;

Re: [PATCH 04/12] drm/v3d: Validate passed in drm syncobj handles in the timestamp extension

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfuly or otherwise fail the


I believe you mean "Fix it by checking if the handle..."

Also s/successfuly/successfully


extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin 
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query 
job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_submit.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index ca1b1ad0a75c..3313423080e7 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -497,6 +497,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
}
  
  		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;


I'm not sure if err should be -ENOENT or -EINVAL, but based on other 
drivers, I believe it should be -EINVAL.


Best Regards,
- Maíra


+   goto error;
+   }
}
job->timestamp_query.count = timestamp.count;
  
@@ -550,6 +554,10 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv,

}
  
  		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = reset.count;
  
@@ -613,6 +621,10 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv,

}
  
  		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

+   if (!job->timestamp_query.queries[i].syncobj) {
+   err = -ENOENT;
+   goto error;
+   }
}
job->timestamp_query.count = copy.count;

Re: [PATCH 03/12] drm/v3d: Fix potential memory leak in the performance extension

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 
Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance 
query job"


Missing ) at the end of Fixes.


Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
  drivers/gpu/drm/v3d/v3d_sched.c  | 22 +-
  drivers/gpu/drm/v3d/v3d_submit.c | 40 +---
  3 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 95651c3c926f..38c80168da51 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -565,6 +565,8 @@ void v3d_mmu_remove_ptes(struct v3d_bo *bo);
  /* v3d_sched.c */
  void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo,
 unsigned int count);
+void __v3d_performance_query_info_free(struct v3d_performance_query_info 
*qinfo,
+  unsigned int count);


Same nits from the previous patch.


  void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
  int v3d_sched_init(struct v3d_dev *v3d);
  void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index e45d3ddc6f82..173801aa54ee 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -87,20 +87,30 @@ __v3d_timestamp_query_info_free(struct 
v3d_timestamp_query_info *qinfo,
}
  }
  
+void

+__v3d_performance_query_info_free(struct v3d_performance_query_info *qinfo,
+ unsigned int count)
+{
+   if (qinfo->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(qinfo->queries[i].syncobj);
+
+   kvfree(qinfo->queries);
+   }
+}
+
  static void
  v3d_cpu_job_free(struct drm_sched_job *sched_job)
  {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_performance_query_info *performance_query = 
>performance_query;
  
  	__v3d_timestamp_query_info_free(>timestamp_query,

job->timestamp_query.count);
  
-	if (performance_query->queries) {

-   for (int i = 0; i < performance_query->count; i++)
-   drm_syncobj_put(performance_query->queries[i].syncobj);
-   kvfree(performance_query->queries);
-   }
+   __v3d_performance_query_info_free(>performance_query,
+ job->performance_query.count);
  
  	v3d_job_cleanup(>base);

  }
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 2818afdd4807..ca1b1ad0a75c 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -637,6 +637,7 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u32 __user *syncs;
u64 __user *kperfmon_ids;
struct drm_v3d_reset_performance_query reset;
+   int err;
  
  	if (!job) {

DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -672,32 +673,36 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
u32 id;
  
  		if (copy_from_user(, syncs++, sizeof(sync))) {

-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
-		job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

-
if (copy_from_user(, kperfmon_ids++, sizeof(ids))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
  		ids_pointer = u64_to_user_ptr(ids);
  
  		for (int j = 0; j < reset.nperfmons; j++) {

if (copy_from_user(, ids_pointer++, sizeof(id))) {
-   kvfree(job->performance_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
  			job->performance_query.queries[i].kperfmon_ids[j] = id;

}
+
+   job->performance_query.queries[i].syncobj = 
drm_syncobj_find(file_priv, sync);
}
job->performance_query.count = reset.count;
job->performance_query.nperfmons = reset.nperfmons;
  
  	return 0;

+
+error > +   __v3d_p

Re: [PATCH 02/12] drm/v3d: Fix potential memory leak in the timestamp extension

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin 


This patch looks fine to me apart from two nits and a compilation issue.


Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query 
job")
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_drv.h|  2 ++
  drivers/gpu/drm/v3d/v3d_sched.c  | 22 +--
  drivers/gpu/drm/v3d/v3d_submit.c | 36 ++--
  3 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 099b962bdfde..95651c3c926f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -563,6 +563,8 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
  void v3d_mmu_remove_ptes(struct v3d_bo *bo);
  
  /* v3d_sched.c */

+void __v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo,
+unsigned int count);


My two nits:

I believe we never used this `__` pattern in V3D and I'm not sure how
comfortable I am to introduce it now. I know it is pretty common in
drivers like i915, but I wonder how much semantics we would miss to
remove it.

Also, any chance we could use `query_info` instead of `qinfo`? This
suggestion applies to all patches in the series that uses `qinfo`.


  void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
  int v3d_sched_init(struct v3d_dev *v3d);
  void v3d_sched_fini(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 03df37a3acf5..e45d3ddc6f82 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -73,18 +73,28 @@ v3d_sched_job_free(struct drm_sched_job *sched_job)
v3d_job_cleanup(job);
  }
  
+void

+__v3d_timestamp_query_info_free(struct v3d_timestamp_query_info *qinfo,
+   unsigned int count)
+{
+   if (qinfo->queries) {
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   drm_syncobj_put(qinfo->queries[i].syncobj);
+
+   kvfree(qinfo->queries);
+   }
+}
+
  static void
  v3d_cpu_job_free(struct drm_sched_job *sched_job)
  {
struct v3d_cpu_job *job = to_cpu_job(sched_job);
-   struct v3d_timestamp_query_info *timestamp_query = 
>timestamp_query;
struct v3d_performance_query_info *performance_query = 
>performance_query;
  
-	if (timestamp_query->queries) {

-   for (int i = 0; i < timestamp_query->count; i++)
-   drm_syncobj_put(timestamp_query->queries[i].syncobj);
-   kvfree(timestamp_query->queries);
-   }
+   __v3d_timestamp_query_info_free(>timestamp_query,
+   job->timestamp_query.count);
  
  	if (performance_query->queries) {

for (int i = 0; i < performance_query->count; i++)
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 263fefc1d04f..2818afdd4807 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -452,6 +452,7 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
  {
u32 __user *offsets, *syncs;
struct drm_v3d_timestamp_query timestamp;
+   int err;
  
  	if (!job) {

DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
@@ -484,15 +485,15 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
u32 offset, sync;
  
  		if (copy_from_user(, offsets++, sizeof(offset))) {

-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
  		job->timestamp_query.queries[i].offset = offset;
  
  		if (copy_from_user(, syncs++, sizeof(sync))) {

-   kvfree(job->timestamp_query.queries);
-   return -EFAULT;
+   err = -EFAULT;
+   goto error;
}
  
  		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);

@@ -500,6 +501,10 @@ v3d_get_cpu_timestamp_query_params(struct drm_file 
*file_priv,
job->timestamp_query.count = timestamp.count;
  
  	return 0;

+
+error:
+   __v3d_timestamp_query_info_free(qinfo, i);


I don't see where `qinfo` is declared in this function.


+   return err;
  }
  
  static int

@@ -509,6 +514,7 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file 
*file_priv,
  {
u32 __user *syncs;
struct drm_v3d_reset_timest

Re: [PATCH 01/12] drm/v3d: Prevent out of bounds access in performance query extensions

2024-07-10 Thread Maíra Canal


On 7/10/24 10:41, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Check that the number of perfmons userspace is passing in the copy and
reset extensions is not greater than the internal kernel storage where
the ids will be copied into.

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


Fixes: bae7cb5d6800 ("drm/v3d: Create a CPU job extension for the reset performance 
query job"
Cc: Maíra Canal 
Cc: Iago Toral Quiroga 
Cc:  # v6.8+
---
  drivers/gpu/drm/v3d/v3d_submit.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 88f63d526b22..263fefc1d04f 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -637,6 +637,9 @@ v3d_get_cpu_reset_performance_params(struct drm_file 
*file_priv,
if (copy_from_user(, ext, sizeof(reset)))
return -EFAULT;
  
+	if (reset.nperfmons > V3D_MAX_PERFMONS)

+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
  
  	job->performance_query.queries = kvmalloc_array(reset.count,

@@ -708,6 +711,9 @@ v3d_get_cpu_copy_performance_query_params(struct drm_file 
*file_priv,
if (copy.pad)
return -EINVAL;
  
+	if (copy.nperfmons > V3D_MAX_PERFMONS)

+   return -EINVAL;
+
job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
  
  	job->performance_query.queries = kvmalloc_array(copy.count,

Re: [PATCH] drm/vkms: Remove event from vkms_output

2024-07-07 Thread Maíra Canal


On 7/3/24 13:04, Lyude Paul wrote:

While working on rvkms, I noticed that there's no code that actually uses
the drm_pending_vblank_event that's embedded in vkms_output. So, just drop
the member from the struct.

Signed-off-by: Lyude Paul 


Applied to drm-misc/drm-misc-next!

Best Regards,
- Maíra


---
  drivers/gpu/drm/vkms/vkms_drv.h | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 8f5710debb1eb..5e46ea5b96dcc 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -103,7 +103,6 @@ struct vkms_output {
struct drm_writeback_connector wb_connector;
struct hrtimer vblank_hrtimer;
ktime_t period_ns;
-   struct drm_pending_vblank_event *event;
/* ordered wq for composer_work */
struct workqueue_struct *composer_workq;
/* protects concurrent access to composer */

Re: [PATCH] drm/vkms: Remove event from vkms_output

2024-07-04 Thread Maíra Canal


On 7/3/24 13:04, Lyude Paul wrote:

While working on rvkms, I noticed that there's no code that actually uses
the drm_pending_vblank_event that's embedded in vkms_output. So, just drop
the member from the struct.

Signed-off-by: Lyude Paul 


Reviewed-by: Maíra Canal 

Feel free to apply it to drm-misc/drm-misc-next! Otherwise, I'll apply
it over the weekend.

Best Regards,
- Maíra


---
  drivers/gpu/drm/vkms/vkms_drv.h | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 8f5710debb1eb..5e46ea5b96dcc 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -103,7 +103,6 @@ struct vkms_output {
struct drm_writeback_connector wb_connector;
struct hrtimer vblank_hrtimer;
ktime_t period_ns;
-   struct drm_pending_vblank_event *event;
/* ordered wq for composer_work */
struct workqueue_struct *composer_workq;
/* protects concurrent access to composer */

Re: [PATCH v5 4/4] drm/doc: document some tracepoints as uAPI

2024-07-03 Thread Maíra Canal


Hi Pierre,

On 6/14/24 05:16, Pierre-Eric Pelloux-Prayer wrote:

This commit adds a document section in drm-uapi.rst about tracepoints,
and mark the events gpu_scheduler_trace.h as stable uAPI.

The goal is to explicitly state that tools can rely on the fields,
formats and semantics of these events.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  Documentation/gpu/drm-uapi.rst| 19 
  .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 22 +++
  2 files changed, 41 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 370d820be248..78496793a8f0 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -570,3 +570,22 @@ dma-buf interoperability
  
  Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for

  information on how dma-buf is integrated and exposed within DRM.
+
+
+Trace events
+
+
+See Documentation/trace/tracepoints.rst for the tracepoints documentation.


I would write it:

"See Documentation/trace/tracepoints.rst for information about using
Linux Kernel Tracepoints."


+In the drm subsystem, some events are considered stable uAPI to avoid


Super small nit: s/drm/DRM


+breaking tools (eg: gpuvis, umr) relying on them. Stable means that fields


Super small nit:

1. s/eg:/e.g.:
2. s/gpuvis/GPUVis (maybe a URL to it?)
3. Maybe a URL to umr?



+cannot be removed, nor their formatting updated. Adding new fields is
+possible, under the normal uAPI requirements.
+
+Stable uAPI events
+--
+
+From ``drivers/gpu/drm/scheduler/gpu_scheduler_trace.h``
+


Super small nit: from the rest of the file, I see that a title was never
needed. Do we need it here?


+
+.. kernel-doc::  drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+   :doc: uAPI trace events
\ No newline at end of file
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h 
b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 0abcad26839c..63113803cdd5 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -33,6 +33,28 @@
  #define TRACE_SYSTEM gpu_scheduler
  #define TRACE_INCLUDE_FILE gpu_scheduler_trace
  
+

+/**
+ * DOC: uAPI trace events
+ *
+ * ``drm_sched_job``, ``drm_run_job``, ``drm_sched_process_job``,
+ * and ``drm_sched_job_wait_dep`` are considered stable uAPI.


Super small nit again, but I believe we should format function names
with ``foo()``, if I understood kerneldoc documentation correctly.

Apart from all those nits, I completely agree with Lucas, it is great to
see this improvement.

Acked-by: Maíra Canal 

Best Regards,
- Maíra


+ *
+ * Common trace events attributes:
+ *
+ * * ``id``- this is _sched_job->id. It uniquely idenfies a job
+ *   inside a  drm_gpu_scheduler.
+ *
+ * * ``dev``   - the dev_name() of the device running the job.
+ *
+ * * ``ring``  - the hardware ring running the job. Together with ``dev`` it
+ *   uniquely identifies where the job is going to be executed.
+ *
+ * * ``fence`` - the _fence.context and the _fence.seqno of
+ *   _sched_fence.finished
+ *
+ */
+
  #ifndef __TRACE_EVENT_GPU_SCHEDULER_PRINT_FN
  #define __TRACE_EVENT_GPU_SCHEDULER_PRINT_FN
  /* Similar to trace_print_array_seq but for fences. */

Re: [PATCH 06/11] drm/vc4: hdmi: Handle error case of pm_runtime_resume_and_get

2024-07-02 Thread Maíra Canal


On 6/30/24 12:36, Stefan Wahren wrote:

The commit 0f5251339eda ("drm/vc4: hdmi: Make sure the controller is
powered in detect") introduced the necessary power management handling
to avoid register access while controller is powered down.
Unfortunately it just print a warning if pm_runtime_resume_and_get()
fails and proceed anyway.

This could happen during suspend to idle. So we must assume it is unsafe
to access the HDMI register. So bail out properly.

Fixes: 0f5251339eda ("drm/vc4: hdmi: Make sure the controller is powered in 
detect")
Signed-off-by: Stefan Wahren 


From the docs, I see that `DRM_ERROR` was deprecated in favor of
`pr_err()` (although I'm seeing some drivers using `dev_err()`). So,
after this change, this is:

Reviewed-by: Maíra Canal 

It would be nice to have a follow-up patch changing other vc4 files,
as they are using `DRM_ERROR` when returning the error from
`pm_runtime_resume_and_get()`.

Best Regards,
- Maíra


---
  drivers/gpu/drm/vc4/vc4_hdmi.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index d57c4a5948c8..b3a42b709718 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -429,6 +429,7 @@ static int vc4_hdmi_connector_detect_ctx(struct 
drm_connector *connector,
  {
struct vc4_hdmi *vc4_hdmi = connector_to_vc4_hdmi(connector);
enum drm_connector_status status = connector_status_disconnected;
+   int ret;

/*
 * NOTE: This function should really take vc4_hdmi->mutex, but
@@ -441,7 +442,11 @@ static int vc4_hdmi_connector_detect_ctx(struct 
drm_connector *connector,
 * the lock for now.
 */

-   WARN_ON(pm_runtime_resume_and_get(_hdmi->pdev->dev));
+   ret = pm_runtime_resume_and_get(_hdmi->pdev->dev);
+   if (ret) {
+   DRM_ERROR("Failed to retain HDMI power domain: %d\n", ret);
+   return status;
+   }

if (vc4_hdmi->hpd_gpio) {
if (gpiod_get_value_cansleep(vc4_hdmi->hpd_gpio))
--
2.34.1

Re: [PATCH] MAINTAINERS: remove myself as a VKMS maintainer

2024-05-27 Thread Maíra Canal


On 5/25/24 11:26, Melissa Wen wrote:

I haven't been able to follow or review the work on the driver for some
time now and I don't see the situation improving anytime soon. I'd like
to continue being listed as a reviewer.

Signed-off-by: Melissa Wen 


Acked-by: Maíra Canal 

Thanks for all the good work you put into VKMS in the last couple of
years!

Best Regards,
- Maíra


---
  MAINTAINERS | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7d735037a383..79fe536355b0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7027,10 +7027,10 @@ F:  drivers/gpu/drm/udl/
  
  DRM DRIVER FOR VIRTUAL KERNEL MODESETTING (VKMS)

  M:Rodrigo Siqueira 
-M: Melissa Wen 
  M:Maíra Canal 
  R:Haneen Mohammed 
  R:Daniel Vetter 
+R: Melissa Wen 
  L:dri-devel@lists.freedesktop.org
  S:Maintained
  T:git https://gitlab.freedesktop.org/drm/misc/kernel.git

Re: [PATCH v2 0/6] drm/v3d: Improve Performance Counters handling

2024-05-21 Thread Maíra Canal


Hi Jani,

On 5/21/24 08:07, Jani Nikula wrote:

On Mon, 20 May 2024, Maíra Canal  wrote:

On 5/12/24 19:23, Maíra Canal wrote:>

Maíra Canal (6):
drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
drm/v3d: Different V3D versions can have different number of perfcnt
drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
drm/v3d: Create new IOCTL to expose performance counters information
drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
drm/v3d: Deprecate the use of the Performance Counters enum >
   drivers/gpu/drm/v3d/v3d_drv.c |  11 +
   drivers/gpu/drm/v3d/v3d_drv.h |  14 +-
   drivers/gpu/drm/v3d/v3d_perfmon.c |  36 ++-
   .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
   drivers/gpu/drm/v3d/v3d_sched.c   |   2 +-
   include/uapi/drm/v3d_drm.h|  48 
   6 files changed, 316 insertions(+), 3 deletions(-)
   create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h



Applied to drm-misc/drm-misc-next!


What compiler do you use? I'm hitting the same as kernel test robot [1]
with arm-linux-gnueabihf-gcc 12.2.0.


I use clang version 17.0.6.



In general, I don't think it's a great idea to put arrays in headers,
and then include it everywhere via v3d_drv.h. You're not just relying on
the compiler to optimize it away in compilation units where its not
referenced (likely to happen), but also for the linker to deduplicate
rodata (possible, but I'm not sure that it will happen).

I think you need to move the arrays to a .c file, and then either a) add
interfaces to access the arrays, or b) declare the arrays and make them
global. For the latter you also need to figure out how to expose the
size.


I'll write a patch to fix it. Sorry for the disturbance, I didn't notice
it with clang.

Best Regards,
- Maíra



BR,
Jani.


[1] https://lore.kernel.org/r/202405211137.huefklkg-...@intel.com

Re: [PATCH v2 0/6] drm/v3d: Improve Performance Counters handling

2024-05-20 Thread Maíra Canal


On 5/12/24 19:23, Maíra Canal wrote:>

Maíra Canal (6):
   drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
   drm/v3d: Different V3D versions can have different number of perfcnt
   drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
   drm/v3d: Create new IOCTL to expose performance counters information
   drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
   drm/v3d: Deprecate the use of the Performance Counters enum >
  drivers/gpu/drm/v3d/v3d_drv.c |  11 +
  drivers/gpu/drm/v3d/v3d_drv.h |  14 +-
  drivers/gpu/drm/v3d/v3d_perfmon.c |  36 ++-
  .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
  drivers/gpu/drm/v3d/v3d_sched.c   |   2 +-
  include/uapi/drm/v3d_drm.h|  48 
  6 files changed, 316 insertions(+), 3 deletions(-)
  create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h



Applied to drm-misc/drm-misc-next!

Best Regards,
- Maíra

Re: [PATCH v7 11/17] drm/vkms: Remove useless drm_rotation_simplify

2024-05-16 Thread Maíra Canal


Hi Louis,

On 5/13/24 04:50, Louis Chauvet wrote:

As all the rotation are now supported by VKMS, this simplification does
not make sense anymore, so remove it.

Signed-off-by: Louis Chauvet 


I'd like to push all commits up to this point to drm-misc-next. Do you
see a problem with it? Reason: I'd like Melissa to take a look at the
YUV patches and patches 1 to 11 fix several composition errors.

Let me know your thoughts about it.

Best Regards,
- Maíra


---
  drivers/gpu/drm/vkms/vkms_plane.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_plane.c 
b/drivers/gpu/drm/vkms/vkms_plane.c
index 8875bed76410..5a028ee96c91 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -115,12 +115,7 @@ static void vkms_plane_atomic_update(struct drm_plane 
*plane,
frame_info->fb = fb;
memcpy(_info->map, _plane_state->data, 
sizeof(frame_info->map));
drm_framebuffer_get(frame_info->fb);
-   frame_info->rotation = drm_rotation_simplify(new_state->rotation, 
DRM_MODE_ROTATE_0 |
- 
DRM_MODE_ROTATE_90 |
- 
DRM_MODE_ROTATE_270 |
- 
DRM_MODE_REFLECT_X |
- 
DRM_MODE_REFLECT_Y);
-
+   frame_info->rotation = new_state->rotation;
  
  	vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt);

  }

[PATCH v2 6/6] drm/v3d: Deprecate the use of the Performance Counters enum

2024-05-12 Thread Maíra Canal

The Performance Counters enum used to identify the index of each
performance counter and provide the total number of performance
counters (V3D_PERFCNT_NUM). But, this enum is only valid for V3D 4.2,
not for V3D 7.1.

As we implemented a new flexible structure to retrieve performance
counters information, we can deprecate this enum.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 include/uapi/drm/v3d_drm.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 0860ddb3d0b6..87fc5bb0a61e 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -603,6 +603,16 @@ struct drm_v3d_submit_cpu {
__u64 extensions;
 };
 
+/* The performance counters index represented by this enum are deprecated and
+ * must no longer be used. These counters are only valid for V3D 4.2.
+ *
+ * In order to check for performance counter information,
+ * use DRM_IOCTL_V3D_PERFMON_GET_COUNTER.
+ *
+ * Don't use V3D_PERFCNT_NUM to retrieve the maximum number of performance
+ * counters. You should use DRM_IOCTL_V3D_GET_PARAM with the following
+ * parameter: DRM_V3D_PARAM_MAX_PERF_COUNTERS.
+ */
 enum {
V3D_PERFCNT_FEP_VALID_PRIMTS_NO_PIXELS,
V3D_PERFCNT_FEP_VALID_PRIMS,
-- 
2.44.0

[PATCH v2 5/6] drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM

2024-05-12 Thread Maíra Canal

V3D_PERFCNT_NUM represents the maximum number of performance counters
for V3D 4.2, but not for V3D 7.1. This means that, if we use
V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1.

Therefore, use the number of performance counters on V3D 7.1 as the
maximum number of counters. This will allow us to create arrays on the
stack with reasonable size. Note that userspace must use the value
provided by DRM_V3D_PARAM_MAX_PERF_COUNTERS.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.h   | 5 -
 drivers/gpu/drm/v3d/v3d_sched.c | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 44cfddedebde..556cbb400ba0 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,8 +351,11 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
 };
 
+/* Maximum number of performance counters supported by any version of V3D */
+#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters)
+
 /* Number of perfmons required to handle all supported performance counters */
-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_PERFCNT_NUM, \
+#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
  DRM_V3D_MAX_PERF_COUNTERS)
 
 struct v3d_performance_query {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 7cd8c335cd9b..03df37a3acf5 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -490,7 +490,7 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, 
void *data, u32 quer
struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
struct v3d_dev *v3d = job->base.v3d;
struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_PERFCNT_NUM];
+   u64 counter_values[V3D_MAX_COUNTERS];
 
for (int i = 0; i < performance_query->nperfmons; i++) {
perfmon = v3d_perfmon_find(v3d_priv,
-- 
2.44.0

[PATCH v2 4/6] drm/v3d: Create new IOCTL to expose performance counters information

2024-05-12 Thread Maíra Canal

Userspace usually needs some information about the performance counters
available. Although we could replicate this information in the kernel
and user-space, let's use the kernel as the "single source of truth" to
avoid issues in the future (e.g. list of performance counters is updated
in user-space, but not in the kernel, generating invalid requests).

Therefore, create a new IOCTL to expose the performance counters
information, that is name, category, and description.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.c |  1 +
 drivers/gpu/drm/v3d/v3d_drv.h |  2 ++
 drivers/gpu/drm/v3d/v3d_perfmon.c | 33 +++
 include/uapi/drm/v3d_drm.h| 37 +++
 4 files changed, 73 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index d2c1d5053132..f7477488b1cc 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -211,6 +211,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = {
DRM_IOCTL_DEF_DRV(V3D_PERFMON_DESTROY, v3d_perfmon_destroy_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES, v3d_perfmon_get_values_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl, 
DRM_RENDER_ALLOW | DRM_AUTH),
+   DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_COUNTER, 
v3d_perfmon_get_counter_ioctl, DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver v3d_drm_driver = {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index bd1e38f7d10a..44cfddedebde 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -582,6 +582,8 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void 
*data,
  struct drm_file *file_priv);
 int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_priv);
+int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);
 
 /* v3d_sysfs.c */
 int v3d_sysfs_init(struct device *dev);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index f268d9466c0f..73e2bb8bdb7f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -217,3 +217,36 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, 
void *data,
 
return ret;
 }
+
+int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+   struct drm_v3d_perfmon_get_counter *req = data;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
+   const struct v3d_perf_counter_desc *counter;
+
+   for (int i = 0; i < ARRAY_SIZE(req->reserved); i++) {
+   if (req->reserved[i] != 0)
+   return -EINVAL;
+   }
+
+   /* Make sure that the counter ID is valid */
+   if (req->counter >= v3d->max_counters)
+   return -EINVAL;
+
+   if (v3d->ver >= 71) {
+   WARN_ON(v3d->max_counters != 
ARRAY_SIZE(v3d_v71_performance_counters));
+   counter = _v71_performance_counters[req->counter];
+   } else if (v3d->ver >= 42) {
+   WARN_ON(v3d->max_counters != 
ARRAY_SIZE(v3d_v42_performance_counters));
+   counter = _v42_performance_counters[req->counter];
+   } else {
+   return -EOPNOTSUPP;
+   }
+
+   strscpy(req->name, counter->name, sizeof(req->name));
+   strscpy(req->category, counter->category, sizeof(req->category));
+   strscpy(req->description, counter->description, 
sizeof(req->description));
+
+   return 0;
+}
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 215b01bb69c3..0860ddb3d0b6 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -42,6 +42,7 @@ extern "C" {
 #define DRM_V3D_PERFMON_DESTROY   0x09
 #define DRM_V3D_PERFMON_GET_VALUES0x0a
 #define DRM_V3D_SUBMIT_CPU0x0b
+#define DRM_V3D_PERFMON_GET_COUNTER   0x0c
 
 #define DRM_IOCTL_V3D_SUBMIT_CL   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
 #define DRM_IOCTL_V3D_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
@@ -58,6 +59,8 @@ extern "C" {
 #define DRM_IOCTL_V3D_PERFMON_GET_VALUES  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_PERFMON_GET_VALUES, \
   struct 
drm_v3d_perfmon_get_values)
 #define DRM_IOCTL_V3D_SUBMIT_CPU  DRM_IOW(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_CPU, struct drm_v3d_submit_cpu)
+#define DRM_IOCTL_V3D_PERFMON_GET_COUNTER DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_PERFMON_GET_COUNTER, \

[PATCH v2 3/6] drm/v3d: Create a new V3D parameter for the maximum number of perfcnt

2024-05-12 Thread Maíra Canal

The maximum number of performance counters can change from version to
version and it's important for userspace to know this value, as it needs
to use the counters for performance queries. Therefore, expose the
maximum number of performance counters to userspace as a parameter.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 3 +++
 include/uapi/drm/v3d_drm.h| 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 6b9dd26df9fe..d2c1d5053132 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -94,6 +94,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
case DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE:
args->value = 1;
return 0;
+   case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
+   args->value = v3d->max_counters;
+   return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
return -EINVAL;
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index dce1835eced4..215b01bb69c3 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -286,6 +286,7 @@ enum drm_v3d_param {
DRM_V3D_PARAM_SUPPORTS_PERFMON,
DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE,
+   DRM_V3D_PARAM_MAX_PERF_COUNTERS,
 };
 
 struct drm_v3d_get_param {
-- 
2.44.0

[PATCH v2 2/6] drm/v3d: Different V3D versions can have different number of perfcnt

2024-05-12 Thread Maíra Canal

Currently, even though V3D 7.1 has 93 performance counters, it is not
possible to create counters bigger than 87, as
`v3d_perfmon_create_ioctl()` understands that counters bigger than 87
are invalid.

Therefore, create a device variable to expose the maximum
number of counters for a given V3D version and make
`v3d_perfmon_create_ioctl()` check this variable.

This commit fixes CTS failures in the performance queries tests
`dEQP-VK.query_pool.performance_query.*` [1]

Link: 
https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81
 [1]
Fixes: 6fd9487147c4 ("drm/v3d: add brcm,2712-v3d as a compatible V3D device")
Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 7 +++
 drivers/gpu/drm/v3d/v3d_drv.h | 5 +
 drivers/gpu/drm/v3d/v3d_perfmon.c | 3 ++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 28b7ddce7747..6b9dd26df9fe 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -294,6 +294,13 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
+   if (v3d->ver >= 71)
+   v3d->max_counters = ARRAY_SIZE(v3d_v71_performance_counters);
+   else if (v3d->ver >= 42)
+   v3d->max_counters = ARRAY_SIZE(v3d_v42_performance_counters);
+   else
+   v3d->max_counters = 0;
+
v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
if (IS_ERR(v3d->reset)) {
ret = PTR_ERR(v3d->reset);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 671375a3bb66..bd1e38f7d10a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,6 +104,11 @@ struct v3d_dev {
int ver;
bool single_irq_line;
 
+   /* Different revisions of V3D have different total number of performance
+* counters
+*/
+   unsigned int max_counters;
+
void __iomem *hub_regs;
void __iomem *core_regs[3];
void __iomem *bridge_regs;
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index e1be7368b87d..f268d9466c0f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -123,6 +123,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 {
struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
struct drm_v3d_perfmon_create *req = data;
+   struct v3d_dev *v3d = v3d_priv->v3d;
struct v3d_perfmon *perfmon;
unsigned int i;
int ret;
@@ -134,7 +135,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 
/* Make sure all counters are valid. */
for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= V3D_PERFCNT_NUM)
+   if (req->counters[i] >= v3d->max_counters)
return -EINVAL;
}
 
-- 
2.44.0

[PATCH v2 0/6] drm/v3d: Improve Performance Counters handling

2024-05-12 Thread Maíra Canal

This series has the intention to address two issues with Performance Counters
on V3D:

1. Update the number of Performance Counters for V3D 7.1 

V3D 7.1 has 93 performance counters, while V3D 4.2 has only 87. Although the
series [1] enabled support for V3D 7.1, it didn’t replace the maximum number of
performance counters. This led to errors in user space as the Vulkan driver
updated the maximum number of performance counters, but the kernel didn’t. 

Currently, the user space can request values for performance counters that
are greater than 87 and the kernel will return an error instead of the values.
That’s why `dEQP-VK.query_pool.performance_query.*` currently fails on Mesa
CI [2]. This series intends to fix the `dEQP-VK.query_pool.performance_query.*`
fail.

2. Make the kernel able to provide the Performance Counter descriptions

Although all the management of the Performance Monitors is done through IOCTLs,
which means that the code is in the kernel, the performance counter descriptions
are in Mesa. This means two things: (#1) only Mesa has access to the 
descriptions
and (#2) we can have inconsistencies between the information provided by Mesa
and the kernel, as seen in the first issue addressed by this series.

To minimize the risk of inconsistencies, this series proposes to use the kernel
as a “single source of truth”. Therefore, if there are any changes to the
performance monitors, all the changes must be done only in the kernel. This
means that all information about the maximum number of performance counters and
all the descriptions will now be retrieved from the kernel. 

This series is coupled with a Mesa series [3] that enabled the use of the new
IOCTL. I appreciate any feedback from both the kernel and Mesa implementations.

[1] https://lore.kernel.org/dri-devel/20231031073859.25298-1-ito...@igalia.com/
[2] 
https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81
[3] https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/fix-perfcnt

Best Regards,
- Maíra Canal

---

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240508143306.2435304-2-mca...@igalia.com/T/

* [5/6] s/DRM_V3D_PARAM_V3D_MAX_PERF_COUNTERS/DRM_V3D_PARAM_MAX_PERF_COUNTERS 
(Iago Toral)
* [6/6] Include a reference to the new DRM_V3D_PARAM_MAX_PERF_COUNTERS param 
(Iago Toral)
* Add Iago's R-b (Iago Toral)

Maíra Canal (6):
  drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
  drm/v3d: Different V3D versions can have different number of perfcnt
  drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
  drm/v3d: Create new IOCTL to expose performance counters information
  drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
  drm/v3d: Deprecate the use of the Performance Counters enum

 drivers/gpu/drm/v3d/v3d_drv.c |  11 +
 drivers/gpu/drm/v3d/v3d_drv.h |  14 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c |  36 ++-
 .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
 drivers/gpu/drm/v3d/v3d_sched.c   |   2 +-
 include/uapi/drm/v3d_drm.h|  48 
 6 files changed, 316 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h

-- 
2.44.0

[PATCH v2 1/6] drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1

2024-05-12 Thread Maíra Canal

Add name, category and description for each one of the 93 performance
counters available on V3D.

Note that V3D 4.2 has 87 performance counters, while V3D 7.1 has 93.
Therefore, there are two performance counters arrays. The index of the
performance counter for each V3D version is represented by its position
on the array.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_drv.h |   2 +
 .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
 2 files changed, 210 insertions(+)
 create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a2c516fe6d79..671375a3bb66 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+#include "v3d_performance_counters.h"
+
 #include "uapi/drm/v3d_drm.h"
 
 struct clk;
diff --git a/drivers/gpu/drm/v3d/v3d_performance_counters.h 
b/drivers/gpu/drm/v3d/v3d_performance_counters.h
new file mode 100644
index ..72822205ebdc
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_performance_counters.h
@@ -0,0 +1,208 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Copyright (C) 2024 Raspberry Pi
+ */
+#ifndef V3D_PERFORMANCE_COUNTERS_H
+#define V3D_PERFORMANCE_COUNTERS_H
+
+/* Holds a description of a given performance counter. The index of performance
+ * counter is given by the array on v3d_performance_counter.h
+ */
+struct v3d_perf_counter_desc {
+   /* Category of the counter */
+   char category[32];
+
+   /* Name of the counter */
+   char name[64];
+
+   /* Description of the counter */
+   char description[256];
+};
+
+static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = {
+   {"CORE", "cycle-count", "[CORE] Cycle counter"},
+   {"CORE", "core-active", "[CORE] Bin/Render/Compute active cycles"},
+   {"CLE", "CLE-bin-thread-active-cycles", "[CLE] Bin thread active 
cycles"},
+   {"CLE", "CLE-render-thread-active-cycles", "[CLE] Render thread active 
cycles"},
+   {"CORE", "compute-active-cycles", "[CORE] Compute active cycles"},
+   {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP] Valid 
primitives that result in no rendered pixels, for all rendered tiles"},
+   {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid primitives 
for all rendered tiles (primitives may be counted in more than one tile)"},
+   {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped quads"},
+   {"FEP", "FEP-valid-quads", "[FEP] Valid quads"},
+   {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads with no 
pixels passing the stencil test"},
+   {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB] Quads with 
no pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-passing-z-and-stencil-test", "[TLB] Quads with any 
pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-written-to-color-buffer", "[TLB] Quads with valid 
pixels written to colour buffer"},
+   {"TLB", "TLB-partial-quads-written-to-color-buffer", "[TLB] Partial 
quads written to the colour buffer"},
+   {"PTB", "PTB-primitives-need-clipping", "[PTB] Primitives that need 
clipping"},
+   {"PTB", "PTB-primitives-discarded-outside-viewport", "[PTB] Primitives 
discarded by being outside the viewport"},
+   {"PTB", "PTB-primitives-binned", "[PTB] Total primitives binned"},
+   {"PTB", "PTB-primitives-discarded-reversed", "[PTB] Primitives that are 
discarded because they are reversed"},
+   {"QPU", "QPU-total-instr-cache-hit", "[QPU] Total instruction cache 
hits for all slices"},
+   {"QPU", "QPU-total-instr-cache-miss", "[QPU] Total instruction cache 
misses for all slices"},
+   {"QPU", "QPU-total-uniform-cache-hit", "[QPU] Total uniforms cache hits 
for all slices"},
+   {"QPU", "QPU-total-uniform-cache-miss", "[QPU] Total uniforms cache 
misses for all slices"},
+   {"TMU", "TMU-active-cycles", "[TMU] Active cycles"},
+   {"TMU", "TMU-stalled-cycles", "[TMU] Stalled cycles"},
+   {"TMU", "TMU-total-text-quads-access", "[TMU] Total texture cache 
access

[PATCH 0/6] drm/v3d: Improve Performance Counters handling

2024-05-08 Thread Maíra Canal

This series has the intention to address two issues with Performance Counters
on V3D:

1. Update the number of Performance Counters for V3D 7.1

V3D 7.1 has 93 performance counters, while V3D 4.2 has only 87. Although the
series [1] enabled support for V3D 7.1, it didn’t replace the maximum number of
performance counters. This led to errors in user space as the Vulkan driver
updated the maximum number of performance counters, but the kernel didn’t.

Currently, the user space can request values for performance counters that
are greater than 87 and the kernel will return an error instead of the values.
That’s why `dEQP-VK.query_pool.performance_query.*` currently fails on Mesa
CI [2]. This series intends to fix the `dEQP-VK.query_pool.performance_query.*`
fail.

2. Make the kernel able to provide the Performance Counter descriptions

Although all the management of the Performance Monitors is done through IOCTLs,
which means that the code is in the kernel, the performance counter descriptions
are in Mesa. This means two things: (#1) only Mesa has access to the
descriptions
and (#2) we can have inconsistencies between the information provided by Mesa
and the kernel, as seen in the first issue addressed by this series.

To minimize the risk of inconsistencies, this series proposes to use the kernel
as a “single source of truth”. Therefore, if there are any changes to the
performance monitors, all the changes must be done only in the kernel. This
means that all information about the maximum number of performance counters and
all the descriptions will now be retrieved from the kernel.

This series is coupled with a Mesa series [3] that enabled the use of the new
IOCTL. I appreciate any feedback from both the kernel and Mesa implementations.

[1] https://lore.kernel.org/dri-devel/20231031073859.25298-1-ito...@igalia.com/
[2]
https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81
[3] https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/fix-perfcnt

Best Regards,
- Maíra Canal

Maíra Canal (6):
drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
drm/v3d: Different V3D versions can have different number of perfcnt
drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
drm/v3d: Create new IOCTL to expose performance counters information
drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
drm/v3d: Deprecate the use of the Performance Counters enum

drivers/gpu/drm/v3d/v3d_drv.c | 11 +
drivers/gpu/drm/v3d/v3d_drv.h | 14 +-
drivers/gpu/drm/v3d/v3d_perfmon.c | 36 ++-
.../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
drivers/gpu/drm/v3d/v3d_sched.c | 2 +-
include/uapi/drm/v3d_drm.h| 44
6 files changed, 312 insertions(+), 3 deletions(-)
create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h

--
2.44.0

[PATCH 6/6] drm/v3d: Deprecate the use of the Performance Counters enum

2024-05-08 Thread Maíra Canal

The Performance Counters enum used to identify the index of each
performance counter and provide the total number of performance
counters (V3D_PERFCNT_NUM). But, this enum is only valid for V3D 4.2,
not for V3D 7.1.

As we implemented a new flexible structure to retrieve performance
counters information, we can deprecate this enum.

Signed-off-by: Maíra Canal 
---
 include/uapi/drm/v3d_drm.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 0860ddb3d0b6..706b4dea1c45 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -603,6 +603,12 @@ struct drm_v3d_submit_cpu {
__u64 extensions;
 };
 
+/* The performance counters index represented by this enum are deprecated and
+ * must no longer be used. These counters are only valid for V3D 4.2.
+ *
+ * In order to check for performance counter information,
+ * use DRM_IOCTL_V3D_PERFMON_GET_COUNTER.
+ */
 enum {
V3D_PERFCNT_FEP_VALID_PRIMTS_NO_PIXELS,
V3D_PERFCNT_FEP_VALID_PRIMS,
-- 
2.44.0

[PATCH 2/6] drm/v3d: Different V3D versions can have different number of perfcnt

2024-05-08 Thread Maíra Canal

Currently, even though V3D 7.1 has 93 performance counters, it is not
possible to create counters bigger than 87, as
`v3d_perfmon_create_ioctl()` understands that counters bigger than 87
are invalid.

Therefore, create a device variable to expose the maximum
number of counters for a given V3D version and make
`v3d_perfmon_create_ioctl()` check this variable.

This commit fixes CTS failures in the performance queries tests
(dEQP-VK.query_pool.performance_query.*) [1]

Link: 
https://gitlab.freedesktop.org/mesa/mesa/-/commit/ea1f09a5f21839f4f3b93610b58507c4bd9b9b81
 [1]
Fixes: 6fd9487147c4 ("drm/v3d: add brcm,2712-v3d as a compatible V3D device")
Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 7 +++
 drivers/gpu/drm/v3d/v3d_drv.h | 5 +
 drivers/gpu/drm/v3d/v3d_perfmon.c | 3 ++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 28b7ddce7747..6b9dd26df9fe 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -294,6 +294,13 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
v3d->cores = V3D_GET_FIELD(ident1, V3D_HUB_IDENT1_NCORES);
WARN_ON(v3d->cores > 1); /* multicore not yet implemented */
 
+   if (v3d->ver >= 71)
+   v3d->max_counters = ARRAY_SIZE(v3d_v71_performance_counters);
+   else if (v3d->ver >= 42)
+   v3d->max_counters = ARRAY_SIZE(v3d_v42_performance_counters);
+   else
+   v3d->max_counters = 0;
+
v3d->reset = devm_reset_control_get_exclusive(dev, NULL);
if (IS_ERR(v3d->reset)) {
ret = PTR_ERR(v3d->reset);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 671375a3bb66..bd1e38f7d10a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -104,6 +104,11 @@ struct v3d_dev {
int ver;
bool single_irq_line;
 
+   /* Different revisions of V3D have different total number of performance
+* counters
+*/
+   unsigned int max_counters;
+
void __iomem *hub_regs;
void __iomem *core_regs[3];
void __iomem *bridge_regs;
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index e1be7368b87d..f268d9466c0f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -123,6 +123,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 {
struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
struct drm_v3d_perfmon_create *req = data;
+   struct v3d_dev *v3d = v3d_priv->v3d;
struct v3d_perfmon *perfmon;
unsigned int i;
int ret;
@@ -134,7 +135,7 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void 
*data,
 
/* Make sure all counters are valid. */
for (i = 0; i < req->ncounters; i++) {
-   if (req->counters[i] >= V3D_PERFCNT_NUM)
+   if (req->counters[i] >= v3d->max_counters)
return -EINVAL;
}
 
-- 
2.44.0

[PATCH 3/6] drm/v3d: Create a new V3D parameter for the maximum number of perfcnt

2024-05-08 Thread Maíra Canal

The maximum number of performance counters can change from version to
version and it's important for userspace to know this value, as it needs
to use the counters for performance queries. Therefore, expose the
maximum number of performance counters to userspace as a parameter.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c | 3 +++
 include/uapi/drm/v3d_drm.h| 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 6b9dd26df9fe..d2c1d5053132 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -94,6 +94,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void 
*data,
case DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE:
args->value = 1;
return 0;
+   case DRM_V3D_PARAM_MAX_PERF_COUNTERS:
+   args->value = v3d->max_counters;
+   return 0;
default:
DRM_DEBUG("Unknown parameter %d\n", args->param);
return -EINVAL;
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index dce1835eced4..215b01bb69c3 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -286,6 +286,7 @@ enum drm_v3d_param {
DRM_V3D_PARAM_SUPPORTS_PERFMON,
DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE,
+   DRM_V3D_PARAM_MAX_PERF_COUNTERS,
 };
 
 struct drm_v3d_get_param {
-- 
2.44.0

[PATCH 5/6] drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM

2024-05-08 Thread Maíra Canal

V3D_PERFCNT_NUM represents the maximum number of performance counters
for V3D 4.2, but not for V3D 7.1. This means that, if we use
V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1.

Therefore, use the number of performance counters on V3D 7.1 as the
maximum number of counters. This will allow us to create arrays on the
stack with reasonable size. Note that userspace must use the value
provided by DRM_V3D_PARAM_V3D_MAX_PERF_COUNTERS.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.h   | 5 -
 drivers/gpu/drm/v3d/v3d_sched.c | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 44cfddedebde..556cbb400ba0 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -351,8 +351,11 @@ struct v3d_timestamp_query {
struct drm_syncobj *syncobj;
 };
 
+/* Maximum number of performance counters supported by any version of V3D */
+#define V3D_MAX_COUNTERS ARRAY_SIZE(v3d_v71_performance_counters)
+
 /* Number of perfmons required to handle all supported performance counters */
-#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_PERFCNT_NUM, \
+#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_MAX_COUNTERS, \
  DRM_V3D_MAX_PERF_COUNTERS)
 
 struct v3d_performance_query {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 7cd8c335cd9b..03df37a3acf5 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -490,7 +490,7 @@ v3d_write_performance_query_result(struct v3d_cpu_job *job, 
void *data, u32 quer
struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
struct v3d_dev *v3d = job->base.v3d;
struct v3d_perfmon *perfmon;
-   u64 counter_values[V3D_PERFCNT_NUM];
+   u64 counter_values[V3D_MAX_COUNTERS];
 
for (int i = 0; i < performance_query->nperfmons; i++) {
perfmon = v3d_perfmon_find(v3d_priv,
-- 
2.44.0

[PATCH 4/6] drm/v3d: Create new IOCTL to expose performance counters information

2024-05-08 Thread Maíra Canal

Userspace usually needs some information about the performance counters
available. Although we could replicate this information in the kernel
and user-space, let's use the kernel as the "single source of truth" to
avoid issues in the future (e.g. list of performance counters is updated
in user-space, but not in the kernel, generating invalid requests).

Therefore, create a new IOCTL to expose the performance counters
information, that is name, category, and description.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c |  1 +
 drivers/gpu/drm/v3d/v3d_drv.h |  2 ++
 drivers/gpu/drm/v3d/v3d_perfmon.c | 33 +++
 include/uapi/drm/v3d_drm.h| 37 +++
 4 files changed, 73 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index d2c1d5053132..f7477488b1cc 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -211,6 +211,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = {
DRM_IOCTL_DEF_DRV(V3D_PERFMON_DESTROY, v3d_perfmon_destroy_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES, v3d_perfmon_get_values_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl, 
DRM_RENDER_ALLOW | DRM_AUTH),
+   DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_COUNTER, 
v3d_perfmon_get_counter_ioctl, DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver v3d_drm_driver = {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index bd1e38f7d10a..44cfddedebde 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -582,6 +582,8 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void 
*data,
  struct drm_file *file_priv);
 int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_priv);
+int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);
 
 /* v3d_sysfs.c */
 int v3d_sysfs_init(struct device *dev);
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c 
b/drivers/gpu/drm/v3d/v3d_perfmon.c
index f268d9466c0f..73e2bb8bdb7f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -217,3 +217,36 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, 
void *data,
 
return ret;
 }
+
+int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+   struct drm_v3d_perfmon_get_counter *req = data;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
+   const struct v3d_perf_counter_desc *counter;
+
+   for (int i = 0; i < ARRAY_SIZE(req->reserved); i++) {
+   if (req->reserved[i] != 0)
+   return -EINVAL;
+   }
+
+   /* Make sure that the counter ID is valid */
+   if (req->counter >= v3d->max_counters)
+   return -EINVAL;
+
+   if (v3d->ver >= 71) {
+   WARN_ON(v3d->max_counters != 
ARRAY_SIZE(v3d_v71_performance_counters));
+   counter = _v71_performance_counters[req->counter];
+   } else if (v3d->ver >= 42) {
+   WARN_ON(v3d->max_counters != 
ARRAY_SIZE(v3d_v42_performance_counters));
+   counter = _v42_performance_counters[req->counter];
+   } else {
+   return -EOPNOTSUPP;
+   }
+
+   strscpy(req->name, counter->name, sizeof(req->name));
+   strscpy(req->category, counter->category, sizeof(req->category));
+   strscpy(req->description, counter->description, 
sizeof(req->description));
+
+   return 0;
+}
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 215b01bb69c3..0860ddb3d0b6 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -42,6 +42,7 @@ extern "C" {
 #define DRM_V3D_PERFMON_DESTROY   0x09
 #define DRM_V3D_PERFMON_GET_VALUES0x0a
 #define DRM_V3D_SUBMIT_CPU0x0b
+#define DRM_V3D_PERFMON_GET_COUNTER   0x0c
 
 #define DRM_IOCTL_V3D_SUBMIT_CL   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
 #define DRM_IOCTL_V3D_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
@@ -58,6 +59,8 @@ extern "C" {
 #define DRM_IOCTL_V3D_PERFMON_GET_VALUES  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_PERFMON_GET_VALUES, \
   struct 
drm_v3d_perfmon_get_values)
 #define DRM_IOCTL_V3D_SUBMIT_CPU  DRM_IOW(DRM_COMMAND_BASE + 
DRM_V3D_SUBMIT_CPU, struct drm_v3d_submit_cpu)
+#define DRM_IOCTL_V3D_PERFMON_GET_COUNTER DRM_IOWR(DRM_COMMAND_BASE + 
DRM_V3D_PERFMON_GET_COUNTER, \
+  stru

[PATCH 1/6] drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1

2024-05-08 Thread Maíra Canal

Add name, category and description for each one of the 93 performance
counters available on V3D.

Note that V3D 4.2 has 87 performance counters, while V3D 7.1 has 93.
Therefore, there are two performance counters arrays. The index of the
performance counter for each V3D version is represented by its position
on the array.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.h |   2 +
 .../gpu/drm/v3d/v3d_performance_counters.h| 208 ++
 2 files changed, 210 insertions(+)
 create mode 100644 drivers/gpu/drm/v3d/v3d_performance_counters.h

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a2c516fe6d79..671375a3bb66 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+#include "v3d_performance_counters.h"
+
 #include "uapi/drm/v3d_drm.h"
 
 struct clk;
diff --git a/drivers/gpu/drm/v3d/v3d_performance_counters.h 
b/drivers/gpu/drm/v3d/v3d_performance_counters.h
new file mode 100644
index ..72822205ebdc
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_performance_counters.h
@@ -0,0 +1,208 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Copyright (C) 2024 Raspberry Pi
+ */
+#ifndef V3D_PERFORMANCE_COUNTERS_H
+#define V3D_PERFORMANCE_COUNTERS_H
+
+/* Holds a description of a given performance counter. The index of performance
+ * counter is given by the array on v3d_performance_counter.h
+ */
+struct v3d_perf_counter_desc {
+   /* Category of the counter */
+   char category[32];
+
+   /* Name of the counter */
+   char name[64];
+
+   /* Description of the counter */
+   char description[256];
+};
+
+static const struct v3d_perf_counter_desc v3d_v71_performance_counters[] = {
+   {"CORE", "cycle-count", "[CORE] Cycle counter"},
+   {"CORE", "core-active", "[CORE] Bin/Render/Compute active cycles"},
+   {"CLE", "CLE-bin-thread-active-cycles", "[CLE] Bin thread active 
cycles"},
+   {"CLE", "CLE-render-thread-active-cycles", "[CLE] Render thread active 
cycles"},
+   {"CORE", "compute-active-cycles", "[CORE] Compute active cycles"},
+   {"FEP", "FEP-valid-primitives-no-rendered-pixels", "[FEP] Valid 
primitives that result in no rendered pixels, for all rendered tiles"},
+   {"FEP", "FEP-valid-primitives-rendered-pixels", "[FEP] Valid primitives 
for all rendered tiles (primitives may be counted in more than one tile)"},
+   {"FEP", "FEP-clipped-quads", "[FEP] Early-Z/Near/Far clipped quads"},
+   {"FEP", "FEP-valid-quads", "[FEP] Valid quads"},
+   {"TLB", "TLB-quads-not-passing-stencil-test", "[TLB] Quads with no 
pixels passing the stencil test"},
+   {"TLB", "TLB-quads-not-passing-z-and-stencil-test", "[TLB] Quads with 
no pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-passing-z-and-stencil-test", "[TLB] Quads with any 
pixels passing the Z and stencil tests"},
+   {"TLB", "TLB-quads-written-to-color-buffer", "[TLB] Quads with valid 
pixels written to colour buffer"},
+   {"TLB", "TLB-partial-quads-written-to-color-buffer", "[TLB] Partial 
quads written to the colour buffer"},
+   {"PTB", "PTB-primitives-need-clipping", "[PTB] Primitives that need 
clipping"},
+   {"PTB", "PTB-primitives-discarded-outside-viewport", "[PTB] Primitives 
discarded by being outside the viewport"},
+   {"PTB", "PTB-primitives-binned", "[PTB] Total primitives binned"},
+   {"PTB", "PTB-primitives-discarded-reversed", "[PTB] Primitives that are 
discarded because they are reversed"},
+   {"QPU", "QPU-total-instr-cache-hit", "[QPU] Total instruction cache 
hits for all slices"},
+   {"QPU", "QPU-total-instr-cache-miss", "[QPU] Total instruction cache 
misses for all slices"},
+   {"QPU", "QPU-total-uniform-cache-hit", "[QPU] Total uniforms cache hits 
for all slices"},
+   {"QPU", "QPU-total-uniform-cache-miss", "[QPU] Total uniforms cache 
misses for all slices"},
+   {"TMU", "TMU-active-cycles", "[TMU] Active cycles"},
+   {"TMU", "TMU-stalled-cycles", "[TMU] Stalled cycles"},
+   {"TMU", "TMU-total-text-quads-access", "[TMU] Total texture cache 
accesses"},
+   {"TMU",

Re: [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available

2024-04-29 Thread Maíra Canal


Hi Iago,

On 4/29/24 02:22, Iago Toral wrote:

Hi Maíra,

a question below:

El dom, 28-04-2024 a las 09:40 -0300, Maíra Canal escribió:

Although Big/Super Pages could appear naturally, it would be quite
hard
to have 1MB or 64KB allocated contiguously naturally. Therefore, we
can
force the creation of large pages allocated contiguously by using a
mountpoint with "huge=within_size" enabled.

Therefore, as V3D has a mountpoint with "huge=within_size" (if user
has
THP enabled), use this mountpoint for BO creation if available. This
will allow us to create large pages allocated contiguously and make
use
of Big/Super Pages.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---



(...)


@@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device
*dev, struct drm_file *file_priv,
     size_t unaligned_size)
  {
    struct drm_gem_shmem_object *shmem_obj;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
    struct v3d_bo *bo;
    int ret;
  
-	shmem_obj = drm_gem_shmem_create(dev, unaligned_size);

+   /* Let the user opt out of allocating the BOs with THP */
+   if (v3d->gemfs)
+   shmem_obj = drm_gem_shmem_create_with_mnt(dev,
unaligned_size,
+     v3d-

gemfs);

+   else
+   shmem_obj = drm_gem_shmem_create(dev,
unaligned_size);
+
    if (IS_ERR(shmem_obj))
    return ERR_CAST(shmem_obj);
    bo = to_v3d_bo(_obj->base);



if I read this correctly when we have THP we always allocate with that,
Even objects that are smaller than 64KB. I was wondering if there is
any benefit/downside to this or if the behavior for small allocations
is the same we had without the new mount point.


I'm assuming that your concern is related to memory pressure and memory
fragmentation.

As we are using `huge=within_size`, we only allocate a huge page if it
will be fully within `i_size`. When using `huge=within_size`, we can
optimize the performance for smaller files without forcing larger files
to also use huge pages. I don't understand `huge=within_size` in full
details, but it is possible to check that it is able to avoid the system
(even the RPi) to go OOM. Although it is slightly less performant than
`huge=always` (used in v1), I believe it is more ideal for a system such
as the RPi due to the memory requirements.

Best Regards,
- Maíra



Iago

[PATCH v4 8/8] drm/v3d: Add modparam for turning off Big/Super Pages

2024-04-28 Thread Maíra Canal

Add a modparam for turning off Big/Super Pages to make sure that if an
user doesn't want Big/Super Pages enabled, it can disabled it by setting
the modparam to false.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 7 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 5 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 28b7ddce7747..1a6e01235df6 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -36,6 +36,13 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
+/* Only expose the `super_pages` modparam if THP is enabled. */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+bool super_pages = true;
+module_param_named(super_pages, super_pages, bool, 0400);
+MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support.");
+#endif
+
 static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
 {
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
index 31cf5bd11e39..0ade02bb7209 100644
--- a/drivers/gpu/drm/v3d/v3d_gemfs.c
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -11,6 +11,7 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
char huge_opt[] = "huge=within_size";
struct file_system_type *type;
struct vfsmount *gemfs;
+   extern bool super_pages;
 
/*
 * By creating our own shmemfs mountpoint, we can pass in
@@ -20,6 +21,10 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
goto err;
 
+   /* The user doesn't want to enable Super Pages */
+   if (!super_pages)
+   goto err;
+
type = get_fs_type("tmpfs");
if (!type)
goto err;
-- 
2.44.0

[PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available

2024-04-28 Thread Maíra Canal

Although Big/Super Pages could appear naturally, it would be quite hard
to have 1MB or 64KB allocated contiguously naturally. Therefore, we can
force the creation of large pages allocated contiguously by using a
mountpoint with "huge=within_size" enabled.

Therefore, as V3D has a mountpoint with "huge=within_size" (if user has
THP enabled), use this mountpoint for BO creation if available. This
will allow us to create large pages allocated contiguously and make use
of Big/Super Pages.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_bo.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 79e31c5299b1..16ac26c31c6b 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
struct v3d_dev *v3d = to_v3d_dev(obj->dev);
struct v3d_bo *bo = to_v3d_bo(obj);
struct sg_table *sgt;
+   u64 align;
int ret;
 
/* So far we pin the BO in the MMU for its lifetime, so use
@@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
if (IS_ERR(sgt))
return PTR_ERR(sgt);
 
+   if (!v3d->gemfs)
+   align = SZ_4K;
+   else if (obj->size >= SZ_1M)
+   align = SZ_1M;
+   else if (obj->size >= SZ_64K)
+   align = SZ_64K;
+   else
+   align = SZ_4K;
+
spin_lock(>mm_lock);
/* Allocate the object's space in the GPU's page tables.
 * Inserting PTEs will happen later, but the offset is for the
@@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
+align >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
@@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, 
struct drm_file *file_priv,
 size_t unaligned_size)
 {
struct drm_gem_shmem_object *shmem_obj;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
struct v3d_bo *bo;
int ret;
 
-   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+   /* Let the user opt out of allocating the BOs with THP */
+   if (v3d->gemfs)
+   shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size,
+ v3d->gemfs);
+   else
+   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+
if (IS_ERR(shmem_obj))
return ERR_CAST(shmem_obj);
bo = to_v3d_bo(_obj->base);
-- 
2.44.0

[PATCH v4 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs

2024-04-28 Thread Maíra Canal

The V3D MMU also supports 64KB and 1MB pages, called big and super pages,
respectively. In order to set a 64KB page or 1MB page in the MMU, we need
to make sure that page table entries for all 4KB pages within a big/super
page must be correctly configured.

In order to create a big/super page, we need a contiguous memory region.
That's why we use a separate mountpoint with THP enabled. In order to
place the page table entries in the MMU, we iterate over the 16 4KB pages
(for big pages) or 256 4KB pages (for super pages) and insert the PTE.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_drv.h |  1 +
 drivers/gpu/drm/v3d/v3d_mmu.c | 52 ++-
 2 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index e1f291db68de..3276eef280ef 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -18,6 +18,7 @@ struct platform_device;
 struct reset_control;
 
 #define V3D_MMU_PAGE_SHIFT 12
+#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
 
diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
index 14f3af40d6f6..2e0b31e373b2 100644
--- a/drivers/gpu/drm/v3d/v3d_mmu.c
+++ b/drivers/gpu/drm/v3d/v3d_mmu.c
@@ -25,9 +25,16 @@
  * superpage bit set.
  */
 #define V3D_PTE_SUPERPAGE BIT(31)
+#define V3D_PTE_BIGPAGE BIT(30)
 #define V3D_PTE_WRITEABLE BIT(29)
 #define V3D_PTE_VALID BIT(28)
 
+static bool v3d_mmu_is_aligned(u32 page, u32 page_address, size_t alignment)
+{
+   return IS_ALIGNED(page, alignment >> V3D_MMU_PAGE_SHIFT) &&
+   IS_ALIGNED(page_address, alignment >> V3D_MMU_PAGE_SHIFT);
+}
+
 static int v3d_mmu_flush_all(struct v3d_dev *v3d)
 {
int ret;
@@ -87,19 +94,38 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
struct drm_gem_shmem_object *shmem_obj = >base;
struct v3d_dev *v3d = to_v3d_dev(shmem_obj->base.dev);
u32 page = bo->node.start;
-   u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
-   struct sg_dma_page_iter dma_iter;
-
-   for_each_sgtable_dma_page(shmem_obj->sgt, _iter, 0) {
-   dma_addr_t dma_addr = sg_page_iter_dma_address(_iter);
-   u32 page_address = dma_addr >> V3D_MMU_PAGE_SHIFT;
-   u32 pte = page_prot | page_address;
-   u32 i;
-
-   BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >=
-  BIT(24));
-   for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++)
-   v3d->pt[page++] = pte + i;
+   struct scatterlist *sgl;
+   unsigned int count;
+
+   for_each_sgtable_dma_sg(shmem_obj->sgt, sgl, count) {
+   dma_addr_t dma_addr = sg_dma_address(sgl);
+   u32 pfn = dma_addr >> V3D_MMU_PAGE_SHIFT;
+   unsigned int len = sg_dma_len(sgl);
+
+   while (len > 0) {
+   u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
+   u32 page_address = page_prot | pfn;
+   unsigned int i, page_size;
+
+   BUG_ON(pfn + V3D_PAGE_FACTOR >= BIT(24));
+
+   if (len >= SZ_1M && v3d_mmu_is_aligned(page, 
page_address, SZ_1M)) {
+   page_size = SZ_1M;
+   page_address |= V3D_PTE_SUPERPAGE;
+   } else if (len >= SZ_64K && v3d_mmu_is_aligned(page, 
page_address, SZ_64K)) {
+   page_size = SZ_64K;
+   page_address |= V3D_PTE_BIGPAGE;
+   } else {
+   page_size = SZ_4K;
+   }
+
+   for (i = 0; i < page_size >> V3D_MMU_PAGE_SHIFT; i++) {
+   v3d->pt[page++] = page_address + i;
+   pfn++;
+   }
+
+   len -= page_size;
+   }
}
 
WARN_ON_ONCE(page - bo->node.start !=
-- 
2.44.0

[PATCH v4 5/8] drm/v3d: Reduce the alignment of the node allocation

2024-04-28 Thread Maíra Canal

Currently, we are using an alignment of 128 kB to insert a node, which
ends up wasting memory as we perform plenty of small BOs allocations
(<= 4 kB). We require that allocations are aligned to 128Kb so for any
allocation smaller than that, we are wasting the difference.

This implies that we cannot effectively use the whole 4 GB address space
available for the GPU in the RPi 4. Currently, we can allocate up to
32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be
quite limiting for applications that have a high memory requirement, such
as vkoverhead [1].

By reducing the page alignment to 4 kB, we can allocate up to 100 BOs
of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by performing
benchmarks, we were able to attest that reducing the page alignment to
4 kB can provide a general performance improvement in OpenGL
applications (e.g. glmark2).

Therefore, this patch reduces the alignment of the node allocation to 4
kB, which will allow RPi users to explore the whole 4GB virtual
address space provided by the hardware. Also, this patch allow users to
fully run vkoverhead in the RPi 4/5, solving the issue reported in [1].

[1] https://github.com/zmike/vkoverhead/issues/14

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_bo.c  | 2 +-
 drivers/gpu/drm/v3d/v3d_drv.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a07ede668cc1..79e31c5299b1 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 
0, 0);
+SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index cef2f82b7a75..e1f291db68de 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -17,8 +17,6 @@ struct clk;
 struct platform_device;
 struct reset_control;
 
-#define GMP_GRANULARITY (128 * 1024)
-
 #define V3D_MMU_PAGE_SHIFT 12
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
-- 
2.44.0

[PATCH v4 4/8] drm/gem: Create shmem GEM object in a given mountpoint

2024-04-28 Thread Maíra Canal

Create a function `drm_gem_shmem_create_with_mnt()`, similar to
`drm_gem_shmem_create()`, that has a mountpoint as a argument. This
function will create a shmem GEM object in a given tmpfs mountpoint.

This function will be useful for drivers that have a special mountpoint
with flags enabled.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++
 include/drm/drm_gem_shmem_helper.h |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 13bcdbfd..10b7c4c769a3 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs 
= {
 };
 
 static struct drm_gem_shmem_object *
-__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
+__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private,
+  struct vfsmount *gemfs)
 {
struct drm_gem_shmem_object *shmem;
struct drm_gem_object *obj;
@@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, 
bool private)
drm_gem_private_object_init(dev, obj, size);
shmem->map_wc = false; /* dma-buf mappings use always 
writecombine */
} else {
-   ret = drm_gem_object_init(dev, obj, size);
+   ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs);
}
if (ret) {
drm_gem_private_object_fini(obj);
@@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t 
size, bool private)
  */
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size)
 {
-   return __drm_gem_shmem_create(dev, size, false);
+   return __drm_gem_shmem_create(dev, size, false, NULL);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
 
+/**
+ * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a
+ * given mountpoint
+ * @dev: DRM device
+ * @size: Size of the object to allocate
+ * @gemfs: tmpfs mount where the GEM object will be created
+ *
+ * This function creates a shmem GEM object in a given tmpfs mountpoint.
+ *
+ * Returns:
+ * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs)
+{
+   return __drm_gem_shmem_create(dev, size, false, gemfs);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt);
+
 /**
  * drm_gem_shmem_free - Free resources associated with a shmem GEM object
  * @shmem: shmem GEM object to free
@@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev,
size_t size = PAGE_ALIGN(attach->dmabuf->size);
struct drm_gem_shmem_object *shmem;
 
-   shmem = __drm_gem_shmem_create(dev, size, true);
+   shmem = __drm_gem_shmem_create(dev, size, true, NULL);
if (IS_ERR(shmem))
return ERR_CAST(shmem);
 
diff --git a/include/drm/drm_gem_shmem_helper.h 
b/include/drm/drm_gem_shmem_helper.h
index efbc9f27312b..d22e3fb53631 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -97,6 +97,9 @@ struct drm_gem_shmem_object {
container_of(obj, struct drm_gem_shmem_object, base)
 
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size);
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs);
 void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem);
 
 void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem);
-- 
2.44.0

[PATCH v4 0/8] drm/v3d: Enable Big and Super Pages

2024-04-28 Thread Maíra Canal

 performance.
This indicates an enhancement in the baseline scenario, rather than any 
detriment
caused by v2. Additionally, I've included stats from v1 in the comparisons. Upon
scrutinizing the average FPS of v2 in contrast to v1, it becomes evident that v2
not only maintains the improvements but may even surpass them.

This version provides a much safer way to iterate through memory and doesn't
hold to the same limitations as v1. For example, v1 had a hard-coded hack that
only allowed a huge page to be created if the BO was bigger than 2MB. These
limitations are gone now.

This series also introduces changes in the GEM helpers, in order to enable V3D
to have a separate mount point for shmfs GEM objects. Any feedback from the
community about the changes in the GEM helpers is welcomed!

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240311100959.205545-1-mca...@igalia.com/

* [1/6] Add Iago's R-b to PATCH 1/5 (Iago Toral)
* [2/6] Create a new function `drm_gem_object_init_with_mnt()` to define the
shmfs mountpoint. Now, we don't touch a bunch of drivers, as
`drm_gem_object_init()` preserves its signature 
(Tvrtko Ursulin)
* [3/6] Use `huge=within_size` instead of `huge=always`, in order to avoid OOM.
This also allow us to move away from the 2MB hack. (Tvrtko Ursulin)
* [3/6] Add Iago's R-b to PATCH 3/5 (Iago Toral)
* [5/6] Create a separate patch to reduce the alignment of the node allocation
(Iago Toral)
* [6/6] Complete refactoring to the way that we iterate through the memory
(Tvrtko Ursulin)
* [6/6] Don't use drm_prime_get_contiguous_size(), as it could give us 
misleading
data (Tvrtko Ursulin)
* [6/6] Use both Big Pages (64K) and Super Pages (1MB)

v2 -> v3: 
https://lore.kernel.org/dri-devel/20240405201753.1176914-1-mca...@igalia.com/T/

* [2/8] Add Tvrtko's R-b to PATCH 2/8 (Tvrtko Ursulin)
* [4/8] Add Tvrtko's R-b to PATCH 4/8 (Tvrtko Ursulin)
* [6/8] Now, PATCH 6/8 regards supporting big/super pages when writing out PTEs
(Tvrtko Ursulin)
* [6/8] s/page_address/pfn (Tvrtko Ursulin)
* [6/8] As `sg_dma_len()` returns `unsigned int`, then `len` must be `unsigned 
int`
too (Tvrtko Ursulin)
* [6/8] `i` and `page_size` are `unsigned int` as well (Tvrtko Ursulin)
* [6/8] Move `i`, `page_prot` and `page_size` to the inner scope (Tvrtko 
Ursulin)
* [6/8] s/pte/page_address/ (Tvrtko Ursulin)
* [7/8] New patch: use gemfs/THP in BO creation if available
* [8/8] New patch: 
* [8/8] Don't expose the modparam `super_pages` unless 
CONFIG_TRANSPARENT_HUGEPAGE
is enabled (Tvrtko Ursulin)
* [8/8] Use `v3d->gemfs` to check if the user disabled Super Pages support
(Tvrtko Ursulin)

v3 -> v4: 
https://lore.kernel.org/dri-devel/20240421215309.660018-1-mca...@igalia.com/T/

* [5/8] Add Iago's R-b to PATCH 5/8 (Iago Toral)
* [6/8] Add Tvrtko's R-b to PATCH 6/8 (Tvrtko Ursulin)
* [7/8] Add Tvrtko's R-b to PATCH 7/8 (Tvrtko Ursulin)
* [8/8] Move `bool super_pages` to the guard (Tvrtko Ursulin)

Best Regards,
- Maíra

Maíra Canal (8):
  drm/v3d: Fix return if scheduler initialization fails
  drm/gem: Create a drm_gem_object_init_with_mnt() function
  drm/v3d: Introduce gemfs
  drm/gem: Create shmem GEM object in a given mountpoint
  drm/v3d: Reduce the alignment of the node allocation
  drm/v3d: Support Big/Super Pages when writing out PTEs
  drm/v3d: Use gemfs/THP in BO creation if available
  drm/v3d: Add modparam for turning off Big/Super Pages

 drivers/gpu/drm/drm_gem.c  | 34 +++--
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 +--
 drivers/gpu/drm/v3d/Makefile   |  3 +-
 drivers/gpu/drm/v3d/v3d_bo.c   | 21 ++-
 drivers/gpu/drm/v3d/v3d_drv.c  |  7 
 drivers/gpu/drm/v3d/v3d_drv.h  | 12 +-
 drivers/gpu/drm/v3d/v3d_gem.c  |  6 ++-
 drivers/gpu/drm/v3d/v3d_gemfs.c| 51 +
 drivers/gpu/drm/v3d/v3d_mmu.c  | 52 +++---
 include/drm/drm_gem.h  |  3 ++
 include/drm/drm_gem_shmem_helper.h |  3 ++
 11 files changed, 195 insertions(+), 27 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

-- 
2.44.0

[PATCH v4 1/8] drm/v3d: Fix return if scheduler initialization fails

2024-04-28 Thread Maíra Canal

If the scheduler initialization fails, GEM initialization must fail as
well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
allocated and return the error value in `v3d_gem_init()`.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index da8faf3b9011..b3b76332f2c5 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -291,8 +291,9 @@ v3d_gem_init(struct drm_device *dev)
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
-   dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt,
+   dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt,
  v3d->pt_paddr);
+   return ret;
}
 
return 0;
-- 
2.44.0

[PATCH v4 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function

2024-04-28 Thread Maíra Canal

For some applications, such as applications that uses huge pages, we might
want to have a different mountpoint, for which we pass mount flags that
better match our usecase.

Therefore, create a new function `drm_gem_object_init_with_mnt()` that
allow us to define the tmpfs mountpoint where the GEM object will be
created. If this parameter is NULL, then we fallback to `shmem_file_setup()`.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_gem.c | 34 ++
 include/drm/drm_gem.h |  3 +++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d4bbc5d109c8..74ebe68e3d61 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev)
 }
 
 /**
- * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM
+ * object in a given shmfs mountpoint
+ *
  * @dev: drm_device the object should be initialized for
  * @obj: drm_gem_object to initialize
  * @size: object size
+ * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use
+ * the usual tmpfs mountpoint (`shm_mnt`).
  *
  * Initialize an already allocated GEM object of the specified size with
  * shmfs backing store.
  */
-int drm_gem_object_init(struct drm_device *dev,
-   struct drm_gem_object *obj, size_t size)
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs)
 {
struct file *filp;
 
drm_gem_private_object_init(dev, obj, size);
 
-   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+   if (gemfs)
+   filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
+VM_NORESERVE);
+   else
+   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+
if (IS_ERR(filp))
return PTR_ERR(filp);
 
@@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev,
 
return 0;
 }
+EXPORT_SYMBOL(drm_gem_object_init_with_mnt);
+
+/**
+ * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * @dev: drm_device the object should be initialized for
+ * @obj: drm_gem_object to initialize
+ * @size: object size
+ *
+ * Initialize an already allocated GEM object of the specified size with
+ * shmfs backing store.
+ */
+int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj,
+   size_t size)
+{
+   return drm_gem_object_init_with_mnt(dev, obj, size, NULL);
+}
 EXPORT_SYMBOL(drm_gem_object_init);
 
 /**
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bae4865b2101..2ebf6e10cc44 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
struct drm_gem_object *obj, size_t size);
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs);
 void drm_gem_private_object_init(struct drm_device *dev,
 struct drm_gem_object *obj, size_t size);
 void drm_gem_private_object_fini(struct drm_gem_object *obj);
-- 
2.44.0

[PATCH v4 3/8] drm/v3d: Introduce gemfs

2024-04-28 Thread Maíra Canal

Create a separate "tmpfs" kernel mount for V3D. This will allow us to
move away from the shmemfs `shm_mnt` and gives the flexibility to do
things like set our own mount options. Here, the interest is to use
"huge=", which should allow us to enable the use of THP for our
shmem-backed objects.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/Makefile|  3 ++-
 drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++
 drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 46 +
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index b7d673f1153b..fcf710926057 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -13,7 +13,8 @@ v3d-y := \
v3d_trace_points.o \
v3d_sched.o \
v3d_sysfs.o \
-   v3d_submit.o
+   v3d_submit.o \
+   v3d_gemfs.o
 
 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a2c516fe6d79..cef2f82b7a75 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -131,6 +131,11 @@ struct v3d_dev {
struct drm_mm mm;
spinlock_t mm_lock;
 
+   /*
+* tmpfs instance used for shmem backed objects
+*/
+   struct vfsmount *gemfs;
+
struct work_struct overflow_mem_work;
 
struct v3d_bin_job *bin_job;
@@ -532,6 +537,10 @@ void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
 void v3d_clean_caches(struct v3d_dev *v3d);
 
+/* v3d_gemfs.c */
+void v3d_gemfs_init(struct v3d_dev *v3d);
+void v3d_gemfs_fini(struct v3d_dev *v3d);
+
 /* v3d_submit.c */
 void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index b3b76332f2c5..b1e681630ded 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -288,6 +288,8 @@ v3d_gem_init(struct drm_device *dev)
v3d_init_hw_state(v3d);
v3d_mmu_set_page_table(v3d);
 
+   v3d_gemfs_init(v3d);
+
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
@@ -305,6 +307,7 @@ v3d_gem_destroy(struct drm_device *dev)
struct v3d_dev *v3d = to_v3d_dev(dev);
 
v3d_sched_fini(v3d);
+   v3d_gemfs_fini(v3d);
 
/* Waiting for jobs to finish would need to be done before
 * unregistering V3D.
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
new file mode 100644
index ..31cf5bd11e39
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (C) 2024 Raspberry Pi */
+
+#include 
+#include 
+
+#include "v3d_drv.h"
+
+void v3d_gemfs_init(struct v3d_dev *v3d)
+{
+   char huge_opt[] = "huge=within_size";
+   struct file_system_type *type;
+   struct vfsmount *gemfs;
+
+   /*
+* By creating our own shmemfs mountpoint, we can pass in
+* mount flags that better match our usecase. However, we
+* only do so on platforms which benefit from it.
+*/
+   if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+   goto err;
+
+   type = get_fs_type("tmpfs");
+   if (!type)
+   goto err;
+
+   gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
+   if (IS_ERR(gemfs))
+   goto err;
+
+   v3d->gemfs = gemfs;
+   drm_info(>drm, "Using Transparent Hugepages\n");
+
+   return;
+
+err:
+   v3d->gemfs = NULL;
+   drm_notice(>drm,
+  "Transparent Hugepage support is recommended for optimal 
performance on this platform!\n");
+}
+
+void v3d_gemfs_fini(struct v3d_dev *v3d)
+{
+   if (v3d->gemfs)
+   kern_unmount(v3d->gemfs);
+}
-- 
2.44.0

Re: [PATCH v3 0/5] drm/v3d: Fix GPU stats inconsistencies and race-condition

2024-04-23 Thread Maíra Canal


On 4/23/24 04:05, Maxime Ripard wrote:

Hi,

On Mon, Apr 22, 2024 at 01:08:44PM -0300, Maíra Canal wrote:

@drm-misc maintainers, is there any chance you could backport commit
35f4f8c9fc97 ("drm/v3d: Don't increment `enabled_ns` twice") [1] to drm-
misc-next?

I would like to apply this series to drm-misc-next because it fixes
another issue with the GPU stats, but this series depends on commit
35f4f8c9fc97, as it has plenty of refactors on the GPU stats code.

Although I could theoretically apply this series in drm-misc-fixes, I
don't believe it would be ideal, as discussed in #dri-devel earlier
today.

[1] 
https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/35f4f8c9fc972248055096d63b782060e473311b


I just did the backmerge


Thanks Maxime! I just applied the series to drm-misc/drm-misc-next.

Thanks for drm-misc maintainers for the quick action!

Best Regards,
- Maíra



Maxime

Re: [PATCH v3 0/5] drm/v3d: Fix GPU stats inconsistencies and race-condition

2024-04-22 Thread Maíra Canal


Hi,

@drm-misc maintainers, is there any chance you could backport commit
35f4f8c9fc97 ("drm/v3d: Don't increment `enabled_ns` twice") [1] to drm-
misc-next?

I would like to apply this series to drm-misc-next because it fixes
another issue with the GPU stats, but this series depends on commit
35f4f8c9fc97, as it has plenty of refactors on the GPU stats code.

Although I could theoretically apply this series in drm-misc-fixes, I
don't believe it would be ideal, as discussed in #dri-devel earlier
today.

[1] 
https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/35f4f8c9fc972248055096d63b782060e473311b


Best Regards,
- Maíra

On 4/20/24 18:32, Maíra Canal wrote:

The first version of this series had the intention to fix two major
issues with the GPU stats:

1. We were incrementing `enabled_ns` twice by the end of each job.
2. There is a race-condition between the IRQ handler and the users

The first of the issues was already addressed and the fix was applied to
drm-misc-fixes. Now, what is left, addresses the second issue.

Apart from addressing this issue, this series improved the GPU stats
code as a whole. We reduced code repetition, creating functions to start and
update the GPU stats. This will likely reduce the odds of issue #1 happen again.

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240403203517.731876-1-mca...@igalia.com/T/

- As the first patch was a bugfix, it was pushed to drm-misc-fixes.
- [1/4] Add Chema Casanova's R-b
- [2/4] s/jobs_sent/jobs_completed and add the reasoning in the commit message
(Chema Casanova)
- [2/4] Add Chema Casanova's and Tvrtko Ursulin's R-b
- [3/4] Call `local_clock()` only once, by adding a new parameter to the
`v3d_stats_update` function (Chema Casanova)
- [4/4] Move new line to the correct patch [2/4] (Tvrtko Ursulin)
- [4/4] Use `seqcount_t` as locking primitive instead of a `rw_lock` (Tvrtko 
Ursulin)

v2 -> v3: 
https://lore.kernel.org/dri-devel/20240417011021.600889-1-mca...@igalia.com/T/

- [4/5] New patch: separates the code refactor from the race-condition fix 
(Tvrtko Ursulin)
- [5/5] s/interruption/interrupt (Tvrtko Ursulin)
- [5/5] s/matches/match (Tvrtko Ursulin)
- [5/5] Add Tvrtko Ursulin's R-b

Best Regards,
- Maíra

Maíra Canal (5):
   drm/v3d: Create two functions to update all GPU stats variables
   drm/v3d: Create a struct to store the GPU stats
   drm/v3d: Create function to update a set of GPU stats
   drm/v3d: Decouple stats calculation from printing
   drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

  drivers/gpu/drm/v3d/v3d_drv.c   | 33 
  drivers/gpu/drm/v3d/v3d_drv.h   | 30 ---
  drivers/gpu/drm/v3d/v3d_gem.c   |  9 ++--
  drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++---
  drivers/gpu/drm/v3d/v3d_sched.c | 94 +
  drivers/gpu/drm/v3d/v3d_sysfs.c | 13 ++---
  6 files changed, 109 insertions(+), 118 deletions(-)

[PATCH v3 8/8] drm/v3d: Add modparam for turning off Big/Super Pages

2024-04-21 Thread Maíra Canal

Add a modparam for turning off Big/Super Pages to make sure that if an
user doesn't want Big/Super Pages enabled, it can disabled it by setting
the modparam to false.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 8 
 drivers/gpu/drm/v3d/v3d_gemfs.c | 5 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..bc8c8905112a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -36,6 +36,14 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
+bool super_pages = true;
+
+/* Only expose the `super_pages` modparam if THP is enabled. */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+module_param_named(super_pages, super_pages, bool, 0400);
+MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support.");
+#endif
+
 static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
 {
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
index 31cf5bd11e39..5fa08263cff2 100644
--- a/drivers/gpu/drm/v3d/v3d_gemfs.c
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -11,6 +11,11 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
char huge_opt[] = "huge=within_size";
struct file_system_type *type;
struct vfsmount *gemfs;
+   extern bool super_pages;
+
+   /* The user doesn't want to enable Super Pages */
+   if (!super_pages)
+   goto err;
 
/*
 * By creating our own shmemfs mountpoint, we can pass in
-- 
2.44.0

[PATCH v3 7/8] drm/v3d: Use gemfs/THP in BO creation if available

2024-04-21 Thread Maíra Canal

Although Big/Super Pages could appear naturally, it would be quite hard
to have 1MB or 64KB allocated contiguously naturally. Therefore, we can
force the creation of large pages allocated contiguously by using a
mountpoint with "huge=within_size" enabled.

As V3D has a mountpoint with "huge=within_size" (if user has THP enabled),
use this mountpoint for BO creation if available. This will allow us to create
large pages allocated contiguously and make use of Big/Super Pages.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_bo.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 79e31c5299b1..16ac26c31c6b 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
struct v3d_dev *v3d = to_v3d_dev(obj->dev);
struct v3d_bo *bo = to_v3d_bo(obj);
struct sg_table *sgt;
+   u64 align;
int ret;
 
/* So far we pin the BO in the MMU for its lifetime, so use
@@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
if (IS_ERR(sgt))
return PTR_ERR(sgt);
 
+   if (!v3d->gemfs)
+   align = SZ_4K;
+   else if (obj->size >= SZ_1M)
+   align = SZ_1M;
+   else if (obj->size >= SZ_64K)
+   align = SZ_64K;
+   else
+   align = SZ_4K;
+
spin_lock(>mm_lock);
/* Allocate the object's space in the GPU's page tables.
 * Inserting PTEs will happen later, but the offset is for the
@@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
+align >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
@@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, 
struct drm_file *file_priv,
 size_t unaligned_size)
 {
struct drm_gem_shmem_object *shmem_obj;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
struct v3d_bo *bo;
int ret;
 
-   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+   /* Let the user opt out of allocating the BOs with THP */
+   if (v3d->gemfs)
+   shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size,
+ v3d->gemfs);
+   else
+   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+
if (IS_ERR(shmem_obj))
return ERR_CAST(shmem_obj);
bo = to_v3d_bo(_obj->base);
-- 
2.44.0

[PATCH v3 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs

2024-04-21 Thread Maíra Canal

The V3D MMU also supports 64KB and 1MB pages, called big and super pages,
respectively. In order to set a 64KB page or 1MB page in the MMU, we need
to make sure that page table entries for all 4KB pages within a big/super
page must be correctly configured.

In order to create a big/super page, we need a contiguous memory region.
That's why we use a separate mountpoint with THP enabled. In order to
place the page table entries in the MMU, we iterate over the 16 4KB pages
(for big pages) or 256 4KB pages (for super pages) and insert the PTE.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.h |  1 +
 drivers/gpu/drm/v3d/v3d_mmu.c | 52 ++-
 2 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 17236ee23490..79d8a1a059aa 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -18,6 +18,7 @@ struct platform_device;
 struct reset_control;
 
 #define V3D_MMU_PAGE_SHIFT 12
+#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
 
diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
index 14f3af40d6f6..2e0b31e373b2 100644
--- a/drivers/gpu/drm/v3d/v3d_mmu.c
+++ b/drivers/gpu/drm/v3d/v3d_mmu.c
@@ -25,9 +25,16 @@
  * superpage bit set.
  */
 #define V3D_PTE_SUPERPAGE BIT(31)
+#define V3D_PTE_BIGPAGE BIT(30)
 #define V3D_PTE_WRITEABLE BIT(29)
 #define V3D_PTE_VALID BIT(28)
 
+static bool v3d_mmu_is_aligned(u32 page, u32 page_address, size_t alignment)
+{
+   return IS_ALIGNED(page, alignment >> V3D_MMU_PAGE_SHIFT) &&
+   IS_ALIGNED(page_address, alignment >> V3D_MMU_PAGE_SHIFT);
+}
+
 static int v3d_mmu_flush_all(struct v3d_dev *v3d)
 {
int ret;
@@ -87,19 +94,38 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
struct drm_gem_shmem_object *shmem_obj = >base;
struct v3d_dev *v3d = to_v3d_dev(shmem_obj->base.dev);
u32 page = bo->node.start;
-   u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
-   struct sg_dma_page_iter dma_iter;
-
-   for_each_sgtable_dma_page(shmem_obj->sgt, _iter, 0) {
-   dma_addr_t dma_addr = sg_page_iter_dma_address(_iter);
-   u32 page_address = dma_addr >> V3D_MMU_PAGE_SHIFT;
-   u32 pte = page_prot | page_address;
-   u32 i;
-
-   BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >=
-  BIT(24));
-   for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++)
-   v3d->pt[page++] = pte + i;
+   struct scatterlist *sgl;
+   unsigned int count;
+
+   for_each_sgtable_dma_sg(shmem_obj->sgt, sgl, count) {
+   dma_addr_t dma_addr = sg_dma_address(sgl);
+   u32 pfn = dma_addr >> V3D_MMU_PAGE_SHIFT;
+   unsigned int len = sg_dma_len(sgl);
+
+   while (len > 0) {
+   u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
+   u32 page_address = page_prot | pfn;
+   unsigned int i, page_size;
+
+   BUG_ON(pfn + V3D_PAGE_FACTOR >= BIT(24));
+
+   if (len >= SZ_1M && v3d_mmu_is_aligned(page, 
page_address, SZ_1M)) {
+   page_size = SZ_1M;
+   page_address |= V3D_PTE_SUPERPAGE;
+   } else if (len >= SZ_64K && v3d_mmu_is_aligned(page, 
page_address, SZ_64K)) {
+   page_size = SZ_64K;
+   page_address |= V3D_PTE_BIGPAGE;
+   } else {
+   page_size = SZ_4K;
+   }
+
+   for (i = 0; i < page_size >> V3D_MMU_PAGE_SHIFT; i++) {
+   v3d->pt[page++] = page_address + i;
+   pfn++;
+   }
+
+   len -= page_size;
+   }
}
 
WARN_ON_ONCE(page - bo->node.start !=
-- 
2.44.0

[PATCH v3 4/8] drm/gem: Create shmem GEM object in a given mountpoint

2024-04-21 Thread Maíra Canal

Create a function `drm_gem_shmem_create_with_mnt()`, similar to
`drm_gem_shmem_create()`, that has a mountpoint as a argument. This
function will create a shmem GEM object in a given tmpfs mountpoint.

This function will be useful for drivers that have a special mountpoint
with flags enabled.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++
 include/drm/drm_gem_shmem_helper.h |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 13bcdbfd..10b7c4c769a3 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs 
= {
 };
 
 static struct drm_gem_shmem_object *
-__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
+__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private,
+  struct vfsmount *gemfs)
 {
struct drm_gem_shmem_object *shmem;
struct drm_gem_object *obj;
@@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, 
bool private)
drm_gem_private_object_init(dev, obj, size);
shmem->map_wc = false; /* dma-buf mappings use always 
writecombine */
} else {
-   ret = drm_gem_object_init(dev, obj, size);
+   ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs);
}
if (ret) {
drm_gem_private_object_fini(obj);
@@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t 
size, bool private)
  */
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size)
 {
-   return __drm_gem_shmem_create(dev, size, false);
+   return __drm_gem_shmem_create(dev, size, false, NULL);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
 
+/**
+ * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a
+ * given mountpoint
+ * @dev: DRM device
+ * @size: Size of the object to allocate
+ * @gemfs: tmpfs mount where the GEM object will be created
+ *
+ * This function creates a shmem GEM object in a given tmpfs mountpoint.
+ *
+ * Returns:
+ * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs)
+{
+   return __drm_gem_shmem_create(dev, size, false, gemfs);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt);
+
 /**
  * drm_gem_shmem_free - Free resources associated with a shmem GEM object
  * @shmem: shmem GEM object to free
@@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev,
size_t size = PAGE_ALIGN(attach->dmabuf->size);
struct drm_gem_shmem_object *shmem;
 
-   shmem = __drm_gem_shmem_create(dev, size, true);
+   shmem = __drm_gem_shmem_create(dev, size, true, NULL);
if (IS_ERR(shmem))
return ERR_CAST(shmem);
 
diff --git a/include/drm/drm_gem_shmem_helper.h 
b/include/drm/drm_gem_shmem_helper.h
index efbc9f27312b..d22e3fb53631 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -97,6 +97,9 @@ struct drm_gem_shmem_object {
container_of(obj, struct drm_gem_shmem_object, base)
 
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size);
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs);
 void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem);
 
 void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem);
-- 
2.44.0

[PATCH v3 5/8] drm/v3d: Reduce the alignment of the node allocation

2024-04-21 Thread Maíra Canal

Currently, we are using an alignment of 128 kB to insert a node, which
ends up wasting memory as we perform plenty of small BOs allocations
(<= 4 kB). We require that allocations are aligned to 128Kb so for any
allocation smaller than that, we are wasting the difference.

This implies that we cannot effectively use the whole 4 GB address space
available for the GPU in the RPi 4. Currently, we can allocate up to
32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be
quite limiting for applications that have a high memory requirement, such
as vkoverhead [1].

By reducing the page alignment to 4 kB, we can allocate up to 100 BOs
of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by performing
benchmarks, we were able to attest that reducing the page alignment to
4 kB can provide a general performance improvement in OpenGL
applications (e.g. glmark2).

Therefore, this patch reduces the alignment of the node allocation to 4
kB, which will allow RPi users to explore the whole 4GB virtual
address space provided by the hardware. Also, this patch allow users to
fully run vkoverhead in the RPi 4/5, solving the issue reported in [1].

[1] https://github.com/zmike/vkoverhead/issues/14

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_bo.c  | 2 +-
 drivers/gpu/drm/v3d/v3d_drv.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a07ede668cc1..79e31c5299b1 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 
0, 0);
+SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index d2ce8222771a..17236ee23490 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -17,8 +17,6 @@ struct clk;
 struct platform_device;
 struct reset_control;
 
-#define GMP_GRANULARITY (128 * 1024)
-
 #define V3D_MMU_PAGE_SHIFT 12
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
-- 
2.44.0

[PATCH v3 0/8] drm/v3d: Enable Big and Super Pages

2024-04-21 Thread Maíra Canal

This also allow us to move away from the 2MB hack. (Tvrtko Ursulin)
* [3/6] Add Iago's R-b to PATCH 3/5 (Iago Toral)
* [5/6] Create a separate patch to reduce the alignment of the node allocation.
(Iago Toral)
* [6/6] Complete refactoring to the way that we iterate through the memory.
(Tvrtko Ursulin)
* [6/6] Don't use drm_prime_get_contiguous_size(), as it could give us
misleading data. (Tvrtko Ursulin)
* [6/6] Use both Big Pages (64K) and Super Pages (1MB).

v2 -> v3: 
https://lore.kernel.org/dri-devel/20240405201753.1176914-1-mca...@igalia.com/T/

* [2/8] Add Tvrtko's R-b to PATCH 2/8 (Tvrtko Ursulin)
* [4/8] Add Tvrtko's R-b to PATCH 4/8 (Tvrtko Ursulin)
* [6/8] Now, PATCH 6/8 only adds support to big/super pages when writing out
PTEs. BO creation with THP and addition of modparam are moved to
other patches. (Tvrtko Ursulin)
* [6/8] s/page_address/pfn (Tvrtko Ursulin)
* [6/8] As `sg_dma_len()` returns `unsigned int`, then `len` must be
`unsigned int` too. (Tvrtko Ursulin)
* [6/8] `i` and `page_size` are `unsigned int` as well. (Tvrtko Ursulin)
* [6/8] Move `i`, `page_prot` and `page_size` to the inner scope. (Tvrtko 
Ursulin)
* [6/8] s/pte/page_address/ (Tvrtko Ursulin)
* [7/8] New patch: Use gemfs/THP in BO creation if available (Tvrtko Ursulin)
* [8/8] New patch: Add modparam for turning off Big/Super Pages (Tvrtko Ursulin)
* [8/8] Don't expose the modparam `super_pages` unless 
CONFIG_TRANSPARENT_HUGEPAGE
is enabled. (Tvrtko Ursulin)
* [8/8] Use `v3d->gemfs` to check if the user disabled Super Pages support.
(Tvrtko Ursulin)

Best Regards,
- Maíra

Maíra Canal (8):
  drm/v3d: Fix return if scheduler initialization fails
  drm/gem: Create a drm_gem_object_init_with_mnt() function
  drm/v3d: Introduce gemfs
  drm/gem: Create shmem GEM object in a given mountpoint
  drm/v3d: Reduce the alignment of the node allocation
  drm/v3d: Support Big/Super Pages when writing out PTEs
  drm/v3d: Use gemfs/THP in BO creation if available
  drm/v3d: Add modparam for turning off Big/Super Pages

 drivers/gpu/drm/drm_gem.c  | 34 +++--
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 +--
 drivers/gpu/drm/v3d/Makefile   |  3 +-
 drivers/gpu/drm/v3d/v3d_bo.c   | 21 ++-
 drivers/gpu/drm/v3d/v3d_drv.c  |  8 
 drivers/gpu/drm/v3d/v3d_drv.h  | 12 +-
 drivers/gpu/drm/v3d/v3d_gem.c  |  6 ++-
 drivers/gpu/drm/v3d/v3d_gemfs.c| 51 +
 drivers/gpu/drm/v3d/v3d_mmu.c  | 52 +++---
 include/drm/drm_gem.h  |  3 ++
 include/drm/drm_gem_shmem_helper.h |  3 ++
 11 files changed, 196 insertions(+), 27 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

-- 
2.44.0

[PATCH v3 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function

2024-04-21 Thread Maíra Canal

For some applications, such as applications that uses huge pages, we might
want to have a different mountpoint, for which we pass mount flags that
better match our usecase.

Therefore, create a new function `drm_gem_object_init_with_mnt()` that
allow us to define the tmpfs mountpoint where the GEM object will be
created. If this parameter is NULL, then we fallback to `shmem_file_setup()`.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_gem.c | 34 ++
 include/drm/drm_gem.h |  3 +++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d4bbc5d109c8..74ebe68e3d61 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev)
 }
 
 /**
- * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM
+ * object in a given shmfs mountpoint
+ *
  * @dev: drm_device the object should be initialized for
  * @obj: drm_gem_object to initialize
  * @size: object size
+ * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use
+ * the usual tmpfs mountpoint (`shm_mnt`).
  *
  * Initialize an already allocated GEM object of the specified size with
  * shmfs backing store.
  */
-int drm_gem_object_init(struct drm_device *dev,
-   struct drm_gem_object *obj, size_t size)
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs)
 {
struct file *filp;
 
drm_gem_private_object_init(dev, obj, size);
 
-   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+   if (gemfs)
+   filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
+VM_NORESERVE);
+   else
+   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+
if (IS_ERR(filp))
return PTR_ERR(filp);
 
@@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev,
 
return 0;
 }
+EXPORT_SYMBOL(drm_gem_object_init_with_mnt);
+
+/**
+ * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * @dev: drm_device the object should be initialized for
+ * @obj: drm_gem_object to initialize
+ * @size: object size
+ *
+ * Initialize an already allocated GEM object of the specified size with
+ * shmfs backing store.
+ */
+int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj,
+   size_t size)
+{
+   return drm_gem_object_init_with_mnt(dev, obj, size, NULL);
+}
 EXPORT_SYMBOL(drm_gem_object_init);
 
 /**
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bae4865b2101..2ebf6e10cc44 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
struct drm_gem_object *obj, size_t size);
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs);
 void drm_gem_private_object_init(struct drm_device *dev,
 struct drm_gem_object *obj, size_t size);
 void drm_gem_private_object_fini(struct drm_gem_object *obj);
-- 
2.44.0

[PATCH v3 3/8] drm/v3d: Introduce gemfs

2024-04-21 Thread Maíra Canal

Create a separate "tmpfs" kernel mount for V3D. This will allow us to
move away from the shmemfs `shm_mnt` and gives the flexibility to do
things like set our own mount options. Here, the interest is to use
"huge=", which should allow us to enable the use of THP for our
shmem-backed objects.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/Makefile|  3 ++-
 drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++
 drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 46 +
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index b7d673f1153b..fcf710926057 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -13,7 +13,8 @@ v3d-y := \
v3d_trace_points.o \
v3d_sched.o \
v3d_sysfs.o \
-   v3d_submit.o
+   v3d_submit.o \
+   v3d_gemfs.o
 
 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..d2ce8222771a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -119,6 +119,11 @@ struct v3d_dev {
struct drm_mm mm;
spinlock_t mm_lock;
 
+   /*
+* tmpfs instance used for shmem backed objects
+*/
+   struct vfsmount *gemfs;
+
struct work_struct overflow_mem_work;
 
struct v3d_bin_job *bin_job;
@@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
 void v3d_clean_caches(struct v3d_dev *v3d);
 
+/* v3d_gemfs.c */
+void v3d_gemfs_init(struct v3d_dev *v3d);
+void v3d_gemfs_fini(struct v3d_dev *v3d);
+
 /* v3d_submit.c */
 void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 66f4b78a6b2e..faefbe497e8d 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev)
v3d_init_hw_state(v3d);
v3d_mmu_set_page_table(v3d);
 
+   v3d_gemfs_init(v3d);
+
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
@@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev)
struct v3d_dev *v3d = to_v3d_dev(dev);
 
v3d_sched_fini(v3d);
+   v3d_gemfs_fini(v3d);
 
/* Waiting for jobs to finish would need to be done before
 * unregistering V3D.
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
new file mode 100644
index ..31cf5bd11e39
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (C) 2024 Raspberry Pi */
+
+#include 
+#include 
+
+#include "v3d_drv.h"
+
+void v3d_gemfs_init(struct v3d_dev *v3d)
+{
+   char huge_opt[] = "huge=within_size";
+   struct file_system_type *type;
+   struct vfsmount *gemfs;
+
+   /*
+* By creating our own shmemfs mountpoint, we can pass in
+* mount flags that better match our usecase. However, we
+* only do so on platforms which benefit from it.
+*/
+   if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+   goto err;
+
+   type = get_fs_type("tmpfs");
+   if (!type)
+   goto err;
+
+   gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
+   if (IS_ERR(gemfs))
+   goto err;
+
+   v3d->gemfs = gemfs;
+   drm_info(>drm, "Using Transparent Hugepages\n");
+
+   return;
+
+err:
+   v3d->gemfs = NULL;
+   drm_notice(>drm,
+  "Transparent Hugepage support is recommended for optimal 
performance on this platform!\n");
+}
+
+void v3d_gemfs_fini(struct v3d_dev *v3d)
+{
+   if (v3d->gemfs)
+   kern_unmount(v3d->gemfs);
+}
-- 
2.44.0

[PATCH v3 1/8] drm/v3d: Fix return if scheduler initialization fails

2024-04-21 Thread Maíra Canal

If the scheduler initialization fails, GEM initialization must fail as
well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
allocated and return the error value in `v3d_gem_init()`.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/v3d_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index afc565078c78..66f4b78a6b2e 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -290,8 +290,9 @@ v3d_gem_init(struct drm_device *dev)
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
-   dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt,
+   dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt,
  v3d->pt_paddr);
+   return ret;
}
 
return 0;
-- 
2.44.0

[PATCH v3 4/5] drm/v3d: Decouple stats calculation from printing

2024-04-20 Thread Maíra Canal

Create a function to decouple the stats calculation from the printing.
This will be useful in the next step when we add a seqcount to protect
the stats.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 18 ++
 drivers/gpu/drm/v3d/v3d_drv.h   |  4 
 drivers/gpu/drm/v3d/v3d_sysfs.c | 11 +++
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 52e3ba9df46f..2ec359ed2def 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -142,6 +142,15 @@ v3d_postclose(struct drm_device *dev, struct drm_file 
*file)
kfree(v3d_priv);
 }
 
+void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp,
+  u64 *active_runtime, u64 *jobs_completed)
+{
+   *active_runtime = stats->enabled_ns;
+   if (stats->start_ns)
+   *active_runtime += timestamp - stats->start_ns;
+   *jobs_completed = stats->jobs_completed;
+}
+
 static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
struct v3d_file_priv *file_priv = file->driver_priv;
@@ -150,20 +159,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
struct v3d_stats *stats = _priv->stats[queue];
+   u64 active_runtime, jobs_completed;
+
+   v3d_get_stats(stats, timestamp, _runtime, 
_completed);
 
/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
-  v3d_queue_to_string(queue),
-  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
-  : stats->enabled_ns);
+  v3d_queue_to_string(queue), active_runtime);
 
/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), stats->jobs_completed);
+  v3d_queue_to_string(queue), jobs_completed);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 5a198924d568..ff06dc1cc078 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -510,6 +510,10 @@ struct drm_gem_object *v3d_prime_import_sg_table(struct 
drm_device *dev,
 /* v3d_debugfs.c */
 void v3d_debugfs_init(struct drm_minor *minor);
 
+/* v3d_drv.c */
+void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp,
+  u64 *active_runtime, u64 *jobs_completed);
+
 /* v3d_fence.c */
 extern const struct dma_fence_ops v3d_fence_ops;
 struct dma_fence *v3d_fence_create(struct v3d_dev *v3d, enum v3d_queue queue);
diff --git a/drivers/gpu/drm/v3d/v3d_sysfs.c b/drivers/gpu/drm/v3d/v3d_sysfs.c
index 6a8e7acc8b82..d610e355964f 100644
--- a/drivers/gpu/drm/v3d/v3d_sysfs.c
+++ b/drivers/gpu/drm/v3d/v3d_sysfs.c
@@ -15,18 +15,15 @@ gpu_stats_show(struct device *dev, struct device_attribute 
*attr, char *buf)
struct v3d_dev *v3d = to_v3d_dev(drm);
enum v3d_queue queue;
u64 timestamp = local_clock();
-   u64 active_runtime;
ssize_t len = 0;
 
len += sysfs_emit(buf, "queue\ttimestamp\tjobs\truntime\n");
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
struct v3d_stats *stats = >queue[queue].stats;
+   u64 active_runtime, jobs_completed;
 
-   if (stats->start_ns)
-   active_runtime = timestamp - stats->start_ns;
-   else
-   active_runtime = 0;
+   v3d_get_stats(stats, timestamp, _runtime, 
_completed);
 
/* Each line will display the queue name, timestamp, the number
 * of jobs sent to that queue and the runtime, as can be seem 
here:
@@ -40,9 +37,7 @@ gpu_stats_show(struct device *dev, struct device_attribute 
*attr, char *buf)
 */
len += sysfs_emit_at(buf, len, "%s\t%llu\t%llu\t%llu\n",
 v3d_queue_to_string(queue),
-timestamp,
-stats->jobs_completed,
-stats->enabled_ns + active_runtime);
+timestamp, jobs_completed, active_runtime);
}
 
return len;
-- 
2.44.0

[PATCH v3 5/5] drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

2024-04-20 Thread Maíra Canal

In V3D, the conclusion of a job is indicated by a IRQ. When a job
finishes, then we update the local and the global GPU stats of that
queue. But, while the GPU stats are being updated, a user might be
reading the stats from sysfs or fdinfo.

For example, on `gpu_stats_show()`, we could think about a scenario where
`v3d->queue[queue].start_ns != 0`, then an interrupt happens, we update
the value of `v3d->queue[queue].start_ns` to 0, we come back to
`gpu_stats_show()` to calculate `active_runtime` and now,
`active_runtime = timestamp`.

In this simple example, the user would see a spike in the queue usage,
that didn't match reality.

In order to address this issue properly, use a seqcount to protect read
and write sections of the code.

Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage 
stats")
Reported-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 14 ++
 drivers/gpu/drm/v3d/v3d_drv.h   |  7 +++
 drivers/gpu/drm/v3d/v3d_gem.c   |  1 +
 drivers/gpu/drm/v3d/v3d_sched.c |  7 +++
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 2ec359ed2def..28b7ddce7747 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -121,6 +121,7 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
  1, NULL);
 
memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
+   seqcount_init(_priv->stats[i].lock);
}
 
v3d_perfmon_open_file(v3d_priv);
@@ -145,10 +146,15 @@ v3d_postclose(struct drm_device *dev, struct drm_file 
*file)
 void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp,
   u64 *active_runtime, u64 *jobs_completed)
 {
-   *active_runtime = stats->enabled_ns;
-   if (stats->start_ns)
-   *active_runtime += timestamp - stats->start_ns;
-   *jobs_completed = stats->jobs_completed;
+   unsigned int seq;
+
+   do {
+   seq = read_seqcount_begin(>lock);
+   *active_runtime = stats->enabled_ns;
+   if (stats->start_ns)
+   *active_runtime += timestamp - stats->start_ns;
+   *jobs_completed = stats->jobs_completed;
+   } while (read_seqcount_retry(>lock, seq));
 }
 
 static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file)
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index ff06dc1cc078..a2c516fe6d79 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -40,6 +40,13 @@ struct v3d_stats {
u64 start_ns;
u64 enabled_ns;
u64 jobs_completed;
+
+   /*
+* This seqcount is used to protect the access to the GPU stats
+* variables. It must be used as, while we are reading the stats,
+* IRQs can happen and the stats can be updated.
+*/
+   seqcount_t lock;
 };
 
 struct v3d_queue_state {
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 0086081a9261..da8faf3b9011 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -251,6 +251,7 @@ v3d_gem_init(struct drm_device *dev)
 
queue->fence_context = dma_fence_context_alloc(1);
memset(>stats, 0, sizeof(queue->stats));
+   seqcount_init(>stats.lock);
}
 
spin_lock_init(>mm_lock);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index b9614944931c..7cd8c335cd9b 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -114,16 +114,23 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue 
queue)
struct v3d_stats *local_stats = >stats[queue];
u64 now = local_clock();
 
+   write_seqcount_begin(_stats->lock);
local_stats->start_ns = now;
+   write_seqcount_end(_stats->lock);
+
+   write_seqcount_begin(_stats->lock);
global_stats->start_ns = now;
+   write_seqcount_end(_stats->lock);
 }
 
 static void
 v3d_stats_update(struct v3d_stats *stats, u64 now)
 {
+   write_seqcount_begin(>lock);
stats->enabled_ns += now - stats->start_ns;
stats->jobs_completed++;
stats->start_ns = 0;
+   write_seqcount_end(>lock);
 }
 
 void
-- 
2.44.0

[PATCH v3 3/5] drm/v3d: Create function to update a set of GPU stats

2024-04-20 Thread Maíra Canal

Given a set of GPU stats, that is, a `struct v3d_stats` related to a
queue in a given context, create a function that can update this set
of GPU stats.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index b6b5542c3fcf..b9614944931c 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -118,6 +118,14 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue 
queue)
global_stats->start_ns = now;
 }
 
+static void
+v3d_stats_update(struct v3d_stats *stats, u64 now)
+{
+   stats->enabled_ns += now - stats->start_ns;
+   stats->jobs_completed++;
+   stats->start_ns = 0;
+}
+
 void
 v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue)
 {
@@ -127,13 +135,8 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue 
queue)
struct v3d_stats *local_stats = >stats[queue];
u64 now = local_clock();
 
-   local_stats->enabled_ns += now - local_stats->start_ns;
-   local_stats->jobs_completed++;
-   local_stats->start_ns = 0;
-
-   global_stats->enabled_ns += now - global_stats->start_ns;
-   global_stats->jobs_completed++;
-   global_stats->start_ns = 0;
+   v3d_stats_update(local_stats, now);
+   v3d_stats_update(global_stats, now);
 }
 
 static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
-- 
2.44.0

[PATCH v3 1/5] drm/v3d: Create two functions to update all GPU stats variables

2024-04-20 Thread Maíra Canal

Currently, we manually perform all operations to update the GPU stats
variables. Apart from the code repetition, this is very prone to errors,
as we can see on commit 35f4f8c9fc97 ("drm/v3d: Don't increment
`enabled_ns` twice").

Therefore, create two functions to manage updating all GPU stats
variables. Now, the jobs only need to call for `v3d_job_update_stats()`
when the job is done and `v3d_job_start_stats()` when starting the job.

Co-developed-by: Tvrtko Ursulin 
Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_drv.h   |  1 +
 drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++--
 drivers/gpu/drm/v3d/v3d_sched.c | 80 +++--
 3 files changed, 40 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..ee3545226d7f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -543,6 +543,7 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 
 /* v3d_sched.c */
+void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
 
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index ce6b2fb341d1..d469bda52c1a 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -102,18 +102,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FLDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->bin_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->bin_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_BIN];
-
-   file->jobs_sent[V3D_BIN]++;
-   v3d->queue[V3D_BIN].jobs_sent++;
-
-   file->start_ns[V3D_BIN] = 0;
-   v3d->queue[V3D_BIN].start_ns = 0;
-
-   file->enabled_ns[V3D_BIN] += runtime;
-   v3d->queue[V3D_BIN].enabled_ns += runtime;
 
+   v3d_job_update_stats(>bin_job->base, V3D_BIN);
trace_v3d_bcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -122,18 +112,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FRDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->render_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->render_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_RENDER];
-
-   file->jobs_sent[V3D_RENDER]++;
-   v3d->queue[V3D_RENDER].jobs_sent++;
-
-   file->start_ns[V3D_RENDER] = 0;
-   v3d->queue[V3D_RENDER].start_ns = 0;
-
-   file->enabled_ns[V3D_RENDER] += runtime;
-   v3d->queue[V3D_RENDER].enabled_ns += runtime;
 
+   v3d_job_update_stats(>render_job->base, V3D_RENDER);
trace_v3d_rcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -142,18 +122,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_CSDDONE(v3d->ver)) {
struct v3d_fence *fence =
to_v3d_fence(v3d->csd_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->csd_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_CSD];
-
-   file->jobs_sent[V3D_CSD]++;
-   v3d->queue[V3D_CSD].jobs_sent++;
-
-   file->start_ns[V3D_CSD] = 0;
-   v3d->queue[V3D_CSD].start_ns = 0;
-
-   file->enabled_ns[V3D_CSD] += runtime;
-   v3d->queue[V3D_CSD].enabled_ns += runtime;
 
+   v3d_job_update_stats(>csd_job->base, V3D_CSD);
trace_v3d_csd_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -189,18 +159,8 @@ v3d_hub_irq(int irq, void *arg)
if (intsts & V3D_HUB_INT_TFUC) {
struct v3d_fence *fence =
to_v3d_fence(v3d->tfu_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->tfu_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_TFU];
-
-   file->jobs_sent[V3D_TFU]++;
-   v3d->queue[V3D_TFU].jobs_sent++;
-
-   file->start_ns[V3D_TFU] = 0;
-   v3d->queue[V3D_TFU].start_ns = 0;
-
-   file->enabled_ns[V3D_TFU] += runtime;
-   v3d->queue[V3D_TFU].enabled_ns += runtime;
 
+   v3d_job_u

[PATCH v3 0/5] drm/v3d: Fix GPU stats inconsistencies and race-condition

2024-04-20 Thread Maíra Canal

The first version of this series had the intention to fix two major
issues with the GPU stats:

1. We were incrementing `enabled_ns` twice by the end of each job.
2. There is a race-condition between the IRQ handler and the users

The first of the issues was already addressed and the fix was applied to
drm-misc-fixes. Now, what is left, addresses the second issue.

Apart from addressing this issue, this series improved the GPU stats
code as a whole. We reduced code repetition, creating functions to start and
update the GPU stats. This will likely reduce the odds of issue #1 happen again.

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240403203517.731876-1-mca...@igalia.com/T/

- As the first patch was a bugfix, it was pushed to drm-misc-fixes.
- [1/4] Add Chema Casanova's R-b
- [2/4] s/jobs_sent/jobs_completed and add the reasoning in the commit message
(Chema Casanova)
- [2/4] Add Chema Casanova's and Tvrtko Ursulin's R-b
- [3/4] Call `local_clock()` only once, by adding a new parameter to the
`v3d_stats_update` function (Chema Casanova)
- [4/4] Move new line to the correct patch [2/4] (Tvrtko Ursulin)
- [4/4] Use `seqcount_t` as locking primitive instead of a `rw_lock` (Tvrtko 
Ursulin)

v2 -> v3: 
https://lore.kernel.org/dri-devel/20240417011021.600889-1-mca...@igalia.com/T/

- [4/5] New patch: separates the code refactor from the race-condition fix 
(Tvrtko Ursulin)
- [5/5] s/interruption/interrupt (Tvrtko Ursulin)
- [5/5] s/matches/match (Tvrtko Ursulin)
- [5/5] Add Tvrtko Ursulin's R-b

Best Regards,
- Maíra

Maíra Canal (5):
  drm/v3d: Create two functions to update all GPU stats variables
  drm/v3d: Create a struct to store the GPU stats
  drm/v3d: Create function to update a set of GPU stats
  drm/v3d: Decouple stats calculation from printing
  drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

 drivers/gpu/drm/v3d/v3d_drv.c   | 33 
 drivers/gpu/drm/v3d/v3d_drv.h   | 30 ---
 drivers/gpu/drm/v3d/v3d_gem.c   |  9 ++--
 drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++---
 drivers/gpu/drm/v3d/v3d_sched.c | 94 +
 drivers/gpu/drm/v3d/v3d_sysfs.c | 13 ++---
 6 files changed, 109 insertions(+), 118 deletions(-)

-- 
2.44.0

[PATCH v3 2/5] drm/v3d: Create a struct to store the GPU stats

2024-04-20 Thread Maíra Canal

This will make it easier to instantiate the GPU stats variables and it
will create a structure where we can store all the variables that refer
to GPU stats.

Note that, when we created the struct `v3d_stats`, we renamed
`jobs_sent` to `jobs_completed`. This better express the semantics of
the variable, as we are only accounting jobs that have been completed.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 15 +++
 drivers/gpu/drm/v3d/v3d_drv.h   | 18 ++
 drivers/gpu/drm/v3d/v3d_gem.c   |  8 
 drivers/gpu/drm/v3d/v3d_sched.c | 20 
 drivers/gpu/drm/v3d/v3d_sysfs.c | 10 ++
 5 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..52e3ba9df46f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -115,14 +115,12 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
v3d_priv->v3d = v3d;
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d_priv->enabled_ns[i] = 0;
-   v3d_priv->start_ns[i] = 0;
-   v3d_priv->jobs_sent[i] = 0;
-
sched = >queue[i].sched;
drm_sched_entity_init(_priv->sched_entity[i],
  DRM_SCHED_PRIORITY_NORMAL, ,
  1, NULL);
+
+   memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
}
 
v3d_perfmon_open_file(v3d_priv);
@@ -151,20 +149,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
enum v3d_queue queue;
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
+   struct v3d_stats *stats = _priv->stats[queue];
+
/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
   v3d_queue_to_string(queue),
-  file_priv->start_ns[queue] ? 
file_priv->enabled_ns[queue]
- + timestamp - 
file_priv->start_ns[queue]
- : 
file_priv->enabled_ns[queue]);
+  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
+  : stats->enabled_ns);
 
/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), 
file_priv->jobs_sent[queue]);
+  v3d_queue_to_string(queue), stats->jobs_completed);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index ee3545226d7f..5a198924d568 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -36,15 +36,20 @@ static inline char *v3d_queue_to_string(enum v3d_queue 
queue)
return "UNKNOWN";
 }
 
+struct v3d_stats {
+   u64 start_ns;
+   u64 enabled_ns;
+   u64 jobs_completed;
+};
+
 struct v3d_queue_state {
struct drm_gpu_scheduler sched;
 
u64 fence_context;
u64 emit_seqno;
 
-   u64 start_ns;
-   u64 enabled_ns;
-   u64 jobs_sent;
+   /* Stores the GPU stats for this queue in the global context. */
+   struct v3d_stats stats;
 };
 
 /* Performance monitor object. The perform lifetime is controlled by userspace
@@ -188,11 +193,8 @@ struct v3d_file_priv {
 
struct drm_sched_entity sched_entity[V3D_MAX_QUEUES];
 
-   u64 start_ns[V3D_MAX_QUEUES];
-
-   u64 enabled_ns[V3D_MAX_QUEUES];
-
-   u64 jobs_sent[V3D_MAX_QUEUES];
+   /* Stores the GPU stats for a specific queue for this fd. */
+   struct v3d_stats stats[V3D_MAX_QUEUES];
 };
 
 struct v3d_bo {
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index afc565078c78..0086081a9261 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -247,10 +247,10 @@ v3d_gem_init(struct drm_device *dev)
int ret, i;
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d->queue[i].fence_context = dma_fence_context_alloc(1);
-   v3d->queue[i].start_ns = 0;
-   v3d->queue[i].enabled_ns = 0;
-   v3d->queue[i].jobs_sent = 0;
+   struct v3d_queue_state *queue = >queue[i];
+
+   queue->fence_context = dma_fence_context_alloc(1);
+   memset(>stats, 0, sizeof(queue->stats));
}

[PATCH v2 4/4] drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

2024-04-16 Thread Maíra Canal

In V3D, the conclusion of a job is indicated by a IRQ. When a job
finishes, then we update the local and the global GPU stats of that
queue. But, while the GPU stats are being updated, a user might be
reading the stats from sysfs or fdinfo.

For example, on `gpu_stats_show()`, we could think about a scenario where
`v3d->queue[queue].start_ns != 0`, then an interruption happens, we update
the value of `v3d->queue[queue].start_ns` to 0, we come back to
`gpu_stats_show()` to calculate `active_runtime` and now,
`active_runtime = timestamp`.

In this simple example, the user would see a spike in the queue usage,
that didn't matches reality.

In order to address this issue properly, use a seqcount to protect read
and write sections of the code.

Fixes: 09a93cc4f7d1 ("drm/v3d: Implement show_fdinfo() callback for GPU usage 
stats")
Reported-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 10 ++
 drivers/gpu/drm/v3d/v3d_drv.h   | 21 +
 drivers/gpu/drm/v3d/v3d_gem.c   |  7 +--
 drivers/gpu/drm/v3d/v3d_sched.c |  7 +++
 drivers/gpu/drm/v3d/v3d_sysfs.c | 11 +++
 5 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 52e3ba9df46f..cf15fa142968 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -121,6 +121,7 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
  1, NULL);
 
memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
+   seqcount_init(_priv->stats[i].lock);
}
 
v3d_perfmon_open_file(v3d_priv);
@@ -150,20 +151,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
struct v3d_stats *stats = _priv->stats[queue];
+   u64 active_runtime, jobs_completed;
+
+   v3d_get_stats(stats, timestamp, _runtime, 
_completed);
 
/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
-  v3d_queue_to_string(queue),
-  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
-  : stats->enabled_ns);
+  v3d_queue_to_string(queue), active_runtime);
 
/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), stats->jobs_completed);
+  v3d_queue_to_string(queue), jobs_completed);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 5a198924d568..5211df7c7317 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -40,8 +40,29 @@ struct v3d_stats {
u64 start_ns;
u64 enabled_ns;
u64 jobs_completed;
+
+   /*
+* This seqcount is used to protect the access to the GPU stats
+* variables. It must be used as, while we are reading the stats,
+* IRQs can happen and the stats can be updated.
+*/
+   seqcount_t lock;
 };
 
+static inline void v3d_get_stats(const struct v3d_stats *stats, u64 timestamp,
+u64 *active_runtime, u64 *jobs_completed)
+{
+   unsigned int seq;
+
+   do {
+   seq = read_seqcount_begin(>lock);
+   *active_runtime = stats->enabled_ns;
+   if (stats->start_ns)
+   *active_runtime += timestamp - stats->start_ns;
+   *jobs_completed = stats->jobs_completed;
+   } while (read_seqcount_retry(>lock, seq));
+}
+
 struct v3d_queue_state {
struct drm_gpu_scheduler sched;
 
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index d14589d3ae6c..da8faf3b9011 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -247,8 +247,11 @@ v3d_gem_init(struct drm_device *dev)
int ret, i;
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d->queue[i].fence_context = dma_fence_context_alloc(1);
-   memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats));
+   struct v3d_queue_state *queue = >queue[i];
+
+   queue->fence_context = dma_fence_context_alloc(1);
+   memset(>stats, 0, sizeof(queue->stats));
+   seqcount_init(>stats.lock);
}
 
spin_lock_init(>mm_lock);
diff --git a/drivers/g

[PATCH v2 3/4] drm/v3d: Create function to update a set of GPU stats

2024-04-16 Thread Maíra Canal

Given a set of GPU stats, that is, a `struct v3d_stats` related to a
queue in a given context, create a function that can update this set
of GPU stats.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_sched.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index b6b5542c3fcf..b9614944931c 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -118,6 +118,14 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue 
queue)
global_stats->start_ns = now;
 }
 
+static void
+v3d_stats_update(struct v3d_stats *stats, u64 now)
+{
+   stats->enabled_ns += now - stats->start_ns;
+   stats->jobs_completed++;
+   stats->start_ns = 0;
+}
+
 void
 v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue)
 {
@@ -127,13 +135,8 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue 
queue)
struct v3d_stats *local_stats = >stats[queue];
u64 now = local_clock();
 
-   local_stats->enabled_ns += now - local_stats->start_ns;
-   local_stats->jobs_completed++;
-   local_stats->start_ns = 0;
-
-   global_stats->enabled_ns += now - global_stats->start_ns;
-   global_stats->jobs_completed++;
-   global_stats->start_ns = 0;
+   v3d_stats_update(local_stats, now);
+   v3d_stats_update(global_stats, now);
 }
 
 static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
-- 
2.44.0

[PATCH v2 2/4] drm/v3d: Create a struct to store the GPU stats

2024-04-16 Thread Maíra Canal

This will make it easier to instantiate the GPU stats variables and it
will create a structure where we can store all the variables that refer
to GPU stats.

Note that, when we created the struct `v3d_stats`, we renamed
`jobs_sent` to `jobs_completed`. This better express the semantics of
the variable, as we are only accounting jobs that have been completed.

Signed-off-by: Maíra Canal 
Reviewed-by: Tvrtko Ursulin 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 15 +++
 drivers/gpu/drm/v3d/v3d_drv.h   | 18 ++
 drivers/gpu/drm/v3d/v3d_gem.c   |  4 +---
 drivers/gpu/drm/v3d/v3d_sched.c | 20 
 drivers/gpu/drm/v3d/v3d_sysfs.c | 10 ++
 5 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..52e3ba9df46f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -115,14 +115,12 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
v3d_priv->v3d = v3d;
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
-   v3d_priv->enabled_ns[i] = 0;
-   v3d_priv->start_ns[i] = 0;
-   v3d_priv->jobs_sent[i] = 0;
-
sched = >queue[i].sched;
drm_sched_entity_init(_priv->sched_entity[i],
  DRM_SCHED_PRIORITY_NORMAL, ,
  1, NULL);
+
+   memset(_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
}
 
v3d_perfmon_open_file(v3d_priv);
@@ -151,20 +149,21 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
enum v3d_queue queue;
 
for (queue = 0; queue < V3D_MAX_QUEUES; queue++) {
+   struct v3d_stats *stats = _priv->stats[queue];
+
/* Note that, in case of a GPU reset, the time spent during an
 * attempt of executing the job is not computed in the runtime.
 */
drm_printf(p, "drm-engine-%s: \t%llu ns\n",
   v3d_queue_to_string(queue),
-  file_priv->start_ns[queue] ? 
file_priv->enabled_ns[queue]
- + timestamp - 
file_priv->start_ns[queue]
- : 
file_priv->enabled_ns[queue]);
+  stats->start_ns ? stats->enabled_ns + timestamp - 
stats->start_ns
+  : stats->enabled_ns);
 
/* Note that we only count jobs that completed. Therefore, jobs
 * that were resubmitted due to a GPU reset are not computed.
 */
drm_printf(p, "v3d-jobs-%s: \t%llu jobs\n",
-  v3d_queue_to_string(queue), 
file_priv->jobs_sent[queue]);
+  v3d_queue_to_string(queue), stats->jobs_completed);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index ee3545226d7f..5a198924d568 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -36,15 +36,20 @@ static inline char *v3d_queue_to_string(enum v3d_queue 
queue)
return "UNKNOWN";
 }
 
+struct v3d_stats {
+   u64 start_ns;
+   u64 enabled_ns;
+   u64 jobs_completed;
+};
+
 struct v3d_queue_state {
struct drm_gpu_scheduler sched;
 
u64 fence_context;
u64 emit_seqno;
 
-   u64 start_ns;
-   u64 enabled_ns;
-   u64 jobs_sent;
+   /* Stores the GPU stats for this queue in the global context. */
+   struct v3d_stats stats;
 };
 
 /* Performance monitor object. The perform lifetime is controlled by userspace
@@ -188,11 +193,8 @@ struct v3d_file_priv {
 
struct drm_sched_entity sched_entity[V3D_MAX_QUEUES];
 
-   u64 start_ns[V3D_MAX_QUEUES];
-
-   u64 enabled_ns[V3D_MAX_QUEUES];
-
-   u64 jobs_sent[V3D_MAX_QUEUES];
+   /* Stores the GPU stats for a specific queue for this fd. */
+   struct v3d_stats stats[V3D_MAX_QUEUES];
 };
 
 struct v3d_bo {
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index afc565078c78..d14589d3ae6c 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -248,9 +248,7 @@ v3d_gem_init(struct drm_device *dev)
 
for (i = 0; i < V3D_MAX_QUEUES; i++) {
v3d->queue[i].fence_context = dma_fence_context_alloc(1);
-   v3d->queue[i].start_ns = 0;
-   v3d->queue[i].enabled_ns = 0;
-   v3d->queue[i].jobs_sent = 0;
+   memset(>queue[i].stats, 0, sizeof(v3d->queue[i].stats));
}
 
spin_lock_init(>mm_lock);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 8ca61bcd4b1

[PATCH v2 0/4] drm/v3d: Fix GPU stats inconsistencies and race-condition

2024-04-16 Thread Maíra Canal

The first version of this series had the intention to fix two major
issues with the GPU stats:

1. We were incrementing `enabled_ns` twice by the end of each job.
2. There is a race-condition between the IRQ handler and the users

The first of the issues was already addressed and the fix was applied to
drm-misc-fixes. Now, what is left, addresses the second issue.

Apart from addressing this issue, this series improved the GPU stats
code as a whole. We reduced code repetition as a whole, creating functions
to start and update the GPU stats. This will likely reduce the odds of
issue #1 happen again.

Best Regards,
- Maíra

v1 -> v2: 
https://lore.kernel.org/dri-devel/20240403203517.731876-1-mca...@igalia.com/T/

* As the first patch was a bugfix, it was pushed to drm-misc-fixes.
* [1/4]: Add Chema Casanova's R-b
* [2/4]: s/jobs_sent/jobs_completed and add the reasoning in the commit
message (Chema Casanova)
* [2/4]: Add Chema Casanova's and Tvrtko Ursulin's R-b
* [3/4]: Call `local_clock()` only once, by adding a new parameter to 
the
`v3d_stats_update` function (Chema Casanova)
* [4/4]: Move new line to the correct patch (2/4) (Tvrtko Ursulin)
* [4/4]: Use `seqcount_t` as locking primitive instead of a `rw_lock` 
(Tvrtko Ursulin)

Maíra Canal (4):
  drm/v3d: Create two functions to update all GPU stats variables
  drm/v3d: Create a struct to store the GPU stats
  drm/v3d: Create function to update a set of GPU stats
  drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler

 drivers/gpu/drm/v3d/v3d_drv.c   | 19 +++
 drivers/gpu/drm/v3d/v3d_drv.h   | 40 +++---
 drivers/gpu/drm/v3d/v3d_gem.c   |  9 ++--
 drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++---
 drivers/gpu/drm/v3d/v3d_sched.c | 94 +
 drivers/gpu/drm/v3d/v3d_sysfs.c | 13 ++---
 6 files changed, 105 insertions(+), 118 deletions(-)

-- 
2.44.0

[PATCH v2 1/4] drm/v3d: Create two functions to update all GPU stats variables

2024-04-16 Thread Maíra Canal

Currently, we manually perform all operations to update the GPU stats
variables. Apart from the code repetition, this is very prone to errors,
as we can see on commit 35f4f8c9fc97 ("drm/v3d: Don't increment
`enabled_ns` twice").

Therefore, create two functions to manage updating all GPU stats
variables. Now, the jobs only need to call for `v3d_job_update_stats()`
when the job is done and `v3d_job_start_stats()` when starting the job.

Co-developed-by: Tvrtko Ursulin 
Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 
Reviewed-by: Jose Maria Casanova Crespo 
---
 drivers/gpu/drm/v3d/v3d_drv.h   |  1 +
 drivers/gpu/drm/v3d/v3d_irq.c   | 48 ++--
 drivers/gpu/drm/v3d/v3d_sched.c | 80 +++--
 3 files changed, 40 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..ee3545226d7f 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -543,6 +543,7 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
 
 /* v3d_sched.c */
+void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
 int v3d_sched_init(struct v3d_dev *v3d);
 void v3d_sched_fini(struct v3d_dev *v3d);
 
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index ce6b2fb341d1..d469bda52c1a 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -102,18 +102,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FLDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->bin_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->bin_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_BIN];
-
-   file->jobs_sent[V3D_BIN]++;
-   v3d->queue[V3D_BIN].jobs_sent++;
-
-   file->start_ns[V3D_BIN] = 0;
-   v3d->queue[V3D_BIN].start_ns = 0;
-
-   file->enabled_ns[V3D_BIN] += runtime;
-   v3d->queue[V3D_BIN].enabled_ns += runtime;
 
+   v3d_job_update_stats(>bin_job->base, V3D_BIN);
trace_v3d_bcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -122,18 +112,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_FRDONE) {
struct v3d_fence *fence =
to_v3d_fence(v3d->render_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->render_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_RENDER];
-
-   file->jobs_sent[V3D_RENDER]++;
-   v3d->queue[V3D_RENDER].jobs_sent++;
-
-   file->start_ns[V3D_RENDER] = 0;
-   v3d->queue[V3D_RENDER].start_ns = 0;
-
-   file->enabled_ns[V3D_RENDER] += runtime;
-   v3d->queue[V3D_RENDER].enabled_ns += runtime;
 
+   v3d_job_update_stats(>render_job->base, V3D_RENDER);
trace_v3d_rcl_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -142,18 +122,8 @@ v3d_irq(int irq, void *arg)
if (intsts & V3D_INT_CSDDONE(v3d->ver)) {
struct v3d_fence *fence =
to_v3d_fence(v3d->csd_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->csd_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_CSD];
-
-   file->jobs_sent[V3D_CSD]++;
-   v3d->queue[V3D_CSD].jobs_sent++;
-
-   file->start_ns[V3D_CSD] = 0;
-   v3d->queue[V3D_CSD].start_ns = 0;
-
-   file->enabled_ns[V3D_CSD] += runtime;
-   v3d->queue[V3D_CSD].enabled_ns += runtime;
 
+   v3d_job_update_stats(>csd_job->base, V3D_CSD);
trace_v3d_csd_irq(>drm, fence->seqno);
dma_fence_signal(>base);
status = IRQ_HANDLED;
@@ -189,18 +159,8 @@ v3d_hub_irq(int irq, void *arg)
if (intsts & V3D_HUB_INT_TFUC) {
struct v3d_fence *fence =
to_v3d_fence(v3d->tfu_job->base.irq_fence);
-   struct v3d_file_priv *file = 
v3d->tfu_job->base.file->driver_priv;
-   u64 runtime = local_clock() - file->start_ns[V3D_TFU];
-
-   file->jobs_sent[V3D_TFU]++;
-   v3d->queue[V3D_TFU].jobs_sent++;
-
-   file->start_ns[V3D_TFU] = 0;
-   v3d->queue[V3D_TFU].start_ns = 0;
-
-   file->enabled_ns[V3D_TFU] += runtime;
-   v3d->queue[V3D_TFU].enabled_ns += runtime;
 
+   v3d_job_u

Re: [PATCH v2 20/43] drm/vkms: Use fbdev-shmem

2024-04-16 Thread Maíra Canal


On 4/10/24 10:02, Thomas Zimmermann wrote:

Implement fbdev emulation with fbdev-shmem. Avoids the overhead of
fbdev-generic's additional shadow buffering. No functional changes.

Signed-off-by: Thomas Zimmermann 


Acked-by: Maíra Canal 

Best Regards,
- Maíra


Cc: Rodrigo Siqueira 
Cc: Melissa Wen 
Cc: "Maíra Canal" 
Cc: Haneen Mohammed 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/vkms/vkms_drv.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c
index dd0af086e7fa9..8dc9dc13896e9 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.c
+++ b/drivers/gpu/drm/vkms/vkms_drv.c
@@ -17,7 +17,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 
@@ -223,7 +223,7 @@ static int vkms_create(struct vkms_config *config)
if (ret)
goto out_devres;
  
-	drm_fbdev_generic_setup(_device->drm, 0);

+   drm_fbdev_shmem_setup(_device->drm, 0);
  
  	return 0;

Re: [PATCH v2] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-16 Thread Maíra Canal


On 4/16/24 02:30, Stefan Wahren wrote:

Hi Maíra,

Am 16.04.24 um 03:02 schrieb Maíra Canal:

On 4/15/24 13:54, Andre Przywara wrote:

On Mon, 15 Apr 2024 13:00:39 -0300
Maíra Canal  wrote:

Hi,


RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.


So I think Krzysztof's initial comment still stands: What does that
patch
actually change? If I build those DTBs as of now, none of them has a
status property in the v3d node. Which means it's enabled:
https://github.com/devicetree-org/devicetree-specification/blob/main/source/chapter2-devicetree-basics.rst#status

So adding an explicit 'status = "okay";' doesn't make a difference.

What do I miss here?


As mentioned by Stefan in the last version, in Raspberry Pi OS, there is
a systemd script which is trying to check for the V3D driver (/usr/lib
/systemd/scripts/gldriver_test.sh). Within the first check, "raspi-
config nonint is_kms" is called, which always seems to fail. What
"raspi-config" does is check if
/proc/device-tree/soc/v3d@7ec0/status is equal to "okay". As
/proc/device-tree/soc/v3d@7ec0/status doesn't exists, it returns
false.

yes, but i also mention that the V3D driver starts without this patch.
The commit message of this patch suggests this is a DT issue, which is not.

I hadn't the time to update my SD card to Bookworm yet. Does the issue
still exists with this version?


I'm using a 32-bit kernel and the recommended OS for 32-bit is Bullseye.
But I checked the Bookworm code and indeed, Bookworm doesn't check
the device tree [1].

I'm thinking about sending a patch to the Bullseye branch to fix this
issue.

[1] 
https://github.com/RPi-Distro/raspi-config/blob/966ed3fecc159ff3e69a774d74bfd716c04dafff/raspi-config#L128


Best Regards,
- Maíra



I'll send if I can improve the userspace tool by just checking if the
folder /proc/device-tree/soc/v3d@7ec0/ exists. >>
Thanks for the explanation!

Best Regards,
- Maíra



Cheers,
Andre


Signed-off-by: Maíra Canal 
---

v1 -> v2:
https://lore.kernel.org/dri-devel/41694292-af1f-4760-a7b6-101ed5dd6...@gmx.net/T/

* As mentioned by Krzysztof, enabling should be done in last place of
override/extend. Therefore, I'm disabling V3D in the common dtsi
and enabling in the last place of extend, i.e. the RPi DTS files.

  arch/arm/boot/dts/broadcom/bcm2835-common.dtsi  | 1 +
  arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts    | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts    | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-zero.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2836-rpi-2-b.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-a-plus.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-cm3-io3.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-zero-2-w.dts | 4 
  15 files changed, 57 insertions(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..69e34831de51 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
  compatible = "brcm,bcm2835-v3d";
  reg = <0x7ec0 0x1000>;
  interrupts = <1 10>;
+    status = "disabled";
  };
    vc4: gpu {
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
index 069b48272aa5..495ab1dfd2ce 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
@@ -128,3 +128,7 @@  {
  pinctrl-0 = <_gpio14>;
  status = "okay";
  };
+
+ {
+    status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
index 2726c00431e8..4634d88ce3af 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
@@ -121,3 +121,7 @@  {
  pinctrl-0 = <_gpio14>;
  status = "okay";
  };
+
+ {
+    status = "okay";
+};
diff --git a/arch/arm/boot/dts/broad

Re: [PATCH v2] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-15 Thread Maíra Canal


On 4/15/24 13:54, Andre Przywara wrote:

On Mon, 15 Apr 2024 13:00:39 -0300
Maíra Canal  wrote:

Hi,


RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.


So I think Krzysztof's initial comment still stands: What does that patch
actually change? If I build those DTBs as of now, none of them has a
status property in the v3d node. Which means it's enabled:
https://github.com/devicetree-org/devicetree-specification/blob/main/source/chapter2-devicetree-basics.rst#status
So adding an explicit 'status = "okay";' doesn't make a difference.

What do I miss here?


As mentioned by Stefan in the last version, in Raspberry Pi OS, there is
a systemd script which is trying to check for the V3D driver (/usr/lib
/systemd/scripts/gldriver_test.sh). Within the first check, "raspi-
config nonint is_kms" is called, which always seems to fail. What 
"raspi-config" does is check if 
/proc/device-tree/soc/v3d@7ec0/status is equal to "okay". As 
/proc/device-tree/soc/v3d@7ec0/status doesn't exists, it returns false.


I'll send if I can improve the userspace tool by just checking if the
folder /proc/device-tree/soc/v3d@7ec0/ exists.

Thanks for the explanation!

Best Regards,
- Maíra



Cheers,
Andre


Signed-off-by: Maíra Canal 
---

v1 -> v2: 
https://lore.kernel.org/dri-devel/41694292-af1f-4760-a7b6-101ed5dd6...@gmx.net/T/

* As mentioned by Krzysztof, enabling should be done in last place of
override/extend. Therefore, I'm disabling V3D in the common dtsi
and enabling in the last place of extend, i.e. the RPi DTS files.

  arch/arm/boot/dts/broadcom/bcm2835-common.dtsi  | 1 +
  arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts| 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts| 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts   | 4 
  arch/arm/boot/dts/broadcom/bcm2835-rpi-zero.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2836-rpi-2-b.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-a-plus.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-cm3-io3.dts  | 4 
  arch/arm/boot/dts/broadcom/bcm2837-rpi-zero-2-w.dts | 4 
  15 files changed, 57 insertions(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi 
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..69e34831de51 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
compatible = "brcm,bcm2835-v3d";
reg = <0x7ec0 0x1000>;
interrupts = <1 10>;
+   status = "disabled";
};
  
  		vc4: gpu {

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
index 069b48272aa5..495ab1dfd2ce 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
@@ -128,3 +128,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
  };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
index 2726c00431e8..4634d88ce3af 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
@@ -121,3 +121,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
  };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
index c57b999a4520..45fa0f6851fc 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
@@ -130,3 +130,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
  };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
index ae6d3a9586ab..c1dac5d704aa 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
@@ -121,3 +121,7

Re: [PATCH 1/5] drm/v3d: Don't increment `enabled_ns` twice

2024-04-15 Thread Maíra Canal


On 4/3/24 17:24, Maíra Canal wrote:

The commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs")
introduced the calculation of global GPU stats. For the regards, it used
the already existing infrastructure provided by commit 09a93cc4f7d1 ("drm/v3d:
Implement show_fdinfo() callback for GPU usage stats"). While adding
global GPU stats calculation ability, the author forgot to delete the
existing one.

Currently, the value of `enabled_ns` is incremented twice by the end of
the job, when it should be added just once. Therefore, delete the
leftovers from commit 509433d8146c ("drm/v3d: Expose the total GPU usage
stats on sysfs").

Fixes: 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs")
Reported-by: Tvrtko Ursulin 
Signed-off-by: Maíra Canal 


As this patch is a isolated bugfix and it was reviewed by two
developers, I'm applying it to drm-misc/drm-misc-fixes.

I'll address the feedback for the rest of the series later and send a
v2.

Best Regards,
- Maíra


---
  drivers/gpu/drm/v3d/v3d_irq.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index 2e04f6cb661e..ce6b2fb341d1 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -105,7 +105,6 @@ v3d_irq(int irq, void *arg)
struct v3d_file_priv *file = 
v3d->bin_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_BIN];
  
-		file->enabled_ns[V3D_BIN] += local_clock() - file->start_ns[V3D_BIN];

file->jobs_sent[V3D_BIN]++;
v3d->queue[V3D_BIN].jobs_sent++;
  
@@ -126,7 +125,6 @@ v3d_irq(int irq, void *arg)

struct v3d_file_priv *file = 
v3d->render_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_RENDER];
  
-		file->enabled_ns[V3D_RENDER] += local_clock() - file->start_ns[V3D_RENDER];

file->jobs_sent[V3D_RENDER]++;
v3d->queue[V3D_RENDER].jobs_sent++;
  
@@ -147,7 +145,6 @@ v3d_irq(int irq, void *arg)

struct v3d_file_priv *file = 
v3d->csd_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_CSD];
  
-		file->enabled_ns[V3D_CSD] += local_clock() - file->start_ns[V3D_CSD];

file->jobs_sent[V3D_CSD]++;
v3d->queue[V3D_CSD].jobs_sent++;
  
@@ -195,7 +192,6 @@ v3d_hub_irq(int irq, void *arg)

struct v3d_file_priv *file = 
v3d->tfu_job->base.file->driver_priv;
u64 runtime = local_clock() - file->start_ns[V3D_TFU];
  
-		file->enabled_ns[V3D_TFU] += local_clock() - file->start_ns[V3D_TFU];

file->jobs_sent[V3D_TFU]++;
v3d->queue[V3D_TFU].jobs_sent++;

Re: [PATCH] dma-buf: Do not build debugfs related code when !CONFIG_DEBUG_FS

2024-04-15 Thread Maíra Canal


Hi Tvrtko,

On 4/1/24 10:21, Tvrtko Ursulin wrote:


On 01/04/2024 13:45, Christian König wrote:

Am 01.04.24 um 14:39 schrieb Tvrtko Ursulin:


On 29/03/2024 00:00, T.J. Mercier wrote:
On Thu, Mar 28, 2024 at 7:53 AM Tvrtko Ursulin  
wrote:


From: Tvrtko Ursulin 

There is no point in compiling in the list and mutex operations 
which are
only used from the dma-buf debugfs code, if debugfs is not compiled 
in.


Put the code in questions behind some kconfig guards and so save 
some text

and maybe even a pointer per object at runtime when not enabled.

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: T.J. Mercier 


Thanks!

How would patches to dma-buf be typically landed? Via what tree I 
mean? drm-misc-next?


That should go through drm-misc-next.

And feel free to add Reviewed-by: Christian König 
 as well.


Thanks!

Maarten if I got it right you are handling the next drm-misc-next pull - 
could you merge this one please?


Applied to drm-misc/drm-misc-next!

Best Regards,
- Maíra



Regards,

Tvrtko

Re: [PATCH] drm/fb_dma: s/drm_panic_gem_get_scanout_buffer/drm_fb_dma_get_scanout_buffer

2024-04-15 Thread Maíra Canal


On 4/15/24 12:19, Jocelyn Falempe wrote:

Hi,

You're right, I messed up the rename, and I mostly test on x86, where I 
don't build the imx driver.


Reviewed-by: Jocelyn Falempe 

Best regards,



Applied to drm-misc/drm-misc-next!

Best Regards,
- Maíra

[PATCH v2] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-15 Thread Maíra Canal

RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.

Signed-off-by: Maíra Canal 
---

v1 -> v2: 
https://lore.kernel.org/dri-devel/41694292-af1f-4760-a7b6-101ed5dd6...@gmx.net/T/

* As mentioned by Krzysztof, enabling should be done in last place of
override/extend. Therefore, I'm disabling V3D in the common dtsi
and enabling in the last place of extend, i.e. the RPi DTS files.

 arch/arm/boot/dts/broadcom/bcm2835-common.dtsi  | 1 +
 arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts   | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts| 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts   | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts   | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts| 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts  | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts   | 4 
 arch/arm/boot/dts/broadcom/bcm2835-rpi-zero.dts | 4 
 arch/arm/boot/dts/broadcom/bcm2836-rpi-2-b.dts  | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-a-plus.dts | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-3-b.dts  | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-cm3-io3.dts  | 4 
 arch/arm/boot/dts/broadcom/bcm2837-rpi-zero-2-w.dts | 4 
 15 files changed, 57 insertions(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi 
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..69e34831de51 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
compatible = "brcm,bcm2835-v3d";
reg = <0x7ec0 0x1000>;
interrupts = <1 10>;
+   status = "disabled";
};
 
vc4: gpu {
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
index 069b48272aa5..495ab1dfd2ce 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a-plus.dts
@@ -128,3 +128,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
index 2726c00431e8..4634d88ce3af 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-a.dts
@@ -121,3 +121,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
index c57b999a4520..45fa0f6851fc 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-plus.dts
@@ -130,3 +130,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
index ae6d3a9586ab..c1dac5d704aa 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b-rev2.dts
@@ -121,3 +121,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts
index 72764be75a79..72ca31f2a7d6 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-b.dts
@@ -115,3 +115,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts
index 3f9d198ac3ab..881a07d2f28f 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-cm1-io1.dts
@@ -95,3 +95,7 @@  {
pinctrl-0 = <_gpio14>;
status = "okay";
 };
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts 
b/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts
index 1f0b163e400c..1c7324067442 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts
+++ b/arch/arm/boot/dts/broadcom/bcm2835-rpi-zero-w.dts
@@ -134,6

[PATCH] drm/fb_dma: s/drm_panic_gem_get_scanout_buffer/drm_fb_dma_get_scanout_buffer

2024-04-15 Thread Maíra Canal

On version 11, Thomas suggested to change the name of the function and
this request was applied on version 12, which is the version that
landed. Although the name of the function changed on the C file, it
didn't changed on the header file, leading to a compilation error as
such:

drivers/gpu/drm/imx/ipuv3/ipuv3-plane.c:780:24: error: use of undeclared
identifier 'drm_fb_dma_get_scanout_buffer'; did you mean 
'drm_panic_gem_get_scanout_buffer'?
  780 | .get_scanout_buffer = drm_fb_dma_get_scanout_buffer,
  |   ^
  |   drm_panic_gem_get_scanout_buffer
./include/drm/drm_fb_dma_helper.h:23:5: note: 'drm_panic_gem_get_scanout_buffer'
declared here
   23 | int drm_panic_gem_get_scanout_buffer(struct drm_plane *plane,
  | ^
1 error generated.

Best Regards,
- Maíra

Fixes: 879b3b6511fe ("drm/fb_dma: Add generic get_scanout_buffer() for 
drm_panic"
Signed-off-by: Maíra Canal 
---
 include/drm/drm_fb_dma_helper.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/drm/drm_fb_dma_helper.h b/include/drm/drm_fb_dma_helper.h
index 61f24c2aba2f..c950732c6d36 100644
--- a/include/drm/drm_fb_dma_helper.h
+++ b/include/drm/drm_fb_dma_helper.h
@@ -6,6 +6,7 @@
 
 struct drm_device;
 struct drm_framebuffer;
+struct drm_plane;
 struct drm_plane_state;
 struct drm_scanout_buffer;
 
@@ -20,8 +21,8 @@ void drm_fb_dma_sync_non_coherent(struct drm_device *drm,
  struct drm_plane_state *old_state,
  struct drm_plane_state *state);
 
-int drm_panic_gem_get_scanout_buffer(struct drm_plane *plane,
-struct drm_scanout_buffer *sb);
+int drm_fb_dma_get_scanout_buffer(struct drm_plane *plane,
+ struct drm_scanout_buffer *sb);
 
 #endif
 
-- 
2.44.0

Re: [PATCH] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-14 Thread Maíra Canal


Hi Phil,

On 4/14/24 15:43, Phil Elwell wrote:

Hello all,

On Fri, 12 Apr 2024 at 18:17, Stefan Wahren  wrote:


Hi Maíra,

[add Phil & Dave]

Am 12.04.24 um 15:25 schrieb Maíra Canal:

RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.

thanks for trying to improve the combination Raspberry Pi OS + Mainline
Kernel. I think i'm able to reproduce the issue with Raspberry Pi 3 B +
on Buster.


Buster? We launched Buster with 4.19 and ended on 5.10. We've moved
onto Bookworm now. A lot has changed in that time...


 From the kernel side everything looks good:

[   11.054833] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
[   11.055118] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
[   11.055340] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops [vc4])
[   11.055521] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops
vc4_crtc_ops [vc4])
[   11.055695] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops
vc4_crtc_ops [vc4])
[   11.055874] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops
vc4_crtc_ops [vc4])
[   11.056020] vc4-drm soc:gpu: bound 3fc0.v3d (ops vc4_v3d_ops [vc4])
[   11.063277] Bluetooth: hci0: BCM4345C0
'brcm/BCM4345C0.raspberrypi,3-model-b-plus.hcd' Patch
[   11.070466] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
[   11.174803] Console: switching to colour frame buffer device 240x75
[   11.205125] vc4-drm soc:gpu: [drm] fb0: vc4drmfb frame buffer device

But in Raspberry Pi OS there is a systemd script which is trying to
check for the V3D driver /usr/lib/systemd/scripts/gldriver_test.sh
Within the first check "raspi-config nonint is_kms" is called, which
always seems to fail. If i run strace on this command it seems to check
for /proc/device-tree/soc/v3d@7ec0/status which doesn't exists in
the Mainline device tree.

Maybe there is a chance to improve the userspace tool?


...such as the raspi-config tool, which now always succeeds for is_kms.



I'm using Raspberry Pi OS Bulleye with the raspi-config tool on version
20231012~bulleye. I can still reproduce this issue when using a upstream
kernel.

I ran `sudo apt upgrade`, but a new version of the raspi-config tool
didn't appeared.

Best Regards,
- Maíra


Phil



Signed-off-by: Maíra Canal 
---

I decided to add the status property to the `bcm2835-common.dtsi`, but
there are two other options:

1. To add the status property to the `bcm2835-rpi-common.dtsi` file
2. To add the status property to each individual RPi model, e.g.
`bcm2837-rpi-3-b.dts`.

Let me know which option is more suitable, and if `bcm2835-common.dtsi`
is not the best option, I can send a v2.

Best Regards,
- Maíra

   arch/arm/boot/dts/broadcom/bcm2835-common.dtsi | 1 +
   1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi 
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..851a6bce1939 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
   compatible = "brcm,bcm2835-v3d";
   reg = <0x7ec0 0x1000>;
   interrupts = <1 10>;
+ status = "okay";
   };

   vc4: gpu {

[PATCH] ARM: dts: bcm2835: Enable 3D rendering through V3D

2024-04-12 Thread Maíra Canal

RPi 0-3 is packed with a GPU that provides 3D rendering capabilities to
the RPi. Currently, the downstream kernel uses an overlay to enable the
GPU and use GPU hardware acceleration. When deploying a mainline kernel
to the RPi 0-3, we end up without any GPU hardware acceleration
(essentially, we can't use the OpenGL driver).

Therefore, enable the V3D core for the RPi 0-3 in the mainline kernel.

Signed-off-by: Maíra Canal 
---

I decided to add the status property to the `bcm2835-common.dtsi`, but
there are two other options:

1. To add the status property to the `bcm2835-rpi-common.dtsi` file
2. To add the status property to each individual RPi model, e.g.
`bcm2837-rpi-3-b.dts`.

Let me know which option is more suitable, and if `bcm2835-common.dtsi`
is not the best option, I can send a v2.

Best Regards,
- Maíra

 arch/arm/boot/dts/broadcom/bcm2835-common.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi 
b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
index 9261b67dbee1..851a6bce1939 100644
--- a/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm2835-common.dtsi
@@ -139,6 +139,7 @@ v3d: v3d@7ec0 {
compatible = "brcm,bcm2835-v3d";
reg = <0x7ec0 0x1000>;
interrupts = <1 10>;
+   status = "okay";
};
 
vc4: gpu {
-- 
2.44.0

Re: [PATCH v6 3/8] drm/ci: uprev IGT and update testlist

2024-04-09 Thread Maíra Canal


On 4/9/24 12:15, Dmitry Baryshkov wrote:

On Tue, Apr 09, 2024 at 07:22:38PM +0530, Vignesh Raman wrote:

Hi Maíra,

On 09/04/24 15:10, Maíra Canal wrote:

On 4/9/24 05:13, Vignesh Raman wrote:

Uprev IGT and add amd, v3d, vc4 and vgem specific tests to
testlist and skip driver-specific tests in *-skips.txt.
Also add testlist to the MAINTAINERS file and update xfails.

A better approach would be to stop vendoring the testlist
into the kernel and instead use testlist from the IGT build
to ensure we do not miss renamed or newly added tests.
This implementation is planned for the future.


How problamatic would be to just do this in this test series, instead
of adding a huge testlist that we need to maintain synced with IGT?


Is it okay if these changes are submitted in another patch series to avoid
delaying the current one. There are patches like vkms which are
blocked due to the mesa uprev patch. We would also need to rerun all jobs
and update xfails with the new testlist. In next series we could uprev IGT
to the latest version and use the testlist from the build and remove the one
in drm-ci. We can also test with the latest kernel. I will work on this.
Please let me know your thoughts.


As we have to rebase/retest anyway, I think it makes more sense to land
from-IGT-test-list first, fixing it for the devices that are currently
present, and to land the rest afterwards. As for the IGT uprev we have
been waititng for that for quite a while (I think I've event sent a
patch a while ago) in order to fix test failures on drm/msm.


Agreed on that.

Best Regards
- Maíra





Regards,
Vignesh



Best Regards,
- Maíra



Acked-by: Helen Koike 
Signed-off-by: Vignesh Raman 
---

v3:
    - New patch in series to uprev IGT and update testlist.

v4:
    - Add testlists to the MAINTAINERS file and remove amdgpu xfails
changes.

v5:
    - Keep single testlist and update xfails. Skip driver specific tests.

v6:
    - Update xfails.

---
   MAINTAINERS   |   8 +
   drivers/gpu/drm/ci/gitlab-ci.yml  |   2 +-
   drivers/gpu/drm/ci/testlist.txt   | 321 ++
   .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |  25 +-
   .../drm/ci/xfails/amdgpu-stoney-flakes.txt    |  10 +-
   .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |  23 +-
   drivers/gpu/drm/ci/xfails/i915-amly-fails.txt |   1 +
   drivers/gpu/drm/ci/xfails/i915-amly-skips.txt |   9 +-
   drivers/gpu/drm/ci/xfails/i915-apl-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-cml-fails.txt  |   1 +
   drivers/gpu/drm/ci/xfails/i915-cml-skips.txt  |   7 +
   drivers/gpu/drm/ci/xfails/i915-glk-fails.txt  |   2 +-
   drivers/gpu/drm/ci/xfails/i915-glk-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-tgl-fails.txt  |   2 +
   drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt  |   9 +-
   drivers/gpu/drm/ci/xfails/i915-whl-fails.txt  |   1 +
   drivers/gpu/drm/ci/xfails/i915-whl-skips.txt  |   9 +-
   .../drm/ci/xfails/mediatek-mt8173-fails.txt   |   3 -
   .../drm/ci/xfails/mediatek-mt8173-skips.txt   |   6 +
   .../drm/ci/xfails/mediatek-mt8183-fails.txt   |   1 +
   .../drm/ci/xfails/mediatek-mt8183-skips.txt   |   5 +
   .../gpu/drm/ci/xfails/meson-g12b-fails.txt    |   1 +
   .../gpu/drm/ci/xfails/meson-g12b-skips.txt    |   5 +
   .../gpu/drm/ci/xfails/msm-apq8016-skips.txt   |   5 +
   .../gpu/drm/ci/xfails/msm-apq8096-skips.txt   |   8 +-
   .../msm-sc7180-trogdor-kingoftown-skips.txt   |   6 +
   ...sm-sc7180-trogdor-lazor-limozeen-skips.txt |   6 +
   .../gpu/drm/ci/xfails/msm-sdm845-skips.txt    |   6 +
   .../drm/ci/xfails/rockchip-rk3288-fails.txt   |   1 +
   .../drm/ci/xfails/rockchip-rk3288-skips.txt   |   8 +-
   .../drm/ci/xfails/rockchip-rk3399-fails.txt   |   1 +
   .../drm/ci/xfails/rockchip-rk3399-skips.txt   |   6 +
   .../drm/ci/xfails/virtio_gpu-none-fails.txt   |  15 +
   .../drm/ci/xfails/virtio_gpu-none-skips.txt   |   9 +-
   35 files changed, 532 insertions(+), 17 deletions(-)
   create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt
   create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt
   create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt
   create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bc7e122a094..f7d0040a6c21 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1665,6 +1665,7 @@ L:    dri-devel@lists.freedesktop.org
   S:    Supported
   T:    git git://anongit.freedesktop.org/drm/drm-misc
   F:    Documentation/gpu/panfrost.rst
+F:    drivers/gpu/drm/ci/testlist.txt
   F:    drivers/gpu/drm/panfrost/
   F:    include/uapi/drm/panfrost_drm.h
@@ -6753,6 +6754,7 @@ S:    Maintained
   B:    https://gitlab.freedesktop.org/drm/msm/-/issues
   T:    git https://gitlab.freedesktop.org/drm/msm.git
   F:    Documentation/devicetree/bindings/display/msm/
+F:    drivers/gpu/drm/ci/testlist.txt
   F:    drivers

Re: [PATCH v6 3/8] drm/ci: uprev IGT and update testlist

2024-04-09 Thread Maíra Canal


On 4/9/24 05:13, Vignesh Raman wrote:

Uprev IGT and add amd, v3d, vc4 and vgem specific tests to
testlist and skip driver-specific tests in *-skips.txt.
Also add testlist to the MAINTAINERS file and update xfails.

A better approach would be to stop vendoring the testlist
into the kernel and instead use testlist from the IGT build
to ensure we do not miss renamed or newly added tests.
This implementation is planned for the future.


How problamatic would be to just do this in this test series, instead
of adding a huge testlist that we need to maintain synced with IGT?

Best Regards,
- Maíra



Acked-by: Helen Koike 
Signed-off-by: Vignesh Raman 
---

v3:
   - New patch in series to uprev IGT and update testlist.

v4:
   - Add testlists to the MAINTAINERS file and remove amdgpu xfails changes.

v5:
   - Keep single testlist and update xfails. Skip driver specific tests.

v6:
   - Update xfails.

---
  MAINTAINERS   |   8 +
  drivers/gpu/drm/ci/gitlab-ci.yml  |   2 +-
  drivers/gpu/drm/ci/testlist.txt   | 321 ++
  .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |  25 +-
  .../drm/ci/xfails/amdgpu-stoney-flakes.txt|  10 +-
  .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |  23 +-
  drivers/gpu/drm/ci/xfails/i915-amly-fails.txt |   1 +
  drivers/gpu/drm/ci/xfails/i915-amly-skips.txt |   9 +-
  drivers/gpu/drm/ci/xfails/i915-apl-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-cml-fails.txt  |   1 +
  drivers/gpu/drm/ci/xfails/i915-cml-skips.txt  |   7 +
  drivers/gpu/drm/ci/xfails/i915-glk-fails.txt  |   2 +-
  drivers/gpu/drm/ci/xfails/i915-glk-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-tgl-fails.txt  |   2 +
  drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt  |   9 +-
  drivers/gpu/drm/ci/xfails/i915-whl-fails.txt  |   1 +
  drivers/gpu/drm/ci/xfails/i915-whl-skips.txt  |   9 +-
  .../drm/ci/xfails/mediatek-mt8173-fails.txt   |   3 -
  .../drm/ci/xfails/mediatek-mt8173-skips.txt   |   6 +
  .../drm/ci/xfails/mediatek-mt8183-fails.txt   |   1 +
  .../drm/ci/xfails/mediatek-mt8183-skips.txt   |   5 +
  .../gpu/drm/ci/xfails/meson-g12b-fails.txt|   1 +
  .../gpu/drm/ci/xfails/meson-g12b-skips.txt|   5 +
  .../gpu/drm/ci/xfails/msm-apq8016-skips.txt   |   5 +
  .../gpu/drm/ci/xfails/msm-apq8096-skips.txt   |   8 +-
  .../msm-sc7180-trogdor-kingoftown-skips.txt   |   6 +
  ...sm-sc7180-trogdor-lazor-limozeen-skips.txt |   6 +
  .../gpu/drm/ci/xfails/msm-sdm845-skips.txt|   6 +
  .../drm/ci/xfails/rockchip-rk3288-fails.txt   |   1 +
  .../drm/ci/xfails/rockchip-rk3288-skips.txt   |   8 +-
  .../drm/ci/xfails/rockchip-rk3399-fails.txt   |   1 +
  .../drm/ci/xfails/rockchip-rk3399-skips.txt   |   6 +
  .../drm/ci/xfails/virtio_gpu-none-fails.txt   |  15 +
  .../drm/ci/xfails/virtio_gpu-none-skips.txt   |   9 +-
  35 files changed, 532 insertions(+), 17 deletions(-)
  create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8173-skips.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/mediatek-mt8183-skips.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/meson-g12b-skips.txt
  create mode 100644 drivers/gpu/drm/ci/xfails/msm-apq8016-skips.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bc7e122a094..f7d0040a6c21 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1665,6 +1665,7 @@ L:dri-devel@lists.freedesktop.org
  S:Supported
  T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/gpu/panfrost.rst
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/panfrost/
  F:include/uapi/drm/panfrost_drm.h
  
@@ -6753,6 +6754,7 @@ S:	Maintained

  B:https://gitlab.freedesktop.org/drm/msm/-/issues
  T:git https://gitlab.freedesktop.org/drm/msm.git
  F:Documentation/devicetree/bindings/display/msm/
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/msm*
  F:drivers/gpu/drm/msm/
  F:include/uapi/drm/msm_drm.h
@@ -7047,6 +7049,7 @@ T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml
  F:Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml
  F:Documentation/gpu/meson.rst
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/meson*
  F:drivers/gpu/drm/meson/
  
@@ -7160,6 +7163,7 @@ L:	dri-devel@lists.freedesktop.org

  L:linux-media...@lists.infradead.org (moderated for non-subscribers)
  S:Supported
  F:Documentation/devicetree/bindings/display/mediatek/
+F: drivers/gpu/drm/ci/testlist.txt
  F:drivers/gpu/drm/ci/xfails/mediatek*
  F:drivers/gpu/drm/mediatek/
  F:drivers/phy/mediatek/phy-mtk-dp.c
@@ -7211,6 +7215,7 @@ L:dri-devel@lists.freedesktop.org
  S:Maintained
  T:git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/devicetree/bindings/display/rockchip/
+F: drivers/gpu/drm/ci/testlist.txt
  F:

Re: [PATCH 5/5] drm/vkms: Use drm_crtc_vblank_crtc()

2024-04-08 Thread Maíra Canal


On 4/8/24 16:06, Ville Syrjala wrote:

From: Ville Syrjälä 

Replace the open coded drm_crtc_vblank_crtc() with the real
thing.

Cc: Rodrigo Siqueira 
Cc: Melissa Wen 
Cc: "Maíra Canal" 
Cc: Haneen Mohammed 
Cc: Daniel Vetter 
Signed-off-by: Ville Syrjälä 


Reviewed-by: Maíra Canal 

Best Regards,
- Maíra


---
  drivers/gpu/drm/vkms/vkms_crtc.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index 61e500b8c9da..40b4d084e3ce 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -61,9 +61,7 @@ static enum hrtimer_restart vkms_vblank_simulate(struct 
hrtimer *timer)
  
  static int vkms_enable_vblank(struct drm_crtc *crtc)

  {
-   struct drm_device *dev = crtc->dev;
-   unsigned int pipe = drm_crtc_index(crtc);
-   struct drm_vblank_crtc *vblank = >vblank[pipe];
+   struct drm_vblank_crtc *vblank = drm_crtc_vblank_crtc(crtc);
struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
  
  	drm_calc_timestamping_constants(crtc, >mode);

@@ -88,10 +86,9 @@ static bool vkms_get_vblank_timestamp(struct drm_crtc *crtc,
  bool in_vblank_irq)
  {
struct drm_device *dev = crtc->dev;
-   unsigned int pipe = crtc->index;
struct vkms_device *vkmsdev = drm_device_to_vkms_device(dev);
struct vkms_output *output = >output;
-   struct drm_vblank_crtc *vblank = >vblank[pipe];
+   struct drm_vblank_crtc *vblank = drm_crtc_vblank_crtc(crtc);
  
  	if (!READ_ONCE(vblank->enabled)) {

*vblank_time = ktime_get();

Re: [PATCH] drm/panfrost: Show overall GPU usage stats through sysfs knob

2024-04-08 Thread Maíra Canal


On 4/4/24 18:30, Adrián Larumbe wrote:

On 04.04.2024 11:31, Maíra Canal wrote:

On 4/4/24 11:00, Adrián Larumbe wrote:

This changeset is heavily inspired by commit 509433d8146c ("drm/v3d: Expose
the total GPU usage stats on sysfs"). The point is making broader GPU
occupancy numbers available through the sysfs interface, so that for every
job slot, its number of processed jobs and total processing time are
displayed.


Shouldn't we make this sysfs interface a generic DRM interface?
Something that would be standard for all drivers and that we could
integrate into gputop in the future.


I think the best way to generalise this sysfs knob would be to create a DRM
class attribute somewhere in drivers/gpu/drm/drm_sysfs.c and then adding a new
function to 'struct drm_driver' that would return a structure with the relevant
information (execution units and their names, number of processed jobs, etc).


These is exactly what I was thinking about.



What that information would exactly be is up to debate, I guess, since different
drivers might be interested in showing different bits of information.


I believe we can start with the requirements of V3D and Panfrost and 
them, expand from it.




Laying that down is important because the sysfs file would become part of the
device class API.


My PoV: it is important, but not completly tragic if we don't get it
perfect. Just like fdinfo, which is evolving.



I might come up with a new RFC patch series that does precisely that, at least
for v3d and Panfrost, and maybe other people could pitch in with the sort of
things they'd like to see for other drivers?


Yeah, this would be a great idea. Please, CC me on this series.

Best Regards,
- Maíra



Cheers,
Adrian


Best Regards,
- Maíra



Cc: Boris Brezillon 
Cc: Christopher Healy 
Signed-off-by: Adrián Larumbe 
---
   drivers/gpu/drm/panfrost/panfrost_device.h |  5 +++
   drivers/gpu/drm/panfrost/panfrost_drv.c| 49 --
   drivers/gpu/drm/panfrost/panfrost_job.c| 17 +++-
   drivers/gpu/drm/panfrost/panfrost_job.h|  3 ++
   4 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index cffcb0ac7c11..1d343351c634 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -169,6 +169,11 @@ struct panfrost_engine_usage {
unsigned long long cycles[NUM_JOB_SLOTS];
   };
+struct panfrost_slot_usage {
+   u64 enabled_ns;
+   u64 jobs_sent;
+};
+
   struct panfrost_file_priv {
struct panfrost_device *pfdev;
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ef9f6c0716d5..6afcde66270f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -8,6 +8,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -524,6 +525,10 @@ static const struct drm_ioctl_desc 
panfrost_drm_driver_ioctls[] = {
PANFROST_IOCTL(MADVISE, madvise,DRM_RENDER_ALLOW),
   };
+static const char * const engine_names[] = {
+   "fragment", "vertex-tiler", "compute-only"
+};
+
   static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
 struct panfrost_file_priv *panfrost_priv,
 struct drm_printer *p)
@@ -543,10 +548,6 @@ static void panfrost_gpu_show_fdinfo(struct 
panfrost_device *pfdev,
 *   job spent on the GPU.
 */
-   static const char * const engine_names[] = {
-   "fragment", "vertex-tiler", "compute-only"
-   };
-
BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
@@ -716,8 +717,48 @@ static ssize_t profiling_store(struct device *dev,
   static DEVICE_ATTR_RW(profiling);
+static ssize_t
+gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+   struct panfrost_device *pfdev = dev_get_drvdata(dev);
+   struct panfrost_slot_usage stats;
+   u64 timestamp = local_clock();
+   ssize_t len = 0;
+   unsigned int i;
+
+   BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
+
+   len += sysfs_emit(buf, "queuetimestampjobs
runtime\n");
+   len += sysfs_emit_at(buf, len, 
"-\n");
+
+   for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
+
+   stats = get_slot_stats(pfdev, i);
+
+   /*
+* Each line will display the slot name, timestamp, the number
+* of jobs handled by that engine and runtime, as shown below:
+*
+* queuetimestampjobsruntime
+* -

[PATCH v2 6/6] drm/v3d: Enable big and super pages

2024-04-05 Thread Maíra Canal

The V3D MMU also supports 64KB and 1MB pages, called big and super pages,
respectively. In order to set a 64KB page or 1MB page in the MMU, we need
to make sure that page table entries for all 4KB pages within a big/super
page must be correctly configured.

In order to create a big/super page, we need a contiguous memory region.
That's why we use a separate mountpoint with THP enabled. In order to
place the page table entries in the MMU, we iterate over the 16 4KB pages
(for big pages) or 256 4KB pages (for super pages) and insert the PTE.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_bo.c| 21 +--
 drivers/gpu/drm/v3d/v3d_drv.c   |  8 ++
 drivers/gpu/drm/v3d/v3d_drv.h   |  2 ++
 drivers/gpu/drm/v3d/v3d_gemfs.c |  6 +
 drivers/gpu/drm/v3d/v3d_mmu.c   | 46 ++---
 5 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 79e31c5299b1..cfe82232886a 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
struct v3d_dev *v3d = to_v3d_dev(obj->dev);
struct v3d_bo *bo = to_v3d_bo(obj);
struct sg_table *sgt;
+   u64 align;
int ret;

/* So far we pin the BO in the MMU for its lifetime, so use
@@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
if (IS_ERR(sgt))
return PTR_ERR(sgt);

+   if (!v3d->super_pages)
+   align = SZ_4K;
+   else if (obj->size >= SZ_1M)
+   align = SZ_1M;
+   else if (obj->size >= SZ_64K)
+   align = SZ_64K;
+   else
+   align = SZ_4K;
+
spin_lock(>mm_lock);
/* Allocate the object's space in the GPU's page tables.
 * Inserting PTEs will happen later, but the offset is for the
@@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
+align >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
@@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, 
struct drm_file *file_priv,
 size_t unaligned_size)
 {
struct drm_gem_shmem_object *shmem_obj;
+   struct v3d_dev *v3d = to_v3d_dev(dev);
struct v3d_bo *bo;
int ret;

-   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+   /* Let the user opt out of allocating the BOs with THP */
+   if (v3d->super_pages)
+   shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size,
+ v3d->gemfs);
+   else
+   shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+
if (IS_ERR(shmem_obj))
return ERR_CAST(shmem_obj);
bo = to_v3d_bo(_obj->base);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..3dbd29560be4 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -36,6 +36,12 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0

+static bool super_pages = true;
+module_param_named(super_pages, super_pages, bool, 0400);
+MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support. Note: \
+  To enable Super Pages, you need support to \
+  enable THP.");
+
 static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
 {
@@ -308,6 +314,8 @@ static int v3d_platform_drm_probe(struct platform_device 
*pdev)
return -ENOMEM;
}

+   v3d->super_pages = super_pages;
+
ret = v3d_gem_init(drm);
if (ret)
goto dma_free;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 17236ee23490..0a7aacf51164 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -18,6 +18,7 @@ struct platform_device;
 struct reset_control;

 #define V3D_MMU_PAGE_SHIFT 12
+#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)

 #define V3D_MAX_QUEUES (V3D_CPU + 1)

@@ -121,6 +122,7 @@ struct v3d_dev {
 * tmpfs instance used for shmem backed objects
 */
struct vfsmount *gemfs;
+   bool super_pages;

struct work_struct overflow_mem_work;

diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
index 31cf5bd11e39..7ee55b32c36e 100644
--- a/drivers/gpu/drm/v3d/v3d_gemfs.c
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -12,6 +12,10 @@ void v3d_gemfs_

[PATCH v2 5/6] drm/v3d: Reduce the alignment of the node allocation

2024-04-05 Thread Maíra Canal

Currently, we are using an alignment of 128 kB to insert a node, which
ends up wasting memory as we perform plenty of small BOs allocations
(<= 4 kB). We require that allocations are aligned to 128Kb so for any
allocation smaller than that, we are wasting the difference.

This implies that we cannot effectively use the whole 4 GB address space
available for the GPU in the RPi 4. Currently, we can allocate up to
32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be
quite limiting for applications that have a high memory requirement, such
as vkoverhead [1].

By reducing the page alignment to 4 kB, we can allocate up to 100 BOs
of 4 kB (~4 GB) and 1 BOs of 400 kB (~4 GB). Moreover, by performing
benchmarks, we were able to attest that reducing the page alignment to
4 kB can provide a general performance improvement in OpenGL
applications (e.g. glmark2).

Therefore, this patch reduces the alignment of the node allocation to 4
kB, which will allow RPi users to explore the whole 4GB virtual
address space provided by the hardware. Also, this patch allow users to
fully run vkoverhead in the RPi 4/5, solving the issue reported in [1].

[1] https://github.com/zmike/vkoverhead/issues/14

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/v3d/v3d_bo.c  | 2 +-
 drivers/gpu/drm/v3d/v3d_drv.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a07ede668cc1..79e31c5299b1 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 */
ret = drm_mm_insert_node_generic(>mm, >node,
 obj->size >> V3D_MMU_PAGE_SHIFT,
-GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 
0, 0);
+SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
spin_unlock(>mm_lock);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index d2ce8222771a..17236ee23490 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -17,8 +17,6 @@ struct clk;
 struct platform_device;
 struct reset_control;
 
-#define GMP_GRANULARITY (128 * 1024)
-
 #define V3D_MMU_PAGE_SHIFT 12
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
-- 
2.44.0

[PATCH v2 4/6] drm/gem: Create shmem GEM object in a given mountpoint

2024-04-05 Thread Maíra Canal

Create a function `drm_gem_shmem_create_with_mnt()`, similar to
`drm_gem_shmem_create()`, that has a mountpoint as a argument. This
function will create a shmem GEM object in a given tmpfs mountpoint.

This function will be useful for drivers that have a special mountpoint
with flags enabled.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++
 include/drm/drm_gem_shmem_helper.h |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 13bcdbfd..10b7c4c769a3 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs 
= {
 };
 
 static struct drm_gem_shmem_object *
-__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
+__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private,
+  struct vfsmount *gemfs)
 {
struct drm_gem_shmem_object *shmem;
struct drm_gem_object *obj;
@@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, 
bool private)
drm_gem_private_object_init(dev, obj, size);
shmem->map_wc = false; /* dma-buf mappings use always 
writecombine */
} else {
-   ret = drm_gem_object_init(dev, obj, size);
+   ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs);
}
if (ret) {
drm_gem_private_object_fini(obj);
@@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t 
size, bool private)
  */
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size)
 {
-   return __drm_gem_shmem_create(dev, size, false);
+   return __drm_gem_shmem_create(dev, size, false, NULL);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
 
+/**
+ * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a
+ * given mountpoint
+ * @dev: DRM device
+ * @size: Size of the object to allocate
+ * @gemfs: tmpfs mount where the GEM object will be created
+ *
+ * This function creates a shmem GEM object in a given tmpfs mountpoint.
+ *
+ * Returns:
+ * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs)
+{
+   return __drm_gem_shmem_create(dev, size, false, gemfs);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt);
+
 /**
  * drm_gem_shmem_free - Free resources associated with a shmem GEM object
  * @shmem: shmem GEM object to free
@@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev,
size_t size = PAGE_ALIGN(attach->dmabuf->size);
struct drm_gem_shmem_object *shmem;
 
-   shmem = __drm_gem_shmem_create(dev, size, true);
+   shmem = __drm_gem_shmem_create(dev, size, true, NULL);
if (IS_ERR(shmem))
return ERR_CAST(shmem);
 
diff --git a/include/drm/drm_gem_shmem_helper.h 
b/include/drm/drm_gem_shmem_helper.h
index efbc9f27312b..d22e3fb53631 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -97,6 +97,9 @@ struct drm_gem_shmem_object {
container_of(obj, struct drm_gem_shmem_object, base)
 
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, 
size_t size);
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device 
*dev,
+  size_t size,
+  struct vfsmount 
*gemfs);
 void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem);
 
 void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem);
-- 
2.44.0

[PATCH v2 3/6] drm/v3d: Introduce gemfs

2024-04-05 Thread Maíra Canal

Create a separate "tmpfs" kernel mount for V3D. This will allow us to
move away from the shmemfs `shm_mnt` and gives the flexibility to do
things like set our own mount options. Here, the interest is to use
"huge=", which should allow us to enable the use of THP for our
shmem-backed objects.

Signed-off-by: Maíra Canal 
Reviewed-by: Iago Toral Quiroga 
---
 drivers/gpu/drm/v3d/Makefile|  3 ++-
 drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++
 drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 46 +
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index b7d673f1153b..fcf710926057 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -13,7 +13,8 @@ v3d-y := \
v3d_trace_points.o \
v3d_sched.o \
v3d_sysfs.o \
-   v3d_submit.o
+   v3d_submit.o \
+   v3d_gemfs.o
 
 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..d2ce8222771a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -119,6 +119,11 @@ struct v3d_dev {
struct drm_mm mm;
spinlock_t mm_lock;
 
+   /*
+* tmpfs instance used for shmem backed objects
+*/
+   struct vfsmount *gemfs;
+
struct work_struct overflow_mem_work;
 
struct v3d_bin_job *bin_job;
@@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
 void v3d_clean_caches(struct v3d_dev *v3d);
 
+/* v3d_gemfs.c */
+void v3d_gemfs_init(struct v3d_dev *v3d);
+void v3d_gemfs_fini(struct v3d_dev *v3d);
+
 /* v3d_submit.c */
 void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 66f4b78a6b2e..faefbe497e8d 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev)
v3d_init_hw_state(v3d);
v3d_mmu_set_page_table(v3d);
 
+   v3d_gemfs_init(v3d);
+
ret = v3d_sched_init(v3d);
if (ret) {
drm_mm_takedown(>mm);
@@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev)
struct v3d_dev *v3d = to_v3d_dev(dev);
 
v3d_sched_fini(v3d);
+   v3d_gemfs_fini(v3d);
 
/* Waiting for jobs to finish would need to be done before
 * unregistering V3D.
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
new file mode 100644
index ..31cf5bd11e39
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (C) 2024 Raspberry Pi */
+
+#include 
+#include 
+
+#include "v3d_drv.h"
+
+void v3d_gemfs_init(struct v3d_dev *v3d)
+{
+   char huge_opt[] = "huge=within_size";
+   struct file_system_type *type;
+   struct vfsmount *gemfs;
+
+   /*
+* By creating our own shmemfs mountpoint, we can pass in
+* mount flags that better match our usecase. However, we
+* only do so on platforms which benefit from it.
+*/
+   if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+   goto err;
+
+   type = get_fs_type("tmpfs");
+   if (!type)
+   goto err;
+
+   gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
+   if (IS_ERR(gemfs))
+   goto err;
+
+   v3d->gemfs = gemfs;
+   drm_info(>drm, "Using Transparent Hugepages\n");
+
+   return;
+
+err:
+   v3d->gemfs = NULL;
+   drm_notice(>drm,
+  "Transparent Hugepage support is recommended for optimal 
performance on this platform!\n");
+}
+
+void v3d_gemfs_fini(struct v3d_dev *v3d)
+{
+   if (v3d->gemfs)
+   kern_unmount(v3d->gemfs);
+}
-- 
2.44.0

[PATCH v2 2/6] drm/gem: Create a drm_gem_object_init_with_mnt() function

2024-04-05 Thread Maíra Canal

For some applications, such as applications that uses huge pages, we might
want to have a different mountpoint, for which we pass mount flags that
better match our usecase.

Therefore, create a new function `drm_gem_object_init_with_mnt()` that
allow us to define the tmpfs mountpoint where the GEM object will be
created. If this parameter is NULL, then we fallback to `shmem_file_setup()`.

Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/drm_gem.c | 34 ++
 include/drm/drm_gem.h |  3 +++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d4bbc5d109c8..74ebe68e3d61 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev)
 }

 /**
- * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM
+ * object in a given shmfs mountpoint
+ *
  * @dev: drm_device the object should be initialized for
  * @obj: drm_gem_object to initialize
  * @size: object size
+ * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use
+ * the usual tmpfs mountpoint (`shm_mnt`).
  *
  * Initialize an already allocated GEM object of the specified size with
  * shmfs backing store.
  */
-int drm_gem_object_init(struct drm_device *dev,
-   struct drm_gem_object *obj, size_t size)
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs)
 {
struct file *filp;

drm_gem_private_object_init(dev, obj, size);

-   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+   if (gemfs)
+   filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
+VM_NORESERVE);
+   else
+   filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+
if (IS_ERR(filp))
return PTR_ERR(filp);

@@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev,

return 0;
 }
+EXPORT_SYMBOL(drm_gem_object_init_with_mnt);
+
+/**
+ * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * @dev: drm_device the object should be initialized for
+ * @obj: drm_gem_object to initialize
+ * @size: object size
+ *
+ * Initialize an already allocated GEM object of the specified size with
+ * shmfs backing store.
+ */
+int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj,
+   size_t size)
+{
+   return drm_gem_object_init_with_mnt(dev, obj, size, NULL);
+}
 EXPORT_SYMBOL(drm_gem_object_init);

 /**
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bae4865b2101..2ebf6e10cc44 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
struct drm_gem_object *obj, size_t size);
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+struct drm_gem_object *obj, size_t size,
+struct vfsmount *gemfs);
 void drm_gem_private_object_init(struct drm_device *dev,
 struct drm_gem_object *obj, size_t size);
 void drm_gem_private_object_fini(struct drm_gem_object *obj);
--
2.44.0

1 2 3 4 5 6 7 8 >

1 - 100 of 767 matches

Mail list logo