Re: [Freedreno] [RFC PATCH 1/3] drm/msm/dpu: add support for MSM8953

2023-09-29 Thread Dmitry Baryshkov
On Fri, 29 Sept 2023 at 23:53, Luca Weiss  wrote:
>
> On Samstag, 23. September 2023 23:49:10 CEST Dmitry Baryshkov wrote:
> > Experimental support for MSM8953, which has MDP5 v1.16. It looks like
> > trimmed down version of MSM8996. Less SSPP, LM and PP blocks. No DSC,
> > etc.
> >
>
> Hi Dmitry,
>
> As written on IRC, on sdm632-fairphone-fp3 with this DPU patches the screen is
> initializing and displaying stuff :) But there's some errors, which presumably
> are the reason that the screen is only updating a few times per second.
>
> [   22.774205] [drm:dpu_kms_hw_init:1164] dpu hardware revision:0x1010
> [   23.099806] [drm:_dpu_encoder_phys_cmd_wait_for_ctl_start:657] [dpu 
> error]enc31 intf1 ctl start interrupt wait failed
> [   23.099821] [drm:dpu_kms_wait_for_commit_done:495] [dpu error]wait for 
> commit done returned -22
>
> These messages appear about 13 times per second but as I mentioned, the screen
> *is* updating (slowly) there.

Thank you for the testing, I'll see if I can determine what is causing
the ctl start issue.

>
> Also you for sure forgot to add "qcom,msm8953-mdp5" to the
> msm_mdp5_dpu_migration list, without this DPU is never even considered for
> 8953.

Yep.

>
> Regards
> Luca


-- 
With best wishes
Dmitry


Re: [Freedreno] [RFC PATCH 1/3] drm/msm/dpu: add support for MSM8953

2023-09-29 Thread Luca Weiss
On Samstag, 23. September 2023 23:49:10 CEST Dmitry Baryshkov wrote:
> Experimental support for MSM8953, which has MDP5 v1.16. It looks like
> trimmed down version of MSM8996. Less SSPP, LM and PP blocks. No DSC,
> etc.
> 

Hi Dmitry,

As written on IRC, on sdm632-fairphone-fp3 with this DPU patches the screen is
initializing and displaying stuff :) But there's some errors, which presumably
are the reason that the screen is only updating a few times per second.

[   22.774205] [drm:dpu_kms_hw_init:1164] dpu hardware revision:0x1010
[   23.099806] [drm:_dpu_encoder_phys_cmd_wait_for_ctl_start:657] [dpu 
error]enc31 intf1 ctl start interrupt wait failed
[   23.099821] [drm:dpu_kms_wait_for_commit_done:495] [dpu error]wait for 
commit done returned -22

These messages appear about 13 times per second but as I mentioned, the screen
*is* updating (slowly) there.

Also you for sure forgot to add "qcom,msm8953-mdp5" to the
msm_mdp5_dpu_migration list, without this DPU is never even considered for
8953.

Regards
Luca

> Signed-off-by: Dmitry Baryshkov 
> ---
>  .../msm/disp/dpu1/catalog/dpu_1_16_msm8953.h  | 221 ++
>  .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c|  12 +
>  .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h|   1 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c   |   1 +
>  4 files changed, 235 insertions(+)
>  create mode 100644 drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_1_16_msm8953.h
> 
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_1_16_msm8953.h
> b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_1_16_msm8953.h new file mode
> 100644
> index ..6944bfa4568a
> --- /dev/null
> +++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_1_16_msm8953.h
> @@ -0,0 +1,221 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2023, Linaro Limited
> + */
> +
> +#ifndef _DPU_1_16_MSM8953_H
> +#define _DPU_1_16_MSM8953_H
> +
> +static const struct dpu_caps msm8953_dpu_caps = {
> + .max_mixer_width = DEFAULT_DPU_LINE_WIDTH,
> + .max_mixer_blendstages = 0x4,
> + .max_linewidth = DEFAULT_DPU_LINE_WIDTH,
> + .pixel_ram_size = 40 * 1024,
> + .max_hdeci_exp = MAX_HORZ_DECIMATION,
> + .max_vdeci_exp = MAX_VERT_DECIMATION,
> +};
> +
> +static const struct dpu_mdp_cfg msm8953_mdp[] = {
> + {
> + .name = "top_0",
> + .base = 0x0, .len = 0x454,
> + .features = BIT(DPU_MDP_VSYNC_SEL),
> + .clk_ctrls = {
> + [DPU_CLK_CTRL_VIG0] = { .reg_off = 0x2ac, .bit_off = 0 
> },
> + [DPU_CLK_CTRL_RGB0] = { .reg_off = 0x2ac, .bit_off = 4 
> },
> + [DPU_CLK_CTRL_RGB1] = { .reg_off = 0x2b4, .bit_off = 4 
> },
> + [DPU_CLK_CTRL_DMA0] = { .reg_off = 0x2ac, .bit_off = 8 
> },
> + [DPU_CLK_CTRL_CURSOR0] = { .reg_off = 0x3a8, .bit_off = 
> 16 },
> + },
> + },
> +};
> +
> +static const struct dpu_ctl_cfg msm8953_ctl[] = {
> + {
> + .name = "ctl_0", .id = CTL_0,
> + .base = 0x1000, .len = 0x64,
> + .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
> + }, {
> + .name = "ctl_1", .id = CTL_1,
> + .base = 0x1200, .len = 0x64,
> + .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 10),
> + }, {
> + .name = "ctl_2", .id = CTL_2,
> + .base = 0x1400, .len = 0x64,
> + .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 11),
> + },
> +};
> +
> +static const struct dpu_sspp_cfg msm8953_sspp[] = {
> + {
> + .name = "sspp_0", .id = SSPP_VIG0,
> + .base = 0x4000, .len = 0x150,
> + .features = VIG_MSM8953_MASK,
> + .sblk = _vig_sblk_qseed2,
> + .xin_id = 0,
> + .type = SSPP_TYPE_VIG,
> + .clk_ctrl = DPU_CLK_CTRL_VIG0,
> + }, {
> + .name = "sspp_4", .id = SSPP_RGB0,
> + .base = 0x14000, .len = 0x150,
> + .features = RGB_MSM8953_MASK,
> + .sblk = _rgb_sblk,
> + .xin_id = 1,
> + .type = SSPP_TYPE_RGB,
> + .clk_ctrl = DPU_CLK_CTRL_RGB0,
> + }, {
> + .name = "sspp_5", .id = SSPP_RGB1,
> + .base = 0x16000, .len = 0x150,
> + .features = RGB_MSM8953_MASK,
> + .sblk = _rgb_sblk,
> + .xin_id = 5,
> + .type = SSPP_TYPE_RGB,
> + .clk_ctrl = DPU_CLK_CTRL_RGB1,
> + }, {
> + .name = "sspp_8", .id = SSPP_DMA0,
> + .base = 0x24000, .len = 0x150,
> + .features = DMA_MSM8953_MASK | BIT(DPU_SSPP_CURSOR),
> + .sblk = _dma_sblk,
> + .xin_id = 2,
> + .type = SSPP_TYPE_DMA,
> + .clk_ctrl = DPU_CLK_CTRL_DMA0,
> + },
> +};
> +
> +static const struct dpu_lm_cfg msm8953_lm[] = {
> + {
> + .name = "lm_0", .id = LM_0,
> + .base = 0x44000, .len = 0x320,
> +

Re: [Freedreno] [PATCH 0/9] drm: Annotate structs with __counted_by

2023-09-29 Thread Kees Cook
On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote:
> This is a batch of patches touching drm for preparing for the coming
> implementation by GCC and Clang of the __counted_by attribute. Flexible
> array members annotated with __counted_by can have their accesses
> bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array
> indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).
> 
> As found with Coccinelle[1], add __counted_by to structs that would
> benefit from the annotation.
> 
> [...]

Since this got Acks, I figure I should carry it in my tree. Let me know
if this should go via drm instead.

Applied to for-next/hardening, thanks!

[1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with 
__counted_by
  https://git.kernel.org/kees/c/a6046ac659d6
[2/9] drm/amdgpu/discovery: Annotate struct ip_hw_instance with __counted_by
  https://git.kernel.org/kees/c/4df33089b46f
[3/9] drm/i915/selftests: Annotate struct perf_series with __counted_by
  https://git.kernel.org/kees/c/ffd3f823bdf6
[4/9] drm/msm/dpu: Annotate struct dpu_hw_intr with __counted_by
  https://git.kernel.org/kees/c/2de35a989b76
[5/9] drm/nouveau/pm: Annotate struct nvkm_perfdom with __counted_by
  https://git.kernel.org/kees/c/188aeb08bfaa
[6/9] drm/vc4: Annotate struct vc4_perfmon with __counted_by
  https://git.kernel.org/kees/c/59a54dc896c3
[7/9] drm/virtio: Annotate struct virtio_gpu_object_array with __counted_by
  https://git.kernel.org/kees/c/5cd476de33af
[8/9] drm/vmwgfx: Annotate struct vmw_surface_dirty with __counted_by
  https://git.kernel.org/kees/c/b426f2e5356a
[9/9] drm/v3d: Annotate struct v3d_perfmon with __counted_by
  https://git.kernel.org/kees/c/dc662fa1b0e4

Take care,

-- 
Kees Cook



[Freedreno] [PATCH v8 5/5] drm/panfrost: Implement generic DRM object RSS reporting function

2023-09-29 Thread Adrián Larumbe
BO's RSS is updated every time new pages are allocated on demand and mapped
for the object at GPU page fault's IRQ handler, but only for heap buffers.
The reason this is unnecessary for non-heap buffers is that they are mapped
onto the GPU's VA space and backed by physical memory in their entirety at
BO creation time.

This calculation is unnecessary for imported PRIME objects, since heap
buffers cannot be exported by our driver, and the actual BO RSS size is the
one reported in its attached dmabuf structure.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index de238b71b321..0cf64456e29a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -209,6 +209,20 @@ static enum drm_gem_object_status 
panfrost_gem_status(struct drm_gem_object *obj
return res;
 }
 
+static size_t panfrost_gem_rss(struct drm_gem_object *obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+
+   if (bo->is_heap) {
+   return bo->heap_rss_size;
+   } else if (bo->base.pages) {
+   WARN_ON(bo->heap_rss_size);
+   return bo->base.base.size;
+   }
+
+   return 0;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -221,6 +235,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
.status = panfrost_gem_status,
+   .rss = panfrost_gem_rss,
.vm_ops = _gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h 
b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..13c0a8149c3a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -36,6 +36,11 @@ struct panfrost_gem_object {
 */
atomic_t gpu_usecount;
 
+   /*
+* Object chunk size currently mapped onto physical memory
+*/
+   size_t heap_rss_size;
+
bool noexec :1;
bool is_heap:1;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index d54d4e7b2195..846dd697c410 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
 
bomapping->active = true;
+   bo->heap_rss_size += SZ_2M;
 
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
-- 
2.42.0



[Freedreno] [PATCH v8 1/5] drm/panfrost: Add cycle count GPU register definitions

2023-09-29 Thread Adrián Larumbe
These GPU registers will be used when programming the cycle counter, which
we need for providing accurate fdinfo drm-cycles values to user space.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 919f44ac853d..55ec807550b3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET   0x01
 #define   GPU_CMD_PERFCNT_CLEAR0x03
 #define   GPU_CMD_PERFCNT_SAMPLE   0x04
+#define   GPU_CMD_CYCLE_COUNT_START0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP 0x06
 #define   GPU_CMD_CLEAN_CACHES 0x07
 #define   GPU_CMD_CLEAN_INV_CACHES 0x08
 #define GPU_STATUS 0x34
@@ -73,6 +75,9 @@
 #define GPU_PRFCNT_TILER_EN0x74
 #define GPU_PRFCNT_MMU_L2_EN   0x7c
 
+#define GPU_CYCLE_COUNT_LO 0x90
+#define GPU_CYCLE_COUNT_HI 0x94
+
 #define GPU_THREAD_MAX_THREADS 0x0A0   /* (RO) Maximum number of 
threads per core */
 #define GPU_THREAD_MAX_WORKGROUP_SIZE  0x0A4   /* (RO) Maximum workgroup size 
*/
 #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8   /* (RO) Maximum threads waiting 
at a barrier */
-- 
2.42.0



[Freedreno] [PATCH v8 3/5] drm/panfrost: Add fdinfo support for memory stats

2023-09-29 Thread Adrián Larumbe
A new DRM GEM object function is added so that drm_show_memory_stats can
provide more accurate memory usage numbers.

Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
after locking the driver's shrinker mutex, but drm_show_memory_stats takes
over the drm file's object handle database spinlock, so there's potential
for a race condition here.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 97e5bc4a82c8..b834777b409b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -568,6 +568,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
struct panfrost_device *pfdev = dev->dev_private;
 
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+
+   drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 3c812fbd126f..de238b71b321 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -195,6 +195,20 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
return drm_gem_shmem_pin(>base);
 }
 
+static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   if (bo->base.pages)
+   res |= DRM_GEM_OBJECT_RESIDENT;
+
+   if (bo->base.madv == PANFROST_MADV_DONTNEED)
+   res |= DRM_GEM_OBJECT_PURGEABLE;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -206,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = panfrost_gem_status,
.vm_ops = _gem_shmem_vm_ops,
 };
 
-- 
2.42.0



[Freedreno] [PATCH v8 2/5] drm/panfrost: Add fdinfo support GPU load metrics

2023-09-29 Thread Adrián Larumbe
The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

It is important to bear in mind that both GPU cycle and kernel time numbers
provided are at best rough estimations, and always reported in excess from
the actual figure because of two reasons:
 - Excess time because of the delay between the end of a job processing,
   the subsequent job IRQ and the actual time of the sample.
 - Time spent in the engine queue waiting for the GPU to pick up the next
   job.

To avoid race conditions during enablement/disabling, a reference counting
mechanism was introduced, and a job flag that tells us whether a given job
increased the refcount. This is necessary, because user space can toggle
cycle counting through a debugfs file, and a given job might have been in
flight by the time cycle counting was disabled.

The main goal of the debugfs cycle counter knob is letting tools like nvtop
or IGT's gputop switch it at any time, to avoid power waste in case no
engine usage measuring is necessary.

Also add a documentation file explaining the possible values for fdinfo's
engine keystrings and Panfrost-specific drm-curfreq- pairs.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 Documentation/gpu/drm-usage-stats.rst   |  1 +
 Documentation/gpu/panfrost.rst  | 38 ++
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 58 -
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 14 files changed, 233 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/gpu/panfrost.rst
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index fe35a291ff3e..8d963cd7c1b7 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -169,3 +169,4 @@ Driver specific implementations
 ---
 
 :ref:`i915-usage-stats`
+:ref:`panfrost-usage-stats`
diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
new file mode 100644
index ..ecc48ba5ac11
--- /dev/null
+++ b/Documentation/gpu/panfrost.rst
@@ -0,0 +1,38 @@
+===
+ drm/Panfrost Mali Driver
+===
+
+.. _panfrost-usage-stats:
+
+Panfrost DRM client usage stats implementation
+==
+
+The drm/Panfrost driver implements the DRM client usage stats specification as
+documented in :ref:`drm-client-usage-stats`.
+
+Example of the output showing the implemented key value pairs and entirety of
+the currently possible format options:
+
+::
+  pos:0
+  flags:  0242
+  mnt_id: 27
+  ino:531
+  drm-driver: panfrost
+  drm-client-id:  14
+  drm-engine-fragment:1846584880 ns
+  drm-cycles-fragment:1424359409
+  drm-maxfreq-fragment:   79987 Hz
+  drm-curfreq-fragment:   79987 Hz
+  drm-engine-vertex-tiler:71932239 ns
+  drm-cycles-vertex-tiler:52617357
+  drm-maxfreq-vertex-tiler:   79987 Hz
+  drm-curfreq-vertex-tiler:   79987 Hz
+  drm-total-memory:   290 MiB
+  drm-shared-memory:  0 MiB
+  drm-active-memory:  226 MiB
+  drm-resident-memory:36496 KiB
+  drm-purgeable-memory:   128 KiB
+
+Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
+`drm-curfreq-` values convey the current operating frequency for that engine.
diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 7da2b3f02ed9..2c01c1e7523e 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -12,4 +12,6 @@ panfrost-y := \

[Freedreno] [PATCH v8 4/5] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo

2023-09-29 Thread Adrián Larumbe
Some BO's might be mapped onto physical memory chunkwise and on demand,
like Panfrost's tiler heap. In this case, even though the
drm_gem_shmem_object page array might already be allocated, only a very
small fraction of the BO is currently backed by system memory, but
drm_show_memory_stats will then proceed to add its entire virtual size to
the file's total resident size regardless.

This led to very unrealistic RSS sizes being reckoned for Panfrost, where
said tiler heap buffer is initially allocated with a virtual size of 128
MiB, but only a small part of it will eventually be backed by system memory
after successive GPU page faults.

Provide a new DRM object generic function that would allow drivers to
return a more accurate RSS and purgeable sizes for their BOs.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/gpu/drm/drm_file.c | 8 +---
 include/drm/drm_gem.h  | 9 +
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 883d83bc0e3d..9a1bd8d0d785 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -930,6 +930,8 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
spin_lock(>table_lock);
idr_for_each_entry (>object_idr, obj, id) {
enum drm_gem_object_status s = 0;
+   size_t add_size = (obj->funcs && obj->funcs->rss) ?
+   obj->funcs->rss(obj) : obj->size;
 
if (obj->funcs && obj->funcs->status) {
s = obj->funcs->status(obj);
@@ -944,7 +946,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (s & DRM_GEM_OBJECT_RESIDENT) {
-   status.resident += obj->size;
+   status.resident += add_size;
} else {
/* If already purged or not yet backed by pages, don't
 * count it as purgeable:
@@ -953,14 +955,14 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (!dma_resv_test_signaled(obj->resv, 
dma_resv_usage_rw(true))) {
-   status.active += obj->size;
+   status.active += add_size;
 
/* If still active, don't count as purgeable: */
s &= ~DRM_GEM_OBJECT_PURGEABLE;
}
 
if (s & DRM_GEM_OBJECT_PURGEABLE)
-   status.purgeable += obj->size;
+   status.purgeable += add_size;
}
spin_unlock(>table_lock);
 
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bc9f6aa2f3fe..16364487fde9 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
 */
enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
 
+   /**
+* @rss:
+*
+* Return resident size of the object in physical memory.
+*
+* Called by drm_show_memory_stats().
+*/
+   size_t (*rss)(struct drm_gem_object *obj);
+
/**
 * @vm_ops:
 *
-- 
2.42.0



[Freedreno] [PATCH v8 0/5] Add fdinfo support to Panfrost

2023-09-29 Thread Adrián Larumbe
This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/

v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/
 - Changed the way gpu cycles and engine time are calculated, using GPU
   registers and taking into account potential resets.
 - Split render engine values into fragment and vertex/tiler ones.
 - Added more fine-grained calculation of RSS size for BO's.
 - Implemente selection of drm-memory region size units.
 - Removed locking of shrinker's mutex in GEM obj status function.

v3: 
https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/
 - Changed fdinfo engine names to something more descriptive.;
 - Mentioned GPU cycle counts aren't an exact measure.
 - Handled the case when job->priv might be NULL.
 - Handled 32 bit overflow of cycle register.
 - Kept fdinfo drm memory stats size unit display within 10k times the
   previous multiplier for more accurate BO size numbers.
 - Removed special handling of Prime imported BO RSS.
 - Use rss_size only for heap objects.
 - Use bo->base.madv instead of specific purgeable flag.
 - Fixed kernel test robot warnings.

v4: 
https://lore.kernel.org/lkml/20230912084044.955864-1-adrian.laru...@collabora.com/
 - Move cycle counter get and put to panfrost_job_hw_submit and
   panfrost_job_handle_{err,done} for more accuracy.
 - Make sure cycle counter refs are released in reset path
 - Drop the model param for toggling cycle counting and do
   leave it down to the debugfs file.
 - Don't disable cycle counter when togglint debugfs file,
   let refcounting logic handle it instead.
 - Remove fdinfo data nested structure definion and 'names' field
 - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume
   granuality of 2MiB for every successful mapping.
 - drm-file picks an fdinfo memory object size unit that doesn't lose precision.

v5: 
https://lore.kernel.org/lkml/20230914223928.2374933-1-adrian.laru...@collabora.com/
 - Removed explicit initialisation of atomic variable for profiling mode,
   as it's allocated with kzalloc.
 - Pass engine utilisation structure to jobs rather than the file context, to 
avoid
   future misusage of the latter.
 - Remove double reading of cycle counter register and ktime in job deqeueue 
function,
   as the scheduler will make sure these values are read over in case of 
requeuing.
 - Moved putting of cycle counting refcnt into panfrost job dequeue.
   function to avoid repetition.

v6: 
https://lore.kernel.org/lkml/c73ad42b-a8db-23c2-86c7-1a2939dba...@linux.intel.com/T/
 - Fix wrong swapped-round engine time and cycle values in fdinfo
   drm print statements.

v7: 
https://lore.kernel.org/lkml/20230927213133.1651169-6-adrian.laru...@collabora.com/T/
 - Make sure an object's actual RSS size is added to the overall fdinfo's 
purgeable
   and active size tally when it's both resident and purgeable or active.
 - Create a drm/panfrost.rst documentation file with meaning of fdinfo strings.
 - BUILD_BUG_ON checking the engine name array size for fdinfo.
 - Added copyright notices for Amazon in Panfrost's new debugfs files.
 - Discarded fdinfo memory stats unit size selection patch.

v8:
 - Style improvements and addressing nitpicks. 

Adrián Larumbe (5):
  drm/panfrost: Add cycle count GPU register definitions
  drm/panfrost: Add fdinfo support GPU load metrics
  drm/panfrost: Add fdinfo support for memory stats
  drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  drm/panfrost: Implement generic DRM object RSS reporting function

 Documentation/gpu/drm-usage-stats.rst   |  1 +
 Documentation/gpu/panfrost.rst  | 38 +
 drivers/gpu/drm/drm_file.c  |  8 +--
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 60 -
 drivers/gpu/drm/panfrost/panfrost_gem.c | 30 +++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 drivers/gpu/drm/panfrost/panfrost_regs.h|  5 

Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-29 Thread Robin Murphy

On 29/09/2023 4:45 pm, Will Deacon wrote:

On Mon, Sep 25, 2023 at 06:54:42PM +0100, Robin Murphy wrote:

On 2023-04-10 19:52, Dmitry Baryshkov wrote:

If the Adreno SMMU is dma-coherent, allocation will fail unless we
disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
coherent SMMUs (like we have on sm8350 platform).


Hmm, but is it right that it should fail in the first place? The fact is
that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly
can't see why the io-pgtable code is going out of its way to explicitly
reject a request to give them the same attribute it's already giving then
anyway :/

Even if the original intent was for the quirk to have an over-specific
implication of representing inner-NC as well, that hardly seems useful if
what we've ended up with in practice is a nonsensical-looking check in one
place and then a weird hacky bodge in another purely to work around it.

Does anyone know a good reason why this is the way it is?


I think it was mainly because the quick doesn't make sense for a coherent
page-table walker and we could in theory use that bit for something else
in that case.


Yuck, even if we did want some horrible notion of quirks being 
conditional on parts of the config rather than just the format, then the 
users would need to be testing for the same condition as the pagetable 
code itself (i.e. cfg->coherent_walk), rather than hoping some other 
property of something else indirectly reflects the right information - 
e.g. there'd be no hope of backporting this particular bodge before 5.19 
where the old iommu_capable(IOMMU_CAP_CACHE_COHERENCY) always returned 
true, and in future we could conceivably support coherent SMMUs being 
configured for non-coherent walks on a per-domain basis.


Furthermore, if we did overload a flag to have multiple meanings, then 
we'd have no way of knowing which one the caller was actually expecting, 
thus the illusion of being able to validate calls in the meantime isn't 
necessarily as helpful as it seems, particularly in a case where the 
"wrong" interpretation would be to have no effect anyway. Mostly though 
I'd hope that if we ever got anywhere near the point of running out of 
quirk bits we'd have already realised that it's time for a better 
interface :(


Based on that, I think that when I do get round to needing to touch this 
code, I'll propose just streamlining the whole quirk.


Cheers,
Robin.


Re: [Freedreno] [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-29 Thread Will Deacon
On Mon, Sep 25, 2023 at 06:54:42PM +0100, Robin Murphy wrote:
> On 2023-04-10 19:52, Dmitry Baryshkov wrote:
> > If the Adreno SMMU is dma-coherent, allocation will fail unless we
> > disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
> > coherent SMMUs (like we have on sm8350 platform).
> 
> Hmm, but is it right that it should fail in the first place? The fact is
> that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly
> can't see why the io-pgtable code is going out of its way to explicitly
> reject a request to give them the same attribute it's already giving then
> anyway :/
> 
> Even if the original intent was for the quirk to have an over-specific
> implication of representing inner-NC as well, that hardly seems useful if
> what we've ended up with in practice is a nonsensical-looking check in one
> place and then a weird hacky bodge in another purely to work around it.
> 
> Does anyone know a good reason why this is the way it is?

I think it was mainly because the quick doesn't make sense for a coherent
page-table walker and we could in theory use that bit for something else
in that case.

Will