date:20230404

Re: [PATCH v6 02/15] dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx

2023-04-04 Thread Krzysztof Kozlowski

On 01/04/2023 13:54, Konrad Dybcio wrote:
> The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
> we'd normally assign to the GMU as if they were a part of the GMU, even
> though they are not". It's a (good) software representation of the GMU_CX
> and GMU_GX register spaces within the GPUSS that helps us programatically
> treat these de-facto GMU-less parts in a way that's very similar to their
> GMU-equipped cousins, massively saving up on code duplication.
> 
> The "wrapper" register space was specifically designed to mimic the layout
> of a real GMU, though it rather obviously does not have the M3 core et al.
> 
> GMU wrapper-equipped A6xx GPUs require clocks and clock-names to be
> specified under the GPU node, just like their older cousins. Account
> for that.
> 
> Signed-off-by: Konrad Dybcio 


Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof

Re: [Intel-gfx] [PATCH v3] drm/i915/mtl: Disable stolen memory backed FB for A0

2023-04-04 Thread Hogander, Jouni

On Tue, 2023-04-04 at 23:26 +0200, Das, Nirmoy wrote:
> 
> On 4/4/2023 8:27 PM, Ville Syrjälä wrote:
> > On Tue, Apr 04, 2023 at 08:13:42PM +0200, Nirmoy Das wrote:
> > > Stolen memory is not usable for MTL A0 stepping beyond
> > > certain access size and we have no control over userspace
> > > access size of /dev/fb which can be backed by stolen memory.
> > > So disable stolen memory backed fb by setting i915-
> > > >dsm.usable_size
> > > to zero.
> > > 
> > > v2: remove hsdes reference and fix commit message(Andi)
> > > v3: use revid as we want to target SOC stepping(Radhakrishna)
> > > 
> > > Cc: Matthew Auld 
> > > Cc: Andi Shyti 
> > > Cc: Daniele Ceraolo Spurio 
> > > Cc: Lucas De Marchi 
> > > Cc: Radhakrishna Sripada 
> > > Signed-off-by: Nirmoy Das 
> > > Reviewed-by: Andi Shyti 
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 8 
> > >   1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > index 8ac376c24aa2..ee492d823f1b 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> > > @@ -535,6 +535,14 @@ static int i915_gem_init_stolen(struct
> > > intel_memory_region *mem)
> > > /* Basic memrange allocator for stolen space. */
> > > drm_mm_init(&i915->mm.stolen, 0, i915->dsm.usable_size);
> > >   
> > > +   /*
> > > +    * Access to stolen lmem beyond certain size for MTL A0
> > > stepping
> > > +    * would crash the machine. Disable stolen lmem for
> > > userspace access
> > > +    * by setting usable_size to zero.
> > > +    */
> > > +   if (IS_METEORLAKE(i915) && INTEL_REVID(i915) == 0x0)
> > > +   i915->dsm.usable_size = 0;
> > That certainly won't prevent FBC from using stolen.
> > Are we sure that FBC accesses are fine?
> 
> I think so. I remember Jouni tested this patch internally to unblock
> a 
> FBC test.
> 
> Jouni, could you please share your thoughts. I can't seem to find the
> internal JIRA reference right now.

I tested this patch and it was fixing the problem it was targeted. I
didn't noticed any issue back then.

> 
> 
> Regards,
> 
> Nirmoy
> 
> > 
> > > +
> > > return 0;
> > >   }
> > >   
> > > -- 
> > > 2.39.0

[PATCH] drm/atomic-helper: Do not assume vblank is always present

2023-04-04 Thread Zack Rusin

From: Zack Rusin 

Many drivers (in particular all of the virtualized drivers) do not
implement vblank handling. Assuming vblank is always present
leads to crashes.

Fix the crashes by making sure the device supports vblank before using
it.

Fixes crashes on boot, as in:
Oops:  [#1] PREEMPT SMP PTI
CPU: 0 PID: 377 Comm: systemd-modules Not tainted 6.3.0-rc4-vmwgfx #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference 
Platform, BIOS 6.00 11/12/2020
RIP: 0010:drm_crtc_next_vblank_start+0x2c/0x80 [drm]
Code: 1e fa 0f 1f 44 00 00 8b 87 90 00 00 00 48 8b 17 55 48 8d 0c c0 48 89 e5 
41 54 53 48 8d 1c 48 48 c1 e3 04 48 03 9a 40 01 00 00 <8b> 53 74 85 d2 74 3f 8b 
4>
RSP: 0018:b825004073c8 EFLAGS: 00010246
RAX:  RBX:  RCX: 
RDX: 9b18cf8d RSI: b825004073e8 RDI: 9b18d0f4
RBP: b825004073d8 R08: 9b18cf8d R09: 
R10: 9b18c57ef6e8 R11: 9b18c2d59e00 R12: 
R13: 9b18cfa99280 R14:  R15: 9b18cf8d
FS:  7f2b82538900() GS:9b19f7c0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0074 CR3: 0001060a6003 CR4: 003706f0
Call Trace:
 
 drm_atomic_helper_wait_for_fences+0x87/0x1e0 [drm_kms_helper]
 drm_atomic_helper_commit+0xa1/0x160 [drm_kms_helper]
 drm_atomic_commit+0x9a/0xd0 [drm]
 ? __pfx___drm_printfn_info+0x10/0x10 [drm]
 drm_client_modeset_commit_atomic+0x1e9/0x230 [drm]
 drm_client_modeset_commit_locked+0x5f/0x160 [drm]
 ? mutex_lock+0x17/0x50
 drm_client_modeset_commit+0x2b/0x50 [drm]
 __drm_fb_helper_restore_fbdev_mode_unlocked+0xca/0x100 [drm_kms_helper]
 drm_fb_helper_set_par+0x40/0x80 [drm_kms_helper]
 fbcon_init+0x2c5/0x690
 visual_init+0xee/0x190
 do_bind_con_driver.isra.0+0x1ce/0x4b0
 do_take_over_console+0x136/0x220
 ? vprintk_default+0x21/0x30
 do_fbcon_takeover+0x78/0x130
 do_fb_registered+0x139/0x270
 fbcon_fb_registered+0x3b/0x90
 ? fb_add_videomode+0x81/0xf0
 register_framebuffer+0x22f/0x330
 __drm_fb_helper_initial_config_and_unlock+0x349/0x520 [drm_kms_helper]
 ? kmalloc_large+0x25/0xc0
 drm_fb_helper_initial_config+0x3f/0x50 [drm_kms_helper]
 drm_fbdev_generic_client_hotplug+0x73/0xc0 [drm_kms_helper]
 drm_fbdev_generic_setup+0x99/0x170 [drm_kms_helper]

Signed-off-by: Zack Rusin 
Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
Cc: Rob Clark 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/drm_vblank.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 299fa2a19a90..6438aeb1c65f 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -997,12 +997,16 @@ int drm_crtc_next_vblank_start(struct drm_crtc *crtc, 
ktime_t *vblanktime)
 {
unsigned int pipe = drm_crtc_index(crtc);
struct drm_vblank_crtc *vblank = &crtc->dev->vblank[pipe];
-   struct drm_display_mode *mode = &vblank->hwmode;
+   struct drm_display_mode *mode;
u64 vblank_start;
 
+   if (!drm_dev_has_vblank(crtc->dev))
+   return -EINVAL;
+
if (!vblank->framedur_ns || !vblank->linedur_ns)
return -EINVAL;
 
+   mode = &vblank->hwmode;
if (!drm_crtc_get_last_vbltimestamp(crtc, vblanktime, false))
return -EINVAL;
 
-- 
2.39.2

Re: (subset) [PATCH v2 0/4] arm64: qcom: sm8450: bindings check cleanup

2023-04-04 Thread Bjorn Andersson

On Fri, 24 Mar 2023 10:28:45 +0100, Neil Armstrong wrote:
> A few fixes to pass the DT bindings check successfully
> for sm8450 qrd & hdk DTs.
> 
> The following are still needed to pass all the checks:
> - 
> https://lore.kernel.org/r/20230308082424.140224-3-manivannan.sadhasi...@linaro.org
> - 
> https://lore.kernel.org/r/20230130-topic-sm8450-upstream-pmic-glink-v5-5-552f3b721...@linaro.org
> - 
> https://lore.kernel.org/all/20230308075648.134119-1-manivannan.sadhasi...@linaro.org/
> - 
> https://lore.kernel.org/r/20230306112129.3687744-1-dmitry.barysh...@linaro.org
> - 
> https://lore.kernel.org/all/20221209-dt-binding-ufs-v3-0-499dff23a...@fairphone.com/
> - 
> https://lore.kernel.org/all/20221118071849.25506-2-srinivas.kandaga...@linaro.org/
> 
> [...]

Applied, thanks!

[2/4] arm64: dts: qcom: sm8450: remove invalid properties in cluster-sleep nodes
  commit: 35fa9a7fc577a4d1ed541ff37d9dda83b0635e4b

Best regards,
-- 
Bjorn Andersson

Re: [PATCH 0/5] drm: shmobile: Fixes and enhancements

2023-04-04 Thread Laurent Pinchart

Hi Geert,

On Fri, Mar 31, 2023 at 04:48:06PM +0200, Geert Uytterhoeven wrote:
>   Hi all,
> 
> Currently, there are two drivers for the LCD controller on Renesas
> SuperH-based and ARM-based SH-Mobile and R-Mobile SoCs:
>   1. sh_mobile_lcdcfb, using the fbdev framework,
>   2. shmob_drm, using the DRM framework.
> However, only the former driver can be used, as all platform support
> integrates the former.  None of these drivers support DT-based systems.
> 
> This patch series is a first step to enable the SH-Mobile DRM driver for
> Renesas ARM-based SH-Mobile and R-Mobile SoCs.  The next step planned is
> to add DT support.
> 
> This has been tested on the R-Mobile A1-based Atmark Techno
> Armadillo-800-EVA development board, using a temporary
> platform-enablement patch[1].
> 
> Thanks for your comments!

Thanks for reviving this driver. I'm looking forward to DT and KMS
atomic support :-)

Will you get these patches merged through drm-misc ?

> [1] 
> https://lore.kernel.org/r/c03d4edbd650836bf6a96504df82338ec6d800ff.1680272980.git.geert+rene...@glider.be
> 
> Geert Uytterhoeven (5):
>   drm: shmobile: Use %p4cc to print fourcc codes
>   drm: shmobile: Add support for DRM_FORMAT_XRGB
>   drm: shmobile: Switch to drm_crtc_init_with_planes()
>   drm: shmobile: Add missing call to drm_fbdev_generic_setup()
>   drm: shmobile: Make DRM_SHMOBILE visible on Renesas SoC platforms
> 
>  drivers/gpu/drm/shmobile/Kconfig   |  2 +-
>  drivers/gpu/drm/shmobile/shmob_drm_crtc.c  | 35 +++---
>  drivers/gpu/drm/shmobile/shmob_drm_drv.c   |  3 ++
>  drivers/gpu/drm/shmobile/shmob_drm_kms.c   |  9 --
>  drivers/gpu/drm/shmobile/shmob_drm_plane.c |  5 
>  5 files changed, 47 insertions(+), 7 deletions(-)

-- 
Regards,

Laurent Pinchart

Re: [PATCH 5/5] drm: shmobile: Make DRM_SHMOBILE visible on Renesas SoC platforms

2023-04-04 Thread Laurent Pinchart

Hi Geert,

Thank you for the patch.

On Fri, Mar 31, 2023 at 04:48:11PM +0200, Geert Uytterhoeven wrote:
> The LCD Controller supported by the drm-shmob driver is not only present
> on SuperH SH-Mobile SoCs, but also on Renesas ARM SH/R-Mobile SoCs.
> Make its option visible, so the user can enable support for it.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/gpu/drm/shmobile/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/shmobile/Kconfig 
> b/drivers/gpu/drm/shmobile/Kconfig
> index 4ec5dc74a6b0b880..719d4e7a5cd75aad 100644
> --- a/drivers/gpu/drm/shmobile/Kconfig
> +++ b/drivers/gpu/drm/shmobile/Kconfig
> @@ -2,7 +2,7 @@
>  config DRM_SHMOBILE
>   tristate "DRM Support for SH Mobile"
>   depends on DRM && ARM

There shouldn't be anything ARM-dependent, could you drop "&& ARM" while
at it ?

Reviewed-by: Laurent Pinchart 

> - depends on ARCH_SHMOBILE || COMPILE_TEST
> + depends on ARCH_RENESAS || ARCH_SHMOBILE || COMPILE_TEST
>   select BACKLIGHT_CLASS_DEVICE
>   select DRM_KMS_HELPER
>   select DRM_GEM_DMA_HELPER

-- 
Regards,

Laurent Pinchart

Re: [PATCH 4/5] drm: shmobile: Add missing call to drm_fbdev_generic_setup()

2023-04-04 Thread Laurent Pinchart

Hi Geert,

Thank you for the patch.

On Fri, Mar 31, 2023 at 04:48:10PM +0200, Geert Uytterhoeven wrote:
> Set up generic fbdev emulation, to enable support for the Linux console.
> 
> Use 16 as the preferred depth, as that is a good compromise between
> colorfulness and resource utilization, and the default of the fbdev
> driver.
> 
> Suggested-by: Laurent Pinchart 
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Laurent Pinchart 

> ---
>  drivers/gpu/drm/shmobile/shmob_drm_drv.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/shmobile/shmob_drm_drv.c 
> b/drivers/gpu/drm/shmobile/shmob_drm_drv.c
> index faacfee24763b1d4..30493ce874192e3e 100644
> --- a/drivers/gpu/drm/shmobile/shmob_drm_drv.c
> +++ b/drivers/gpu/drm/shmobile/shmob_drm_drv.c
> @@ -16,6 +16,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -271,6 +272,8 @@ static int shmob_drm_probe(struct platform_device *pdev)
>   if (ret < 0)
>   goto err_irq_uninstall;
>  
> + drm_fbdev_generic_setup(ddev, 16);
> +
>   return 0;
>  
>  err_irq_uninstall:

-- 
Regards,

Laurent Pinchart

Re: [PATCH 3/5] drm: shmobile: Switch to drm_crtc_init_with_planes()

2023-04-04 Thread Laurent Pinchart

Hi Geert,

Thank you for the patch.

On Fri, Mar 31, 2023 at 04:48:09PM +0200, Geert Uytterhoeven wrote:
> The SH-Mobile DRM driver uses the legacy drm_crtc_init(), which
> advertizes only the formats in safe_modeset_formats[] (XR24 and AR24) as
> being supported.
> 
> Switch to drm_crtc_init_with_planes(), and advertize all supported
> (A)RGB modes, so we can use RGB565 as the default mode for the console.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/gpu/drm/shmobile/shmob_drm_crtc.c | 30 +--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c 
> b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
> index 08dc1428aa16caf0..11dd2bc803e7cb62 100644
> --- a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
> +++ b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -478,16 +479,41 @@ static const struct drm_crtc_funcs crtc_funcs = {
>   .disable_vblank = shmob_drm_disable_vblank,
>  };
>  
> +static const uint32_t modeset_formats[] = {
> + DRM_FORMAT_RGB565,
> + DRM_FORMAT_RGB888,
> + DRM_FORMAT_ARGB,
> + DRM_FORMAT_XRGB,
> +};
> +
> +static const struct drm_plane_funcs primary_plane_funcs = {
> + DRM_PLANE_NON_ATOMIC_FUNCS,
> +};
> +
>  int shmob_drm_crtc_create(struct shmob_drm_device *sdev)
>  {
>   struct drm_crtc *crtc = &sdev->crtc.crtc;
> + struct drm_plane *primary;
>   int ret;
>  
>   sdev->crtc.dpms = DRM_MODE_DPMS_OFF;
>  
> - ret = drm_crtc_init(sdev->ddev, crtc, &crtc_funcs);
> - if (ret < 0)
> + primary = __drm_universal_plane_alloc(sdev->ddev, sizeof(*primary), 0,
> +   0, &primary_plane_funcs,
> +   modeset_formats,
> +   ARRAY_SIZE(modeset_formats),
> +   NULL, DRM_PLANE_TYPE_PRIMARY,
> +   NULL);
> + if (IS_ERR(primary))
> + return PTR_ERR(primary);

This seems like a bit of a hack to me. Why don't you use the planes
created by shmob_drm_plane_create() instead of allocating a new one ?

> +
> + ret = drm_crtc_init_with_planes(sdev->ddev, crtc, primary, NULL,
> + &crtc_funcs, NULL);
> + if (ret < 0) {
> + drm_plane_cleanup(primary);
> + kfree(primary);
>   return ret;
> + }
>  
>   drm_crtc_helper_add(crtc, &crtc_helper_funcs);
>  

-- 
Regards,

Laurent Pinchart

Re: [PATCH] drm: bridge: ldb: add support for using channel 1 only

2023-04-04 Thread Marek Vasut


On 4/4/23 09:37, Luca Ceresoli wrote:

[...]


@@ -177,28 +183,25 @@ static void fsl_ldb_atomic_enable(struct drm_bridge 
*bridge,
clk_prepare_enable(fsl_ldb->clk);
  
  	/* Program LDB_CTRL */

-   reg = LDB_CTRL_CH0_ENABLE;
-
-   if (fsl_ldb->lvds_dual_link)
-   reg |= LDB_CTRL_CH1_ENABLE | LDB_CTRL_SPLIT_MODE;
-
-   if (lvds_format_24bpp) {
-   reg |= LDB_CTRL_CH0_DATA_WIDTH;
-   if (fsl_ldb->lvds_dual_link)
-   reg |= LDB_CTRL_CH1_DATA_WIDTH;
-   }
-
-   if (lvds_format_jeida) {
-   reg |= LDB_CTRL_CH0_BIT_MAPPING;
-   if (fsl_ldb->lvds_dual_link)
-   reg |= LDB_CTRL_CH1_BIT_MAPPING;
-   }
-
-   if (mode->flags & DRM_MODE_FLAG_PVSYNC) {
-   reg |= LDB_CTRL_DI0_VSYNC_POLARITY;
-   if (fsl_ldb->lvds_dual_link)
-   reg |= LDB_CTRL_DI1_VSYNC_POLARITY;
-   }
+   reg =


Cosmetic nit, do we need the newline here , can't we just move the first 
'(fsl_ldb->ch0_enabled ? LDB_CTRL_CH0_ENABLE : 0) |' on the same line as 
'reg =' ? It might need a bit of indent with spaces, but that should be OK.



+   (fsl_ldb->ch0_enabled ? LDB_CTRL_CH0_ENABLE : 0) |
+   (fsl_ldb->ch1_enabled ? LDB_CTRL_CH1_ENABLE : 0) |
+   (fsl_ldb_is_dual(fsl_ldb) ? LDB_CTRL_SPLIT_MODE : 0);
+
+   if (lvds_format_24bpp)
+   reg |=
+   (fsl_ldb->ch0_enabled ? LDB_CTRL_CH0_DATA_WIDTH : 0) |
+   (fsl_ldb->ch1_enabled ? LDB_CTRL_CH1_DATA_WIDTH : 0);
+
+   if (lvds_format_jeida)
+   reg |=
+   (fsl_ldb->ch0_enabled ? LDB_CTRL_CH0_BIT_MAPPING : 0) |
+   (fsl_ldb->ch1_enabled ? LDB_CTRL_CH1_BIT_MAPPING : 0);
+
+   if (mode->flags & DRM_MODE_FLAG_PVSYNC)
+   reg |=
+   (fsl_ldb->ch0_enabled ? LDB_CTRL_DI0_VSYNC_POLARITY : 
0) |
+   (fsl_ldb->ch1_enabled ? LDB_CTRL_DI1_VSYNC_POLARITY : 
0);
  
  	regmap_write(fsl_ldb->regmap, fsl_ldb->devdata->ldb_ctrl, reg);


[...]


@@ -311,10 +314,23 @@ static int fsl_ldb_probe(struct platform_device *pdev)
if (IS_ERR(fsl_ldb->regmap))
return PTR_ERR(fsl_ldb->regmap);
  
-	/* Locate the panel DT node. */

-   panel_node = of_graph_get_remote_node(dev->of_node, 1, 0);
-   if (!panel_node)
-   return -ENXIO;
+   /* Locate the remote ports and the panel node */
+   remote1 = of_graph_get_remote_node(dev->of_node, 1, 0);
+   remote2 = of_graph_get_remote_node(dev->of_node, 2, 0);
+   fsl_ldb->ch0_enabled = (remote1 != NULL);
+   fsl_ldb->ch1_enabled = (remote2 != NULL);
+   panel_node = of_node_get(remote1 ? remote1 : remote2);


You can even do this without the middle 'remote1' I think:

panel_node = of_node_get(remote1 ? : remote2);


+   of_node_put(remote1);
+   of_node_put(remote2);
+
+   if (!fsl_ldb->ch0_enabled && !fsl_ldb->ch1_enabled) {
+   of_node_put(panel_node);
+   return dev_err_probe(dev, -ENXIO, "No panel node found");
+   }
+
+   dev_dbg(dev, "Using %s\n",
+   fsl_ldb_is_dual(fsl_ldb) ? "dual mode" :


I think this is called "dual-link mode" , maybe update the string .


+   fsl_ldb->ch0_enabled ? "channel 0" : "channel 1");
  
  	panel = of_drm_find_panel(panel_node);

of_node_put(panel_node);
@@ -325,20 +341,26 @@ static int fsl_ldb_probe(struct platform_device *pdev)
if (IS_ERR(fsl_ldb->panel_bridge))
return PTR_ERR(fsl_ldb->panel_bridge);
  
-	/* Determine whether this is dual-link configuration */

-   port1 = of_graph_get_port_by_id(dev->of_node, 1);
-   port2 = of_graph_get_port_by_id(dev->of_node, 2);
-   dual_link = drm_of_lvds_get_dual_link_pixel_order(port1, port2);
-   of_node_put(port1);
-   of_node_put(port2);
  
-	if (dual_link == DRM_LVDS_DUAL_LINK_EVEN_ODD_PIXELS) {

-   dev_err(dev, "LVDS channel pixel swap not supported.\n");
-   return -EINVAL;
-   }
+   if (fsl_ldb_is_dual(fsl_ldb)) {
+   struct device_node *port1, *port2;
+
+   port1 = of_graph_get_port_by_id(dev->of_node, 1);
+   port2 = of_graph_get_port_by_id(dev->of_node, 2);
+   dual_link = drm_of_lvds_get_dual_link_pixel_order(port1, port2);
+   of_node_put(port1);
+   of_node_put(port2);
  
-	if (dual_link == DRM_LVDS_DUAL_LINK_ODD_EVEN_PIXELS)

-   fsl_ldb->lvds_dual_link = true;
+   if (dual_link < 0)
+   return dev_err_probe(dev, dual_link,
+"Error getting dual link 
configuration");


Does this need a trailing '\n' in the formatting string or not ? I think 
yes.


The rest looks good, with the few details fixed:

Reviewed-by: Marek Vasut

Re: [PATCH 2/5] drm: shmobile: Add support for DRM_FORMAT_XRGB8888

2023-04-04 Thread Laurent Pinchart

Hi Geert,

Thank you for the patch.

On Fri, Mar 31, 2023 at 04:48:08PM +0200, Geert Uytterhoeven wrote:
> DRM_FORMAT_XRGB aka XR24 is the modus francus of DRM, and should be
> supported by all drivers.
> 
> The handling for DRM_FORMAT_XRGB is similar to DRM_FORMAT_ARGB,
> just ignore the alpha channel.
> 
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Laurent Pinchart 

> ---
>  drivers/gpu/drm/shmobile/shmob_drm_crtc.c  | 1 +
>  drivers/gpu/drm/shmobile/shmob_drm_kms.c   | 5 +
>  drivers/gpu/drm/shmobile/shmob_drm_plane.c | 5 +
>  3 files changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c 
> b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
> index 713a7612244c647a..08dc1428aa16caf0 100644
> --- a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
> +++ b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
> @@ -232,6 +232,7 @@ static void shmob_drm_crtc_start(struct shmob_drm_crtc 
> *scrtc)
>   value = LDDDSR_LS | LDDDSR_WS | LDDDSR_BS;
>   break;
>   case DRM_FORMAT_ARGB:
> + case DRM_FORMAT_XRGB:
>   default:
>   value = LDDDSR_LS;
>   break;
> diff --git a/drivers/gpu/drm/shmobile/shmob_drm_kms.c 
> b/drivers/gpu/drm/shmobile/shmob_drm_kms.c
> index 3c5fe3bc183c7c13..99381cc0abf3ae1f 100644
> --- a/drivers/gpu/drm/shmobile/shmob_drm_kms.c
> +++ b/drivers/gpu/drm/shmobile/shmob_drm_kms.c
> @@ -39,6 +39,11 @@ static const struct shmob_drm_format_info 
> shmob_drm_format_infos[] = {
>   .bpp = 32,
>   .yuv = false,
>   .lddfr = LDDFR_PKF_ARGB32,
> + }, {
> + .fourcc = DRM_FORMAT_XRGB,
> + .bpp = 32,
> + .yuv = false,
> + .lddfr = LDDFR_PKF_ARGB32,
>   }, {
>   .fourcc = DRM_FORMAT_NV12,
>   .bpp = 12,
> diff --git a/drivers/gpu/drm/shmobile/shmob_drm_plane.c 
> b/drivers/gpu/drm/shmobile/shmob_drm_plane.c
> index 604ae23825daaafd..850986cee848226a 100644
> --- a/drivers/gpu/drm/shmobile/shmob_drm_plane.c
> +++ b/drivers/gpu/drm/shmobile/shmob_drm_plane.c
> @@ -80,6 +80,7 @@ static void __shmob_drm_plane_setup(struct shmob_drm_plane 
> *splane,
>   format |= LDBBSIFR_SWPL | LDBBSIFR_SWPW | LDBBSIFR_SWPB;
>   break;
>   case DRM_FORMAT_ARGB:
> + case DRM_FORMAT_XRGB:
>   default:
>   format |= LDBBSIFR_SWPL;
>   break;
> @@ -95,6 +96,9 @@ static void __shmob_drm_plane_setup(struct shmob_drm_plane 
> *splane,
>   case DRM_FORMAT_ARGB:
>   format |= LDBBSIFR_AL_PK | LDBBSIFR_RY | LDDFR_PKF_ARGB32;
>   break;
> + case DRM_FORMAT_XRGB:
> + format |= LDBBSIFR_AL_1 | LDBBSIFR_RY | LDDFR_PKF_ARGB32;
> + break;
>   case DRM_FORMAT_NV12:
>   case DRM_FORMAT_NV21:
>   format |= LDBBSIFR_AL_1 | LDBBSIFR_CHRR_420;
> @@ -231,6 +235,7 @@ static const uint32_t formats[] = {
>   DRM_FORMAT_RGB565,
>   DRM_FORMAT_RGB888,
>   DRM_FORMAT_ARGB,
> + DRM_FORMAT_XRGB,
>   DRM_FORMAT_NV12,
>   DRM_FORMAT_NV21,
>   DRM_FORMAT_NV16,

-- 
Regards,

Laurent Pinchart

Re: [PATCH 1/5] drm: shmobile: Use %p4cc to print fourcc codes

2023-04-04 Thread Laurent Pinchart

Hi Geert,

Thank you for the patch.

On Fri, Mar 31, 2023 at 04:48:07PM +0200, Geert Uytterhoeven wrote:
> Replace the printing of hexadecimal fourcc format codes by
> pretty-printed format names, using the "%p4cc" format specifier.

I really like %p4cc :-) I makes things much neater.

> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/gpu/drm/shmobile/shmob_drm_crtc.c | 4 ++--
>  drivers/gpu/drm/shmobile/shmob_drm_kms.c  | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c 
> b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
> index d354ab3077cecf94..713a7612244c647a 100644
> --- a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
> +++ b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c
> @@ -355,8 +355,8 @@ static int shmob_drm_crtc_mode_set(struct drm_crtc *crtc,
>  
>   format = shmob_drm_format_info(crtc->primary->fb->format->format);
>   if (format == NULL) {
> - dev_dbg(sdev->dev, "mode_set: unsupported format %08x\n",
> - crtc->primary->fb->format->format);
> + dev_dbg(sdev->dev, "mode_set: unsupported format %p4cc\n",
> + &crtc->primary->fb->format->format);
>   return -EINVAL;
>   }
>  
> diff --git a/drivers/gpu/drm/shmobile/shmob_drm_kms.c 
> b/drivers/gpu/drm/shmobile/shmob_drm_kms.c
> index 60a2c8d8a0d947d2..3c5fe3bc183c7c13 100644
> --- a/drivers/gpu/drm/shmobile/shmob_drm_kms.c
> +++ b/drivers/gpu/drm/shmobile/shmob_drm_kms.c
> @@ -96,8 +96,8 @@ shmob_drm_fb_create(struct drm_device *dev, struct drm_file 
> *file_priv,
>  
>   format = shmob_drm_format_info(mode_cmd->pixel_format);
>   if (format == NULL) {
> - dev_dbg(dev->dev, "unsupported pixel format %08x\n",
> - mode_cmd->pixel_format);
> + dev_dbg(dev->dev, "unsupported pixel format %p4cc\n",
> + &mode_cmd->pixel_format);
>   return ERR_PTR(-EINVAL);
>   }
>  

-- 
Regards,

Laurent Pinchart

Re: [RESEND PATCH v4 03/21] staging: media: tegra-video: fix .vidioc_enum_fmt_vid_cap to return all formats

2023-04-04 Thread Laurent Pinchart

Hi Luca,

On Tue, Apr 04, 2023 at 04:12:51PM +0200, Luca Ceresoli wrote:
> On Wed, 29 Mar 2023 13:16:22 +0200 Hans Verkuil wrote:
> 
> > Hi Luca,
> > 
> > I finally found the time to test this series. It looks OK, except for this 
> > patch.
> 
> Thank you very much for taking the time!
> 
> > The list of supported formats really has to be the intersection of what the 
> > tegra
> > supports and what the sensor supports.
> > 
> > Otherwise you would advertise pixelformats that cannot be used, and the 
> > application
> > would have no way of knowing that.
> 
> As far as I understand, I think we should rather make this driver fully
> behave as an MC-centric device. It is already using MC quite
> successfully after all.
> 
> Do you think this is correct?

Given the use cases for this driver, I agree.

> If you do, then I think the plan would be:
> 
>  - Add the V4L2_CAP_IO_MC flag
>  - As the mbus_code in get_format appropriately
>  - Leave the changes in this patch unmodified otherwise

-- 
Regards,

Laurent Pinchart

Re: [PATCH v1 2/3] msm/disp/dpu: allow atomic_check in PSR usecase

2023-04-04 Thread Doug Anderson

Hi,

On Fri, Mar 31, 2023 at 6:59 AM Vinod Polimera
 wrote:
>
> Certain flags like dirty_fb will be updated into the plane state
> during crtc atomic_check. Allow those updates during PSR commit.
>
> Reported-by: Bjorn Andersson 
> Link: https://lore.kernel.org/all/20230326162723.3lo6pnsfdwzsvbhj@ripper/
> Signed-off-by: Vinod Polimera 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I can confirm that this patch plus patch #1 fixes the typing problems
seen on VT2 on a Chromebook using PSR.

Tested-by: Douglas Anderson

Re: [PATCH v1 1/3] drm/msm/dpu: set dirty_fb flag while in self refresh mode

2023-04-04 Thread Doug Anderson

Hi,

On Fri, Mar 31, 2023 at 6:59 AM Vinod Polimera
 wrote:
>
> While in virtual terminal mode with PSR enabled, there will be
> no atomic commits triggered without dirty_fb being set. This
> will create a notion of no screen update. Allow atomic commit
> when dirty_fb ioctl is issued, so that it can trigger a PSR exit
> and shows update on the screen.
>
> Reported-by: Bjorn Andersson 
> Link: https://lore.kernel.org/all/20230326162723.3lo6pnsfdwzsvbhj@ripper/
> Signed-off-by: Vinod Polimera 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c | 3 +++
>  1 file changed, 3 insertions(+)

I can confirm that this patch plus patch #2 fixes the typing problems
seen on VT2 on a Chromebook using PSR.

Tested-by: Douglas Anderson 


-Doug

Re: [PATCH] drm/msm/mdp5: set varaiable msm8x76_config storage-class-specifier to static

2023-04-04 Thread Dmitry Baryshkov



On Tue, 04 Apr 2023 14:53:29 -0400, Tom Rix wrote:
> smatch reports
> drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c:658:26: warning: symbol
>   'msm8x76_config' was not declared. Should it be static?
> 
> This variable is only used in one file so should be static.
> 
> 
> [...]

Applied, thanks!

[1/1] drm/msm/mdp5: set varaiable msm8x76_config storage-class-specifier to 
static
  https://gitlab.freedesktop.org/lumag/msm/-/commit/d2f762745162

Best regards,
-- 
Dmitry Baryshkov

Re: [PATCH v2 0/8] drm/msm: Convert fbdev to DRM client

2023-04-04 Thread Dmitry Baryshkov



On Mon, 03 Apr 2023 14:45:30 +0200, Thomas Zimmermann wrote:
> Convert msm' fbdev code to struct drm_client. Replaces the current
> ad-hoc integration. The conversion includes a number of cleanups. As
> with most other drivers' fbdev emulation, fbdev in msm is now just
> another DRM client that runs after the DRM device has been registered.
> 
> Once all drivers' fbdev emulation has been converted to struct drm_client,
> we can attempt to add additional in-kernel clients. A DRM-based dmesg
> log or a bootsplash are commonly mentioned. DRM can then switch easily
> among the existing clients if/when required.
> 
> [...]

Applied, thanks!

[1/8] drm/msm: Include 
  https://gitlab.freedesktop.org/lumag/msm/-/commit/62c58ffe011d
[2/8] drm/msm: Clear aperture ownership outside of fbdev code
  https://gitlab.freedesktop.org/lumag/msm/-/commit/f4de16da5b40
[3/8] drm/msm: Remove fb from struct msm_fbdev
  https://gitlab.freedesktop.org/lumag/msm/-/commit/a5ddc0f1a7bc
[4/8] drm/msm: Remove struct msm_fbdev
  https://gitlab.freedesktop.org/lumag/msm/-/commit/09cbdbafbe9f
[5/8] drm/msm: Remove fbdev from struct msm_drm_private
  https://gitlab.freedesktop.org/lumag/msm/-/commit/37e8bad3ae5d
[6/8] drm/msm: Move module parameter 'fbdev' to fbdev code
  https://gitlab.freedesktop.org/lumag/msm/-/commit/2fa4748b5ad8
[7/8] drm/msm: Initialize fbdev DRM client
  https://gitlab.freedesktop.org/lumag/msm/-/commit/7e563538d210
[8/8] drm/msm: Implement fbdev emulation as in-kernel client
  https://gitlab.freedesktop.org/lumag/msm/-/commit/5ba5b96d3327

Best regards,
-- 
Dmitry Baryshkov

Re: [PATCH v4 0/4] Reserve DSPPs based on user request

2023-04-04 Thread Dmitry Baryshkov



On Mon, 13 Feb 2023 03:11:40 -0800, Kalyan Thota wrote:
> This series will enable color features on sc7280 target which has
> primary panel as eDP
> 
> The series removes DSPP allocation based on encoder type and allows
> the DSPP reservation based on user request via CTM.
> 
> The series will release/reserve the dpu resources whenever there is
> a CTM enable/disable change so that DSPPs are allocated appropriately.
> 
> [...]

Applied, thanks!

[2/4] drm/msm/dpu: add DSPPs into reservation upon a CTM request
  https://gitlab.freedesktop.org/lumag/msm/-/commit/1a9c3512fbd4
[3/4] drm/msm/dpu: avoid unnecessary check in DPU reservations
  https://gitlab.freedesktop.org/lumag/msm/-/commit/8b1ed0088e21
[4/4] drm/msm/dpu: manage DPU resources if CTM is requested
  https://gitlab.freedesktop.org/lumag/msm/-/commit/34c74e76a6a5

Best regards,
-- 
Dmitry Baryshkov

Re: [PATCH][next] drm/msm/mdss: Fix spelling mistake "Unuspported" -> "Unsupported"

2023-04-04 Thread Dmitry Baryshkov



On Wed, 29 Mar 2023 10:30:26 +0100, Colin Ian King wrote:
> There is a spelling mistake in a dev_error message. Fix it.
> 
> 

Applied, thanks!

[1/1] drm/msm/mdss: Fix spelling mistake "Unuspported" -> "Unsupported"
  https://gitlab.freedesktop.org/lumag/msm/-/commit/44310e2964f2

Best regards,
-- 
Dmitry Baryshkov

Re: [PATCH v4 31/42] drm/msm/dpu: enable DPU_CTL_SPLIT_DISPLAY for sc8280xp

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 6:06 AM, Dmitry Baryshkov wrote:

Theoretically since sm8150 we should be using a single CTL for the
source split case, but since we do not support it for now, fallback to
DPU_CTL_SPLIT_DISPLAY.



Did you mean split panel case?


Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h 
b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h
index d30b797e90e0..3dc850a681bb 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_8_0_sc8280xp.h
@@ -42,17 +42,18 @@ static const struct dpu_mdp_cfg sc8280xp_mdp[] = {
},
  };
  
+/* FIXME: get rid of DPU_CTL_SPLIT_DISPLAY in favour of proper ACTIVE_CTL support */

  static const struct dpu_ctl_cfg sc8280xp_ctl[] = {
{
.name = "ctl_0", .id = CTL_0,
.base = 0x15000, .len = 0x204,
-   .features = CTL_SC7280_MASK,
+   .features = BIT(DPU_CTL_SPLIT_DISPLAY) | CTL_SC7280_MASK,
.intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
},
{
.name = "ctl_1", .id = CTL_1,
.base = 0x16000, .len = 0x204,
-   .features = CTL_SC7280_MASK,
+   .features = BIT(DPU_CTL_SPLIT_DISPLAY) | CTL_SC7280_MASK,
.intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 10),
},
{

Re: [PATCH v4 30/42] drm/msm/dpu: catalog: add comments regarding DPU_CTL_SPLIT_DISPLAY

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 6:06 AM, Dmitry Baryshkov wrote:

For sm8150+ the DPU_CTL_SPLIT_DISPLAY should be replaced with
DPU_CTL_ACTIVE_CFG support (which supports having a single CTL for both
interfaces in a split). Add comments where this conversion is required.

Signed-off-by: Dmitry Baryshkov 


Reviewed-by: Abhinav Kumar

Re: [PATCH v4 01/42] drm/msm/dpu: use CTL_SC7280_MASK for sm8450's ctl_0

2023-04-04 Thread Dmitry Baryshkov


On 05/04/2023 04:00, Abhinav Kumar wrote:



On 4/4/2023 5:43 PM, Dmitry Baryshkov wrote:

On 05/04/2023 03:39, Abhinav Kumar wrote:



On 4/4/2023 5:33 PM, Dmitry Baryshkov wrote:

On 05/04/2023 01:12, Abhinav Kumar wrote:



On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:
On sm8450 platform the CTL_0 doesn't differ from the rest of CTL 
blocks,

so switch it to CTL_SC7280_MASK too.

Some background: original commit 100d7ef6995d ("drm/msm/dpu: add 
support
for SM8450") had all (relevant at that time) bit spelled 
individually.

Then commit 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 to hw catalog"),
despite being a mismerge, correctly changed all other CTL entries 
to use

CTL_SC7280_MASK, except CTL_0.



I think having it spelled individually is better. If we start using 
one chipset's mask for another, we are again going down the same 
path of this becoming one confused file.


So, even though I agree that 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 
to hw catalog") corrected the mask to re-use sc7280, with the 
individual catalog file, its better to have it separate and spelled 
individually.


This change is not heading in the direction of the rest of the series.


I didn't create duplicates of all the defines. This is done well in 
the style of patch37. I'm not going to add all per-SoC feature masks.




Yes, I was actually going to comment even on patch 37.

We are again trying to generalize a CTL's caps based on DPU version, 
the same mistake which led us down this path.


So today you have CTL_DPU_0_MASK , CTL_DPU_5_MASK , CTL_DPU_7_MASK 
and CTL_DPU_9_MASK and this builds on an assumption that you can get 
5 by ORing ACTIVE_CFG with 0.


+#define CTL_DPU_5_MASK (CTL_DPU_0_MASK | \
+    BIT(DPU_CTL_ACTIVE_CFG))
+

This is again moving towards that problematic pattern.

Why dont we stick to CTL features individually spelling it then work 
towards generalizing as we discussed.


Because adding a feature would become a nightmare of touching all the 
platforms?




On the contrary, this would help us to enable the feature where it was 
verified and not generalize it that when it works on one chipset it will 
work on the other.


We have been discussing this while I was working on wide planes. I 
agreed there, because it was the topic which can impact a lot. In the 
same way the virtual planes even have a modparam knob as to limit the 
impact.


However I still have the understanding that this is not how the things 
should be working. This is not how we are doing development/cleanups in 
other areas. The usual practice is perform the change, verify it as much 
as possible, collect the fallouts. Doing it other way around would 
mostly stop the development.




There is another point of view here which we have already seen.

If we go with generalizing, then when we find one case which is 
different, we end up decoupling the generalization and thats more 
painful and led us to the rework in the first place.


No, squashing everything together led us to the rework. I'd prefer to 
see whole picture before reworking this part. I'm not going to do 
everything at once. So far the masks have been working in the boundaries 
of generations. The only one which didn't is the IRQ mask. It gets inlined.


If you want it other way, currently we have three defines which do not 
fall into the DPUn standard. I'd prefer to get more date before making a 
decision there.





We discussed not merging on major+LM. Glad, I agreed there. But I 
don't think that we should remove existing defines without good 
reason. We know that they work in the majority of cases.




Ofcourse it will work today because we have covered only supported 
chipsets but its just the start of a design which we know led us to this 
rework.


Again, getting samples led us to the understanding and then the rework. 
Doing rework without understanding the issues is like premature 
optimization.






Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c

index 6840b22a4159..83f8f83e2b29 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -975,7 +975,7 @@ static const struct dpu_ctl_cfg sm8450_ctl[] = {
  {
  .name = "ctl_0", .id = CTL_0,
  .base = 0x15000, .len = 0x204,
-    .features = BIT(DPU_CTL_ACTIVE_CFG) | 
BIT(DPU_CTL_SPLIT_DISPLAY) | BIT(DPU_CTL_FETCH_ACTIVE),

+    .features = BIT(DPU_CTL_SPLIT_DISPLAY) | CTL_SC7280_MASK,
  .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
  },
  {






--
With best wishes
Dmitry

Re: [PATCH v4 28/42] drm/msm/dpu: expand sm8550 catalog

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 6:06 AM, Dmitry Baryshkov wrote:

Duplicate sm8450 catalog entries to sm8550 to remove dependencies
between DPU instances.

Signed-off-by: Dmitry Baryshkov 
---
  .../msm/disp/dpu1/catalog/dpu_9_0_sm8550.h| 32 ++-
  1 file changed, 31 insertions(+), 1 deletion(-)



Reviewed-by: Abhinav Kumar

Re: [PATCH v4 27/42] drm/msm/dpu: expand sm6115 catalog

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 6:06 AM, Dmitry Baryshkov wrote:

Duplicate qcm2290 catalog entries to sm6115 to remove dependencies
between DPU instances.

Signed-off-by: Dmitry Baryshkov 
---


Reviewed-by: Abhinav Kumar

Re: [PATCH v4 03/42] drm/msm/dpu: Allow variable INTF_BLK size

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 5:37 PM, Dmitry Baryshkov wrote:

On 05/04/2023 01:30, Abhinav Kumar wrote:



On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:

From: Konrad Dybcio 

These blocks are of variable length on different SoCs. Set the
correct values where I was able to retrieve it from downstream
DTs and leave the old defaults (0x280) otherwise.

Signed-off-by: Konrad Dybcio 
[DB: fixed some lengths, split the INTF changes away]
Signed-off-by: Dmitry Baryshkov 


Everything is fine except sm8250.

DPU | SoC  | INTF_DSI size
5.0 | sm8150   | 0x2bc
5.1 | sc8180x  | 0x2bc
6.0 | sm8250   | 0x2c0
6.2 | sc7180   | 0x2c0
6.3 | sm6115   | 0x2c0
6.5 | qcm2290  | 0x2c0
7.0 | sm8350   | 0x2c4
7.2 | sc7280   | 0x2c4
8.0 | sc8280xp | 0x300
8.1 | sm8450   | 0x300
9.0 | sm8550   | 0x300

Today sm8250 is using the same table as sm8150 but it needs 0x2c0 and 
not 0x2bc.


We should de-duplicate it add a new one for sm8250?


This is done in patch 22. It makes no sense to play with the data until 
we are clear, which platform uses which instance.




Ack, that one looks fine and since this one is just preserving what was 
already present, this change LGTM, hence


Reviewed-by: Abhinav Kumar

Re: [PATCH v4 01/42] drm/msm/dpu: use CTL_SC7280_MASK for sm8450's ctl_0

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 5:43 PM, Dmitry Baryshkov wrote:

On 05/04/2023 03:39, Abhinav Kumar wrote:



On 4/4/2023 5:33 PM, Dmitry Baryshkov wrote:

On 05/04/2023 01:12, Abhinav Kumar wrote:



On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:
On sm8450 platform the CTL_0 doesn't differ from the rest of CTL 
blocks,

so switch it to CTL_SC7280_MASK too.

Some background: original commit 100d7ef6995d ("drm/msm/dpu: add 
support

for SM8450") had all (relevant at that time) bit spelled individually.
Then commit 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 to hw catalog"),
despite being a mismerge, correctly changed all other CTL entries 
to use

CTL_SC7280_MASK, except CTL_0.



I think having it spelled individually is better. If we start using 
one chipset's mask for another, we are again going down the same 
path of this becoming one confused file.


So, even though I agree that 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 
to hw catalog") corrected the mask to re-use sc7280, with the 
individual catalog file, its better to have it separate and spelled 
individually.


This change is not heading in the direction of the rest of the series.


I didn't create duplicates of all the defines. This is done well in 
the style of patch37. I'm not going to add all per-SoC feature masks.




Yes, I was actually going to comment even on patch 37.

We are again trying to generalize a CTL's caps based on DPU version, 
the same mistake which led us down this path.


So today you have CTL_DPU_0_MASK , CTL_DPU_5_MASK , CTL_DPU_7_MASK  
and CTL_DPU_9_MASK and this builds on an assumption that you can get 5 
by ORing ACTIVE_CFG with 0.


+#define CTL_DPU_5_MASK (CTL_DPU_0_MASK | \
+    BIT(DPU_CTL_ACTIVE_CFG))
+

This is again moving towards that problematic pattern.

Why dont we stick to CTL features individually spelling it then work 
towards generalizing as we discussed.


Because adding a feature would become a nightmare of touching all the 
platforms?




On the contrary, this would help us to enable the feature where it was 
verified and not generalize it that when it works on one chipset it will 
work on the other.


There is another point of view here which we have already seen.

If we go with generalizing, then when we find one case which is 
different, we end up decoupling the generalization and thats more 
painful and led us to the rework in the first place.



We discussed not merging on major+LM. Glad, I agreed there. But I don't 
think that we should remove existing defines without good reason. We 
know that they work in the majority of cases.




Ofcourse it will work today because we have covered only supported 
chipsets but its just the start of a design which we know led us to this 
rework.




Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c

index 6840b22a4159..83f8f83e2b29 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -975,7 +975,7 @@ static const struct dpu_ctl_cfg sm8450_ctl[] = {
  {
  .name = "ctl_0", .id = CTL_0,
  .base = 0x15000, .len = 0x204,
-    .features = BIT(DPU_CTL_ACTIVE_CFG) | 
BIT(DPU_CTL_SPLIT_DISPLAY) | BIT(DPU_CTL_FETCH_ACTIVE),

+    .features = BIT(DPU_CTL_SPLIT_DISPLAY) | CTL_SC7280_MASK,
  .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
  },
  {

Re: [PATCH][next] drm/msm/mdss: Fix spelling mistake "Unuspported" -> "Unsupported"

2023-04-04 Thread Dmitry Baryshkov


On 29/03/2023 12:30, Colin Ian King wrote:

There is a spelling mistake in a dev_error message. Fix it.

Signed-off-by: Colin Ian King 
---
  drivers/gpu/drm/msm/msm_mdss.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

Re: [PATCH 0/3] Revert lima fdinfo patchset

2023-04-04 Thread Qiang Yu

Applied to drm-misc-next, sorry for the trouble.

Regards,
Qiang

On Wed, Apr 5, 2023 at 3:28 AM Daniel Vetter  wrote:
>
> On Tue, Apr 04, 2023 at 04:17:33PM +0100, Emil Velikov wrote:
> > On Tue, 4 Apr 2023 at 08:13,  wrote:
> > >
> > > From: Qiang Yu 
> > >
> > > Upstream has reverted the dependant commit
> > > df622729ddbf ("drm/scheduler: track GPU active time per entity""),
> > > but this patchset has been pushed to drm-misc-next which still
> > > has that commit. To avoid other branch build fail after merge
> > > drm-misc-next, just revert the lima patchset on drm-misc-next too.
> > >
> >
> > The bug/revert is unfortunate, although we better keep the build clean
> > or Linus will go bananas ^_^
> >
> > Fwiw for the series:
> > Acked-by: Emil Velikov 
>
> Can you (eitehr of you really) please push asap and make sure this doesn't
> miss the last drm-misc-next pull (-rc6 is this w/e)? Otherwise we'll have
> a bit a mess.
> -Daniel
>
> >
> > HTH
> > -Emil
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [PATCH v4 01/42] drm/msm/dpu: use CTL_SC7280_MASK for sm8450's ctl_0

2023-04-04 Thread Dmitry Baryshkov


On 05/04/2023 03:39, Abhinav Kumar wrote:



On 4/4/2023 5:33 PM, Dmitry Baryshkov wrote:

On 05/04/2023 01:12, Abhinav Kumar wrote:



On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:
On sm8450 platform the CTL_0 doesn't differ from the rest of CTL 
blocks,

so switch it to CTL_SC7280_MASK too.

Some background: original commit 100d7ef6995d ("drm/msm/dpu: add 
support

for SM8450") had all (relevant at that time) bit spelled individually.
Then commit 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 to hw catalog"),
despite being a mismerge, correctly changed all other CTL entries to 
use

CTL_SC7280_MASK, except CTL_0.



I think having it spelled individually is better. If we start using 
one chipset's mask for another, we are again going down the same path 
of this becoming one confused file.


So, even though I agree that 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 
to hw catalog") corrected the mask to re-use sc7280, with the 
individual catalog file, its better to have it separate and spelled 
individually.


This change is not heading in the direction of the rest of the series.


I didn't create duplicates of all the defines. This is done well in 
the style of patch37. I'm not going to add all per-SoC feature masks.




Yes, I was actually going to comment even on patch 37.

We are again trying to generalize a CTL's caps based on DPU version, the 
same mistake which led us down this path.


So today you have CTL_DPU_0_MASK , CTL_DPU_5_MASK , CTL_DPU_7_MASK  and 
CTL_DPU_9_MASK and this builds on an assumption that you can get 5 by 
ORing ACTIVE_CFG with 0.


+#define CTL_DPU_5_MASK (CTL_DPU_0_MASK | \
+    BIT(DPU_CTL_ACTIVE_CFG))
+

This is again moving towards that problematic pattern.

Why dont we stick to CTL features individually spelling it then work 
towards generalizing as we discussed.


Because adding a feature would become a nightmare of touching all the 
platforms?


We discussed not merging on major+LM. Glad, I agreed there. But I don't 
think that we should remove existing defines without good reason. We 
know that they work in the majority of cases.



Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c

index 6840b22a4159..83f8f83e2b29 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -975,7 +975,7 @@ static const struct dpu_ctl_cfg sm8450_ctl[] = {
  {
  .name = "ctl_0", .id = CTL_0,
  .base = 0x15000, .len = 0x204,
-    .features = BIT(DPU_CTL_ACTIVE_CFG) | 
BIT(DPU_CTL_SPLIT_DISPLAY) | BIT(DPU_CTL_FETCH_ACTIVE),

+    .features = BIT(DPU_CTL_SPLIT_DISPLAY) | CTL_SC7280_MASK,
  .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
  },
  {




--
With best wishes
Dmitry

[PATCH v4 5/6] drm/msm/dsi: update hdisplay calculation for dsi_timing_setup

2023-04-04 Thread Jessica Zhang

hdisplay for compressed images should be calculated as bytes_per_slice *
slice_count. Thus, use MSM DSC helper to calculate hdisplay for
dsi_timing_setup instead of directly using mode->hdisplay.

Changes in v3:
- Split from previous patch
- Initialized hdisplay as uncompressed pclk per line at the beginning of
  dsi_timing_setup as to not break dual DSI calculations

Changes in v4:
- Moved pclk_per_intf calculations to DSC hdisplay adjustments

Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 6a6218a9655f..412339cc9301 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -958,7 +958,7 @@ static void dsi_timing_setup(struct msm_dsi_host *msm_host, 
bool is_bonded_dsi)
 * pulse width same
 */
h_total -= hdisplay;
-   hdisplay /= 3;
+   hdisplay = msm_dsc_get_pclk_per_intf(msm_host->dsc) / 3;
h_total += hdisplay;
ha_end = ha_start + hdisplay;
}

-- 
2.40.0

[PATCH v4 0/6] Introduce MSM-specific DSC helpers

2023-04-04 Thread Jessica Zhang

There are some overlap in calculations for MSM-specific DSC variables
between DP and DSI. In addition, the calculations for initial_scale_value
and det_thresh_flatness that are defined within the DSC 1.2 specifications,
but aren't yet included in drm_dsc_helper.c.

This series moves these calculations to a shared msm_dsc_helper.c file and
defines drm_dsc_helper methods for initial_scale_value and
det_thresh_flatness.

Note: For now, the MSM specific helper methods are only called for the DSI
path, but will called for DP once DSC 1.2 support for DP has been added.

Depends on: "drm/i915: move DSC RC tables to drm_dsc_helper.c" [1]

[1] https://patchwork.freedesktop.org/series/114472/

---
Changes in v4:
- Changed msm_dsc_get_uncompressed_pclk_per_intf to msm_dsc_get_pclk_per_intf
- Moved pclk_per_intf calculation for dsi_timing_setup to `if
  (msm_host->dsc)` block
- Link to v3: 
https://lore.kernel.org/r/20230329-rfc-msm-dsc-helper-v3-0-6bec0d277...@quicinc.com

Changes in v3:
- Cleaned up unused parameters
- Reworded some calculations for clarity
- Changed get_bytes_per_soft_slice() to a public method
- Added comment documentation to MSM DSC helpers
- Changed msm_dsc_get_eol_byte_num() to *_get_bytes_per_intf()
- Split dsi_timing_setup() hdisplay calculation to a separate patch
- Dropped 78c8b81d57d8 ("drm/display/dsc: Add flatness and initial scale
  value calculations") patch as it was absorbed in Dmitry's DSC series [1]
- Link to v2: 
https://lore.kernel.org/r/20230329-rfc-msm-dsc-helper-v2-0-3c13ced53...@quicinc.com

Changes in v2:
- Changed det_thresh_flatness to flatness_det_thresh
- Moved msm_dsc_helper files to msm/ directory
- Fixed type mismatch issues in MSM DSC helpers
- Dropped MSM_DSC_SLICE_PER_PKT macro
- Removed get_comp_ratio() helper
- Style changes to improve readability
- Use drm_dsc_get_bpp_int() instead of DSC_BPP macro
- Picked up Fixes tags for patches 3/5 and 4/5
- Picked up Reviewed-by for patch 4/5
- Split eol_byte_num and pkt_per_line calculation into a separate patch
- Moved pclk_per_line calculation into `if (dsc)` block in
  dsi_timing_setup()
- Link to v1: 
https://lore.kernel.org/r/20230329-rfc-msm-dsc-helper-v1-0-f3e479f59...@quicinc.com

---
Jessica Zhang (6):
  drm/msm: Add MSM-specific DSC helper methods
  drm/msm/dpu: Use DRM DSC helper for det_thresh_flatness
  drm/msm/dpu: Fix slice_last_group_size calculation
  drm/msm/dsi: Use MSM and DRM DSC helper methods
  drm/msm/dsi: update hdisplay calculation for dsi_timing_setup
  drm/msm/dsi: Fix calculations pkt_per_line

 drivers/gpu/drm/msm/Makefile   |  1 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c |  9 ++--
 drivers/gpu/drm/msm/dsi/dsi_host.c | 21 ++---
 drivers/gpu/drm/msm/msm_dsc_helper.c   | 47 
 drivers/gpu/drm/msm/msm_dsc_helper.h   | 70 ++
 5 files changed, 140 insertions(+), 8 deletions(-)
---
base-commit: 56777fc93a145afcf71b92ba4281250f59ba6d9b
change-id: 20230329-rfc-msm-dsc-helper-981a95edfbd0

Best regards,
-- 
Jessica Zhang

[PATCH v4 2/6] drm/msm/dpu: Use DRM DSC helper for det_thresh_flatness

2023-04-04 Thread Jessica Zhang

Use the DRM DSC helper for det_thresh_flatness to match downstream
implementation and the DSC spec.

Changes in V2:
- Added a Fixes tag

Fixes: c110cfd1753e ("drm/msm/disp/dpu1: Add support for DSC")
Signed-off-by: Jessica Zhang 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
index 619926da1441..b952f7d2b7f5 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
@@ -3,6 +3,8 @@
  * Copyright (c) 2020-2022, Linaro Limited
  */
 
+#include 
+
 #include "dpu_kms.h"
 #include "dpu_hw_catalog.h"
 #include "dpu_hwio.h"
@@ -102,7 +104,7 @@ static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc,
data |= dsc->final_offset;
DPU_REG_WRITE(c, DSC_DSC_OFFSET, data);
 
-   det_thresh_flatness = 7 + 2 * (dsc->bits_per_component - 8);
+   det_thresh_flatness = drm_dsc_calculate_flatness_det_thresh(dsc);
data = det_thresh_flatness << 10;
data |= dsc->flatness_max_qp << 5;
data |= dsc->flatness_min_qp;

-- 
2.40.0

[PATCH v4 4/6] drm/msm/dsi: Use MSM and DRM DSC helper methods

2023-04-04 Thread Jessica Zhang

Use MSM and DRM DSC helper methods to configure DSC for DSI.

Changes in V2:
- *_calculate_initial_scale_value --> *_set_initial_scale_value
- Split pkt_per_line and eol_byte_num changes to a separate patch
- Moved pclk_per_line calculation to hdisplay adjustment in `if (dsc)`
  block of dsi_update_dsc_timing()

Changes in v3:
- Split pclk_per_intf calculation into a separate patch
- Added slice_width check to dsi_timing_setup
- Used MSM DSC helper to calculate total_bytes_per_intf

Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/dsi/dsi_host.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 74d38f90398a..6a6218a9655f 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -28,6 +28,7 @@
 #include "dsi.xml.h"
 #include "sfpb.xml.h"
 #include "dsi_cfg.h"
+#include "msm_dsc_helper.h"
 #include "msm_kms.h"
 #include "msm_gem.h"
 #include "phy/dsi_phy.h"
@@ -848,7 +849,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host 
*msm_host, bool is_cmd_mod
/* first calculate dsc parameters and then program
 * compress mode registers
 */
-   slice_per_intf = DIV_ROUND_UP(hdisplay, dsc->slice_width);
+   slice_per_intf = msm_dsc_get_slice_per_intf(dsc, hdisplay);
 
/*
 * If slice_count is greater than slice_per_intf
@@ -858,7 +859,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host 
*msm_host, bool is_cmd_mod
if (dsc->slice_count > slice_per_intf)
dsc->slice_count = 1;
 
-   total_bytes_per_intf = dsc->slice_chunk_size * slice_per_intf;
+   total_bytes_per_intf = msm_dsc_get_bytes_per_intf(dsc, hdisplay);
 
eol_byte_num = total_bytes_per_intf % 3;
pkt_per_line = slice_per_intf / dsc->slice_count;
@@ -936,6 +937,12 @@ static void dsi_timing_setup(struct msm_dsi_host 
*msm_host, bool is_bonded_dsi)
return;
}
 
+   if (!dsc->slice_width || (mode->hdisplay < dsc->slice_width)) {
+   pr_err("DSI: invalid slice width %d (pic_width: %d)\n",
+  dsc->slice_width, mode->hdisplay);
+   return;
+   }
+
dsc->pic_width = mode->hdisplay;
dsc->pic_height = mode->vdisplay;
DBG("Mode %dx%d\n", dsc->pic_width, dsc->pic_height);
@@ -1759,7 +1766,7 @@ static int dsi_populate_dsc_params(struct msm_dsi_host 
*msm_host, struct drm_dsc
return ret;
}
 
-   dsc->initial_scale_value = 32;
+   drm_dsc_set_initial_scale_value(dsc);
dsc->line_buf_depth = dsc->bits_per_component + 1;
 
return drm_dsc_compute_rc_parameters(dsc);

-- 
2.40.0

[PATCH v4 6/6] drm/msm/dsi: Fix calculations pkt_per_line

2023-04-04 Thread Jessica Zhang

Currently, pkt_per_line is calculated by dividing slice_per_intf by
slice_count. This is incorrect, as slice_per_intf should be divided by
slice_per_pkt, which is not always equivalent to slice_count as it is
possible for there to be multiple soft slices per interface even though
a panel only specifies one slice per packet.

Fixes: 08802f515c3c ("drm/msm/dsi: Add support for DSC configuration")
Signed-off-by: Jessica Zhang 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dsi/dsi_host.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 412339cc9301..633b60acfe18 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -862,7 +862,11 @@ static void dsi_update_dsc_timing(struct msm_dsi_host 
*msm_host, bool is_cmd_mod
total_bytes_per_intf = msm_dsc_get_bytes_per_intf(dsc, hdisplay);
 
eol_byte_num = total_bytes_per_intf % 3;
-   pkt_per_line = slice_per_intf / dsc->slice_count;
+
+   /* Default to 1 slice_per_pkt, so pkt_per_line will be equal to
+* slice per intf.
+*/
+   pkt_per_line = slice_per_intf;
 
if (is_cmd_mode) /* packet data type */
reg = 
DSI_COMMAND_COMPRESSION_MODE_CTRL_STREAM0_DATATYPE(MIPI_DSI_DCS_LONG_WRITE);

-- 
2.40.0

[PATCH v4 3/6] drm/msm/dpu: Fix slice_last_group_size calculation

2023-04-04 Thread Jessica Zhang

Correct the math for slice_last_group_size so that it matches the
calculations downstream.

Changes in v3:
- Reworded slice_last_group_size calculation to
  `(dsc->slice_width + 2) % 3`

Fixes: c110cfd1753e ("drm/msm/disp/dpu1: Add support for DSC")
Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
index b952f7d2b7f5..ff1c8f92fb20 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
@@ -56,9 +56,10 @@ static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc,
if (is_cmd_mode)
initial_lines += 1;
 
-   slice_last_group_size = 3 - (dsc->slice_width % 3);
+   slice_last_group_size = (dsc->slice_width + 2) % 3;
+
data = (initial_lines << 20);
-   data |= ((slice_last_group_size - 1) << 18);
+   data |= (slice_last_group_size << 18);
/* bpp is 6.4 format, 4 LSBs bits are for fractional part */
data |= (dsc->bits_per_pixel << 8);
data |= (dsc->block_pred_enable << 7);

-- 
2.40.0

[PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods

2023-04-04 Thread Jessica Zhang

Introduce MSM-specific DSC helper methods, as some calculations are
common between DP and DSC.

Changes in v2:
- Moved files up to msm/ directory
- Dropped get_comp_ratio() helper
- Used drm_int2fixp() to convert to integers to fp
- Style changes to improve readability
- Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line()
- Changed msm_dsc_get_slice_per_intf() to a static inline method
- Dropped last division step of msm_dsc_get_pclk_per_line() and changed
  method name accordingly
- Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method
- Fixed some math issues caused by passing in incorrect types to
  drm_fixed methods in get_bytes_per_soft_slice()

Changes in v3:
- Dropped src_bpp parameter from all methods -- src_bpp can be
  calculated as dsc->bits_per_component * 3
- Dropped intf_width parameter from get_bytes_per_soft_slice()
- Moved dsc->bits_per_component to numerator calculation in
  get_bytes_per_soft_slice()
- Renamed msm_dsc_get_uncompressed_pclk_per_line to
  *_get_uncompressed_pclk_per_intf()
- Removed dsc->slice_width check from
  msm_dsc_get_uncompressed_pclk_per_intf()
- Made get_bytes_per_soft_slice() a public method (this will be called
  later to help calculate DP pclk params)
- Added documentation in comments
- Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and
  renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf.

Changes in v4:
- Changed msm_dsc_get_uncompressed_pclk_per_intf to
  msm_dsc_get_pclk_per_intf

Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/Makefile |  1 +
 drivers/gpu/drm/msm/msm_dsc_helper.c | 47 
 drivers/gpu/drm/msm/msm_dsc_helper.h | 70 
 3 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 7274c41228ed..b814fc80e2d5 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -94,6 +94,7 @@ msm-y += \
msm_atomic_tracepoints.o \
msm_debugfs.o \
msm_drv.o \
+   msm_dsc_helper.o \
msm_fb.o \
msm_fence.o \
msm_gem.o \
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c 
b/drivers/gpu/drm/msm/msm_dsc_helper.c
new file mode 100644
index ..0539221eb09d
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+
+#include "msm_drv.h"
+#include "msm_dsc_helper.h"
+
+s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc)
+{
+   int bpp = msm_dsc_get_bpp_int(dsc);
+   s64 numerator_fp, denominator_fp;
+   s64 comp_ratio_fp = drm_fixp_from_fraction(dsc->bits_per_component * 3, 
bpp);
+
+   numerator_fp = drm_int2fixp(dsc->slice_width * 3 * 
dsc->bits_per_component);
+   denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8));
+
+   return drm_fixp_div(numerator_fp, denominator_fp);
+}
+
+u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int intf_width)
+{
+   u32 bytes_per_soft_slice, bytes_per_intf;
+   s64 bytes_per_soft_slice_fp;
+   int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width);
+
+   bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc);
+   bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp);
+
+   bytes_per_intf = bytes_per_soft_slice * slice_per_intf;
+
+   return bytes_per_intf;
+}
+
+int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc)
+{
+   s64 data_width;
+
+   data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count),
+   get_bytes_per_soft_slice(dsc));
+
+   return drm_fixp2int_ceil(data_width);
+}
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h 
b/drivers/gpu/drm/msm/msm_dsc_helper.h
new file mode 100644
index ..31116a31090f
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
+ */
+
+#ifndef MSM_DSC_HELPER_H_
+#define MSM_DSC_HELPER_H_
+
+#include 
+#include 
+
+/*
+ * Helper methods for MSM specific DSC calculations that are common between 
timing engine,
+ * DSI, and DP.
+ */
+
+/**
+ * msm_dsc_get_bpp_int - get bits per pixel integer value
+ * @dsc: Pointer to drm dsc config struct
+ */
+static inline int msm_dsc_get_bpp_int(struct drm_dsc_config *dsc)
+{
+   WARN_ON_ONCE(dsc->bits_per_pixel & 0xf);
+   return dsc->bits_per_pixel >> 4;
+}
+
+/**
+ * msm_dsc_get_slice_per_intf - get number of slices per interface
+ * @dsc: Pointer to drm dsc config struct
+ * @intf_width: interface width
+ */
+static inline int msm_dsc_get_slice_per_intf(struct drm_dsc_config *dsc, int 
intf_width)
+{
+   return DIV_ROUND_UP(intf_width, dsc->slice_width);
+}
+
+/**
+ * msm_dsc_get_dce_bytes_per_line - get bytes per line to help calculate dat

Re: [PATCH v4 01/42] drm/msm/dpu: use CTL_SC7280_MASK for sm8450's ctl_0

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 5:33 PM, Dmitry Baryshkov wrote:

On 05/04/2023 01:12, Abhinav Kumar wrote:



On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:

On sm8450 platform the CTL_0 doesn't differ from the rest of CTL blocks,
so switch it to CTL_SC7280_MASK too.

Some background: original commit 100d7ef6995d ("drm/msm/dpu: add support
for SM8450") had all (relevant at that time) bit spelled individually.
Then commit 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 to hw catalog"),
despite being a mismerge, correctly changed all other CTL entries to use
CTL_SC7280_MASK, except CTL_0.



I think having it spelled individually is better. If we start using 
one chipset's mask for another, we are again going down the same path 
of this becoming one confused file.


So, even though I agree that 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 to 
hw catalog") corrected the mask to re-use sc7280, with the individual 
catalog file, its better to have it separate and spelled individually.


This change is not heading in the direction of the rest of the series.


I didn't create duplicates of all the defines. This is done well in the 
style of patch37. I'm not going to add all per-SoC feature masks.




Yes, I was actually going to comment even on patch 37.

We are again trying to generalize a CTL's caps based on DPU version, the 
same mistake which led us down this path.


So today you have CTL_DPU_0_MASK , CTL_DPU_5_MASK , CTL_DPU_7_MASK  and 
CTL_DPU_9_MASK and this builds on an assumption that you can get 5 by 
ORing ACTIVE_CFG with 0.


+#define CTL_DPU_5_MASK (CTL_DPU_0_MASK | \
+   BIT(DPU_CTL_ACTIVE_CFG))
+

This is again moving towards that problematic pattern.

Why dont we stick to CTL features individually spelling it then work 
towards generalizing as we discussed.





Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c

index 6840b22a4159..83f8f83e2b29 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -975,7 +975,7 @@ static const struct dpu_ctl_cfg sm8450_ctl[] = {
  {
  .name = "ctl_0", .id = CTL_0,
  .base = 0x15000, .len = 0x204,
-    .features = BIT(DPU_CTL_ACTIVE_CFG) | BIT(DPU_CTL_SPLIT_DISPLAY) 
| BIT(DPU_CTL_FETCH_ACTIVE),

+    .features = BIT(DPU_CTL_SPLIT_DISPLAY) | CTL_SC7280_MASK,
  .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
  },
  {

Re: [PATCH v4 03/42] drm/msm/dpu: Allow variable INTF_BLK size

2023-04-04 Thread Dmitry Baryshkov


On 05/04/2023 01:30, Abhinav Kumar wrote:



On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:

From: Konrad Dybcio 

These blocks are of variable length on different SoCs. Set the
correct values where I was able to retrieve it from downstream
DTs and leave the old defaults (0x280) otherwise.

Signed-off-by: Konrad Dybcio 
[DB: fixed some lengths, split the INTF changes away]
Signed-off-by: Dmitry Baryshkov 


Everything is fine except sm8250.

DPU | SoC  | INTF_DSI size
5.0 | sm8150   | 0x2bc
5.1 | sc8180x  | 0x2bc
6.0 | sm8250   | 0x2c0
6.2 | sc7180   | 0x2c0
6.3 | sm6115   | 0x2c0
6.5 | qcm2290  | 0x2c0
7.0 | sm8350   | 0x2c4
7.2 | sc7280   | 0x2c4
8.0 | sc8280xp | 0x300
8.1 | sm8450   | 0x300
9.0 | sm8550   | 0x300

Today sm8250 is using the same table as sm8150 but it needs 0x2c0 and 
not 0x2bc.


We should de-duplicate it add a new one for sm8250?


This is done in patch 22. It makes no sense to play with the data until 
we are clear, which platform uses which instance.


--
With best wishes
Dmitry

Re: [PATCH v4 02/42] drm/msm/dpu: Allow variable SSPP_BLK size

2023-04-04 Thread Dmitry Baryshkov


On 05/04/2023 01:19, Abhinav Kumar wrote:



On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:

From: Konrad Dybcio 

These blocks are of variable length on different SoCs. Set the
correct values where I was able to retrieve it from downstream
DTs and leave the old defaults (0x1c8) otherwise.

Signed-off-by: Konrad Dybcio 
[DB: fixed some of lengths, split the INTF changes away]
Signed-off-by: Dmitry Baryshkov 


Everything is fine except sm8450. As I already wrote in the previous 
version's review, it should be 0x32c for sm8450.




Ack, please excuse me for missing you comment. I'll fix this in v5.

--
With best wishes
Dmitry

Re: [PATCH v4 01/42] drm/msm/dpu: use CTL_SC7280_MASK for sm8450's ctl_0

2023-04-04 Thread Dmitry Baryshkov


On 05/04/2023 01:12, Abhinav Kumar wrote:



On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:

On sm8450 platform the CTL_0 doesn't differ from the rest of CTL blocks,
so switch it to CTL_SC7280_MASK too.

Some background: original commit 100d7ef6995d ("drm/msm/dpu: add support
for SM8450") had all (relevant at that time) bit spelled individually.
Then commit 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 to hw catalog"),
despite being a mismerge, correctly changed all other CTL entries to use
CTL_SC7280_MASK, except CTL_0.



I think having it spelled individually is better. If we start using one 
chipset's mask for another, we are again going down the same path of 
this becoming one confused file.


So, even though I agree that 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 to 
hw catalog") corrected the mask to re-use sc7280, with the individual 
catalog file, its better to have it separate and spelled individually.


This change is not heading in the direction of the rest of the series.


I didn't create duplicates of all the defines. This is done well in the 
style of patch37. I'm not going to add all per-SoC feature masks.





Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c

index 6840b22a4159..83f8f83e2b29 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -975,7 +975,7 @@ static const struct dpu_ctl_cfg sm8450_ctl[] = {
  {
  .name = "ctl_0", .id = CTL_0,
  .base = 0x15000, .len = 0x204,
-    .features = BIT(DPU_CTL_ACTIVE_CFG) | BIT(DPU_CTL_SPLIT_DISPLAY) 
| BIT(DPU_CTL_FETCH_ACTIVE),

+    .features = BIT(DPU_CTL_SPLIT_DISPLAY) | CTL_SC7280_MASK,
  .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
  },
  {


--
With best wishes
Dmitry

Re: [PATCH v1 5/7] Revert "drm: Assert held reservation lock for dma-buf mmapping"

2023-04-04 Thread Dmitry Osipenko

On 4/3/23 18:17, Christian König wrote:
> Am 02.04.23 um 18:48 schrieb Dmitry Osipenko:
>> Don't assert held dma-buf reservation lock on memory mapping of exported
>> buffer.
>>
>> We're going to change dma-buf mmap() locking policy such that exporters
>> will have to handle the lock. The previous locking policy caused deadlock
>> problem for DRM drivers in a case of self-imported dma-bufs, it's solved
>> by moving the lock down to exporters.
> 
> I only checked the TTM code path and think that at least that one should
> work fine.
> 
>> Fixes: 39ce25291871 ("drm: Assert held reservation lock for dma-buf
>> mmapping")
> 
> This here is not really a "fix" for the previous patch. We just found
> that we didn't like the behavior and so reverted the original patch.
> 
> A "Reverts..." comment in the commit message would be more appropriate I
> think.

Ack, will drop the fixes tag in v2. Thank you and Emil for the review.

-- 
Best regards,
Dmitry

Re: [PATCH v3 5/6] drm/msm/dsi: update hdisplay calculation for dsi_timing_setup

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 4:56 PM, Jessica Zhang wrote:

hdisplay for compressed images should be calculated as bytes_per_slice *
slice_count. Thus, use MSM DSC helper to calculate hdisplay for
dsi_timing_setup instead of directly using mode->hdisplay.

Changes in v3:
- Split from previous patch
- Initialized hdisplay as uncompressed pclk per line at the beginning of
   dsi_timing_setup as to not break dual DSI calculations

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/dsi/dsi_host.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 6a6218a9655f..9c33060e4c29 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -912,6 +912,9 @@ static void dsi_timing_setup(struct msm_dsi_host *msm_host, 
bool is_bonded_dsi)
  
  	DBG("");
  
+	if (msm_host->dsc)

+   hdisplay = 
msm_dsc_get_uncompressed_pclk_per_intf(msm_host->dsc);
+


bonded-dsi with DSC is really not a tested configuration. If you move 
this line to before the if (is_bonded_dsi), then you will be dividing /2 
on a compressed value. That wont make sense.


I would still move this back to the if (msm_host->dsc) that way, 
bonded_dsi first divides hdisplay/2 on an uncompressed mode->hdisplay 
then DSC does its math on top of that.



/*
 * For bonded DSI mode, the current DRM mode has
 * the complete width of the panel. Since, the complete

Re: [PATCH v3 1/6] drm/msm: Add MSM-specific DSC helper methods

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 4:56 PM, Jessica Zhang wrote:

Introduce MSM-specific DSC helper methods, as some calculations are
common between DP and DSC.

Changes in v2:
- Moved files up to msm/ directory
- Dropped get_comp_ratio() helper
- Used drm_int2fixp() to convert to integers to fp
- Style changes to improve readability
- Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line()
- Changed msm_dsc_get_slice_per_intf() to a static inline method
- Dropped last division step of msm_dsc_get_pclk_per_line() and changed
   method name accordingly
- Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method
- Fixed some math issues caused by passing in incorrect types to
   drm_fixed methods in get_bytes_per_soft_slice()

Changes in v3:
- Dropped src_bpp parameter from all methods -- src_bpp can be
   calculated as dsc->bits_per_component * 3
- Dropped intf_width parameter from get_bytes_per_soft_slice()
- Moved dsc->bits_per_component to numerator calculation in
   get_bytes_per_soft_slice()
- Renamed msm_dsc_get_uncompressed_pclk_per_line to
   *_get_uncompressed_pclk_per_intf()
- Removed dsc->slice_width check from
   msm_dsc_get_uncompressed_pclk_per_intf()
- Made get_bytes_per_soft_slice() a public method (this will be called
   later to help calculate DP pclk params)
- Added documentation in comments
- Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and
   renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf.

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/Makefile |  1 +
  drivers/gpu/drm/msm/msm_dsc_helper.c | 47 
  drivers/gpu/drm/msm/msm_dsc_helper.h | 70 
  3 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 7274c41228ed..b814fc80e2d5 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -94,6 +94,7 @@ msm-y += \
msm_atomic_tracepoints.o \
msm_debugfs.o \
msm_drv.o \
+   msm_dsc_helper.o \
msm_fb.o \
msm_fence.o \
msm_gem.o \
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c 
b/drivers/gpu/drm/msm/msm_dsc_helper.c
new file mode 100644
index ..c8c530211f50
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+
+#include "msm_drv.h"
+#include "msm_dsc_helper.h"
+
+s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc)
+{
+   int bpp = msm_dsc_get_bpp_int(dsc);
+   s64 numerator_fp, denominator_fp;
+   s64 comp_ratio_fp = drm_fixp_from_fraction(dsc->bits_per_component * 3, 
bpp);
+
+   numerator_fp = drm_int2fixp(dsc->slice_width * 3 * 
dsc->bits_per_component);
+   denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8));
+
+   return drm_fixp_div(numerator_fp, denominator_fp);
+}
+
+u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int intf_width)
+{
+   u32 bytes_per_soft_slice, bytes_per_intf;
+   s64 bytes_per_soft_slice_fp;
+   int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width);
+
+   bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc);
+   bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp);
+
+   bytes_per_intf = bytes_per_soft_slice * slice_per_intf;
+
+   return bytes_per_intf;
+}
+
+int msm_dsc_get_uncompressed_pclk_per_intf(struct drm_dsc_config *dsc)
+{
+   s64 data_width;
+
+   data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count),
+   get_bytes_per_soft_slice(dsc));
+
+   return drm_fixp2int_ceil(data_width);
+}


This is not really uncompressed pclk_per_intf. This is still calculated 
using the compression ratio so it is still compressed.


I would suggest changing this to msm_dsc_get_pclk_per_intf().

Lets keep it simple.


diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h 
b/drivers/gpu/drm/msm/msm_dsc_helper.h
new file mode 100644
index ..5ee972eb247c
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
+ */
+
+#ifndef MSM_DSC_HELPER_H_
+#define MSM_DSC_HELPER_H_
+
+#include 
+#include 
+
+/*
+ * Helper methods for MSM specific DSC calculations that are common between 
timing engine,
+ * DSI, and DP.
+ */
+
+/**
+ * msm_dsc_get_bpp_int - get bits per pixel integer value
+ * @dsc: Pointer to drm dsc config struct
+ */
+static inline int msm_dsc_get_bpp_int(struct drm_dsc_config *dsc)
+{
+   WARN_ON_ONCE(dsc->bits_per_pixel & 0xf);
+   return dsc->bits_per_pixel >> 4;
+}
+
+/**
+ * msm_dsc_get_slice_per_intf - get number of slices per interface
+ * @dsc: Pointer to drm dsc config struct
+ * @intf_width: interface width
+ */
+static inline int msm_dsc_get_slice_per_intf(

Re: [RFC PATCH 0/4] uapi, drm: Add and implement RLIMIT_GPUPRIO

2023-04-04 Thread Luben Tuikov

Hi!

On 2023-04-04 04:50, Christian König wrote:
> Adding a bunch of people who have been involved in this before.
> 
> Am 03.04.23 um 22:15 schrieb Joshua Ashton:
>> On 4/3/23 20:54, Christian König wrote:
>>> Am 03.04.23 um 21:40 schrieb Joshua Ashton:
 [SNIP]
 Anyway, please let me know what you think!
 Definitely open to any feedback and advice you may have. :D
>>>
>>> Well the basic problem is that higher priority queues can be used to 
>>> starve low priority queues.
>>>
>>> This starvation in turn is very very bad for memory management since 
>>> the dma_fence's the GPU scheduler deals with have very strong 
>>> restrictions.
>>>
>>> Even exposing this under CAP_SYS_NICE is questionable, so we will 
>>> most likely have to NAK this.
>>
>> This is already exposed with CAP_SYS_NICE and is relied on by SteamVR 
>> for async reprojection and Gamescope's composite path on Steam Deck.
> 
> Yeah, I know I was the one who designed that :)
> 
>>
>> Having a high priority async compute queue is really really important 
>> and advantageous for these tasks.
>>
>> The majority of usecases for something like this is going to be a 
>> compositor which does some really tiny amount of work per-frame but is 
>> incredibly latency dependent (as it depends on latching onto buffers 
>> just before vblank to do it's work)

There seems to be a dependency here. Is it possible to express this
dependency so that this work is done on vblank, then whoever needs
this, can latch onto vblank and get scheduled and completed before the vblank?

The problem generally is "We need to do some work B in order to satisfy
some condition in work A. Let's raise the ``priority'' of work B so that
if A needs it, when it needs it, it is ready." Or something to that effect.

The system would be much more responsive and run optimally, if such
dependencies are expressed directly, as opposed to trying to game
the scheduler and add more and more priorities, one on top of the other,
every so often.

It's okay to have priorities when tasks are independent and unrelated. But
when they do depend on each other directly, or indirectly (as in when memory
allocation or freeing is concerned), thus creating priority inversion,
then the best scheduler is the fair, oldest-ready-first scheduling, which
is the default GPU scheduler in DRM at the moment (for the last few months).

>> Starving and surpassing work on other queues is kind of the entire 
>> point. Gamescope and SteamVR do it on ACE as well so GFX work can run 
>> alongside it.

Are there no dependencies between them?

I mean if they're independent, we already have run queues with
different priorities. But if they're dependent, perhaps
we can express this explicitly so that we don't starve
other tasks/queues...

Regards,
Luben

> 
> Yes, unfortunately exactly that.
> 
> The problem is that our memory management is designed around the idea 
> that submissions to the hardware are guaranteed to finish at some point 
> in the future.
> 
> When we now have a functionality which allows to extend the amount of 
> time some work needs to finish on the hardware infinitely, then we have 
> a major problem at hand.
> 
> What we could do is to make the GPU scheduler more clever and make sure 
> that while higher priority submissions get precedence and can even 
> preempt low priority submissions we still guarantee some forward 
> progress for everybody.
> 
> Luben has been looking into a similar problem AMD internally as well, 
> maybe he has some idea here but I doubt that the solution will be simple.
> 
> Regards,
> Christian.
> 
>>
>> - Joshie 🐸✨
>>
>

[PATCH v3 5/6] drm/msm/dsi: update hdisplay calculation for dsi_timing_setup

2023-04-04 Thread Jessica Zhang

hdisplay for compressed images should be calculated as bytes_per_slice *
slice_count. Thus, use MSM DSC helper to calculate hdisplay for
dsi_timing_setup instead of directly using mode->hdisplay.

Changes in v3:
- Split from previous patch
- Initialized hdisplay as uncompressed pclk per line at the beginning of
  dsi_timing_setup as to not break dual DSI calculations

Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/dsi/dsi_host.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 6a6218a9655f..9c33060e4c29 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -912,6 +912,9 @@ static void dsi_timing_setup(struct msm_dsi_host *msm_host, 
bool is_bonded_dsi)
 
DBG("");
 
+   if (msm_host->dsc)
+   hdisplay = 
msm_dsc_get_uncompressed_pclk_per_intf(msm_host->dsc);
+
/*
 * For bonded DSI mode, the current DRM mode has
 * the complete width of the panel. Since, the complete

-- 
2.40.0

[PATCH v3 6/6] drm/msm/dsi: Fix calculations pkt_per_line

2023-04-04 Thread Jessica Zhang

Currently, pkt_per_line is calculated by dividing slice_per_intf by
slice_count. This is incorrect, as slice_per_intf should be divided by
slice_per_pkt, which is not always equivalent to slice_count as it is
possible for there to be multiple soft slices per interface even though
a panel only specifies one slice per packet.

Fixes: 08802f515c3c ("drm/msm/dsi: Add support for DSC configuration")
Signed-off-by: Jessica Zhang 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dsi/dsi_host.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 9c33060e4c29..d888978926da 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -862,7 +862,11 @@ static void dsi_update_dsc_timing(struct msm_dsi_host 
*msm_host, bool is_cmd_mod
total_bytes_per_intf = msm_dsc_get_bytes_per_intf(dsc, hdisplay);
 
eol_byte_num = total_bytes_per_intf % 3;
-   pkt_per_line = slice_per_intf / dsc->slice_count;
+
+   /* Default to 1 slice_per_pkt, so pkt_per_line will be equal to
+* slice per intf.
+*/
+   pkt_per_line = slice_per_intf;
 
if (is_cmd_mode) /* packet data type */
reg = 
DSI_COMMAND_COMPRESSION_MODE_CTRL_STREAM0_DATATYPE(MIPI_DSI_DCS_LONG_WRITE);

-- 
2.40.0

[PATCH v3 3/6] drm/msm/dpu: Fix slice_last_group_size calculation

2023-04-04 Thread Jessica Zhang

Correct the math for slice_last_group_size so that it matches the
calculations downstream.

Changes in v3:
- Reworded slice_last_group_size calculation to
  `(dsc->slice_width + 2) % 3`

Fixes: c110cfd1753e ("drm/msm/disp/dpu1: Add support for DSC")
Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
index b952f7d2b7f5..ff1c8f92fb20 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
@@ -56,9 +56,10 @@ static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc,
if (is_cmd_mode)
initial_lines += 1;
 
-   slice_last_group_size = 3 - (dsc->slice_width % 3);
+   slice_last_group_size = (dsc->slice_width + 2) % 3;
+
data = (initial_lines << 20);
-   data |= ((slice_last_group_size - 1) << 18);
+   data |= (slice_last_group_size << 18);
/* bpp is 6.4 format, 4 LSBs bits are for fractional part */
data |= (dsc->bits_per_pixel << 8);
data |= (dsc->block_pred_enable << 7);

-- 
2.40.0

[PATCH v3 1/6] drm/msm: Add MSM-specific DSC helper methods

2023-04-04 Thread Jessica Zhang

Introduce MSM-specific DSC helper methods, as some calculations are
common between DP and DSC.

Changes in v2:
- Moved files up to msm/ directory
- Dropped get_comp_ratio() helper
- Used drm_int2fixp() to convert to integers to fp
- Style changes to improve readability
- Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line()
- Changed msm_dsc_get_slice_per_intf() to a static inline method
- Dropped last division step of msm_dsc_get_pclk_per_line() and changed
  method name accordingly
- Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method
- Fixed some math issues caused by passing in incorrect types to
  drm_fixed methods in get_bytes_per_soft_slice()

Changes in v3:
- Dropped src_bpp parameter from all methods -- src_bpp can be
  calculated as dsc->bits_per_component * 3
- Dropped intf_width parameter from get_bytes_per_soft_slice()
- Moved dsc->bits_per_component to numerator calculation in
  get_bytes_per_soft_slice()
- Renamed msm_dsc_get_uncompressed_pclk_per_line to
  *_get_uncompressed_pclk_per_intf()
- Removed dsc->slice_width check from
  msm_dsc_get_uncompressed_pclk_per_intf()
- Made get_bytes_per_soft_slice() a public method (this will be called
  later to help calculate DP pclk params)
- Added documentation in comments
- Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and
  renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf.

Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/Makefile |  1 +
 drivers/gpu/drm/msm/msm_dsc_helper.c | 47 
 drivers/gpu/drm/msm/msm_dsc_helper.h | 70 
 3 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 7274c41228ed..b814fc80e2d5 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -94,6 +94,7 @@ msm-y += \
msm_atomic_tracepoints.o \
msm_debugfs.o \
msm_drv.o \
+   msm_dsc_helper.o \
msm_fb.o \
msm_fence.o \
msm_gem.o \
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c 
b/drivers/gpu/drm/msm/msm_dsc_helper.c
new file mode 100644
index ..c8c530211f50
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+
+#include "msm_drv.h"
+#include "msm_dsc_helper.h"
+
+s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc)
+{
+   int bpp = msm_dsc_get_bpp_int(dsc);
+   s64 numerator_fp, denominator_fp;
+   s64 comp_ratio_fp = drm_fixp_from_fraction(dsc->bits_per_component * 3, 
bpp);
+
+   numerator_fp = drm_int2fixp(dsc->slice_width * 3 * 
dsc->bits_per_component);
+   denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8));
+
+   return drm_fixp_div(numerator_fp, denominator_fp);
+}
+
+u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int intf_width)
+{
+   u32 bytes_per_soft_slice, bytes_per_intf;
+   s64 bytes_per_soft_slice_fp;
+   int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width);
+
+   bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc);
+   bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp);
+
+   bytes_per_intf = bytes_per_soft_slice * slice_per_intf;
+
+   return bytes_per_intf;
+}
+
+int msm_dsc_get_uncompressed_pclk_per_intf(struct drm_dsc_config *dsc)
+{
+   s64 data_width;
+
+   data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count),
+   get_bytes_per_soft_slice(dsc));
+
+   return drm_fixp2int_ceil(data_width);
+}
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h 
b/drivers/gpu/drm/msm/msm_dsc_helper.h
new file mode 100644
index ..5ee972eb247c
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
+ */
+
+#ifndef MSM_DSC_HELPER_H_
+#define MSM_DSC_HELPER_H_
+
+#include 
+#include 
+
+/*
+ * Helper methods for MSM specific DSC calculations that are common between 
timing engine,
+ * DSI, and DP.
+ */
+
+/**
+ * msm_dsc_get_bpp_int - get bits per pixel integer value
+ * @dsc: Pointer to drm dsc config struct
+ */
+static inline int msm_dsc_get_bpp_int(struct drm_dsc_config *dsc)
+{
+   WARN_ON_ONCE(dsc->bits_per_pixel & 0xf);
+   return dsc->bits_per_pixel >> 4;
+}
+
+/**
+ * msm_dsc_get_slice_per_intf - get number of slices per interface
+ * @dsc: Pointer to drm dsc config struct
+ * @intf_width: interface width
+ */
+static inline int msm_dsc_get_slice_per_intf(struct drm_dsc_config *dsc, int 
intf_width)
+{
+   return DIV_ROUND_UP(intf_width, dsc->slice_width);
+}
+
+/**
+ * msm_dsc_get_dce_bytes_per_line - get bytes per line to help calculate data 
width
+ * when configuring the timing engine
+ * @dsc: Pointer to drm dsc co

[PATCH v3 4/6] drm/msm/dsi: Use MSM and DRM DSC helper methods

2023-04-04 Thread Jessica Zhang

Use MSM and DRM DSC helper methods to configure DSC for DSI.

Changes in V2:
- *_calculate_initial_scale_value --> *_set_initial_scale_value
- Split pkt_per_line and eol_byte_num changes to a separate patch
- Moved pclk_per_line calculation to hdisplay adjustment in `if (dsc)`
  block of dsi_update_dsc_timing()

Changes in v3:
- Split pclk_per_intf calculation into a separate patch
- Added slice_width check to dsi_timing_setup
- Used MSM DSC helper to calculate total_bytes_per_intf

Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/dsi/dsi_host.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 74d38f90398a..6a6218a9655f 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -28,6 +28,7 @@
 #include "dsi.xml.h"
 #include "sfpb.xml.h"
 #include "dsi_cfg.h"
+#include "msm_dsc_helper.h"
 #include "msm_kms.h"
 #include "msm_gem.h"
 #include "phy/dsi_phy.h"
@@ -848,7 +849,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host 
*msm_host, bool is_cmd_mod
/* first calculate dsc parameters and then program
 * compress mode registers
 */
-   slice_per_intf = DIV_ROUND_UP(hdisplay, dsc->slice_width);
+   slice_per_intf = msm_dsc_get_slice_per_intf(dsc, hdisplay);
 
/*
 * If slice_count is greater than slice_per_intf
@@ -858,7 +859,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host 
*msm_host, bool is_cmd_mod
if (dsc->slice_count > slice_per_intf)
dsc->slice_count = 1;
 
-   total_bytes_per_intf = dsc->slice_chunk_size * slice_per_intf;
+   total_bytes_per_intf = msm_dsc_get_bytes_per_intf(dsc, hdisplay);
 
eol_byte_num = total_bytes_per_intf % 3;
pkt_per_line = slice_per_intf / dsc->slice_count;
@@ -936,6 +937,12 @@ static void dsi_timing_setup(struct msm_dsi_host 
*msm_host, bool is_bonded_dsi)
return;
}
 
+   if (!dsc->slice_width || (mode->hdisplay < dsc->slice_width)) {
+   pr_err("DSI: invalid slice width %d (pic_width: %d)\n",
+  dsc->slice_width, mode->hdisplay);
+   return;
+   }
+
dsc->pic_width = mode->hdisplay;
dsc->pic_height = mode->vdisplay;
DBG("Mode %dx%d\n", dsc->pic_width, dsc->pic_height);
@@ -1759,7 +1766,7 @@ static int dsi_populate_dsc_params(struct msm_dsi_host 
*msm_host, struct drm_dsc
return ret;
}
 
-   dsc->initial_scale_value = 32;
+   drm_dsc_set_initial_scale_value(dsc);
dsc->line_buf_depth = dsc->bits_per_component + 1;
 
return drm_dsc_compute_rc_parameters(dsc);

-- 
2.40.0

[PATCH v3 0/6] Introduce MSM-specific DSC helpers

2023-04-04 Thread Jessica Zhang

There are some overlap in calculations for MSM-specific DSC variables between 
DP and DSI. In addition, the calculations for initial_scale_value and 
det_thresh_flatness that are defined within the DSC 1.2 specifications, but 
aren't yet included in drm_dsc_helper.c.

This series moves these calculations to a shared msm_dsc_helper.c file and 
defines drm_dsc_helper methods for initial_scale_value and det_thresh_flatness.

Note: For now, the MSM specific helper methods are only called for the DSI 
path, but will called for DP once DSC 1.2 support for DP has been added.

Depends on: "drm/i915: move DSC RC tables to drm_dsc_helper.c" [1]

[1] https://patchwork.freedesktop.org/series/114472/

---
Changes in v3:
- Cleaned up unused parameters
- Reworded some calculations for clarity
- Changed get_bytes_per_soft_slice() to a public method
- Added comment documentation to MSM DSC helpers
- Changed msm_dsc_get_eol_byte_num() to *_get_bytes_per_intf()
- Split dsi_timing_setup() hdisplay calculation to a separate patch
- Dropped 78c8b81d57d8 ("drm/display/dsc: Add flatness and initial scale
  value calculations") patch as it was absorbed in Dmitry's DSC series [1]
- Link to v2: 
https://lore.kernel.org/r/20230329-rfc-msm-dsc-helper-v2-0-3c13ced53...@quicinc.com

Changes in v2:
- Changed det_thresh_flatness to flatness_det_thresh
- Moved msm_dsc_helper files to msm/ directory
- Fixed type mismatch issues in MSM DSC helpers
- Dropped MSM_DSC_SLICE_PER_PKT macro
- Removed get_comp_ratio() helper
- Style changes to improve readability
- Use drm_dsc_get_bpp_int() instead of DSC_BPP macro
- Picked up Fixes tags for patches 3/5 and 4/5
- Picked up Reviewed-by for patch 4/5
- Split eol_byte_num and pkt_per_line calculation into a separate patch
- Moved pclk_per_line calculation into `if (dsc)` block in
  dsi_timing_setup()
- Link to v1: 
https://lore.kernel.org/r/20230329-rfc-msm-dsc-helper-v1-0-f3e479f59...@quicinc.com

---
Jessica Zhang (6):
  drm/msm: Add MSM-specific DSC helper methods
  drm/msm/dpu: Use DRM DSC helper for det_thresh_flatness
  drm/msm/dpu: Fix slice_last_group_size calculation
  drm/msm/dsi: Use MSM and DRM DSC helper methods
  drm/msm/dsi: update hdisplay calculation for dsi_timing_setup
  drm/msm/dsi: Fix calculations pkt_per_line

 drivers/gpu/drm/msm/Makefile   |  1 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c |  9 ++--
 drivers/gpu/drm/msm/dsi/dsi_host.c | 22 --
 drivers/gpu/drm/msm/msm_dsc_helper.c   | 47 
 drivers/gpu/drm/msm/msm_dsc_helper.h   | 70 ++
 5 files changed, 142 insertions(+), 7 deletions(-)
---
base-commit: 56777fc93a145afcf71b92ba4281250f59ba6d9b
change-id: 20230329-rfc-msm-dsc-helper-981a95edfbd0

Best regards,
-- 
Jessica Zhang

[PATCH v3 2/6] drm/msm/dpu: Use DRM DSC helper for det_thresh_flatness

2023-04-04 Thread Jessica Zhang

Use the DRM DSC helper for det_thresh_flatness to match downstream
implementation and the DSC spec.

Changes in V2:
- Added a Fixes tag

Fixes: c110cfd1753e ("drm/msm/disp/dpu1: Add support for DSC")
Signed-off-by: Jessica Zhang 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
index 619926da1441..b952f7d2b7f5 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c
@@ -3,6 +3,8 @@
  * Copyright (c) 2020-2022, Linaro Limited
  */
 
+#include 
+
 #include "dpu_kms.h"
 #include "dpu_hw_catalog.h"
 #include "dpu_hwio.h"
@@ -102,7 +104,7 @@ static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc,
data |= dsc->final_offset;
DPU_REG_WRITE(c, DSC_DSC_OFFSET, data);
 
-   det_thresh_flatness = 7 + 2 * (dsc->bits_per_component - 8);
+   det_thresh_flatness = drm_dsc_calculate_flatness_det_thresh(dsc);
data = det_thresh_flatness << 10;
data |= dsc->flatness_max_qp << 5;
data |= dsc->flatness_min_qp;

-- 
2.40.0

Re: [PATCH v4 26/42] drm/msm/dpu: expand sc7180 catalog

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 6:06 AM, Dmitry Baryshkov wrote:

Duplicate sm8250 catalog entries to sc7180 to remove dependencies
between DPU instances.

Signed-off-by: Dmitry Baryshkov 
---


Reviewed-by: Abhinav Kumar

Re: [PATCH v4 25/42] drm/msm/dpu: expand sc8180x catalog

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 6:06 AM, Dmitry Baryshkov wrote:

Duplicate sm8150 catalog entries to sc8180x to remove dependencies
between DPU instances.

Signed-off-by: Dmitry Baryshkov 


Reviewed-by: Abhinav Kumar

Re: [PATCH v4 03/42] drm/msm/dpu: Allow variable INTF_BLK size

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:

From: Konrad Dybcio 

These blocks are of variable length on different SoCs. Set the
correct values where I was able to retrieve it from downstream
DTs and leave the old defaults (0x280) otherwise.

Signed-off-by: Konrad Dybcio 
[DB: fixed some lengths, split the INTF changes away]
Signed-off-by: Dmitry Baryshkov 


Everything is fine except sm8250.

DPU | SoC  | INTF_DSI size
5.0 | sm8150   | 0x2bc
5.1 | sc8180x  | 0x2bc
6.0 | sm8250   | 0x2c0
6.2 | sc7180   | 0x2c0
6.3 | sm6115   | 0x2c0
6.5 | qcm2290  | 0x2c0
7.0 | sm8350   | 0x2c4
7.2 | sc7280   | 0x2c4
8.0 | sc8280xp | 0x300
8.1 | sm8450   | 0x300
9.0 | sm8550   | 0x300

Today sm8250 is using the same table as sm8150 but it needs 0x2c0 and 
not 0x2bc.


We should de-duplicate it add a new one for sm8250?


---
  .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c| 96 +--
  1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
index 9e20ab6acbd2..f4ffd7fc4424 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -1853,10 +1853,10 @@ static struct dpu_dsc_cfg sm8150_dsc[] = {
  /*
   * INTF sub blocks config
   */
-#define INTF_BLK(_name, _id, _base, _type, _ctrl_id, _progfetch, _features, 
_reg, _underrun_bit, _vsync_bit) \
+#define INTF_BLK(_name, _id, _base, _len, _type, _ctrl_id, _progfetch, 
_features, _reg, _underrun_bit, _vsync_bit) \
{\
.name = _name, .id = _id, \
-   .base = _base, .len = 0x280, \
+   .base = _base, .len = _len, \
.features = _features, \
.type = _type, \
.controller_id = _ctrl_id, \
@@ -1866,85 +1866,85 @@ static struct dpu_dsc_cfg sm8150_dsc[] = {
}
  
  static const struct dpu_intf_cfg msm8998_intf[] = {

-   INTF_BLK("intf_0", INTF_0, 0x6A000, INTF_DP, 0, 25, INTF_SDM845_MASK, 
MDP_SSPP_TOP0_INTR, 24, 25),
-   INTF_BLK("intf_1", INTF_1, 0x6A800, INTF_DSI, 0, 25, INTF_SDM845_MASK, 
MDP_SSPP_TOP0_INTR, 26, 27),
-   INTF_BLK("intf_2", INTF_2, 0x6B000, INTF_DSI, 1, 25, INTF_SDM845_MASK, 
MDP_SSPP_TOP0_INTR, 28, 29),
-   INTF_BLK("intf_3", INTF_3, 0x6B800, INTF_HDMI, 0, 25, INTF_SDM845_MASK, 
MDP_SSPP_TOP0_INTR, 30, 31),
+   INTF_BLK("intf_0", INTF_0, 0x6A000, 0x280, INTF_DP, 0, 25, 
INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 24, 25),
+   INTF_BLK("intf_1", INTF_1, 0x6A800, 0x280, INTF_DSI, 0, 25, 
INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 26, 27),
+   INTF_BLK("intf_2", INTF_2, 0x6B000, 0x280, INTF_DSI, 1, 25, 
INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 28, 29),
+   INTF_BLK("intf_3", INTF_3, 0x6B800, 0x280, INTF_HDMI, 0, 25, 
INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 30, 31),
  };
  
  static const struct dpu_intf_cfg sdm845_intf[] = {

-   INTF_BLK("intf_0", INTF_0, 0x6A000, INTF_DP, 0, 24, INTF_SDM845_MASK, 
MDP_SSPP_TOP0_INTR, 24, 25),
-   INTF_BLK("intf_1", INTF_1, 0x6A800, INTF_DSI, 0, 24, INTF_SDM845_MASK, 
MDP_SSPP_TOP0_INTR, 26, 27),
-   INTF_BLK("intf_2", INTF_2, 0x6B000, INTF_DSI, 1, 24, INTF_SDM845_MASK, 
MDP_SSPP_TOP0_INTR, 28, 29),
-   INTF_BLK("intf_3", INTF_3, 0x6B800, INTF_DP, 1, 24, INTF_SDM845_MASK, 
MDP_SSPP_TOP0_INTR, 30, 31),
+   INTF_BLK("intf_0", INTF_0, 0x6A000, 0x280, INTF_DP, 0, 24, 
INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 24, 25),
+   INTF_BLK("intf_1", INTF_1, 0x6A800, 0x280, INTF_DSI, 0, 24, 
INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 26, 27),
+   INTF_BLK("intf_2", INTF_2, 0x6B000, 0x280, INTF_DSI, 1, 24, 
INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 28, 29),
+   INTF_BLK("intf_3", INTF_3, 0x6B800, 0x280, INTF_DP, 1, 24, 
INTF_SDM845_MASK, MDP_SSPP_TOP0_INTR, 30, 31),
  };
  
  static const struct dpu_intf_cfg sc7180_intf[] = {

-   INTF_BLK("intf_0", INTF_0, 0x6A000, INTF_DP, MSM_DP_CONTROLLER_0, 24, 
INTF_SC7180_MASK, MDP_SSPP_TOP0_INTR, 24, 25),
-   INTF_BLK("intf_1", INTF_1, 0x6A800, INTF_DSI, 0, 24, INTF_SC7180_MASK, 
MDP_SSPP_TOP0_INTR, 26, 27),
+   INTF_BLK("intf_0", INTF_0, 0x6A000, 0x280, INTF_DP, 
MSM_DP_CONTROLLER_0, 24, INTF_SC7180_MASK, MDP_SSPP_TOP0_INTR, 24, 25),
+   INTF_BLK("intf_1", INTF_1, 0x6A800, 0x2c0, INTF_DSI, 0, 24, 
INTF_SC7180_MASK, MDP_SSPP_TOP0_INTR, 26, 27),
  };
  
  static const struct dpu_intf_cfg sm8150_intf[] = {

-   INTF_BLK("intf_0", INTF_0, 0x6A000, INTF_DP, 0, 24, INTF_SC7180_MASK, 
MDP_SSPP_TOP0_INTR, 24, 25),
-   INTF_BLK("intf_1", INTF_1, 0x6A800, INTF_DSI, 0, 24, INTF_SC7180_MASK, 
MDP_SSPP_TOP0_INTR, 26, 27),
-   INTF_BLK("intf_2", INTF_2, 0x6B000, INTF_DSI, 1, 24, INTF_SC7180_MASK, 
MDP_SSPP_TOP0_INTR, 28, 29),
-   INTF_BLK("intf_3", INTF_3, 0x6B800, INTF_DP, 1, 24, INTF_SC7180_MASK, 
MDP_SSPP_TOP0_INTR, 30, 31),
+   INTF_BLK("intf_0", INTF_0, 0x6A000, 0x280, INTF_DP, 0, 24, 
INTF_SC

Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-04 Thread Kenneth Graunke

On Monday, April 3, 2023 9:48:40 AM PDT Ville Syrjälä wrote:
> On Mon, Apr 03, 2023 at 09:35:32AM -0700, Matt Roper wrote:
> > On Mon, Apr 03, 2023 at 07:02:08PM +0300, Ville Syrjälä wrote:
> > > On Fri, Mar 31, 2023 at 11:38:30PM -0700, fei.y...@intel.com wrote:
> > > > From: Fei Yang 
> > > > 
> > > > To comply with the design that buffer objects shall have immutable
> > > > cache setting through out its life cycle, {set, get}_caching ioctl's
> > > > are no longer supported from MTL onward. With that change caching
> > > > policy can only be set at object creation time. The current code
> > > > applies a default (platform dependent) cache setting for all objects.
> > > > However this is not optimal for performance tuning. The patch extends
> > > > the existing gem_create uAPI to let user set PAT index for the object
> > > > at creation time.
> > > 
> > > This is missing the whole justification for the new uapi.
> > > Why is MOCS not sufficient?
> > 
> > PAT and MOCS are somewhat related, but they're not the same thing.  The
> > general direction of the hardware architecture recently has been to
> > slowly dumb down MOCS and move more of the important memory/cache
> > control over to the PAT instead.  On current platforms there is some
> > overlap (and MOCS has an "ignore PAT" setting that makes the MOCS "win"
> > for the specific fields that both can control), but MOCS doesn't have a
> > way to express things like snoop/coherency mode (on MTL), or class of
> > service (on PVC).  And if you check some of the future platforms, the
> > hardware design starts packing even more stuff into the PAT (not just
> > cache behavior) which will never be handled by MOCS.
> 
> Sigh. So the hardware designers screwed up MOCS yet again and
> instead of getting that fixed we are adding a new uapi to work
> around it?
> 
> The IMO sane approach (which IIRC was the situation for a few
> platform generations at least) is that you just shove the PAT
> index into MOCS (or tell it to go look it up from the PTE).
> Why the heck did they not just stick with that?

There are actually some use cases in newer APIs where MOCS doesn't
work well.  For example, VK_KHR_buffer_device_address in Vulkan 1.2:

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_buffer_device_address.html

It essentially adds "pointers to buffer memory in shaders", where apps
can just get a 64-bit pointer, and use it with an address.  On our EUs,
that turns into A64 data port messages which refer directly to memory.
Notably, there's no descriptor (i.e. SURFACE_STATE) where you could
stuff a MOCS value.  So, you get one single MOCS entry for all such
buffers...which is specified in STATE_BASE_ADDRESS.  Hope you wanted
all of them to have the same cache & coherency settings!

With PAT/PTE, we can at least specify settings for each buffer, rather
than one global setting.

Compression has also been moving towards virtual address-based solutions
and handling in the caches and memory controller, rather than in e.g.
the sampler reading SURFACE_STATE.  (It started evolving that way with
Tigerlake, really, but continues.)

> > Also keep in mind that MOCS generally applies at the GPU instruction
> > level; although a lot of instructions have a field to provide a MOCS
> > index, or can use a MOCS already associated with a surface state, there
> > are still some that don't. PAT is the source of memory access
> > characteristics for anything that can't provide a MOCS directly.
> 
> So what are the things that don't have MOCS and where we need
> some custom cache behaviour, and we already know all that at
> buffer creation time?

For Meteorlake...we have MOCS for cache settings.  We only need to use
PAT for coherency settings; I believe we can get away with deciding that
up-front at buffer creation time.  If we were doing full cacheability,
I'd be very nervous about deciding performance tuning at creation time.

--Ken

signature.asc
Description: This is a digitally signed message part.

Re: [PATCH v4 02/42] drm/msm/dpu: Allow variable SSPP_BLK size

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:

From: Konrad Dybcio 

These blocks are of variable length on different SoCs. Set the
correct values where I was able to retrieve it from downstream
DTs and leave the old defaults (0x1c8) otherwise.

Signed-off-by: Konrad Dybcio 
[DB: fixed some of lengths, split the INTF changes away]
Signed-off-by: Dmitry Baryshkov 


Everything is fine except sm8450. As I already wrote in the previous 
version's review, it should be 0x32c for sm8450.



---
  .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c| 146 +-
  1 file changed, 73 insertions(+), 73 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
index 83f8f83e2b29..9e20ab6acbd2 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -1172,11 +1172,11 @@ static const struct dpu_sspp_sub_blks sdm845_dma_sblk_1 = 
_DMA_SBLK("9", 2);
  static const struct dpu_sspp_sub_blks sdm845_dma_sblk_2 = _DMA_SBLK("10", 3);
  static const struct dpu_sspp_sub_blks sdm845_dma_sblk_3 = _DMA_SBLK("11", 4);
  
-#define SSPP_BLK(_name, _id, _base, _features, \

+#define SSPP_BLK(_name, _id, _base, _len, _features, \
_sblk, _xinid, _type, _clkctrl) \
{ \
.name = _name, .id = _id, \
-   .base = _base, .len = 0x1c8, \
+   .base = _base, .len = _len, \
.features = _features, \
.sblk = &_sblk, \
.xin_id = _xinid, \
@@ -1185,40 +1185,40 @@ static const struct dpu_sspp_sub_blks sdm845_dma_sblk_3 = 
_DMA_SBLK("11", 4);
}
  
  static const struct dpu_sspp_cfg msm8998_sspp[] = {

-   SSPP_BLK("sspp_0", SSPP_VIG0, 0x4000, VIG_MSM8998_MASK,
+   SSPP_BLK("sspp_0", SSPP_VIG0, 0x4000, 0x1ac, VIG_MSM8998_MASK,
msm8998_vig_sblk_0, 0,  SSPP_TYPE_VIG, DPU_CLK_CTRL_VIG0),
-   SSPP_BLK("sspp_1", SSPP_VIG1, 0x6000, VIG_MSM8998_MASK,
+   SSPP_BLK("sspp_1", SSPP_VIG1, 0x6000, 0x1ac, VIG_MSM8998_MASK,
msm8998_vig_sblk_1, 4,  SSPP_TYPE_VIG, DPU_CLK_CTRL_VIG1),
-   SSPP_BLK("sspp_2", SSPP_VIG2, 0x8000, VIG_MSM8998_MASK,
+   SSPP_BLK("sspp_2", SSPP_VIG2, 0x8000, 0x1ac, VIG_MSM8998_MASK,
msm8998_vig_sblk_2, 8, SSPP_TYPE_VIG, DPU_CLK_CTRL_VIG2),
-   SSPP_BLK("sspp_3", SSPP_VIG3, 0xa000, VIG_MSM8998_MASK,
+   SSPP_BLK("sspp_3", SSPP_VIG3, 0xa000, 0x1ac, VIG_MSM8998_MASK,
msm8998_vig_sblk_3, 12,  SSPP_TYPE_VIG, DPU_CLK_CTRL_VIG3),
-   SSPP_BLK("sspp_8", SSPP_DMA0, 0x24000,  DMA_MSM8998_MASK,
+   SSPP_BLK("sspp_8", SSPP_DMA0, 0x24000, 0x1ac, DMA_MSM8998_MASK,
sdm845_dma_sblk_0, 1, SSPP_TYPE_DMA, DPU_CLK_CTRL_DMA0),
-   SSPP_BLK("sspp_9", SSPP_DMA1, 0x26000,  DMA_MSM8998_MASK,
+   SSPP_BLK("sspp_9", SSPP_DMA1, 0x26000, 0x1ac, DMA_MSM8998_MASK,
sdm845_dma_sblk_1, 5, SSPP_TYPE_DMA, DPU_CLK_CTRL_DMA1),
-   SSPP_BLK("sspp_10", SSPP_DMA2, 0x28000,  DMA_CURSOR_MSM8998_MASK,
+   SSPP_BLK("sspp_10", SSPP_DMA2, 0x28000, 0x1ac, DMA_CURSOR_MSM8998_MASK,
sdm845_dma_sblk_2, 9, SSPP_TYPE_DMA, DPU_CLK_CTRL_DMA2),
-   SSPP_BLK("sspp_11", SSPP_DMA3, 0x2a000,  DMA_CURSOR_MSM8998_MASK,
+   SSPP_BLK("sspp_11", SSPP_DMA3, 0x2a000, 0x1ac, DMA_CURSOR_MSM8998_MASK,
sdm845_dma_sblk_3, 13, SSPP_TYPE_DMA, DPU_CLK_CTRL_DMA3),
  };
  
  static const struct dpu_sspp_cfg sdm845_sspp[] = {

-   SSPP_BLK("sspp_0", SSPP_VIG0, 0x4000, VIG_SDM845_MASK_SDMA,
+   SSPP_BLK("sspp_0", SSPP_VIG0, 0x4000, 0x1c8, VIG_SDM845_MASK_SDMA,
sdm845_vig_sblk_0, 0,  SSPP_TYPE_VIG, DPU_CLK_CTRL_VIG0),
-   SSPP_BLK("sspp_1", SSPP_VIG1, 0x6000, VIG_SDM845_MASK_SDMA,
+   SSPP_BLK("sspp_1", SSPP_VIG1, 0x6000, 0x1c8, VIG_SDM845_MASK_SDMA,
sdm845_vig_sblk_1, 4,  SSPP_TYPE_VIG, DPU_CLK_CTRL_VIG1),
-   SSPP_BLK("sspp_2", SSPP_VIG2, 0x8000, VIG_SDM845_MASK_SDMA,
+   SSPP_BLK("sspp_2", SSPP_VIG2, 0x8000, 0x1c8, VIG_SDM845_MASK_SDMA,
sdm845_vig_sblk_2, 8, SSPP_TYPE_VIG, DPU_CLK_CTRL_VIG2),
-   SSPP_BLK("sspp_3", SSPP_VIG3, 0xa000, VIG_SDM845_MASK_SDMA,
+   SSPP_BLK("sspp_3", SSPP_VIG3, 0xa000, 0x1c8, VIG_SDM845_MASK_SDMA,
sdm845_vig_sblk_3, 12,  SSPP_TYPE_VIG, DPU_CLK_CTRL_VIG3),
-   SSPP_BLK("sspp_8", SSPP_DMA0, 0x24000,  DMA_SDM845_MASK_SDMA,
+   SSPP_BLK("sspp_8", SSPP_DMA0, 0x24000, 0x1c8, DMA_SDM845_MASK_SDMA,
sdm845_dma_sblk_0, 1, SSPP_TYPE_DMA, DPU_CLK_CTRL_DMA0),
-   SSPP_BLK("sspp_9", SSPP_DMA1, 0x26000,  DMA_SDM845_MASK_SDMA,
+   SSPP_BLK("sspp_9", SSPP_DMA1, 0x26000, 0x1c8, DMA_SDM845_MASK_SDMA,
sdm845_dma_sblk_1, 5, SSPP_TYPE_DMA, DPU_CLK_CTRL_DMA1),
-   SSPP_BLK("sspp_10", SSPP_DMA2, 0x28000,  DMA_CURSOR_SDM845_MASK_SDMA,
+   SSPP_BLK("sspp_10", SSPP_DMA2, 0x28000, 0x1c8, 
DMA_CURSOR_SDM845_MASK_SDMA,
sdm845_dma_sblk_2, 9, SSPP_TYPE_DMA, DPU_CLK_CTRL_D

Re: [PATCH v4 01/42] drm/msm/dpu: use CTL_SC7280_MASK for sm8450's ctl_0

2023-04-04 Thread Abhinav Kumar





On 4/4/2023 6:05 AM, Dmitry Baryshkov wrote:

On sm8450 platform the CTL_0 doesn't differ from the rest of CTL blocks,
so switch it to CTL_SC7280_MASK too.

Some background: original commit 100d7ef6995d ("drm/msm/dpu: add support
for SM8450") had all (relevant at that time) bit spelled individually.
Then commit 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 to hw catalog"),
despite being a mismerge, correctly changed all other CTL entries to use
CTL_SC7280_MASK, except CTL_0.



I think having it spelled individually is better. If we start using one 
chipset's mask for another, we are again going down the same path of 
this becoming one confused file.


So, even though I agree that 0e91bcbb0016 ("drm/msm/dpu: Add SM8350 to 
hw catalog") corrected the mask to re-use sc7280, with the individual 
catalog file, its better to have it separate and spelled individually.


This change is not heading in the direction of the rest of the series.


Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
index 6840b22a4159..83f8f83e2b29 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -975,7 +975,7 @@ static const struct dpu_ctl_cfg sm8450_ctl[] = {
{
.name = "ctl_0", .id = CTL_0,
.base = 0x15000, .len = 0x204,
-   .features = BIT(DPU_CTL_ACTIVE_CFG) | BIT(DPU_CTL_SPLIT_DISPLAY) | 
BIT(DPU_CTL_FETCH_ACTIVE),
+   .features = BIT(DPU_CTL_SPLIT_DISPLAY) | CTL_SC7280_MASK,
.intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
},
{

Re: [PATCH v2 2/8] drm/msm: Clear aperture ownership outside of fbdev code

2023-04-04 Thread Dmitry Baryshkov


On 03/04/2023 15:45, Thomas Zimmermann wrote:

Move aperture management out of the fbdev code. It is unrelated
and needs to run even if fbdev support has been disabled. Call
the helper at the top of msm_drm_init() to take over hardware
from other drivers.

v2:
* bind all subdevices before acquiring device (Dmitri)

Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/msm/msm_drv.c   | 6 ++
  drivers/gpu/drm/msm/msm_fbdev.c | 6 --
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index aca48c868c14..2a1c6ced82c9 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -12,6 +12,7 @@
  #include 
  #include 
  
+#include 

  #include 
  #include 
  #include 
@@ -451,6 +452,11 @@ static int msm_drm_init(struct device *dev, const struct 
drm_driver *drv)
if (ret)
goto err_drm_dev_put;
  
+	/* the fw fb could be anywhere in memory */

+   ret = drm_aperture_remove_framebuffers(false, drv);
+   if (ret)
+   goto err_drm_dev_put;


This should be goto err_msm_uninit to unbind devices. I'll fix this 
while applying.



+
dma_set_max_seg_size(dev, UINT_MAX);
  
  	msm_gem_shrinker_init(ddev);

diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
index d26aa52217ce..fc7d0406a9f9 100644
--- a/drivers/gpu/drm/msm/msm_fbdev.c
+++ b/drivers/gpu/drm/msm/msm_fbdev.c
@@ -4,7 +4,6 @@
   * Author: Rob Clark 
   */
  
-#include 

  #include 
  #include 
  #include 
@@ -154,11 +153,6 @@ struct drm_fb_helper *msm_fbdev_init(struct drm_device 
*dev)
goto fail;
}
  
-	/* the fw fb could be anywhere in memory */

-   ret = drm_aperture_remove_framebuffers(false, dev->driver);
-   if (ret)
-   goto fini;
-
ret = drm_fb_helper_initial_config(helper);
if (ret)
goto fini;


--
With best wishes
Dmitry

Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

2023-04-04 Thread Dmitry Baryshkov


On 04/04/2023 22:16, Daniel Vetter wrote:

On Tue, Apr 04, 2023 at 08:22:05PM +0300, Dmitry Baryshkov wrote:

On 08/03/2023 17:53, Rob Clark wrote:

From: Rob Clark 

For an atomic commit updating a single CRTC (ie. a pageflip) calculate
the next vblank time, and inform the fence(s) of that deadline.

v2: Comment typo fix (danvet)
v3: If there are multiple CRTCs, consider the time of the soonest vblank

Signed-off-by: Rob Clark 
Reviewed-by: Daniel Vetter 
Signed-off-by: Rob Clark 
---
   drivers/gpu/drm/drm_atomic_helper.c | 37 +
   1 file changed, 37 insertions(+)


As I started playing with hotplug on RB5 (sm8250, DSI-HDMI bridge), I found
that this patch introduces the following backtrace on HDMI hotplug. Is there
anything that I can do to debug/fix the issue? The warning seems harmless,
but it would be probably be good to still fix it. With addresses decoded:


Bit a shot in the dark, but does the below help?


This indeed seems to fix the issue. I'm not sure about the possible side 
effects, but, if you were to send the patch:


Tested-by: Dmitry Baryshkov 




diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index f21b5a74176c..6640d80d84f3 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1528,6 +1528,9 @@ static void set_fence_deadline(struct drm_device *dev,
for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
ktime_t v;
  
+		if (drm_atomic_crtc_needs_modeset(new_crtc_state))

+   continue;
+
if (drm_crtc_next_vblank_start(crtc, &v))
continue;
  
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c

index 78a8c51a4abf..7ae38e8e27e8 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -1001,6 +1001,9 @@ int drm_crtc_next_vblank_start(struct drm_crtc *crtc, 
ktime_t *vblanktime)
struct drm_display_mode *mode = &vblank->hwmode;
u64 vblank_start;
  
+	if (!drm_dev_has_vblank(crtc->dev))

+   return -EINVAL;
+
if (!vblank->framedur_ns || !vblank->linedur_ns)
return -EINVAL;
  



[   31.151348] [ cut here ]
[   31.157043] msm_dpu ae01000.display-controller:
drm_WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev))
[   31.157177] WARNING: CPU: 0 PID: 13 at drivers/gpu/drm/drm_vblank.c:728
drm_crtc_vblank_helper_get_vblank_timestamp_internal
(drivers/gpu/drm/drm_vblank.c:728)
[   31.180629] Modules linked in:
[   31.184106] CPU: 0 PID: 13 Comm: kworker/0:1 Not tainted
6.3.0-rc2-8-gd39e48ca80c0 #542
[   31.193358] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
[   31.200796] Workqueue: events lt9611uxc_hpd_work
[   31.205990] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[   31.213722] pc : drm_crtc_vblank_helper_get_vblank_timestamp_internal
(drivers/gpu/drm/drm_vblank.c:728)
[   31.222032] lr : drm_crtc_vblank_helper_get_vblank_timestamp_internal
(drivers/gpu/drm/drm_vblank.c:728)
[   31.230341] sp : 880bb8d0
[   31.234061] x29: 880bb900 x28: 0038 x27:
61a7956b8d60
[   31.242051] x26:  x25:  x24:
880bb9c4
[   31.250038] x23: 0001 x22: bf0033b94ef0 x21:
61a7957901d0
[   31.258029] x20: 61a79571 x19: 61a78128b000 x18:
fffec278
[   31.266014] x17: 00400465 x16: 0020 x15:
0060
[   31.274001] x14: 0001 x13: bf00354550e0 x12:
0825
[   31.281989] x11: 02b7 x10: bf00354b1208 x9 :
bf00354550e0
[   31.289976] x8 : efff x7 : bf00354ad0e0 x6 :
02b7
[   31.297963] x5 : 61a8feebbe48 x4 : 4000f2b7 x3 :
a2a8c9f64000
[   31.305947] x2 :  x1 :  x0 :
61a780283100
[   31.313934] Call trace:
[   31.316719] drm_crtc_vblank_helper_get_vblank_timestamp_internal
(drivers/gpu/drm/drm_vblank.c:728)
[   31.324646] drm_crtc_vblank_helper_get_vblank_timestamp
(drivers/gpu/drm/drm_vblank.c:843)
[   31.331528] drm_crtc_get_last_vbltimestamp
(drivers/gpu/drm/drm_vblank.c:884)
[   31.337170] drm_crtc_next_vblank_start
(drivers/gpu/drm/drm_vblank.c:1006)
[   31.342430] drm_atomic_helper_wait_for_fences
(drivers/gpu/drm/drm_atomic_helper.c:1531
drivers/gpu/drm/drm_atomic_helper.c:1578)
[   31.348561] drm_atomic_helper_commit
(drivers/gpu/drm/drm_atomic_helper.c:2007)
[   31.353724] drm_atomic_commit (drivers/gpu/drm/drm_atomic.c:1444)
[   31.358127] drm_client_modeset_commit_atomic
(drivers/gpu/drm/drm_client_modeset.c:1045)
[   31.364146] drm_client_modeset_commit_locked
(drivers/gpu/drm/drm_client_modeset.c:1148)
[   31.370071] drm_client_modeset_commit
(drivers/gpu/drm/drm_client_modeset.c:1174)
[   31.375233] drm_fb_helper_set_par (drivers/gpu/drm/drm_fb_helper.c:254
drivers/gpu/drm/drm_fb_helper.c:229 drivers/gpu/

Re: [PATCH] drm/msm/mdp5: set varaiable msm8x76_config storage-class-specifier to static

2023-04-04 Thread Dmitry Baryshkov


On 04/04/2023 21:53, Tom Rix wrote:

smatch reports
drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c:658:26: warning: symbol
   'msm8x76_config' was not declared. Should it be static?

This variable is only used in one file so should be static.

Signed-off-by: Tom Rix 
---
  drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

Re: [PATCH v2 8/8] drm/msm: Implement fbdev emulation as in-kernel client

2023-04-04 Thread Dmitry Baryshkov


On 03/04/2023 15:45, Thomas Zimmermann wrote:

Move code from ad-hoc fbdev callbacks into DRM client functions
and remove the old callbacks. The functions instruct the client
to poll for changed output or restore the display. The DRM core
calls both, the old callbacks and the new client helpers, from
the same places. The new functions perform the same operation as
before, so there's no change in functionality.

Replace all code that initializes or releases fbdev emulation
throughout the driver. Instead initialize the fbdev client by a
single call to msm_fbdev_setup() after msm has registered its
DRM device. As in most drivers, msm's fbdev emulation now acts
like a regular DRM client.

The fbdev client setup consists of the initial preparation and the
hot-plugging of the display. The latter creates the fbdev device
and sets up the fbdev framebuffer. The setup performs display
hot-plugging once. If no display can be detected, DRM probe helpers
re-run the detection on each hotplug event.

A call to drm_dev_unregister() releases the client automatically.
No further action is required within msm. If the fbdev framebuffer
has been fully set up, struct fb_ops.fb_destroy implements the
release. For partially initialized emulation, the fbdev client
reverts the initial setup.

v2:
* handle fbdev module parameter correctly (kernel test robot)

Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/msm/msm_debugfs.c |   1 +
  drivers/gpu/drm/msm/msm_drv.c |  15 +---
  drivers/gpu/drm/msm/msm_drv.h |  10 ++-
  drivers/gpu/drm/msm/msm_fbdev.c   | 115 ++
  4 files changed, 81 insertions(+), 60 deletions(-)


Reviewed-by: Dmitry Baryshkov 
Tested-by: Dmitry Baryshkov  # RB5

--
With best wishes
Dmitry

Re: [PATCH v2 6/8] drm/msm: Move module parameter 'fbdev' to fbdev code

2023-04-04 Thread Dmitry Baryshkov


On 03/04/2023 15:45, Thomas Zimmermann wrote:

Define the module's parameter 'fbdev' in fbdev code. No other code
uses it. No functional changes, but simplifies the later conversion
to struct drm_client.

Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/msm/msm_drv.c   | 10 ++
  drivers/gpu/drm/msm/msm_fbdev.c |  7 +++
  2 files changed, 9 insertions(+), 8 deletions(-)


Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

Re: [PATCH v2 4/8] drm/msm: Remove struct msm_fbdev

2023-04-04 Thread Dmitry Baryshkov


On 03/04/2023 15:45, Thomas Zimmermann wrote:

Remove struct msm_fbdev, which is an empty wrapper around struct
drm_fb_helper. Use the latter directly. No functional changes.

v2:
* kfree fbdev helper instance on init errors (Dmitri)

Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/msm/msm_fbdev.c | 20 
  1 file changed, 4 insertions(+), 16 deletions(-)



Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

Re: [PATCH v2 2/8] drm/msm: Clear aperture ownership outside of fbdev code

2023-04-04 Thread Dmitry Baryshkov


On 03/04/2023 15:45, Thomas Zimmermann wrote:

Move aperture management out of the fbdev code. It is unrelated
and needs to run even if fbdev support has been disabled. Call
the helper at the top of msm_drm_init() to take over hardware
from other drivers.

v2:
* bind all subdevices before acquiring device (Dmitri)

Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/msm/msm_drv.c   | 6 ++
  drivers/gpu/drm/msm/msm_fbdev.c | 6 --
  2 files changed, 6 insertions(+), 6 deletions(-)


Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

Re: [PATCH v2 1/8] drm/msm: Include

2023-04-04 Thread Dmitry Baryshkov


On 03/04/2023 15:45, Thomas Zimmermann wrote:

Include  to get the declaration of devm_ioremap() on
sparc64. No functional changes.

Signed-off-by: Thomas Zimmermann 
Reported-by: kernel test robot 
Link: https://lore.kernel.org/oe-kbuild-all/202303301856.zsmpwzjj-...@intel.com/
---
  drivers/gpu/drm/msm/msm_io_utils.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

Re: [Intel-gfx] [PATCH v3] drm/i915/mtl: Disable stolen memory backed FB for A0

2023-04-04 Thread Das, Nirmoy




On 4/4/2023 8:27 PM, Ville Syrjälä wrote:

On Tue, Apr 04, 2023 at 08:13:42PM +0200, Nirmoy Das wrote:

Stolen memory is not usable for MTL A0 stepping beyond
certain access size and we have no control over userspace
access size of /dev/fb which can be backed by stolen memory.
So disable stolen memory backed fb by setting i915->dsm.usable_size
to zero.

v2: remove hsdes reference and fix commit message(Andi)
v3: use revid as we want to target SOC stepping(Radhakrishna)

Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Daniele Ceraolo Spurio 
Cc: Lucas De Marchi 
Cc: Radhakrishna Sripada 
Signed-off-by: Nirmoy Das 
Reviewed-by: Andi Shyti 
---
  drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 8ac376c24aa2..ee492d823f1b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -535,6 +535,14 @@ static int i915_gem_init_stolen(struct intel_memory_region 
*mem)
/* Basic memrange allocator for stolen space. */
drm_mm_init(&i915->mm.stolen, 0, i915->dsm.usable_size);
  
+	/*

+* Access to stolen lmem beyond certain size for MTL A0 stepping
+* would crash the machine. Disable stolen lmem for userspace access
+* by setting usable_size to zero.
+*/
+   if (IS_METEORLAKE(i915) && INTEL_REVID(i915) == 0x0)
+   i915->dsm.usable_size = 0;

That certainly won't prevent FBC from using stolen.
Are we sure that FBC accesses are fine?


I think so. I remember Jouni tested this patch internally to unblock a 
FBC test.


Jouni, could you please share your thoughts. I can't seem to find the 
internal JIRA reference right now.



Regards,

Nirmoy




+
return 0;
  }
  
--

2.39.0

Re: [PATCH 3/8] drm/aperture: Remove primary argument

2023-04-04 Thread Martin Blumenstingl

On Tue, Apr 4, 2023 at 10:18 PM Daniel Vetter  wrote:
>
> Only really pci devices have a business setting this - it's for
> figuring out whether the legacy vga stuff should be nuked too. And
> with the preceeding two patches those are all using the pci version of
I think it's spelled "preceding"

[...]
>  drivers/gpu/drm/meson/meson_drv.c   |  2 +-
for the meson driver:
Acked-by: Martin Blumenstingl 


Thank you and best regards,
Martin

Re: [PATCH] drm/vblank: Simplify drm_dev_has_vblank()

2023-04-04 Thread Ville Syrjälä

On Tue, Apr 04, 2023 at 10:46:00PM +0200, Daniel Vetter wrote:
> On Mon, Apr 03, 2023 at 09:07:35AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> > 
> > What does vblank have to do with num_crtcs?  Well, this was technically
> > correct, but you'd have to go look at where num_crtcs is initialized to
> > understand why.  Lets just replace it with the simpler and more obvious
> > check.
> 
> If you want to fix this, then I think the right fix is to rename num_crtcs
> to be something like num_vblank_crtcs. It's a historical accident back
> when vblanks without kms was a thing.
> 
> Plan B is someone gets really busy and fixes up the entire vblank mess and
> moves it into drm_crtc struct. Now that the dri1 drivers are gone we could
> indeed do that.

And easy first step could to simply wrap all the naked
&dev->vblank[drm_crtc_index()] things into a function
call with some cocci/etc. That way most of the vblank
code doesn't need to care where that thing actually lives.

-- 
Ville Syrjälä
Intel

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Matthew Brost

On Tue, Apr 04, 2023 at 10:31:58PM +0200, Daniel Vetter wrote:
> On Tue, Apr 04, 2023 at 08:19:37PM +, Matthew Brost wrote:
> > On Tue, Apr 04, 2023 at 10:11:59PM +0200, Daniel Vetter wrote:
> > > On Tue, 4 Apr 2023 at 22:04, Matthew Brost  
> > > wrote:
> > > >
> > > > On Tue, Apr 04, 2023 at 09:00:59PM +0200, Daniel Vetter wrote:
> > > > > On Tue, 4 Apr 2023 at 15:10, Christian König 
> > > > >  wrote:
> > > > > >
> > > > > > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > > > > > Hi, Christian,
> > > > > > >
> > > > > > > On 4/4/23 11:09, Christian König wrote:
> > > > > > >> Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > > > > > >>> From: Thomas Hellström 
> > > > > > >>>
> > > > > > >>> For long-running workloads, drivers either need to open-code 
> > > > > > >>> completion
> > > > > > >>> waits, invent their own synchronization primitives or 
> > > > > > >>> internally use
> > > > > > >>> dma-fences that do not obey the cross-driver dma-fence 
> > > > > > >>> protocol, but
> > > > > > >>> without any lockdep annotation all these approaches are error 
> > > > > > >>> prone.
> > > > > > >>>
> > > > > > >>> So since for example the drm scheduler uses dma-fences it is
> > > > > > >>> desirable for
> > > > > > >>> a driver to be able to use it for throttling and error handling 
> > > > > > >>> also
> > > > > > >>> with
> > > > > > >>> internal dma-fences tha do not obey the cros-driver dma-fence 
> > > > > > >>> protocol.
> > > > > > >>>
> > > > > > >>> Introduce long-running completion fences in form of dma-fences, 
> > > > > > >>> and add
> > > > > > >>> lockdep annotation for them. In particular:
> > > > > > >>>
> > > > > > >>> * Do not allow waiting under any memory management locks.
> > > > > > >>> * Do not allow to attach them to a dma-resv object.
> > > > > > >>> * Introduce a new interface for adding callbacks making the 
> > > > > > >>> helper
> > > > > > >>> adding
> > > > > > >>>a callback sign off on that it is aware that the dma-fence 
> > > > > > >>> may not
> > > > > > >>>complete anytime soon. Typically this will be the scheduler 
> > > > > > >>> chaining
> > > > > > >>>a new long-running fence on another one.
> > > > > > >>
> > > > > > >> Well that's pretty much what I tried before:
> > > > > > >> https://lwn.net/Articles/893704/
> > > > > > >>
> > > > > > >> And the reasons why it was rejected haven't changed.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Christian.
> > > > > > >>
> > > > > > > Yes, TBH this was mostly to get discussion going how we'd best 
> > > > > > > tackle
> > > > > > > this problem while being able to reuse the scheduler for 
> > > > > > > long-running
> > > > > > > workloads.
> > > > > > >
> > > > > > > I couldn't see any clear decision on your series, though, but one 
> > > > > > > main
> > > > > > > difference I see is that this is intended for driver-internal use
> > > > > > > only. (I'm counting using the drm_scheduler as a helper for
> > > > > > > driver-private use). This is by no means a way to try tackle the
> > > > > > > indefinite fence problem.
> > > > > >
> > > > > > Well this was just my latest try to tackle this, but essentially the
> > > > > > problems are the same as with your approach: When we express such
> > > > > > operations as dma_fence there is always the change that we leak that
> > > > > > somewhere.
> > > > > >
> > > > > > My approach of adding a flag noting that this operation is 
> > > > > > dangerous and
> > > > > > can't be synced with something memory management depends on tried to
> > > > > > contain this as much as possible, but Daniel still pretty clearly
> > > > > > rejected it (for good reasons I think).
> > > > >
> > > > > Yeah I still don't like dma_fence that somehow have totally different
> > > > > semantics in that critical piece of "will it complete or will it
> > > > > deadlock?" :-)
> > > >
> > > > Not going to touch LR dma-fences in this reply, I think we can continue
> > > > the LR fence discussion of the fork of this thread I just responded to.
> > > > Have a response the preempt fence discussion below.
> > > >
> > > > > >
> > > > > > >
> > > > > > > We could ofc invent a completely different data-type that 
> > > > > > > abstracts
> > > > > > > the synchronization the scheduler needs in the long-running case, 
> > > > > > > or
> > > > > > > each driver could hack something up, like sleeping in the
> > > > > > > prepare_job() or run_job() callback for throttling, but those 
> > > > > > > waits
> > > > > > > should still be annotated in one way or annotated one way or 
> > > > > > > another
> > > > > > > (and probably in a similar way across drivers) to make sure we 
> > > > > > > don't
> > > > > > > do anything bad.
> > > > > > >
> > > > > > >  So any suggestions as to what would be the better solution here 
> > > > > > > would
> > > > > > > be appreciated.
> > > > > >
> > > > > > Mhm, do we really the the GPU scheduler for that?
> > > > > >
> > > > > > I mean in the 1 to 1 case  you basi

Re: [PATCH] drm/vblank: Simplify drm_dev_has_vblank()

2023-04-04 Thread Daniel Vetter

On Mon, Apr 03, 2023 at 09:07:35AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> What does vblank have to do with num_crtcs?  Well, this was technically
> correct, but you'd have to go look at where num_crtcs is initialized to
> understand why.  Lets just replace it with the simpler and more obvious
> check.

If you want to fix this, then I think the right fix is to rename num_crtcs
to be something like num_vblank_crtcs. It's a historical accident back
when vblanks without kms was a thing.

Plan B is someone gets really busy and fixes up the entire vblank mess and
moves it into drm_crtc struct. Now that the dri1 drivers are gone we could
indeed do that.

Or maybe instead a patch to improve the kerneldoc for drm_dev->num_crtc?
-Daniel

> Signed-off-by: Rob Clark 
> ---
>  drivers/gpu/drm/drm_vblank.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> index 877e2067534f..ad34c235d853 100644
> --- a/drivers/gpu/drm/drm_vblank.c
> +++ b/drivers/gpu/drm/drm_vblank.c
> @@ -575,7 +575,7 @@ EXPORT_SYMBOL(drm_vblank_init);
>   */
>  bool drm_dev_has_vblank(const struct drm_device *dev)
>  {
> - return dev->num_crtcs != 0;
> + return !!dev->vblank;
>  }
>  EXPORT_SYMBOL(drm_dev_has_vblank);
>  
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v2] drm/vblank: Fix for drivers that do not drm_vblank_init()

2023-04-04 Thread Daniel Vetter

On Mon, Apr 03, 2023 at 10:50:34AM -0700, Rob Clark wrote:
> On Mon, Apr 3, 2023 at 9:25 AM Nathan Chancellor  wrote:
> >
> > On Mon, Apr 03, 2023 at 09:03:14AM -0700, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > This should fix a crash that was reported on ast (and possibly other
> > > drivers which do not initialize vblank).
> > >
> > >fbcon: Taking over console
> > >Unable to handle kernel NULL pointer dereference at virtual address 
> > > 0074
> > >Mem abort info:
> > >  ESR = 0x9604
> > >  EC = 0x25: DABT (current EL), IL = 32 bits
> > >  SET = 0, FnV = 0
> > >  EA = 0, S1PTW = 0
> > >  FSC = 0x04: level 0 translation fault
> > >Data abort info:
> > >  ISV = 0, ISS = 0x0004
> > >  CM = 0, WnR = 0
> > >user pgtable: 4k pages, 48-bit VAs, pgdp=080009d16000
> > >[0074] pgd=, p4d=
> > >Internal error: Oops: 9604 [#1] SMP
> > >Modules linked in: ip6table_nat tun nft_fib_inet nft_fib_ipv4 
> > > nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
> > > nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
> > > nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc 
> > > vfat fat xfs snd_usb_audio snd_hwdep snd_usbmidi_lib snd_seq snd_pcm 
> > > snd_rawmidi snd_timer snd_seq_device snd soundcore joydev mc ipmi_ssif 
> > > ipmi_devintf ipmi_msghandler arm_spe_pmu arm_cmn arm_dsu_pmu 
> > > arm_dmc620_pmu cppc_cpufreq loop zram crct10dif_ce polyval_ce nvme 
> > > polyval_generic ghash_ce sbsa_gwdt igb nvme_core ast nvme_common 
> > > i2c_algo_bit xgene_hwmon gpio_dwapb scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
> > > ip6_tables ip_tables dm_multipath fuse
> > >CPU: 12 PID: 469 Comm: kworker/12:1 Not tainted 
> > > 6.3.0-rc2-8-gd39e48ca80c0 #1
> > >Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, 
> > > BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
> > >Workqueue: events fbcon_register_existing_fbs
> > >pstate: 2049 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > >pc : drm_crtc_next_vblank_start+0x2c/0x98
> > >lr : drm_atomic_helper_wait_for_fences+0x90/0x240
> > >sp : 8d583960
> > >x29: 8d583960 x28: 07ff8fc187b0 x27: 
> > >x26: 07ff99c08c00 x25: 0038 x24: 07ff99c0c000
> > >x23: 0001 x22: 0038 x21: 
> > >x20: 07ff9640a280 x19:  x18: 
> > >x17:  x16: b24d2eece1c0 x15: 003038303178
> > >x14: 303239310048 x13:  x12: 
> > >x11:  x10:  x9 : b24d2eeeaca0
> > >x8 : 8d583628 x7 : 080077783000 x6 : 
> > >x5 : 8d584000 x4 : 07ff99c0c000 x3 : 0130
> > >x2 :  x1 : 8d5839c0 x0 : 07ff99c0cc08
> > >Call trace:
> > > drm_crtc_next_vblank_start+0x2c/0x98
> > > drm_atomic_helper_wait_for_fences+0x90/0x240
> > > drm_atomic_helper_commit+0xb0/0x188
> > > drm_atomic_commit+0xb0/0xf0
> > > drm_client_modeset_commit_atomic+0x218/0x280
> > > drm_client_modeset_commit_locked+0x64/0x1a0
> > > drm_client_modeset_commit+0x38/0x68
> > > __drm_fb_helper_restore_fbdev_mode_unlocked+0xb0/0xf8
> > > drm_fb_helper_set_par+0x44/0x88
> > > fbcon_init+0x1e0/0x4a8
> > > visual_init+0xbc/0x118
> > > do_bind_con_driver.isra.0+0x194/0x3a0
> > > do_take_over_console+0x50/0x70
> > > do_fbcon_takeover+0x74/0xf8
> > > do_fb_registered+0x13c/0x158
> > > fbcon_register_existing_fbs+0x78/0xc0
> > > process_one_work+0x1ec/0x478
> > > worker_thread+0x74/0x418
> > > kthread+0xec/0x100
> > > ret_from_fork+0x10/0x20
> > >Code: f944 b9409013 f940a082 9ba30a73 (b9407662)
> > >---[ end trace  ]---
> > >
> > > v2: Use drm_dev_has_vblank()
> > >
> > > Reported-by: Nathan Chancellor 
> > > Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
> > > Signed-off-by: Rob Clark 
> > > Reviewed-by: Thomas Zimmermann 
> >
> > Still appears to work for me:
> >
> > Tested-by: Nathan Chancellor 
> 
> Thanks for confirming

Pushed to drm-misc-next.
-Daniel

> 
> BR,
> -R
> 
> >
> > > ---
> > >  drivers/gpu/drm/drm_vblank.c | 10 --
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> > > index 299fa2a19a90..877e2067534f 100644
> > > --- a/drivers/gpu/drm/drm_vblank.c
> > > +++ b/drivers/gpu/drm/drm_vblank.c
> > > @@ -996,10 +996,16 @@ EXPORT_SYMBOL(drm_crtc_vblank_count_and_time);
> > >  int drm_crtc_next_vblank_start(struct drm_crtc *crtc, ktime_t 
> > > *vblanktime)
> > >  {
> > >   unsigned int pipe = drm_crtc_index(crtc);
> > > - struc

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Daniel Vetter

On Tue, Apr 04, 2023 at 08:19:37PM +, Matthew Brost wrote:
> On Tue, Apr 04, 2023 at 10:11:59PM +0200, Daniel Vetter wrote:
> > On Tue, 4 Apr 2023 at 22:04, Matthew Brost  wrote:
> > >
> > > On Tue, Apr 04, 2023 at 09:00:59PM +0200, Daniel Vetter wrote:
> > > > On Tue, 4 Apr 2023 at 15:10, Christian König  
> > > > wrote:
> > > > >
> > > > > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > > > > Hi, Christian,
> > > > > >
> > > > > > On 4/4/23 11:09, Christian König wrote:
> > > > > >> Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > > > > >>> From: Thomas Hellström 
> > > > > >>>
> > > > > >>> For long-running workloads, drivers either need to open-code 
> > > > > >>> completion
> > > > > >>> waits, invent their own synchronization primitives or internally 
> > > > > >>> use
> > > > > >>> dma-fences that do not obey the cross-driver dma-fence protocol, 
> > > > > >>> but
> > > > > >>> without any lockdep annotation all these approaches are error 
> > > > > >>> prone.
> > > > > >>>
> > > > > >>> So since for example the drm scheduler uses dma-fences it is
> > > > > >>> desirable for
> > > > > >>> a driver to be able to use it for throttling and error handling 
> > > > > >>> also
> > > > > >>> with
> > > > > >>> internal dma-fences tha do not obey the cros-driver dma-fence 
> > > > > >>> protocol.
> > > > > >>>
> > > > > >>> Introduce long-running completion fences in form of dma-fences, 
> > > > > >>> and add
> > > > > >>> lockdep annotation for them. In particular:
> > > > > >>>
> > > > > >>> * Do not allow waiting under any memory management locks.
> > > > > >>> * Do not allow to attach them to a dma-resv object.
> > > > > >>> * Introduce a new interface for adding callbacks making the helper
> > > > > >>> adding
> > > > > >>>a callback sign off on that it is aware that the dma-fence may 
> > > > > >>> not
> > > > > >>>complete anytime soon. Typically this will be the scheduler 
> > > > > >>> chaining
> > > > > >>>a new long-running fence on another one.
> > > > > >>
> > > > > >> Well that's pretty much what I tried before:
> > > > > >> https://lwn.net/Articles/893704/
> > > > > >>
> > > > > >> And the reasons why it was rejected haven't changed.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Christian.
> > > > > >>
> > > > > > Yes, TBH this was mostly to get discussion going how we'd best 
> > > > > > tackle
> > > > > > this problem while being able to reuse the scheduler for 
> > > > > > long-running
> > > > > > workloads.
> > > > > >
> > > > > > I couldn't see any clear decision on your series, though, but one 
> > > > > > main
> > > > > > difference I see is that this is intended for driver-internal use
> > > > > > only. (I'm counting using the drm_scheduler as a helper for
> > > > > > driver-private use). This is by no means a way to try tackle the
> > > > > > indefinite fence problem.
> > > > >
> > > > > Well this was just my latest try to tackle this, but essentially the
> > > > > problems are the same as with your approach: When we express such
> > > > > operations as dma_fence there is always the change that we leak that
> > > > > somewhere.
> > > > >
> > > > > My approach of adding a flag noting that this operation is dangerous 
> > > > > and
> > > > > can't be synced with something memory management depends on tried to
> > > > > contain this as much as possible, but Daniel still pretty clearly
> > > > > rejected it (for good reasons I think).
> > > >
> > > > Yeah I still don't like dma_fence that somehow have totally different
> > > > semantics in that critical piece of "will it complete or will it
> > > > deadlock?" :-)
> > >
> > > Not going to touch LR dma-fences in this reply, I think we can continue
> > > the LR fence discussion of the fork of this thread I just responded to.
> > > Have a response the preempt fence discussion below.
> > >
> > > > >
> > > > > >
> > > > > > We could ofc invent a completely different data-type that abstracts
> > > > > > the synchronization the scheduler needs in the long-running case, or
> > > > > > each driver could hack something up, like sleeping in the
> > > > > > prepare_job() or run_job() callback for throttling, but those waits
> > > > > > should still be annotated in one way or annotated one way or another
> > > > > > (and probably in a similar way across drivers) to make sure we don't
> > > > > > do anything bad.
> > > > > >
> > > > > >  So any suggestions as to what would be the better solution here 
> > > > > > would
> > > > > > be appreciated.
> > > > >
> > > > > Mhm, do we really the the GPU scheduler for that?
> > > > >
> > > > > I mean in the 1 to 1 case  you basically just need a component which
> > > > > collects the dependencies as dma_fence and if all of them are 
> > > > > fulfilled
> > > > > schedules a work item.
> > > > >
> > > > > As long as the work item itself doesn't produce a dma_fence it can 
> > > > > then
> > > > > still just wait for other none dma_fence dependencies.
> > > >
> > > > Yeah that's the importan

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Matthew Brost

On Tue, Apr 04, 2023 at 10:11:59PM +0200, Daniel Vetter wrote:
> On Tue, 4 Apr 2023 at 22:04, Matthew Brost  wrote:
> >
> > On Tue, Apr 04, 2023 at 09:00:59PM +0200, Daniel Vetter wrote:
> > > On Tue, 4 Apr 2023 at 15:10, Christian König  
> > > wrote:
> > > >
> > > > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > > > Hi, Christian,
> > > > >
> > > > > On 4/4/23 11:09, Christian König wrote:
> > > > >> Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > > > >>> From: Thomas Hellström 
> > > > >>>
> > > > >>> For long-running workloads, drivers either need to open-code 
> > > > >>> completion
> > > > >>> waits, invent their own synchronization primitives or internally use
> > > > >>> dma-fences that do not obey the cross-driver dma-fence protocol, but
> > > > >>> without any lockdep annotation all these approaches are error prone.
> > > > >>>
> > > > >>> So since for example the drm scheduler uses dma-fences it is
> > > > >>> desirable for
> > > > >>> a driver to be able to use it for throttling and error handling also
> > > > >>> with
> > > > >>> internal dma-fences tha do not obey the cros-driver dma-fence 
> > > > >>> protocol.
> > > > >>>
> > > > >>> Introduce long-running completion fences in form of dma-fences, and 
> > > > >>> add
> > > > >>> lockdep annotation for them. In particular:
> > > > >>>
> > > > >>> * Do not allow waiting under any memory management locks.
> > > > >>> * Do not allow to attach them to a dma-resv object.
> > > > >>> * Introduce a new interface for adding callbacks making the helper
> > > > >>> adding
> > > > >>>a callback sign off on that it is aware that the dma-fence may 
> > > > >>> not
> > > > >>>complete anytime soon. Typically this will be the scheduler 
> > > > >>> chaining
> > > > >>>a new long-running fence on another one.
> > > > >>
> > > > >> Well that's pretty much what I tried before:
> > > > >> https://lwn.net/Articles/893704/
> > > > >>
> > > > >> And the reasons why it was rejected haven't changed.
> > > > >>
> > > > >> Regards,
> > > > >> Christian.
> > > > >>
> > > > > Yes, TBH this was mostly to get discussion going how we'd best tackle
> > > > > this problem while being able to reuse the scheduler for long-running
> > > > > workloads.
> > > > >
> > > > > I couldn't see any clear decision on your series, though, but one main
> > > > > difference I see is that this is intended for driver-internal use
> > > > > only. (I'm counting using the drm_scheduler as a helper for
> > > > > driver-private use). This is by no means a way to try tackle the
> > > > > indefinite fence problem.
> > > >
> > > > Well this was just my latest try to tackle this, but essentially the
> > > > problems are the same as with your approach: When we express such
> > > > operations as dma_fence there is always the change that we leak that
> > > > somewhere.
> > > >
> > > > My approach of adding a flag noting that this operation is dangerous and
> > > > can't be synced with something memory management depends on tried to
> > > > contain this as much as possible, but Daniel still pretty clearly
> > > > rejected it (for good reasons I think).
> > >
> > > Yeah I still don't like dma_fence that somehow have totally different
> > > semantics in that critical piece of "will it complete or will it
> > > deadlock?" :-)
> >
> > Not going to touch LR dma-fences in this reply, I think we can continue
> > the LR fence discussion of the fork of this thread I just responded to.
> > Have a response the preempt fence discussion below.
> >
> > > >
> > > > >
> > > > > We could ofc invent a completely different data-type that abstracts
> > > > > the synchronization the scheduler needs in the long-running case, or
> > > > > each driver could hack something up, like sleeping in the
> > > > > prepare_job() or run_job() callback for throttling, but those waits
> > > > > should still be annotated in one way or annotated one way or another
> > > > > (and probably in a similar way across drivers) to make sure we don't
> > > > > do anything bad.
> > > > >
> > > > >  So any suggestions as to what would be the better solution here would
> > > > > be appreciated.
> > > >
> > > > Mhm, do we really the the GPU scheduler for that?
> > > >
> > > > I mean in the 1 to 1 case  you basically just need a component which
> > > > collects the dependencies as dma_fence and if all of them are fulfilled
> > > > schedules a work item.
> > > >
> > > > As long as the work item itself doesn't produce a dma_fence it can then
> > > > still just wait for other none dma_fence dependencies.
> > >
> > > Yeah that's the important thing, for long-running jobs dependencies as
> > > dma_fence should be totally fine. You're just not allowed to have any
> > > outgoing dma_fences at all (except the magic preemption fence).
> > >
> > > > Then the work function could submit the work and wait for the result.
> > > >
> > > > The work item would then pretty much represent what you want, you can
> > > > wait for it to finish and pass it alo

[PATCH 7/8] video/aperture: Only remove sysfb on the default vga pci device

2023-04-04 Thread Daniel Vetter

Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.

This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
sysfb device registration when removing conflicting FBs"), where we
remove the sysfb when loading a driver for an unrelated pci device,
resulting in the user loosing their efifb console or similar.

Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.

Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15

Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
  drivers do not use drm_aperture_remove_conflicting_framebuffers(),
  so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
  this.

For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.

v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
  have any issues with making all this fully pci specific.

v3
- polish commit message (Javier)

Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing 
conflicting FBs")
Tested-by: Aaron Plattner 
Reviewed-by: Javier Martinez Canillas 
References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
Signed-off-by: Daniel Vetter 
Cc: Aaron Plattner 
Cc: Javier Martinez Canillas 
Cc: Thomas Zimmermann 
Cc: Helge Deller 
Cc: Sam Ravnborg 
Cc: Alex Deucher 
Cc:  # v5.19+ (if someone else does the backport)
---
 drivers/video/aperture.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 8f1437339e49..2394c2d310f8 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct 
pci_dev *pdev, const char *na
 
primary = pdev == vga_default_device();
 
+   if (primary)
+   sysfb_disable();
+
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
continue;
 
base = pci_resource_start(pdev, bar);
size = pci_resource_len(pdev, bar);
-   ret = aperture_remove_conflicting_devices(base, size, name);
-   if (ret)
-   return ret;
+   aperture_detach_devices(base, size);
}
 
if (primary) {
-- 
2.40.0

[PATCH 8/8] fbdev: Simplify fb_is_primary_device for x86

2023-04-04 Thread Daniel Vetter

vga_default_device really is supposed to cover all corners, at least
for x86. Additionally checking for rom shadowing should be redundant,
because the bios/fw only does that for the boot vga device.

If this turns out to be wrong, then most likely that's a special case
which should be added to the vgaarb code, not replicated all over.

Patch motived by changes to the aperture helpers, which also have this
open code in a bunch of places, and which are all removed in a
clean-up series. This function here is just for selecting the default
fbdev device for fbcon, but I figured for consistency it might be good
to throw this patch in on top.

Note that the shadow rom check predates vgaarb, which was only wired
up in 88674088d10c ("x86: Use vga_default_device() when determining
whether an fb is primary"). That patch doesn't explain why we still
fall back to the shadow rom check.

Signed-off-by: Daniel Vetter 
Cc: Daniel Vetter 
Cc: Helge Deller 
Cc: Daniel Vetter 
Cc: Javier Martinez Canillas 
Cc: Thomas Zimmermann 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
---
 arch/x86/video/fbdev.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/x86/video/fbdev.c b/arch/x86/video/fbdev.c
index 9fd24846d094..5ec4eafbb981 100644
--- a/arch/x86/video/fbdev.c
+++ b/arch/x86/video/fbdev.c
@@ -14,26 +14,15 @@
 int fb_is_primary_device(struct fb_info *info)
 {
struct device *device = info->device;
-   struct pci_dev *default_device = vga_default_device();
struct pci_dev *pci_dev;
-   struct resource *res;
 
if (!device || !dev_is_pci(device))
return 0;
 
pci_dev = to_pci_dev(device);
 
-   if (default_device) {
-   if (pci_dev == default_device)
-   return 1;
-   return 0;
-   }
-
-   res = pci_dev->resource + PCI_ROM_RESOURCE;
-
-   if (res->flags & IORESOURCE_ROM_SHADOW)
+   if (pci_dev == vga_default_device())
return 1;
-
return 0;
 }
 EXPORT_SYMBOL(fb_is_primary_device);
-- 
2.40.0

[PATCH 6/8] video/aperture: Drop primary argument

2023-04-04 Thread Daniel Vetter

With the preceeding patches it's become defunct. Also I'm about to add
a different boolean argument, so it's better to keep the confusion
down to the absolute minimum.

v2: Since the hypervfb patch got droppped (it's only a pci device for
gen1 vm, not for gen2) there is one leftover user in an actual driver
left to touch.

Signed-off-by: Daniel Vetter 
Cc: Thomas Zimmermann 
Cc: Javier Martinez Canillas 
Cc: Helge Deller 
Cc: linux-fb...@vger.kernel.org
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Wei Liu 
Cc: Dexuan Cui 
Cc: linux-hyp...@vger.kernel.org
---
 drivers/gpu/drm/drm_aperture.c  | 2 +-
 drivers/video/aperture.c| 7 +++
 drivers/video/fbdev/hyperv_fb.c | 2 +-
 include/linux/aperture.h| 9 -
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_aperture.c b/drivers/gpu/drm/drm_aperture.c
index 697cffbfd603..5729f3bb4398 100644
--- a/drivers/gpu/drm/drm_aperture.c
+++ b/drivers/gpu/drm/drm_aperture.c
@@ -168,7 +168,7 @@ EXPORT_SYMBOL(devm_aperture_acquire_from_firmware);
 int drm_aperture_remove_conflicting_framebuffers(resource_size_t base, 
resource_size_t size,
 const struct drm_driver 
*req_driver)
 {
-   return aperture_remove_conflicting_devices(base, size, false, 
req_driver->name);
+   return aperture_remove_conflicting_devices(base, size, 
req_driver->name);
 }
 EXPORT_SYMBOL(drm_aperture_remove_conflicting_framebuffers);
 
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index ec9387d94049..8f1437339e49 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -43,7 +43,7 @@
  * base = mem->start;
  * size = resource_size(mem);
  *
- * ret = aperture_remove_conflicting_devices(base, size, false, 
"example");
+ * ret = aperture_remove_conflicting_devices(base, size, 
"example");
  * if (ret)
  * return ret;
  *
@@ -274,7 +274,6 @@ static void aperture_detach_devices(resource_size_t base, 
resource_size_t size)
  * aperture_remove_conflicting_devices - remove devices in the given range
  * @base: the aperture's base address in physical memory
  * @size: aperture size in bytes
- * @primary: also kick vga16fb if present; only relevant for VGA devices
  * @name: a descriptive name of the requesting driver
  *
  * This function removes devices that own apertures within @base and @size.
@@ -283,7 +282,7 @@ static void aperture_detach_devices(resource_size_t base, 
resource_size_t size)
  * 0 on success, or a negative errno code otherwise
  */
 int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t 
size,
-   bool primary, const char *name)
+   const char *name)
 {
/*
 * If a driver asked to unregister a platform device registered by
@@ -328,7 +327,7 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev 
*pdev, const char *na
 
base = pci_resource_start(pdev, bar);
size = pci_resource_len(pdev, bar);
-   ret = aperture_remove_conflicting_devices(base, size, primary, 
name);
+   ret = aperture_remove_conflicting_devices(base, size, name);
if (ret)
return ret;
}
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index ec3f6cf05f8c..54f433e09ab8 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -1073,7 +1073,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct 
fb_info *info)
info->screen_size = dio_fb_size;
 
 getmem_done:
-   aperture_remove_conflicting_devices(base, size, false, KBUILD_MODNAME);
+   aperture_remove_conflicting_devices(base, size, KBUILD_MODNAME);
 
if (gen2vm) {
/* framebuffer is reallocated, clear screen_info to avoid 
misuse from kexec */
diff --git a/include/linux/aperture.h b/include/linux/aperture.h
index 442f15a57cad..7248727753be 100644
--- a/include/linux/aperture.h
+++ b/include/linux/aperture.h
@@ -14,7 +14,7 @@ int devm_aperture_acquire_for_platform_device(struct 
platform_device *pdev,
  resource_size_t size);
 
 int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t 
size,
-   bool primary, const char *name);
+   const char *name);
 
 int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char 
*name);
 #else
@@ -26,7 +26,7 @@ static inline int 
devm_aperture_acquire_for_platform_device(struct platform_devi
 }
 
 static inline int aperture_remove_conflicting_devices(resource_size_t base, 
resource_size_t size,
- bool primary, const char 
*name)
+

[PATCH 5/8] video/aperture: Move vga handling to pci function

2023-04-04 Thread Daniel Vetter

A few reasons for this:

- It's really the only one where this matters. I tried looking around,
  and I didn't find any non-pci vga-compatible controllers for x86
  (since that's the only platform where we had this until a few
  patches ago), where a driver participating in the aperture claim
  dance would interfere.

- I also don't expect that any future bus anytime soon will
  not just look like pci towards the OS, that's been the case for like
  25+ years by now for practically everything (even non non-x86).

- Also it's a bit funny if we have one part of the vga removal in the
  pci function, and the other in the generic one.

v2: Rebase.

Signed-off-by: Daniel Vetter 
Cc: Thomas Zimmermann 
Cc: Javier Martinez Canillas 
Cc: Helge Deller 
Cc: linux-fb...@vger.kernel.org
---
 drivers/video/aperture.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 552cffdb827b..ec9387d94049 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -298,14 +298,6 @@ int aperture_remove_conflicting_devices(resource_size_t 
base, resource_size_t si
 
aperture_detach_devices(base, size);
 
-   /*
-* If this is the primary adapter, there could be a VGA device
-* that consumes the VGA framebuffer I/O range. Remove this device
-* as well.
-*/
-   if (primary)
-   aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
-
return 0;
 }
 EXPORT_SYMBOL(aperture_remove_conflicting_devices);
@@ -342,6 +334,13 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev 
*pdev, const char *na
}
 
if (primary) {
+   /*
+* If this is the primary adapter, there could be a VGA device
+* that consumes the VGA framebuffer I/O range. Remove this
+* device as well.
+*/
+   aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
+
/*
 * WARNING: Apparently we must kick fbdev drivers before vgacon,
 * otherwise the vga fbdev driver falls over.
-- 
2.40.0

[PATCH 4/8] video/aperture: Only kick vgacon when the pdev is decoding vga

2023-04-04 Thread Daniel Vetter

Otherwise it's bit silly, and we might throw out the driver for the
screen the user is actually looking at. I haven't found a bug report
for this case yet, but we did get bug reports for the analog case
where we're throwing out the efifb driver.

v2: Flip the check around to make it clear it's a special case for
kicking out the vgacon driver only (Thomas)

References: https://bugzilla.kernel.org/show_bug.cgi?id=216303
Signed-off-by: Daniel Vetter 
Cc: Thomas Zimmermann 
Cc: Javier Martinez Canillas 
Cc: Helge Deller 
Cc: linux-fb...@vger.kernel.org
---
 drivers/video/aperture.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 8835d3bc39bf..552cffdb827b 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -341,13 +341,15 @@ int aperture_remove_conflicting_pci_devices(struct 
pci_dev *pdev, const char *na
return ret;
}
 
-   /*
-* WARNING: Apparently we must kick fbdev drivers before vgacon,
-* otherwise the vga fbdev driver falls over.
-*/
-   ret = vga_remove_vgacon(pdev);
-   if (ret)
-   return ret;
+   if (primary) {
+   /*
+* WARNING: Apparently we must kick fbdev drivers before vgacon,
+* otherwise the vga fbdev driver falls over.
+*/
+   ret = vga_remove_vgacon(pdev);
+   if (ret)
+   return ret;
+   }
 
return 0;
 
-- 
2.40.0

[PATCH 2/8] video/aperture: use generic code to figure out the vga default device

2023-04-04 Thread Daniel Vetter

Since vgaarb has been promoted to be a core piece of the pci subsystem
we don't have to open code random guesses anymore, we actually know
this in a platform agnostic way, and there's no need for an x86
specific hack. See also 1d38fe6ee6a8 ("PCI/VGA: Move vgaarb to
drivers/pci")

This should not result in any functional change, and the non-x86
multi-gpu pci systems are probably rare enough to not matter (I don't
know of any tbh). But it's a nice cleanup, so let's do it.

There's been a few questions on previous iterations on dri-devel and
irc:

- fb_is_primary_device() seems to be yet another implementation of
  this theme, and at least on x86 it checks for both
  vga_default_device OR rom shadowing. There shouldn't ever be a case
  where rom shadowing gives any additional hints about the boot vga
  device, but if there is then the default vga selection in vgaarb
  should probably be fixed. And not special-case checks replicated all
  over.

- Thomas also brought up that on most !x86 systems
  fb_is_primary_device() returns 0, except on sparc/parisc. But these
  2 special cases are about platform specific devices and not pci, so
  shouldn't have any interactions.

- Furthermore fb_is_primary_device() is a bit a red herring since it's
  only used to select the right fbdev driver for fbcon, and not for
  the fw handover dance which the aperture helpers handle. At least
  for x86 we might want to look into unifying them, but that's a
  separate thing.

v2: Extend commit message trying to summarize various discussions.

Signed-off-by: Daniel Vetter 
Cc: Thomas Zimmermann 
Cc: Javier Martinez Canillas 
Cc: Helge Deller 
Cc: linux-fb...@vger.kernel.org
Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org
---
 drivers/video/aperture.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index b009468ffdff..8835d3bc39bf 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -324,13 +324,11 @@ EXPORT_SYMBOL(aperture_remove_conflicting_devices);
  */
 int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char 
*name)
 {
-   bool primary = false;
+   bool primary;
resource_size_t base, size;
int bar, ret;
 
-#ifdef CONFIG_X86
-   primary = pdev->resource[PCI_ROM_RESOURCE].flags & 
IORESOURCE_ROM_SHADOW;
-#endif
+   primary = pdev == vga_default_device();
 
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
-- 
2.40.0

[PATCH 3/8] drm/aperture: Remove primary argument

2023-04-04 Thread Daniel Vetter

Only really pci devices have a business setting this - it's for
figuring out whether the legacy vga stuff should be nuked too. And
with the preceeding two patches those are all using the pci version of
this.

Which means for all other callers primary == false and we can remove
it now.

v2:
- Reorder to avoid compile fail (Thomas)
- Include gma500, which retained it's called to the non-pci version.

Signed-off-by: Daniel Vetter 
Cc: Thomas Zimmermann 
Cc: Javier Martinez Canillas 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Deepak Rawat 
Cc: Neil Armstrong 
Cc: Kevin Hilman 
Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: Thierry Reding 
Cc: Jonathan Hunter 
Cc: Emma Anholt 
Cc: Helge Deller 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: linux-hyp...@vger.kernel.org
Cc: linux-amlo...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-te...@vger.kernel.org
Cc: linux-fb...@vger.kernel.org
---
 drivers/gpu/drm/arm/hdlcd_drv.c |  2 +-
 drivers/gpu/drm/armada/armada_drv.c |  2 +-
 drivers/gpu/drm/drm_aperture.c  | 11 +++
 drivers/gpu/drm/gma500/psb_drv.c|  2 +-
 drivers/gpu/drm/hyperv/hyperv_drm_drv.c |  1 -
 drivers/gpu/drm/meson/meson_drv.c   |  2 +-
 drivers/gpu/drm/msm/msm_fbdev.c |  2 +-
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c |  2 +-
 drivers/gpu/drm/stm/drv.c   |  2 +-
 drivers/gpu/drm/sun4i/sun4i_drv.c   |  2 +-
 drivers/gpu/drm/tegra/drm.c |  2 +-
 drivers/gpu/drm/vc4/vc4_drv.c   |  2 +-
 include/drm/drm_aperture.h  |  7 +++
 13 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c
index 9020bf820bc8..12f5a2c7f03d 100644
--- a/drivers/gpu/drm/arm/hdlcd_drv.c
+++ b/drivers/gpu/drm/arm/hdlcd_drv.c
@@ -285,7 +285,7 @@ static int hdlcd_drm_bind(struct device *dev)
 */
if (hdlcd_read(hdlcd, HDLCD_REG_COMMAND)) {
hdlcd_write(hdlcd, HDLCD_REG_COMMAND, 0);
-   drm_aperture_remove_framebuffers(false, &hdlcd_driver);
+   drm_aperture_remove_framebuffers(&hdlcd_driver);
}
 
drm_mode_config_reset(drm);
diff --git a/drivers/gpu/drm/armada/armada_drv.c 
b/drivers/gpu/drm/armada/armada_drv.c
index 0643887800b4..c99ec7078301 100644
--- a/drivers/gpu/drm/armada/armada_drv.c
+++ b/drivers/gpu/drm/armada/armada_drv.c
@@ -95,7 +95,7 @@ static int armada_drm_bind(struct device *dev)
}
 
/* Remove early framebuffers */
-   ret = drm_aperture_remove_framebuffers(false, &armada_drm_driver);
+   ret = drm_aperture_remove_framebuffers(&armada_drm_driver);
if (ret) {
dev_err(dev, "[" DRM_NAME ":%s] can't kick out simple-fb: %d\n",
__func__, ret);
diff --git a/drivers/gpu/drm/drm_aperture.c b/drivers/gpu/drm/drm_aperture.c
index 3b8fdeeafd53..697cffbfd603 100644
--- a/drivers/gpu/drm/drm_aperture.c
+++ b/drivers/gpu/drm/drm_aperture.c
@@ -32,17 +32,13 @@
  *
  * static int remove_conflicting_framebuffers(struct pci_dev *pdev)
  * {
- * bool primary = false;
  * resource_size_t base, size;
  * int ret;
  *
  * base = pci_resource_start(pdev, 0);
  * size = pci_resource_len(pdev, 0);
- * #ifdef CONFIG_X86
- * primary = pdev->resource[PCI_ROM_RESOURCE].flags & 
IORESOURCE_ROM_SHADOW;
- * #endif
  *
- * return drm_aperture_remove_conflicting_framebuffers(base, size, 
primary,
+ * return drm_aperture_remove_conflicting_framebuffers(base, size,
  * 
&example_driver);
  * }
  *
@@ -161,7 +157,6 @@ EXPORT_SYMBOL(devm_aperture_acquire_from_firmware);
  * drm_aperture_remove_conflicting_framebuffers - remove existing framebuffers 
in the given range
  * @base: the aperture's base address in physical memory
  * @size: aperture size in bytes
- * @primary: also kick vga16fb if present
  * @req_driver: requesting DRM driver
  *
  * This function removes graphics device drivers which use the memory range 
described by
@@ -171,9 +166,9 @@ EXPORT_SYMBOL(devm_aperture_acquire_from_firmware);
  * 0 on success, or a negative errno code otherwise
  */
 int drm_aperture_remove_conflicting_framebuffers(resource_size_t base, 
resource_size_t size,
-bool primary, const struct 
drm_driver *req_driver)
+const struct drm_driver 
*req_driver)
 {
-   return aperture_remove_conflicting_devices(base, size, primary, 
req_driver->name);
+   return aperture_remove_conflicting_devices(base, size, false, 
req_driver->name);
 }
 EXPORT_SYMBOL(drm_aperture_remove_conflicting_framebuffers);
 
diff --git a/drivers/gpu/drm/gma500/psb_drv.c b/drivers/gpu/drm/gma500/psb_drv.c
index f1e0eed8fea4..4bb06a89e48d 10064

[PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-04 Thread Daniel Vetter

This one nukes all framebuffers, which is a bit much. In reality
gma500 is igpu and never shipped with anything discrete, so there should
not be any difference.

v2: Unfortunately the framebuffer sits outside of the pci bars for
gma500, and so only using the pci helpers won't be enough. Otoh if we
only use non-pci helper, then we don't get the vga handling, and
subsequent refactoring to untangle these special cases won't work.

It's not pretty, but the simplest fix (since gma500 really is the only
quirky pci driver like this we have) is to just have both calls.

Signed-off-by: Daniel Vetter 
Cc: Patrik Jakobsson 
Cc: Thomas Zimmermann 
Cc: Javier Martinez Canillas 
---
 drivers/gpu/drm/gma500/psb_drv.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/gma500/psb_drv.c b/drivers/gpu/drm/gma500/psb_drv.c
index 2ce96b1b9c74..f1e0eed8fea4 100644
--- a/drivers/gpu/drm/gma500/psb_drv.c
+++ b/drivers/gpu/drm/gma500/psb_drv.c
@@ -422,12 +422,17 @@ static int psb_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 
/*
 * We cannot yet easily find the framebuffer's location in memory. So
-* remove all framebuffers here.
+* remove all framebuffers here. Note that we still want the pci special
+* handling to kick out vgacon.
 *
 * TODO: Refactor psb_driver_load() to map vdc_reg earlier. Then we
 *   might be able to read the framebuffer range from the device.
 */
-   ret = drm_aperture_remove_framebuffers(true, &driver);
+   ret = drm_aperture_remove_framebuffers(false, &driver);
+   if (ret)
+   return ret;
+
+   ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev, &driver);
if (ret)
return ret;
 
-- 
2.40.0

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Daniel Vetter

On Tue, 4 Apr 2023 at 22:04, Matthew Brost  wrote:
>
> On Tue, Apr 04, 2023 at 09:00:59PM +0200, Daniel Vetter wrote:
> > On Tue, 4 Apr 2023 at 15:10, Christian König  
> > wrote:
> > >
> > > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > > Hi, Christian,
> > > >
> > > > On 4/4/23 11:09, Christian König wrote:
> > > >> Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > > >>> From: Thomas Hellström 
> > > >>>
> > > >>> For long-running workloads, drivers either need to open-code 
> > > >>> completion
> > > >>> waits, invent their own synchronization primitives or internally use
> > > >>> dma-fences that do not obey the cross-driver dma-fence protocol, but
> > > >>> without any lockdep annotation all these approaches are error prone.
> > > >>>
> > > >>> So since for example the drm scheduler uses dma-fences it is
> > > >>> desirable for
> > > >>> a driver to be able to use it for throttling and error handling also
> > > >>> with
> > > >>> internal dma-fences tha do not obey the cros-driver dma-fence 
> > > >>> protocol.
> > > >>>
> > > >>> Introduce long-running completion fences in form of dma-fences, and 
> > > >>> add
> > > >>> lockdep annotation for them. In particular:
> > > >>>
> > > >>> * Do not allow waiting under any memory management locks.
> > > >>> * Do not allow to attach them to a dma-resv object.
> > > >>> * Introduce a new interface for adding callbacks making the helper
> > > >>> adding
> > > >>>a callback sign off on that it is aware that the dma-fence may not
> > > >>>complete anytime soon. Typically this will be the scheduler 
> > > >>> chaining
> > > >>>a new long-running fence on another one.
> > > >>
> > > >> Well that's pretty much what I tried before:
> > > >> https://lwn.net/Articles/893704/
> > > >>
> > > >> And the reasons why it was rejected haven't changed.
> > > >>
> > > >> Regards,
> > > >> Christian.
> > > >>
> > > > Yes, TBH this was mostly to get discussion going how we'd best tackle
> > > > this problem while being able to reuse the scheduler for long-running
> > > > workloads.
> > > >
> > > > I couldn't see any clear decision on your series, though, but one main
> > > > difference I see is that this is intended for driver-internal use
> > > > only. (I'm counting using the drm_scheduler as a helper for
> > > > driver-private use). This is by no means a way to try tackle the
> > > > indefinite fence problem.
> > >
> > > Well this was just my latest try to tackle this, but essentially the
> > > problems are the same as with your approach: When we express such
> > > operations as dma_fence there is always the change that we leak that
> > > somewhere.
> > >
> > > My approach of adding a flag noting that this operation is dangerous and
> > > can't be synced with something memory management depends on tried to
> > > contain this as much as possible, but Daniel still pretty clearly
> > > rejected it (for good reasons I think).
> >
> > Yeah I still don't like dma_fence that somehow have totally different
> > semantics in that critical piece of "will it complete or will it
> > deadlock?" :-)
>
> Not going to touch LR dma-fences in this reply, I think we can continue
> the LR fence discussion of the fork of this thread I just responded to.
> Have a response the preempt fence discussion below.
>
> > >
> > > >
> > > > We could ofc invent a completely different data-type that abstracts
> > > > the synchronization the scheduler needs in the long-running case, or
> > > > each driver could hack something up, like sleeping in the
> > > > prepare_job() or run_job() callback for throttling, but those waits
> > > > should still be annotated in one way or annotated one way or another
> > > > (and probably in a similar way across drivers) to make sure we don't
> > > > do anything bad.
> > > >
> > > >  So any suggestions as to what would be the better solution here would
> > > > be appreciated.
> > >
> > > Mhm, do we really the the GPU scheduler for that?
> > >
> > > I mean in the 1 to 1 case  you basically just need a component which
> > > collects the dependencies as dma_fence and if all of them are fulfilled
> > > schedules a work item.
> > >
> > > As long as the work item itself doesn't produce a dma_fence it can then
> > > still just wait for other none dma_fence dependencies.
> >
> > Yeah that's the important thing, for long-running jobs dependencies as
> > dma_fence should be totally fine. You're just not allowed to have any
> > outgoing dma_fences at all (except the magic preemption fence).
> >
> > > Then the work function could submit the work and wait for the result.
> > >
> > > The work item would then pretty much represent what you want, you can
> > > wait for it to finish and pass it along as long running dependency.
> > >
> > > Maybe give it a funky name and wrap it up in a structure, but that's
> > > basically it.
> >
> > Like do we need this? If the kernel ever waits for a long-running
> > compute job to finnish I'd call that a bug. Any functional
> > dependencies

[PATCH RESEND v3 1/3] drm/ttm/pool: Fix ttm_pool_alloc error path

2023-04-04 Thread Thomas Hellström

When hitting an error, the error path forgot to unmap dma mappings and
could call set_pages_wb() on already uncached pages.

Fix this by introducing a common ttm_pool_free_range() function that
does the right thing.

v2:
- Simplify that common function (Christian König)
v3:
- Rename that common function to ttm_pool_free_range() (Christian König)

Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
Cc: Christian König 
Cc: Dave Airlie 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 81 +-
 1 file changed, 51 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index aa116a7bbae3..dfce896c4bae 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -367,6 +367,43 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, 
unsigned int order,
return 0;
 }
 
+/**
+ * ttm_pool_free_range() - Free a range of TTM pages
+ * @pool: The pool used for allocating.
+ * @tt: The struct ttm_tt holding the page pointers.
+ * @caching: The page caching mode used by the range.
+ * @start_page: index for first page to free.
+ * @end_page: index for last page to free + 1.
+ *
+ * During allocation the ttm_tt page-vector may be populated with ranges of
+ * pages with different attributes if allocation hit an error without being
+ * able to completely fulfill the allocation. This function can be used
+ * to free these individual ranges.
+ */
+static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt,
+   enum ttm_caching caching,
+   pgoff_t start_page, pgoff_t end_page)
+{
+   struct page **pages = tt->pages;
+   unsigned int order;
+   pgoff_t i, nr;
+
+   for (i = start_page; i < end_page; i += nr, pages += nr) {
+   struct ttm_pool_type *pt = NULL;
+
+   order = ttm_pool_page_order(pool, *pages);
+   nr = (1UL << order);
+   if (tt->dma_address)
+   ttm_pool_unmap(pool, tt->dma_address[i], nr);
+
+   pt = ttm_pool_select_type(pool, caching, order);
+   if (pt)
+   ttm_pool_type_give(pt, *pages);
+   else
+   ttm_pool_free_page(pool, caching, order, *pages);
+   }
+}
+
 /**
  * ttm_pool_alloc - Fill a ttm_tt object
  *
@@ -382,12 +419,14 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, 
unsigned int order,
 int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
   struct ttm_operation_ctx *ctx)
 {
-   unsigned long num_pages = tt->num_pages;
+   pgoff_t num_pages = tt->num_pages;
dma_addr_t *dma_addr = tt->dma_address;
struct page **caching = tt->pages;
struct page **pages = tt->pages;
+   enum ttm_caching page_caching;
gfp_t gfp_flags = GFP_USER;
-   unsigned int i, order;
+   pgoff_t caching_divide;
+   unsigned int order;
struct page *p;
int r;
 
@@ -410,6 +449,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 order = min_t(unsigned int, order, __fls(num_pages))) {
struct ttm_pool_type *pt;
 
+   page_caching = tt->caching;
pt = ttm_pool_select_type(pool, tt->caching, order);
p = pt ? ttm_pool_type_take(pt) : NULL;
if (p) {
@@ -418,6 +458,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
if (r)
goto error_free_page;
 
+   caching = pages;
do {
r = ttm_pool_page_allocated(pool, order, p,
&dma_addr,
@@ -426,14 +467,15 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt 
*tt,
if (r)
goto error_free_page;
 
+   caching = pages;
if (num_pages < (1 << order))
break;
 
p = ttm_pool_type_take(pt);
} while (p);
-   caching = pages;
}
 
+   page_caching = ttm_cached;
while (num_pages >= (1 << order) &&
   (p = ttm_pool_alloc_page(pool, gfp_flags, order))) {
 
@@ -442,6 +484,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
   tt->caching);
if (r)
goto error_free_page;
+   caching = pages;
}
r = ttm_pool_page_allocated(pool, order

[PATCH RESEND v3 3/3] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting

2023-04-04 Thread Thomas Hellström

When swapping in, or under memory pressure ttm_tt_populate() may sleep
for a substantiable amount of time. Allow interrupts during the sleep.
This will also allow us to inject -EINTR errors during swapin in upcoming
patches.

Also avoid returning VM_FAULT_OOM, since that will confuse the core
mm, making it print out a confused message and retrying the fault.
Return VM_FAULT_SIGBUS also under OOM conditions.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index ca7744b852f5..4bca6b54520a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -218,14 +218,21 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
prot = ttm_io_prot(bo, bo->resource, prot);
if (!bo->resource->bus.is_iomem) {
struct ttm_operation_ctx ctx = {
-   .interruptible = false,
+   .interruptible = true,
.no_wait_gpu = false,
.force_alloc = true
};
 
ttm = bo->ttm;
-   if (ttm_tt_populate(bdev, bo->ttm, &ctx))
-   return VM_FAULT_OOM;
+   err = ttm_tt_populate(bdev, bo->ttm, &ctx);
+   if (err) {
+   if (err == -EINTR || err == -ERESTARTSYS ||
+   err == -EAGAIN)
+   return VM_FAULT_NOPAGE;
+
+   pr_debug("TTM fault hit %pe.\n", ERR_PTR(err));
+   return VM_FAULT_SIGBUS;
+   }
} else {
/* Iomem should not be marked encrypted */
prot = pgprot_decrypted(prot);
-- 
2.39.2

[PATCH RESEND v3 2/3] drm/ttm: Reduce the number of used allocation orders for TTM pages

2023-04-04 Thread Thomas Hellström

When swapping out, we will split multi-order pages both in order to
move them to the swap-cache and to be able to return memory to the
swap cache as soon as possible on a page-by-page basis.
Reduce the page max order to the system PMD size, as we can then be nicer
to the system and avoid splitting gigantic pages.

Looking forward to when we might be able to swap out PMD size folios
without splitting, this will also be a benefit.

v2:
- Include all orders up to the PMD size (Christian König)
v3:
- Avoid compilation errors for architectures with special PFN_SHIFTs

Signed-off-by: Thomas Hellström 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index dfce896c4bae..18c342a919a2 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -47,6 +47,11 @@
 
 #include "ttm_module.h"
 
+#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
+#define __TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
+/* Some architectures have a weird PMD_SHIFT */
+#define TTM_DIM_ORDER (__TTM_DIM_ORDER <= MAX_ORDER ? __TTM_DIM_ORDER : 
MAX_ORDER)
+
 /**
  * struct ttm_pool_dma - Helper object for coherent DMA mappings
  *
@@ -65,11 +70,11 @@ module_param(page_pool_size, ulong, 0644);
 
 static atomic_long_t allocated_pages;
 
-static struct ttm_pool_type global_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
 
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
@@ -444,7 +449,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
else
gfp_flags |= GFP_HIGHUSER;
 
-   for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
+   for (order = min_t(unsigned int, TTM_MAX_ORDER, __fls(num_pages));
 num_pages;
 order = min_t(unsigned int, order, __fls(num_pages))) {
struct ttm_pool_type *pt;
@@ -563,7 +568,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device 
*dev,
 
if (use_dma_alloc) {
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-   for (j = 0; j < MAX_ORDER; ++j)
+   for (j = 0; j < TTM_DIM_ORDER; ++j)
ttm_pool_type_init(&pool->caching[i].orders[j],
   pool, i, j);
}
@@ -583,7 +588,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
 
if (pool->use_dma_alloc) {
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-   for (j = 0; j < MAX_ORDER; ++j)
+   for (j = 0; j < TTM_DIM_ORDER; ++j)
ttm_pool_type_fini(&pool->caching[i].orders[j]);
}
 
@@ -637,7 +642,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
unsigned int i;
 
seq_puts(m, "\t ");
-   for (i = 0; i < MAX_ORDER; ++i)
+   for (i = 0; i < TTM_DIM_ORDER; ++i)
seq_printf(m, " ---%2u---", i);
seq_puts(m, "\n");
 }
@@ -648,7 +653,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type 
*pt,
 {
unsigned int i;
 
-   for (i = 0; i < MAX_ORDER; ++i)
+   for (i = 0; i < TTM_DIM_ORDER; ++i)
seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
seq_puts(m, "\n");
 }
@@ -751,13 +756,16 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 {
unsigned int i;
 
+   BUILD_BUG_ON(TTM_DIM_ORDER > MAX_ORDER);
+   BUILD_BUG_ON(TTM_DIM_ORDER < 1);
+
if (!page_pool_size)
page_pool_size = num_pages;
 
spin_lock_init(&shrinker_lock);
INIT_LIST_HEAD(&shrinker_list);
 
-   for (i = 0; i < MAX_ORDER; ++i) {
+   for (i = 0; i < TTM_DIM_ORDER; ++i) {
ttm_pool_type_init(&global_write_combined[i], NULL,
   ttm_write_combined, i);
ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
@@ -790,7 +798,7 @@ void ttm_pool_mgr_fini(void)
 {
unsigned int i;
 
-   for (i = 0; i < MAX_ORDER; ++i) {
+   for (i = 0; i < TTM_DIM_ORDER; ++i) {
ttm_pool_type_fini(&global_write_combined[i]);
ttm_pool_type_fini(&global_uncached[i]);
 
-- 
2.39.2

[PATCH RESEND v3 0/3] drm/ttm: Small fixes / cleanups in prep for shrinking

2023-04-04 Thread Thomas Hellström

I collected the, from my POW, uncontroversial patches from V1 of the TTM
shrinker series, some corrected after the initial patch submission, one
patch added from the Xe RFC ("drm/ttm: Don't print error message if
eviction was interrupted"). It would be nice to have these reviewed and
merged while reworking the rest.

v2:
- Simplify __ttm_pool_free().
- Fix the TTM_TT_FLAG bit numbers.
- Keep all allocation orders for TTM pages at or below PMD order

v3:
- Rename __tm_pool_free() to ttm_pool_free_range(). Document.
- Compile-fix.

Thomas Hellström (3):
  drm/ttm/pool: Fix ttm_pool_alloc error path
  drm/ttm: Reduce the number of used allocation orders for TTM pages
  drm/ttm: Make the call to ttm_tt_populate() interruptible when
faulting

 drivers/gpu/drm/ttm/ttm_bo_vm.c |  13 +++-
 drivers/gpu/drm/ttm/ttm_pool.c  | 111 
 2 files changed, 80 insertions(+), 44 deletions(-)

-- 
2.39.2

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Matthew Brost

On Tue, Apr 04, 2023 at 09:00:59PM +0200, Daniel Vetter wrote:
> On Tue, 4 Apr 2023 at 15:10, Christian König  wrote:
> >
> > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > Hi, Christian,
> > >
> > > On 4/4/23 11:09, Christian König wrote:
> > >> Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > >>> From: Thomas Hellström 
> > >>>
> > >>> For long-running workloads, drivers either need to open-code completion
> > >>> waits, invent their own synchronization primitives or internally use
> > >>> dma-fences that do not obey the cross-driver dma-fence protocol, but
> > >>> without any lockdep annotation all these approaches are error prone.
> > >>>
> > >>> So since for example the drm scheduler uses dma-fences it is
> > >>> desirable for
> > >>> a driver to be able to use it for throttling and error handling also
> > >>> with
> > >>> internal dma-fences tha do not obey the cros-driver dma-fence protocol.
> > >>>
> > >>> Introduce long-running completion fences in form of dma-fences, and add
> > >>> lockdep annotation for them. In particular:
> > >>>
> > >>> * Do not allow waiting under any memory management locks.
> > >>> * Do not allow to attach them to a dma-resv object.
> > >>> * Introduce a new interface for adding callbacks making the helper
> > >>> adding
> > >>>a callback sign off on that it is aware that the dma-fence may not
> > >>>complete anytime soon. Typically this will be the scheduler chaining
> > >>>a new long-running fence on another one.
> > >>
> > >> Well that's pretty much what I tried before:
> > >> https://lwn.net/Articles/893704/
> > >>
> > >> And the reasons why it was rejected haven't changed.
> > >>
> > >> Regards,
> > >> Christian.
> > >>
> > > Yes, TBH this was mostly to get discussion going how we'd best tackle
> > > this problem while being able to reuse the scheduler for long-running
> > > workloads.
> > >
> > > I couldn't see any clear decision on your series, though, but one main
> > > difference I see is that this is intended for driver-internal use
> > > only. (I'm counting using the drm_scheduler as a helper for
> > > driver-private use). This is by no means a way to try tackle the
> > > indefinite fence problem.
> >
> > Well this was just my latest try to tackle this, but essentially the
> > problems are the same as with your approach: When we express such
> > operations as dma_fence there is always the change that we leak that
> > somewhere.
> >
> > My approach of adding a flag noting that this operation is dangerous and
> > can't be synced with something memory management depends on tried to
> > contain this as much as possible, but Daniel still pretty clearly
> > rejected it (for good reasons I think).
> 
> Yeah I still don't like dma_fence that somehow have totally different
> semantics in that critical piece of "will it complete or will it
> deadlock?" :-)

Not going to touch LR dma-fences in this reply, I think we can continue
the LR fence discussion of the fork of this thread I just responded to.
Have a response the preempt fence discussion below.

> >
> > >
> > > We could ofc invent a completely different data-type that abstracts
> > > the synchronization the scheduler needs in the long-running case, or
> > > each driver could hack something up, like sleeping in the
> > > prepare_job() or run_job() callback for throttling, but those waits
> > > should still be annotated in one way or annotated one way or another
> > > (and probably in a similar way across drivers) to make sure we don't
> > > do anything bad.
> > >
> > >  So any suggestions as to what would be the better solution here would
> > > be appreciated.
> >
> > Mhm, do we really the the GPU scheduler for that?
> >
> > I mean in the 1 to 1 case  you basically just need a component which
> > collects the dependencies as dma_fence and if all of them are fulfilled
> > schedules a work item.
> >
> > As long as the work item itself doesn't produce a dma_fence it can then
> > still just wait for other none dma_fence dependencies.
> 
> Yeah that's the important thing, for long-running jobs dependencies as
> dma_fence should be totally fine. You're just not allowed to have any
> outgoing dma_fences at all (except the magic preemption fence).
> 
> > Then the work function could submit the work and wait for the result.
> >
> > The work item would then pretty much represent what you want, you can
> > wait for it to finish and pass it along as long running dependency.
> >
> > Maybe give it a funky name and wrap it up in a structure, but that's
> > basically it.
> 
> Like do we need this? If the kernel ever waits for a long-running
> compute job to finnish I'd call that a bug. Any functional
> dependencies between engines or whatever are userspace's problem only,
> which it needs to sort out using userspace memory fences.
> 
> The only things the kernel needs are some way to track dependencies as
> dma_fence (because memory management move the memory away and we need
> to move it back in, ideally pipelined)

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Matthew Brost

On Tue, Apr 04, 2023 at 09:25:52PM +0200, Daniel Vetter wrote:
> On Tue, Apr 04, 2023 at 07:02:23PM +, Matthew Brost wrote:
> > On Tue, Apr 04, 2023 at 08:14:01PM +0200, Thomas Hellström (Intel) wrote:
> > > 
> > > On 4/4/23 15:10, Christian König wrote:
> > > > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > > > Hi, Christian,
> > > > > 
> > > > > On 4/4/23 11:09, Christian König wrote:
> > > > > > Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > > > > > > From: Thomas Hellström 
> > > > > > > 
> > > > > > > For long-running workloads, drivers either need to open-code
> > > > > > > completion
> > > > > > > waits, invent their own synchronization primitives or internally 
> > > > > > > use
> > > > > > > dma-fences that do not obey the cross-driver dma-fence protocol, 
> > > > > > > but
> > > > > > > without any lockdep annotation all these approaches are error 
> > > > > > > prone.
> > > > > > > 
> > > > > > > So since for example the drm scheduler uses dma-fences it is
> > > > > > > desirable for
> > > > > > > a driver to be able to use it for throttling and error
> > > > > > > handling also with
> > > > > > > internal dma-fences tha do not obey the cros-driver
> > > > > > > dma-fence protocol.
> > > > > > > 
> > > > > > > Introduce long-running completion fences in form of
> > > > > > > dma-fences, and add
> > > > > > > lockdep annotation for them. In particular:
> > > > > > > 
> > > > > > > * Do not allow waiting under any memory management locks.
> > > > > > > * Do not allow to attach them to a dma-resv object.
> > > > > > > * Introduce a new interface for adding callbacks making the
> > > > > > > helper adding
> > > > > > >    a callback sign off on that it is aware that the dma-fence may 
> > > > > > > not
> > > > > > >    complete anytime soon. Typically this will be the
> > > > > > > scheduler chaining
> > > > > > >    a new long-running fence on another one.
> > > > > > 
> > > > > > Well that's pretty much what I tried before:
> > > > > > https://lwn.net/Articles/893704/
> > > > > > 
> > 
> > I don't think this quite the same, this explictly enforces that we don't
> > break the dma-fence rules (in path of memory allocations, exported in
> > any way), essentially this just SW sync point reusing dma-fence the
> > infrastructure for signaling / callbacks. I believe your series tried to
> > export these fences to user space (admittedly I haven't fully read your
> > series).
> > 
> > In this use case we essentially just want to flow control the ring via
> > the dma-scheduler + maintain a list of pending jobs so the TDR can be
> > used for cleanup if LR entity encounters an error. To me this seems
> > perfectly reasonable but I know dma-femce rules are akin to a holy war.
> > 
> > If we return NULL in run_job, now we have to be able to sink all jobs
> > in the backend regardless on ring space, maintain a list of jobs pending
> > for cleanup after errors, and write a different cleanup path as now the
> > TDR doesn't work. Seems very, very silly to duplicate all of this code
> > when the DRM scheduler provides all of this for us. Also if we go this
> > route, now all drivers are going to invent ways to handle LR jobs /w the
> > DRM scheduler.
> > 
> > This solution is pretty clear, mark the scheduler as LR, and don't
> > export any fences from the scheduler. If you try to export these fences
> > a blow up happens.
> 
> The problem is if you mix things up. Like for resets you need all the
> schedulers on an engine/set-of-engines to quiescent or things get
> potentially hilarious. If you now have a scheduler in forever limbo, the
> dma_fence guarantees are right out the window.
> 

Right, a GT reset on Xe is:

Stop all schedulers
Do a reset
Ban any schedulers which we think caused the GT reset
Resubmit all schedulers which we think were good
Restart all schedulers

None of this flow depends on LR dma-fences, all of this uses the DRM
sched infrastructure and work very well compared to the i915. Rewriting
all this with a driver specific implementation is what we are trying to
avoid.

Similarly if LR entity hangs on its own (not a GT reset, rather the
firmware does the reset for us) we use all the DRM scheduler
infrastructure to handle this. Again this works rather well...

> But the issue you're having is fairly specific if it's just about
> ringspace. I think the dumbest fix is to just block in submit if you run
> out of per-ctx ringspace, and call it a day. This notion that somehow the

How does that not break the dma-fence rules? A job can publish its
finished fence after ARM, if the finished fence fence waits on ring
space that may not free up in a reasonable amount of time we now have
broken the dma-dence rules. My understanding is any dma-fence must only
on other dma-fence, Christian seems to agree and NAK'd just blocking if
no space available [1]. IMO this series ensures we don't break dma-fence
rules by restricting how the finished fence can be used.

> kernel is supposed to provide a bottomle

[PATCH 2/3] drm/fb-helper: drop redundant pixclock check from drm_fb_helper_set_par()

2023-04-04 Thread Daniel Vetter

The fb_check_var hook is supposed to validate all this stuff. Any
errors from fb_set_par are considered driver/hw issues and resulting
in dmesg warnings.

Luckily we do fix up the pixclock already, so this is all fine.

Signed-off-by: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_fb_helper.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index eb4ece3e0027..b9696d13a741 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -1647,11 +1647,6 @@ int drm_fb_helper_set_par(struct fb_info *info)
if (oops_in_progress)
return -EBUSY;
 
-   if (var->pixclock != 0) {
-   drm_err(fb_helper->dev, "PIXEL CLOCK SET\n");
-   return -EINVAL;
-   }
-
/*
 * Normally we want to make sure that a kms master takes precedence over
 * fbdev, to avoid fbdev flickering and occasionally stealing the
-- 
2.40.0

[PATCH 1/3] drm/fb-helper: set x/yres_virtual in drm_fb_helper_check_var

2023-04-04 Thread Daniel Vetter

Drivers are supposed to fix this up if needed if they don't outright
reject it. Uncovered by 6c11df58fd1a ("fbmem: Check virtual screen
sizes in fb_set_var()").

Reported-by: syzbot+20dcf81733d43ddff...@syzkaller.appspotmail.com
Link: 
https://syzkaller.appspot.com/bug?id=c5faf983bfa4a607de530cd3bb00bf06cefc
Cc: sta...@vger.kernel.org # v5.4+
Cc: Daniel Vetter 
Cc: Javier Martinez Canillas 
Cc: Thomas Zimmermann 
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/drm_fb_helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 59409820f424..eb4ece3e0027 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -1595,6 +1595,9 @@ int drm_fb_helper_check_var(struct fb_var_screeninfo *var,
return -EINVAL;
}
 
+   var->xres_virtual = fb->width;
+   var->yres_virtual = fb->height;
+
/*
 * Workaround for SDL 1.2, which is known to be setting all pixel format
 * fields values to zero in some cases. We treat this situation as a
-- 
2.40.0

[PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-04 Thread Daniel Vetter

Apparently drivers need to check all this stuff themselves, which for
most things makes sense I guess. And for everything else we luck out,
because modern distros stopped supporting any other fbdev drivers than
drm ones and I really don't want to argue anymore about who needs to
check stuff. Therefore fixing all this just for drm fbdev emulation is
good enough.

Note that var->active is not set or validated. This is just control
flow for fbmem.c and needs to be validated in there as needed.

Signed-off-by: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_fb_helper.c | 49 +
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index b9696d13a741..ef4eb8b12766 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -1543,6 +1543,27 @@ static void drm_fb_helper_fill_pixel_fmt(struct 
fb_var_screeninfo *var,
}
 }
 
+static void __fill_var(struct fb_var_screeninfo *var,
+  struct drm_framebuffer *fb)
+{
+   int i;
+
+   var->xres_virtual = fb->width;
+   var->yres_virtual = fb->height;
+   var->accel_flags = FB_ACCELF_TEXT;
+   var->bits_per_pixel = drm_format_info_bpp(fb->format, 0);
+
+   var->height = var->width = 0;
+   var->left_margin = var->right_margin = 0;
+   var->upper_margin = var->lower_margin = 0;
+   var->hsync_len = var->vsync_len = 0;
+   var->sync = var->vmode = 0;
+   var->rotate = 0;
+   var->colorspace = 0;
+   for (i = 0; i < 4; i++)
+   var->reserved[i] = 0;
+}
+
 /**
  * drm_fb_helper_check_var - implementation for &fb_ops.fb_check_var
  * @var: screeninfo to check
@@ -1595,8 +1616,22 @@ int drm_fb_helper_check_var(struct fb_var_screeninfo 
*var,
return -EINVAL;
}
 
-   var->xres_virtual = fb->width;
-   var->yres_virtual = fb->height;
+   __fill_var(var, fb);
+
+   /*
+* fb_pan_display() validates this, but fb_set_par() doesn't and just
+* falls over. Note that __fill_var above adjusts y/res_virtual.
+*/
+   if (var->yoffset > var->yres_virtual - var->yres ||
+   var->xoffset > var->xres_virtual - var->xres)
+   return -EINVAL;
+
+   /* We neither support grayscale nor FOURCC (also stored in here). */
+   if (var->grayscale > 0)
+   return -EINVAL;
+
+   if (var->nonstd)
+   return -EINVAL;
 
/*
 * Workaround for SDL 1.2, which is known to be setting all pixel format
@@ -1612,11 +1647,6 @@ int drm_fb_helper_check_var(struct fb_var_screeninfo 
*var,
drm_fb_helper_fill_pixel_fmt(var, format);
}
 
-   /*
-* Likewise, bits_per_pixel should be rounded up to a supported value.
-*/
-   var->bits_per_pixel = bpp;
-
/*
 * drm fbdev emulation doesn't support changing the pixel format at all,
 * so reject all pixel format changing requests.
@@ -2040,12 +2070,9 @@ static void drm_fb_helper_fill_var(struct fb_info *info,
}
 
info->pseudo_palette = fb_helper->pseudo_palette;
-   info->var.xres_virtual = fb->width;
-   info->var.yres_virtual = fb->height;
-   info->var.bits_per_pixel = drm_format_info_bpp(format, 0);
-   info->var.accel_flags = FB_ACCELF_TEXT;
info->var.xoffset = 0;
info->var.yoffset = 0;
+   __fill_var(&info->var, fb);
info->var.activate = FB_ACTIVATE_NOW;
 
drm_fb_helper_fill_pixel_fmt(&info->var, format);
-- 
2.40.0

[PATCH] fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace

2023-04-04 Thread Daniel Vetter

This is an oversight from dc5bdb68b5b3 ("drm/fb-helper: Fix vt
restore") - I failed to realize that nasty userspace could set this.

It's not pretty to mix up kernel-internal and userspace uapi flags
like this, but since the entire fb_var_screeninfo structure is uapi
we'd need to either add a new parameter to the ->fb_set_par callback
and fb_set_par() function, which has a _lot_ of users. Or some other
fairly ugly side-channel int fb_info. Neither is a pretty prospect.

Instead just correct the issue at hand by filtering out this
kernel-internal flag in the ioctl handling code.

Signed-off-by: Daniel Vetter 
Fixes: dc5bdb68b5b3 ("drm/fb-helper: Fix vt restore")
Cc: Alex Deucher 
Cc: shl...@fastmail.com
Cc: Michel Dänzer 
Cc: Noralf Trønnes 
Cc: Thomas Zimmermann 
Cc: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.7+
Cc: Bartlomiej Zolnierkiewicz 
Cc: Geert Uytterhoeven 
Cc: Nathan Chancellor 
Cc: Qiujun Huang 
Cc: Peter Rosin 
Cc: linux-fb...@vger.kernel.org
Cc: Helge Deller 
Cc: Sam Ravnborg 
Cc: Geert Uytterhoeven 
Cc: Samuel Thibault 
Cc: Tetsuo Handa 
Cc: Shigeru Yoshida 
---
 drivers/video/fbdev/core/fbmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 875541ff185b..3fd95a79e4c3 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -1116,6 +1116,8 @@ static long do_fb_ioctl(struct fb_info *info, unsigned 
int cmd,
case FBIOPUT_VSCREENINFO:
if (copy_from_user(&var, argp, sizeof(var)))
return -EFAULT;
+   /* only for kernel-internal use */
+   var.activate &= ~FB_ACTIVATE_KD_TEXT;
console_lock();
lock_fb_info(info);
ret = fbcon_modechange_possible(info, &var);
-- 
2.40.0

Re: [PATCH 1/3] Revert "drm/lima: add show_fdinfo for drm usage stats"

2023-04-04 Thread Daniel Vetter

On Tue, Apr 04, 2023 at 08:25:59AM +0800, yq882...@163.com wrote:
> From: Qiang Yu 
> 
> This reverts commit 4a66f3da99dcb4dcbd28544110636b50adfb0f0d.
> 
> This is due to the depend commit has been reverted on upstream:
> baad10973fdb ("Revert "drm/scheduler: track GPU active time per entity"")
> 
> Signed-off-by: Qiang Yu 

A bit an aside, but it feels like a lot more of the fdinfo scheduler code
should be common, and not just the minimal timekeeping? Just a thought for
the next round.
-Daniel

> ---
>  drivers/gpu/drm/lima/lima_drv.c | 31 +--
>  1 file changed, 1 insertion(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/lima/lima_drv.c b/drivers/gpu/drm/lima/lima_drv.c
> index 3420875d6fc6..f456a471216b 100644
> --- a/drivers/gpu/drm/lima/lima_drv.c
> +++ b/drivers/gpu/drm/lima/lima_drv.c
> @@ -261,36 +261,7 @@ static const struct drm_ioctl_desc 
> lima_drm_driver_ioctls[] = {
>   DRM_IOCTL_DEF_DRV(LIMA_CTX_FREE, lima_ioctl_ctx_free, DRM_RENDER_ALLOW),
>  };
>  
> -static void lima_drm_driver_show_fdinfo(struct seq_file *m, struct file 
> *filp)
> -{
> - struct drm_file *file = filp->private_data;
> - struct drm_device *dev = file->minor->dev;
> - struct lima_device *ldev = to_lima_dev(dev);
> - struct lima_drm_priv *priv = file->driver_priv;
> - struct lima_ctx_mgr *ctx_mgr = &priv->ctx_mgr;
> - u64 usage[lima_pipe_num];
> -
> - lima_ctx_mgr_usage(ctx_mgr, usage);
> -
> - /*
> -  * For a description of the text output format used here, see
> -  * Documentation/gpu/drm-usage-stats.rst.
> -  */
> - seq_printf(m, "drm-driver:\t%s\n", dev->driver->name);
> - seq_printf(m, "drm-client-id:\t%u\n", priv->id);
> - for (int i = 0; i < lima_pipe_num; i++) {
> - struct lima_sched_pipe *pipe = &ldev->pipe[i];
> - struct drm_gpu_scheduler *sched = &pipe->base;
> -
> - seq_printf(m, "drm-engine-%s:\t%llu ns\n", sched->name, 
> usage[i]);
> - }
> -}
> -
> -static const struct file_operations lima_drm_driver_fops = {
> - .owner = THIS_MODULE,
> - DRM_GEM_FOPS,
> - .show_fdinfo = lima_drm_driver_show_fdinfo,
> -};
> +DEFINE_DRM_GEM_FOPS(lima_drm_driver_fops);
>  
>  /*
>   * Changelog:
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 0/3] Revert lima fdinfo patchset

2023-04-04 Thread Daniel Vetter

On Tue, Apr 04, 2023 at 04:17:33PM +0100, Emil Velikov wrote:
> On Tue, 4 Apr 2023 at 08:13,  wrote:
> >
> > From: Qiang Yu 
> >
> > Upstream has reverted the dependant commit
> > df622729ddbf ("drm/scheduler: track GPU active time per entity""),
> > but this patchset has been pushed to drm-misc-next which still
> > has that commit. To avoid other branch build fail after merge
> > drm-misc-next, just revert the lima patchset on drm-misc-next too.
> >
> 
> The bug/revert is unfortunate, although we better keep the build clean
> or Linus will go bananas ^_^
> 
> Fwiw for the series:
> Acked-by: Emil Velikov 

Can you (eitehr of you really) please push asap and make sure this doesn't
miss the last drm-misc-next pull (-rc6 is this w/e)? Otherwise we'll have
a bit a mess.
-Daniel

> 
> HTH
> -Emil

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Daniel Vetter

On Tue, Apr 04, 2023 at 07:02:23PM +, Matthew Brost wrote:
> On Tue, Apr 04, 2023 at 08:14:01PM +0200, Thomas Hellström (Intel) wrote:
> > 
> > On 4/4/23 15:10, Christian König wrote:
> > > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > > Hi, Christian,
> > > > 
> > > > On 4/4/23 11:09, Christian König wrote:
> > > > > Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > > > > > From: Thomas Hellström 
> > > > > > 
> > > > > > For long-running workloads, drivers either need to open-code
> > > > > > completion
> > > > > > waits, invent their own synchronization primitives or internally use
> > > > > > dma-fences that do not obey the cross-driver dma-fence protocol, but
> > > > > > without any lockdep annotation all these approaches are error prone.
> > > > > > 
> > > > > > So since for example the drm scheduler uses dma-fences it is
> > > > > > desirable for
> > > > > > a driver to be able to use it for throttling and error
> > > > > > handling also with
> > > > > > internal dma-fences tha do not obey the cros-driver
> > > > > > dma-fence protocol.
> > > > > > 
> > > > > > Introduce long-running completion fences in form of
> > > > > > dma-fences, and add
> > > > > > lockdep annotation for them. In particular:
> > > > > > 
> > > > > > * Do not allow waiting under any memory management locks.
> > > > > > * Do not allow to attach them to a dma-resv object.
> > > > > > * Introduce a new interface for adding callbacks making the
> > > > > > helper adding
> > > > > >    a callback sign off on that it is aware that the dma-fence may 
> > > > > > not
> > > > > >    complete anytime soon. Typically this will be the
> > > > > > scheduler chaining
> > > > > >    a new long-running fence on another one.
> > > > > 
> > > > > Well that's pretty much what I tried before:
> > > > > https://lwn.net/Articles/893704/
> > > > > 
> 
> I don't think this quite the same, this explictly enforces that we don't
> break the dma-fence rules (in path of memory allocations, exported in
> any way), essentially this just SW sync point reusing dma-fence the
> infrastructure for signaling / callbacks. I believe your series tried to
> export these fences to user space (admittedly I haven't fully read your
> series).
> 
> In this use case we essentially just want to flow control the ring via
> the dma-scheduler + maintain a list of pending jobs so the TDR can be
> used for cleanup if LR entity encounters an error. To me this seems
> perfectly reasonable but I know dma-femce rules are akin to a holy war.
> 
> If we return NULL in run_job, now we have to be able to sink all jobs
> in the backend regardless on ring space, maintain a list of jobs pending
> for cleanup after errors, and write a different cleanup path as now the
> TDR doesn't work. Seems very, very silly to duplicate all of this code
> when the DRM scheduler provides all of this for us. Also if we go this
> route, now all drivers are going to invent ways to handle LR jobs /w the
> DRM scheduler.
> 
> This solution is pretty clear, mark the scheduler as LR, and don't
> export any fences from the scheduler. If you try to export these fences
> a blow up happens.

The problem is if you mix things up. Like for resets you need all the
schedulers on an engine/set-of-engines to quiescent or things get
potentially hilarious. If you now have a scheduler in forever limbo, the
dma_fence guarantees are right out the window.

But the issue you're having is fairly specific if it's just about
ringspace. I think the dumbest fix is to just block in submit if you run
out of per-ctx ringspace, and call it a day. This notion that somehow the
kernel is supposed to provide a bottomless queue of anything userspace
submits simply doesn't hold up in reality (as much as userspace standards
committees would like it to), and as long as it doesn't have a real-world
perf impact it doesn't really matter why we end up blocking in the submit
ioctl. It might also be a simple memory allocation that hits a snag in
page reclaim.

> > > > > And the reasons why it was rejected haven't changed.
> > > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > Yes, TBH this was mostly to get discussion going how we'd best
> > > > tackle this problem while being able to reuse the scheduler for
> > > > long-running workloads.
> > > > 
> > > > I couldn't see any clear decision on your series, though, but one
> > > > main difference I see is that this is intended for driver-internal
> > > > use only. (I'm counting using the drm_scheduler as a helper for
> > > > driver-private use). This is by no means a way to try tackle the
> > > > indefinite fence problem.
> > > 
> > > Well this was just my latest try to tackle this, but essentially the
> > > problems are the same as with your approach: When we express such
> > > operations as dma_fence there is always the change that we leak that
> > > somewhere.
> > > 
> > > My approach of adding a flag noting that this operation is dangerous and
> > > can't be synced with someth

Re: [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

2023-04-04 Thread Daniel Vetter

On Tue, Apr 04, 2023 at 08:22:05PM +0300, Dmitry Baryshkov wrote:
> On 08/03/2023 17:53, Rob Clark wrote:
> > From: Rob Clark 
> > 
> > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > the next vblank time, and inform the fence(s) of that deadline.
> > 
> > v2: Comment typo fix (danvet)
> > v3: If there are multiple CRTCs, consider the time of the soonest vblank
> > 
> > Signed-off-by: Rob Clark 
> > Reviewed-by: Daniel Vetter 
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/drm_atomic_helper.c | 37 +
> >   1 file changed, 37 insertions(+)
> 
> As I started playing with hotplug on RB5 (sm8250, DSI-HDMI bridge), I found
> that this patch introduces the following backtrace on HDMI hotplug. Is there
> anything that I can do to debug/fix the issue? The warning seems harmless,
> but it would be probably be good to still fix it. With addresses decoded:

Bit a shot in the dark, but does the below help?


diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index f21b5a74176c..6640d80d84f3 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1528,6 +1528,9 @@ static void set_fence_deadline(struct drm_device *dev,
for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
ktime_t v;
 
+   if (drm_atomic_crtc_needs_modeset(new_crtc_state))
+   continue;
+
if (drm_crtc_next_vblank_start(crtc, &v))
continue;
 
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 78a8c51a4abf..7ae38e8e27e8 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -1001,6 +1001,9 @@ int drm_crtc_next_vblank_start(struct drm_crtc *crtc, 
ktime_t *vblanktime)
struct drm_display_mode *mode = &vblank->hwmode;
u64 vblank_start;
 
+   if (!drm_dev_has_vblank(crtc->dev))
+   return -EINVAL;
+
if (!vblank->framedur_ns || !vblank->linedur_ns)
return -EINVAL;
 

> 
> [   31.151348] [ cut here ]
> [   31.157043] msm_dpu ae01000.display-controller:
> drm_WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev))
> [   31.157177] WARNING: CPU: 0 PID: 13 at drivers/gpu/drm/drm_vblank.c:728
> drm_crtc_vblank_helper_get_vblank_timestamp_internal
> (drivers/gpu/drm/drm_vblank.c:728)
> [   31.180629] Modules linked in:
> [   31.184106] CPU: 0 PID: 13 Comm: kworker/0:1 Not tainted
> 6.3.0-rc2-8-gd39e48ca80c0 #542
> [   31.193358] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
> [   31.200796] Workqueue: events lt9611uxc_hpd_work
> [   31.205990] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [   31.213722] pc : drm_crtc_vblank_helper_get_vblank_timestamp_internal
> (drivers/gpu/drm/drm_vblank.c:728)
> [   31.222032] lr : drm_crtc_vblank_helper_get_vblank_timestamp_internal
> (drivers/gpu/drm/drm_vblank.c:728)
> [   31.230341] sp : 880bb8d0
> [   31.234061] x29: 880bb900 x28: 0038 x27:
> 61a7956b8d60
> [   31.242051] x26:  x25:  x24:
> 880bb9c4
> [   31.250038] x23: 0001 x22: bf0033b94ef0 x21:
> 61a7957901d0
> [   31.258029] x20: 61a79571 x19: 61a78128b000 x18:
> fffec278
> [   31.266014] x17: 00400465 x16: 0020 x15:
> 0060
> [   31.274001] x14: 0001 x13: bf00354550e0 x12:
> 0825
> [   31.281989] x11: 02b7 x10: bf00354b1208 x9 :
> bf00354550e0
> [   31.289976] x8 : efff x7 : bf00354ad0e0 x6 :
> 02b7
> [   31.297963] x5 : 61a8feebbe48 x4 : 4000f2b7 x3 :
> a2a8c9f64000
> [   31.305947] x2 :  x1 :  x0 :
> 61a780283100
> [   31.313934] Call trace:
> [   31.316719] drm_crtc_vblank_helper_get_vblank_timestamp_internal
> (drivers/gpu/drm/drm_vblank.c:728)
> [   31.324646] drm_crtc_vblank_helper_get_vblank_timestamp
> (drivers/gpu/drm/drm_vblank.c:843)
> [   31.331528] drm_crtc_get_last_vbltimestamp
> (drivers/gpu/drm/drm_vblank.c:884)
> [   31.337170] drm_crtc_next_vblank_start
> (drivers/gpu/drm/drm_vblank.c:1006)
> [   31.342430] drm_atomic_helper_wait_for_fences
> (drivers/gpu/drm/drm_atomic_helper.c:1531
> drivers/gpu/drm/drm_atomic_helper.c:1578)
> [   31.348561] drm_atomic_helper_commit
> (drivers/gpu/drm/drm_atomic_helper.c:2007)
> [   31.353724] drm_atomic_commit (drivers/gpu/drm/drm_atomic.c:1444)
> [   31.358127] drm_client_modeset_commit_atomic
> (drivers/gpu/drm/drm_client_modeset.c:1045)
> [   31.364146] drm_client_modeset_commit_locked
> (drivers/gpu/drm/drm_client_modeset.c:1148)
> [   31.370071] drm_client_modeset_commit
> (drivers/gpu/drm/drm_client_modeset.c:1174)
> [   31.375233] drm_fb_helper_set_par (drivers/gpu/drm/drm_fb_helper.c:254
> drivers/gpu/drm/drm_fb_helper.c:229 drivers/gpu/d

Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans

2023-04-04 Thread Daniel Vetter

On Tue, 4 Apr 2023 at 19:29, Tvrtko Ursulin
 wrote:
>
>
> On 04/04/2023 14:52, Matthew Brost wrote:
> > On Tue, Apr 04, 2023 at 10:43:03AM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 04/04/2023 01:22, Matthew Brost wrote:
> >>> Hello,
> >>>
> >>> As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
> >>> have been asked to merge our common DRM scheduler patches first as well
> >>> as develop a common solution for long running workloads with the DRM
> >>> scheduler. This RFC series is our first attempt at doing this. We
> >>> welcome any and all feedback.
> >>>
> >>> This can we thought of as 4 parts detailed below.
> >>>
> >>> - DRM scheduler changes for 1 to 1 relationship between scheduler and
> >>> entity (patches 1-3)
> >>>
> >>> In Xe all of the scheduling of jobs is done by a firmware scheduler (the
> >>> GuC) which is a new paradigm WRT to the DRM scheduler and presents
> >>> severals problems as the DRM was originally designed to schedule jobs on
> >>> hardware queues. The main problem being that DRM scheduler expects the
> >>> submission order of jobs to be the completion order of jobs even across
> >>> multiple entities. This assumption falls apart with a firmware scheduler
> >>> as a firmware scheduler has no concept of jobs and jobs can complete out
> >>> of order. A novel solution for was originally thought of by Faith during
> >>> the initial prototype of Xe, create a 1 to 1 relationship between 
> >>> scheduler
> >>> and entity. I believe the AGX driver [3] is using this approach and
> >>> Boris may use approach as well for the Mali driver [4].
> >>>
> >>> To support a 1 to 1 relationship we move the main execution function
> >>> from a kthread to a work queue and add a new scheduling mode which
> >>> bypasses code in the DRM which isn't needed in a 1 to 1 relationship.
> >>> The new scheduling mode should unify all drivers usage with a 1 to 1
> >>> relationship and can be thought of as using scheduler as a dependency /
> >>> infligt job tracker rather than a true scheduler.
> >>
> >> Once you add capability for a more proper 1:1 via
> >> DRM_SCHED_POLICY_SINGLE_ENTITY, do you still have further need to replace
> >> kthreads with a wq?
> >>
> >> Or in other words, what purpose does the offloading of a job picking code 
> >> to
> >> a separate execution context serve? Could it be done directly in the 1:1
> >> mode and leave kthread setup for N:M?
> >>
> >
> > Addressed the other two on my reply to Christian...
> >
> > For this one basically the concept of a single entity point IMO is a
> > very good concept which I'd like to keep. But most important reason
> > being the main execution thread (now worker) is kicked when a dependency
> > for a job is resolved, dependencies are dma-fences signaled via a
> > callback, and these call backs can be signaled in IRQ contexts. We
> > absolutely do not want to enter the backend in an IRQ context for a
> > variety of reasons.
>
> Sounds like a fair enough requirement but if drivers will not be
> comfortable with the wq conversion, it is probably possible to introduce
> some vfuncs for the 1:1 case which would allow scheduler users override
> the scheduler wakeup and select a special "pick one job" path. That
> could allow 1:1 users do their thing, leaving rest as is. I mean you
> already have the special single entity scheduler, you'd just need to add
> some more specialization on the init, wake up, etc paths.
>
> And I will mention once more that I find a wq item with a loop such as:
>
> while (!READ_ONCE(sched->pause_run_wq)) {
> ...
>
> A bit dodgy. If you piggy back on any system_wq it smells of system wide
> starvation so for me any proposal with an option to use a system shared
> wq is a no go.

Yeah I think the argument for wq based scheduler would need a
per-drm_scheduler wq, like we currently have a per-scheduler kthread.
It might still need some serious work to replace the
kthread_stop/start() with a wq-native equivalent (because really this
is the tricky stuff we shouldn't hand-roll unless someone is willing
to write a few papers on the lockless design that's done), but would
look a bunch more reasonable. Having a per-sched workqueue might also
help with the big sched_stop/start/fini state transitions, which I
really think should still go over all the per-entity schedulers even
in the 1:1 case (because otherwise you get some funky code in drivers
that do the iterations needed, which probably tosses the fairly nice
design the current scheduler has by relying on the kthread_stop/start
primitives for this.
-Daniel

>
> Regards,
>
> Tvrtko
>
>
> >> Apart from those design level questions, low level open IMO still is that
> >> default fallback of using the system_wq has the potential to affect latency
> >> for other drivers. But that's for those driver owners to approve.
> >>
> >> Regards,
> >>
> >> Tvrtko
> >>
> >>> - Generic messaging interface for DRM scheduler
> >>>
> >>> Idea is to be able to communicate to the su

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Matthew Brost

On Tue, Apr 04, 2023 at 08:14:01PM +0200, Thomas Hellström (Intel) wrote:
> 
> On 4/4/23 15:10, Christian König wrote:
> > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > Hi, Christian,
> > > 
> > > On 4/4/23 11:09, Christian König wrote:
> > > > Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > > > > From: Thomas Hellström 
> > > > > 
> > > > > For long-running workloads, drivers either need to open-code
> > > > > completion
> > > > > waits, invent their own synchronization primitives or internally use
> > > > > dma-fences that do not obey the cross-driver dma-fence protocol, but
> > > > > without any lockdep annotation all these approaches are error prone.
> > > > > 
> > > > > So since for example the drm scheduler uses dma-fences it is
> > > > > desirable for
> > > > > a driver to be able to use it for throttling and error
> > > > > handling also with
> > > > > internal dma-fences tha do not obey the cros-driver
> > > > > dma-fence protocol.
> > > > > 
> > > > > Introduce long-running completion fences in form of
> > > > > dma-fences, and add
> > > > > lockdep annotation for them. In particular:
> > > > > 
> > > > > * Do not allow waiting under any memory management locks.
> > > > > * Do not allow to attach them to a dma-resv object.
> > > > > * Introduce a new interface for adding callbacks making the
> > > > > helper adding
> > > > >    a callback sign off on that it is aware that the dma-fence may not
> > > > >    complete anytime soon. Typically this will be the
> > > > > scheduler chaining
> > > > >    a new long-running fence on another one.
> > > > 
> > > > Well that's pretty much what I tried before:
> > > > https://lwn.net/Articles/893704/
> > > > 

I don't think this quite the same, this explictly enforces that we don't
break the dma-fence rules (in path of memory allocations, exported in
any way), essentially this just SW sync point reusing dma-fence the
infrastructure for signaling / callbacks. I believe your series tried to
export these fences to user space (admittedly I haven't fully read your
series).

In this use case we essentially just want to flow control the ring via
the dma-scheduler + maintain a list of pending jobs so the TDR can be
used for cleanup if LR entity encounters an error. To me this seems
perfectly reasonable but I know dma-femce rules are akin to a holy war.

If we return NULL in run_job, now we have to be able to sink all jobs
in the backend regardless on ring space, maintain a list of jobs pending
for cleanup after errors, and write a different cleanup path as now the
TDR doesn't work. Seems very, very silly to duplicate all of this code
when the DRM scheduler provides all of this for us. Also if we go this
route, now all drivers are going to invent ways to handle LR jobs /w the
DRM scheduler.

This solution is pretty clear, mark the scheduler as LR, and don't
export any fences from the scheduler. If you try to export these fences
a blow up happens.

> > > > And the reasons why it was rejected haven't changed.
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > Yes, TBH this was mostly to get discussion going how we'd best
> > > tackle this problem while being able to reuse the scheduler for
> > > long-running workloads.
> > > 
> > > I couldn't see any clear decision on your series, though, but one
> > > main difference I see is that this is intended for driver-internal
> > > use only. (I'm counting using the drm_scheduler as a helper for
> > > driver-private use). This is by no means a way to try tackle the
> > > indefinite fence problem.
> > 
> > Well this was just my latest try to tackle this, but essentially the
> > problems are the same as with your approach: When we express such
> > operations as dma_fence there is always the change that we leak that
> > somewhere.
> > 
> > My approach of adding a flag noting that this operation is dangerous and
> > can't be synced with something memory management depends on tried to
> > contain this as much as possible, but Daniel still pretty clearly
> > rejected it (for good reasons I think).
> > 
> > > 
> > > We could ofc invent a completely different data-type that abstracts
> > > the synchronization the scheduler needs in the long-running case, or
> > > each driver could hack something up, like sleeping in the
> > > prepare_job() or run_job() callback for throttling, but those waits
> > > should still be annotated in one way or annotated one way or another
> > > (and probably in a similar way across drivers) to make sure we don't
> > > do anything bad.
> > > 
> > >  So any suggestions as to what would be the better solution here
> > > would be appreciated.
> > 
> > Mhm, do we really the the GPU scheduler for that?
> > 

I think we need to solve this within the DRM scheduler one way or
another.

> > I mean in the 1 to 1 case  you basically just need a component which
> > collects the dependencies as dma_fence and if all of them are fulfilled
> > schedules a work item.
> > 
> > As long as the work item itself doe

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Daniel Vetter

On Tue, 4 Apr 2023 at 15:10, Christian König  wrote:
>
> Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > Hi, Christian,
> >
> > On 4/4/23 11:09, Christian König wrote:
> >> Am 04.04.23 um 02:22 schrieb Matthew Brost:
> >>> From: Thomas Hellström 
> >>>
> >>> For long-running workloads, drivers either need to open-code completion
> >>> waits, invent their own synchronization primitives or internally use
> >>> dma-fences that do not obey the cross-driver dma-fence protocol, but
> >>> without any lockdep annotation all these approaches are error prone.
> >>>
> >>> So since for example the drm scheduler uses dma-fences it is
> >>> desirable for
> >>> a driver to be able to use it for throttling and error handling also
> >>> with
> >>> internal dma-fences tha do not obey the cros-driver dma-fence protocol.
> >>>
> >>> Introduce long-running completion fences in form of dma-fences, and add
> >>> lockdep annotation for them. In particular:
> >>>
> >>> * Do not allow waiting under any memory management locks.
> >>> * Do not allow to attach them to a dma-resv object.
> >>> * Introduce a new interface for adding callbacks making the helper
> >>> adding
> >>>a callback sign off on that it is aware that the dma-fence may not
> >>>complete anytime soon. Typically this will be the scheduler chaining
> >>>a new long-running fence on another one.
> >>
> >> Well that's pretty much what I tried before:
> >> https://lwn.net/Articles/893704/
> >>
> >> And the reasons why it was rejected haven't changed.
> >>
> >> Regards,
> >> Christian.
> >>
> > Yes, TBH this was mostly to get discussion going how we'd best tackle
> > this problem while being able to reuse the scheduler for long-running
> > workloads.
> >
> > I couldn't see any clear decision on your series, though, but one main
> > difference I see is that this is intended for driver-internal use
> > only. (I'm counting using the drm_scheduler as a helper for
> > driver-private use). This is by no means a way to try tackle the
> > indefinite fence problem.
>
> Well this was just my latest try to tackle this, but essentially the
> problems are the same as with your approach: When we express such
> operations as dma_fence there is always the change that we leak that
> somewhere.
>
> My approach of adding a flag noting that this operation is dangerous and
> can't be synced with something memory management depends on tried to
> contain this as much as possible, but Daniel still pretty clearly
> rejected it (for good reasons I think).

Yeah I still don't like dma_fence that somehow have totally different
semantics in that critical piece of "will it complete or will it
deadlock?" :-)
>
> >
> > We could ofc invent a completely different data-type that abstracts
> > the synchronization the scheduler needs in the long-running case, or
> > each driver could hack something up, like sleeping in the
> > prepare_job() or run_job() callback for throttling, but those waits
> > should still be annotated in one way or annotated one way or another
> > (and probably in a similar way across drivers) to make sure we don't
> > do anything bad.
> >
> >  So any suggestions as to what would be the better solution here would
> > be appreciated.
>
> Mhm, do we really the the GPU scheduler for that?
>
> I mean in the 1 to 1 case  you basically just need a component which
> collects the dependencies as dma_fence and if all of them are fulfilled
> schedules a work item.
>
> As long as the work item itself doesn't produce a dma_fence it can then
> still just wait for other none dma_fence dependencies.

Yeah that's the important thing, for long-running jobs dependencies as
dma_fence should be totally fine. You're just not allowed to have any
outgoing dma_fences at all (except the magic preemption fence).

> Then the work function could submit the work and wait for the result.
>
> The work item would then pretty much represent what you want, you can
> wait for it to finish and pass it along as long running dependency.
>
> Maybe give it a funky name and wrap it up in a structure, but that's
> basically it.

Like do we need this? If the kernel ever waits for a long-running
compute job to finnish I'd call that a bug. Any functional
dependencies between engines or whatever are userspace's problem only,
which it needs to sort out using userspace memory fences.

The only things the kernel needs are some way to track dependencies as
dma_fence (because memory management move the memory away and we need
to move it back in, ideally pipelined). And it needs the special
preempt fence (if we don't have pagefaults) so that you have a fence
to attach to all the dma_resv for memory management purposes. Now the
scheduler already has almost all the pieces (at least if we assume
there's some magic fw which time-slices these contexts on its own),
and we just need a few minimal changes:
- allowing the scheduler to ignore the completion fence and just
immediately push the next "job" in if its dependencies are read

Re: [Intel-xe] [PATCH 3/3] drm/xe: Update GuC/HuC firmware autoselect logic

2023-04-04 Thread Lucas De Marchi


On Mon, Apr 03, 2023 at 06:09:10PM +, Anusha Srivatsa wrote:




-Original Message-
From: De Marchi, Lucas 
Sent: Wednesday, March 29, 2023 8:46 PM
To: Srivatsa, Anusha 
Cc: intel...@lists.freedesktop.org; Harrison, John C
; Ceraolo Spurio, Daniele
; dri-devel@lists.freedesktop.org; Daniel
Vetter ; Dave Airlie 
Subject: Re: [PATCH 3/3] drm/xe: Update GuC/HuC firmware autoselect logic

On Tue, Mar 28, 2023 at 04:31:13PM -0700, Anusha Srivatsa wrote:
>
>
>> -Original Message-
>> From: De Marchi, Lucas 
>> Sent: Thursday, March 23, 2023 10:18 PM
>> To: intel...@lists.freedesktop.org
>> Cc: Srivatsa, Anusha ; Harrison, John C
>> ; Ceraolo Spurio, Daniele
>> ; dri-devel@lists.freedesktop.org;
>> Daniel Vetter ; Dave Airlie
>> ; De Marchi, Lucas 
>> Subject: [PATCH 3/3] drm/xe: Update GuC/HuC firmware autoselect logic
>>
>> Update the logic to autoselect GuC/HuC for the platforms with the
>> following
>> improvements:
>>
>> - Document what is the firmware file that is expected to be
>>   loaded and what is checked from blob headers
>> - When the platform is under force-probe it's desired to enforce
>>   the full-version requirement so the correct firmware is used
>>   before widespread adoption and backward-compatibility
>>
>Extra line ^
>
>>   commitments
>> - Directory from which we expect firmware blobs to be available in
>>   upstream linux-firmware repository depends on the platform: for
>>   the ones supported by i915 it uses the i915/ directory, but the ones
>>   expected to be supported by xe, it's on the xe/ directory. This
>>   means that for platforms in the intersection, the firmware is
>>   loaded from a different directory, but that is not much important
>>   in the firmware repo and it avoids firmware duplication.
>>
>> - Make the table with the firmware definitions clearly state the
>>   versions being expected. Now with macros to select the version it's
>>   possible to choose between full-version/major-version for GuC and
>>   full-version/no-version for HuC. These are similar to the macros used
>>   in i915, but implemented in a slightly different way to avoid
>>   duplicating the macros for each firmware/type and functionality,
>>   besides adding the support for different directories.
>>
>> - There is no check added regarding force-probe since xe should
>>   reuse the same firmware files published for i915 for past
>>   platforms. This can be improved later with additional
>>   kunit checking against a hardcoded list of platforms that
>Extra line here.
>
>>   falls in this category.
>> - As mentioned in the TODO, the major version fallback was not
>>   implemented before as currently each platform only supports one
>>   major. That can be easily added later.
>>
>> - GuC version for MTL and PVC were updated to 70.6.4, using the exact
>>   full version, while the
>>
>> After this the GuC firmware used by PVC changes to pvc_guc_70.5.2.bin
>> since it's using a file not published yet.
>>
>> Signed-off-by: Lucas De Marchi 
>> ---
>>  drivers/gpu/drm/xe/xe_uc_fw.c   | 315 +---
>>  drivers/gpu/drm/xe/xe_uc_fw.h   |   2 +-
>>  drivers/gpu/drm/xe/xe_uc_fw_types.h |   7 +
>>  3 files changed, 204 insertions(+), 120 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c
>> b/drivers/gpu/drm/xe/xe_uc_fw.c index 174c42873ebb..653bc3584cc5
>> 100644
>> --- a/drivers/gpu/drm/xe/xe_uc_fw.c
>> +++ b/drivers/gpu/drm/xe/xe_uc_fw.c
>> @@ -17,6 +17,137 @@
>>  #include "xe_mmio.h"
>>  #include "xe_uc_fw.h"
>>
>> +/*
>> + * List of required GuC and HuC binaries per-platform. They must be
>> +ordered
>> + * based on platform, from newer to older.
>> + *
>> + * Versioning follows the guidelines from
>> + * Documentation/driver-api/firmware/firmware-usage-guidelines.rst.
>> +There is a
>> + * distinction for platforms being officially supported by the driver or 
not.
>> + * Platforms not available publicly or not yet officially supported
>> +by the
>> + * driver (under force-probe), use the mmp_ver(): the firmware
>> +autoselect logic
>> + * will select the firmware from disk with filename that matches the
>> +full
>> + * "mpp version", i.e. major.minor.patch. mmp_ver() should only be
>> +used for
>> + * this case.
>> + *
>> + * For platforms officially supported by the driver, the filename
>> +always only
>> + * ever contains the major version (GuC) or no version at all (HuC).
>> + *
>> + * After loading the file, the driver parses the versions embedded in the 
blob.
>> + * The major version needs to match a major version supported by the
>> +driver (if
>> + * any). The minor version is also checked and a notice emitted to
>> +the log if
>> + * the version found is smaller than the version wanted. This is
>> +done only for
>> + * informational purposes so users may have a chance to upgrade, but
>> +the driver
>> + * still loads and use the older firmware.
>> + *
>> + * Examples:
>> + *
>> + *1) Platform officially supported by i915 - using Tigerlake as

[PATCH] drm/msm/mdp5: set varaiable msm8x76_config storage-class-specifier to static

2023-04-04 Thread Tom Rix

smatch reports
drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c:658:26: warning: symbol
  'msm8x76_config' was not declared. Should it be static?

This variable is only used in one file so should be static.

Signed-off-by: Tom Rix 
---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
index 1f1555aa02d2..2eec2d78f32a 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
@@ -655,7 +655,7 @@ static const struct mdp5_cfg_hw msm8x96_config = {
.max_clk = 41250,
 };
 
-const struct mdp5_cfg_hw msm8x76_config = {
+static const struct mdp5_cfg_hw msm8x76_config = {
.name = "msm8x76",
.mdp = {
.count = 1,
-- 
2.27.0

1 2 3 4 >

1 - 100 of 342 matches

Mail list logo