Re: [PATCH] drm/stm: ltdc: attach immutable zpos property to planes

2021-09-06 Thread yannick Fertre

Hi Raphael,
thanks for the patch.

Acked-by: Yannick Fertre 
Reviewed-by: Yannick Fertre 

On 9/2/21 5:30 PM, Raphael GALLAIS-POU - foss wrote:

Defines plane ordering by hard-coding an immutable Z position from the
first plane, used as primary layer, to the next ones as overlay in order
of instantiation.

This zpos is only an information as it is not possible to modify it,
blending operations are still applied from the top to the bottom layer.

This patch helps to remove a warning message from the Android
Hardware Composer.

Signed-off-by: Raphael Gallais-Pou 
---
  drivers/gpu/drm/stm/ltdc.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/stm/ltdc.c b/drivers/gpu/drm/stm/ltdc.c
index 195de30eb90c..bd603ef5e935 100644
--- a/drivers/gpu/drm/stm/ltdc.c
+++ b/drivers/gpu/drm/stm/ltdc.c
@@ -1024,6 +1024,8 @@ static int ltdc_crtc_init(struct drm_device *ddev, struct 
drm_crtc *crtc)
return -EINVAL;
}
  
+	drm_plane_create_zpos_immutable_property(primary, 0);

+
ret = drm_crtc_init_with_planes(ddev, crtc, primary, NULL,


Re: [PATCH] drm/stm: ltdc: add layer alpha support

2021-09-06 Thread yannick Fertre

Hi Raphael,
thanks for the patch.

Acked-by: Yannick Fertre 
Reviewed-by: Yannick Fertre 

On 9/3/21 10:58 AM, Raphael GALLAIS-POU - foss wrote:

Android Hardware Composer supports alpha values applied to layers.
Enabling non-opaque layers for the STM CRTC could help offload GPU
resources for screen composition.

Signed-off-by: Raphael Gallais-Pou 
---
  drivers/gpu/drm/stm/ltdc.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/stm/ltdc.c b/drivers/gpu/drm/stm/ltdc.c
index 195de30eb90c..e0fef8bacfa8 100644
--- a/drivers/gpu/drm/stm/ltdc.c
+++ b/drivers/gpu/drm/stm/ltdc.c
@@ -845,7 +845,7 @@ static void ltdc_plane_atomic_update(struct drm_plane 
*plane,
LXCFBLR_CFBLL | LXCFBLR_CFBP, val);
  
  	/* Specifies the constant alpha value */

-   val = CONSTA_MAX;
+   val = newstate->alpha >> 8;
reg_update_bits(ldev->regs, LTDC_L1CACR + lofs, LXCACR_CONSTA, val);
  
  	/* Specifies the blending factors */

@@ -997,6 +997,8 @@ static struct drm_plane *ltdc_plane_create(struct 
drm_device *ddev,
  
  	drm_plane_helper_add(plane, base.id);
  
  	return plane;




Re: [PATCH 1/5] drm/ttm: remove the outdated kerneldoc section

2021-09-06 Thread Christian König




Am 03.09.21 um 16:22 schrieb Matthew Auld:

On Fri, 3 Sept 2021 at 13:31, Christian König
 wrote:

Clean up to start over with new and more accurate documentation.

Signed-off-by: Christian König 

For the series,
Reviewed-by: Matthew Auld 


Thanks.



We could maybe also bring in ttm_pool.[ch]? It looks like it already
has near complete kernel-doc?


Yes, just didn't had time to cleanup the remaining fallout yet.

The last and most important remaining beast is the BO documentation, but 
that will still take a while.


Regards,
Christian.




---
  Documentation/gpu/drm-mm.rst | 49 
  1 file changed, 49 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 0198fa43d254..8ca981065e1a 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -30,55 +30,6 @@ The Translation Table Manager (TTM)

  TTM design background and information belongs here.

-TTM initialization
---
-
-**Warning**
-This section is outdated.
-
-Drivers wishing to support TTM must pass a filled :c:type:`ttm_bo_driver
-` structure to ttm_bo_device_init, together with an
-initialized global reference to the memory manager.  The ttm_bo_driver
-structure contains several fields with function pointers for
-initializing the TTM, allocating and freeing memory, waiting for command
-completion and fence synchronization, and memory migration.
-
-The :c:type:`struct drm_global_reference ` is made
-up of several fields:
-
-.. code-block:: c
-
-  struct drm_global_reference {
-  enum ttm_global_types global_type;
-  size_t size;
-  void *object;
-  int (*init) (struct drm_global_reference *);
-  void (*release) (struct drm_global_reference *);
-  };
-
-
-There should be one global reference structure for your memory manager
-as a whole, and there will be others for each object created by the
-memory manager at runtime. Your global TTM should have a type of
-TTM_GLOBAL_TTM_MEM. The size field for the global object should be
-sizeof(struct ttm_mem_global), and the init and release hooks should
-point at your driver-specific init and release routines, which probably
-eventually call ttm_mem_global_init and ttm_mem_global_release,
-respectively.
-
-Once your global TTM accounting structure is set up and initialized by
-calling ttm_global_item_ref() on it, you need to create a buffer
-object TTM to provide a pool for buffer object allocation by clients and
-the kernel itself. The type of this object should be
-TTM_GLOBAL_TTM_BO, and its size should be sizeof(struct
-ttm_bo_global). Again, driver-specific init and release functions may
-be provided, likely eventually calling ttm_bo_global_ref_init() and
-ttm_bo_global_ref_release(), respectively. Also, like the previous
-object, ttm_global_item_ref() is used to create an initial reference
-count for the TTM, which will call your initialization function.
-
-See the radeon_ttm.c file for an example of usage.
-
  The Graphics Execution Manager (GEM)
  

--
2.25.1





Re: [PATCH 3/5] drm/ttm: enable TTM resource object kerneldoc

2021-09-06 Thread Christian König

Am 03.09.21 um 15:48 schrieb Matthew Auld:

On Fri, 3 Sept 2021 at 13:31, Christian König
 wrote:

Fix the last two remaining warnings and finally enable this.

Signed-off-by: Christian König 
---
  Documentation/gpu/drm-mm.rst   | 9 +
  include/drm/ttm/ttm_resource.h | 6 ++
  2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 56b7b581567d..094e367130db 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -39,6 +39,15 @@ TTM device object reference
  .. kernel-doc:: drivers/gpu/drm/ttm/ttm_device.c
 :export:

+TTM resource object reference
+-
+
+.. kernel-doc:: include/drm/ttm/ttm_resource.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/ttm/ttm_resource.c
+   :export:
+
  The Graphics Execution Manager (GEM)
  

diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index 32c5edd9e8b5..255fc8169d9a 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -103,10 +103,7 @@ struct ttm_resource_manager_func {
   * struct ttm_resource_manager
   *
   * @use_type: The memory type is enabled.
- * @flags: TTM_MEMTYPE_XX flags identifying the traits of the memory
- * managed by this memory type.
- * @gpu_offset: If used, the GPU offset of the first managed page of
- * fixed memory or the first managed location in an aperture.
+ * @use_tt: If a TT object should be used for the backing store.
   * @size: Size of the managed region.
   * @func: structure pointer implementing the range manager. See above
   * @move_lock: lock for move fence
@@ -144,6 +141,7 @@ struct ttm_resource_manager {
   * @addr:  mapped virtual address
   * @offset:physical addr
   * @is_iomem:  is this io memory ?
+ * @caching:   What CPU caching should be used

Maybe add "See enum ttm_caching" or something, so it generates a link,
once we also add kernel-doc for that?


Good point, going to do that as well.

Thanks,
Christian.




   *
   * Structure indicating the bus placement of an object.
   */
--
2.25.1





Re: [PATCH 2/5] drm/ttm: enable TTM device object kerneldoc

2021-09-06 Thread Christian König

Am 03.09.21 um 15:16 schrieb Matthew Auld:

On Fri, 3 Sept 2021 at 13:31, Christian König
 wrote:

Fix the remaining warnings, switch to inline structure documentation
and finally enable this.

Signed-off-by: Christian König 
---
  Documentation/gpu/drm-mm.rst |  9 +
  include/drm/ttm/ttm_device.h | 73 +---
  2 files changed, 51 insertions(+), 31 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 8ca981065e1a..56b7b581567d 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -30,6 +30,15 @@ The Translation Table Manager (TTM)

  TTM design background and information belongs here.

+TTM device object reference
+---
+
+.. kernel-doc:: include/drm/ttm/ttm_device.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/ttm/ttm_device.c
+   :export:
+
  The Graphics Execution Manager (GEM)
  

diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 07d722950d5b..0b31ec731e66 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -39,31 +39,23 @@ struct ttm_operation_ctx;

  /**
   * struct ttm_global - Buffer object driver global data.
- *
- * @dummy_read_page: Pointer to a dummy page used for mapping requests
- * of unpopulated pages.
- * @shrink: A shrink callback object used for buffer object swap.
- * @device_list_mutex: Mutex protecting the device list.
- * This mutex is held while traversing the device list for pm options.
- * @lru_lock: Spinlock protecting the bo subsystem lru lists.
- * @device_list: List of buffer object devices.
- * @swap_lru: Lru list of buffer objects used for swapping.
   */
  extern struct ttm_global {

 /**
-* Constant after init.
+* @dummy_read_page: Pointer to a dummy page used for mapping requests
+* of unpopulated pages. Constant after init.
  */
-
 struct page *dummy_read_page;

 /**
-* Protected by ttm_global_mutex.
+* @device_list: List of buffer object devices. Protected by
+* ttm_global_mutex.

Would it be reasonable to move the ttm_global_mutex into ttm_global
here? That way everything is nicely grouped together, and we can
easily reference it here with @mutex or so?


To be honest I'm in the process of decomposing the global structure. 
Those are essentially static information which can be kept inside the 
ttm_device.c file.


The only reason we had it in the first place was because we leaked the 
BO count and device list into other parts of TTM and even the driver.


Regards,
Christian.




  */
 struct list_head device_list;

 /**
-* Internal protection.
+* @bo_count: Number of buffer objects allocated by devices.
  */
 atomic_t bo_count;
  } ttm_glob;
@@ -230,50 +222,69 @@ struct ttm_device_funcs {

  /**
   * struct ttm_device - Buffer object driver device-specific data.
- *
- * @device_list: Our entry in the global device list.
- * @funcs: Function table for the device.
- * @sysman: Resource manager for the system domain.
- * @man_drv: An array of resource_managers.
- * @vma_manager: Address space manager.
- * @pool: page pool for the device.
- * @dev_mapping: A pointer to the struct address_space representing the
- * device address space.
- * @wq: Work queue structure for the delayed delete workqueue.
   */
  struct ttm_device {
-   /*
+   /**
+* @device_list: Our entry in the global device list.
  * Constant after bo device init
  */
 struct list_head device_list;
+
+   /**
+* @funcs: Function table for the device.
+* Constant after bo device init
+*/
 struct ttm_device_funcs *funcs;

-   /*
+   /**
+* @sysman: Resource manager for the system domain.
  * Access via ttm_manager_type.
  */
 struct ttm_resource_manager sysman;
+
+   /**
+* @man_drv: An array of resource_managers.
+*/
 struct ttm_resource_manager *man_drv[TTM_NUM_MEM_TYPES];

 /*
  * Protected by internal locks.
  */
+
+   /**
+* @vma_manager: Address space manager for finding BOs to mmap.
+*/
 struct drm_vma_offset_manager *vma_manager;
+
+   /**
+* @pool: page pool for the device.
+*/
 struct ttm_pool pool;

-   /*
-* Protection for the per manager LRU and ddestroy lists.
+   /**
+* @lru_lock: Protection for the per manager LRU and ddestroy lists.
  */
 spinlock_t lru_lock;
+
+   /**
+* @ddestroy: Destroyed but not yet cleaned up buffer objects.
+*/
 struct list_head ddestroy;
+
+   /**
+* @pinned: Buffer object which are pinned and so not on any LRU list.
+*/
 struct list_head pinned;

-   /*
-* Protected by load / firstopen / lastc

[Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

2021-09-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #54 from Jerome C (m...@jeromec.com) ---
Hi James,

After 900 ( 600 on LLVM, 300 on GCC ) susp/resu using kernel 5.14.1 compiled by
LLVM 12.0.1 ( LLVM\_IAS is unset during compiling ) and again by GCC 11.1.0,
there no crash on resume, awesome. It usually fails between 1-150 susp/resu

BRING ON THE RYZEN 6000 SERIES APU

Thanks

Jerome





\ Original Message 
On 7 Sep 2021, 03:00, < bugzilla-dae...@bugzilla.kernel.org> wrote:

>
>
>
>
> [https://bugzilla.kernel.org/show\_bug.cgi?id=211277][https_bugzilla.kernel.org_show_bug.cgi_id_211277]
>
> \--- Comment \#52 from James Zhu (jam...@amd.com) ---
> Created attachment 298691
> \--> https://bugzilla.kernel.org/attachment.cgi?id=298691&action=edit
> Fix for S3 hung issue
>
> Hi Jerome and kolAflash,
>
> I think iommu device init is put at wrong place during the resume. I attache
> a
> patch. Please confirm if it works.
> Thanks!
> James
>
> \--
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are on the CC list for the bug.


[https_bugzilla.kernel.org_show_bug.cgi_id_211277]:
https://bugzilla.kernel.org/show_bug.cgi?id=211277

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

2021-09-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #55 from Jerome C (m...@jeromec.com) ---
Created attachment 298695
  --> https://bugzilla.kernel.org/attachment.cgi?id=298695&action=edit
signature.asc

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH] backlight: propagate errors from get_brightness()

2021-09-06 Thread Thomas Weißschuh
backlight.h documents "struct backlight_ops->get_brightness()" to return
a negative errno on failure.
So far these errors have not been handled in the backlight core.
This leads to negative values being exposed through sysfs although only
positive values are documented to be reported.

Signed-off-by: Thomas Weißschuh 
---
 drivers/video/backlight/backlight.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/video/backlight/backlight.c 
b/drivers/video/backlight/backlight.c
index 537fe1b376ad..d681962f8509 100644
--- a/drivers/video/backlight/backlight.c
+++ b/drivers/video/backlight/backlight.c
@@ -292,10 +292,13 @@ static ssize_t actual_brightness_show(struct device *dev,
struct backlight_device *bd = to_backlight_device(dev);
 
mutex_lock(&bd->ops_lock);
-   if (bd->ops && bd->ops->get_brightness)
-   rc = sprintf(buf, "%d\n", bd->ops->get_brightness(bd));
-   else
+   if (bd->ops && bd->ops->get_brightness) {
+   rc = bd->ops->get_brightness(bd);
+   if (rc >= 0)
+   rc = sprintf(buf, "%d\n", rc);
+   } else {
rc = sprintf(buf, "%d\n", bd->props.brightness);
+   }
mutex_unlock(&bd->ops_lock);
 
return rc;
@@ -381,9 +384,18 @@ ATTRIBUTE_GROUPS(bl_device);
 void backlight_force_update(struct backlight_device *bd,
enum backlight_update_reason reason)
 {
+   int brightness;
+
mutex_lock(&bd->ops_lock);
-   if (bd->ops && bd->ops->get_brightness)
-   bd->props.brightness = bd->ops->get_brightness(bd);
+   if (bd->ops && bd->ops->get_brightness) {
+   brightness = bd->ops->get_brightness(bd);
+   if (brightness >= 0)
+   bd->props.brightness = brightness;
+   else
+   dev_warn(&bd->dev,
+"Could not update brightness from device: 
errno = %d",
+-brightness);
+   }
mutex_unlock(&bd->ops_lock);
backlight_generate_event(bd, reason);
 }

base-commit: 79fad92f2e596f5a8dd085788a24f540263ef887
-- 
2.33.0



[PATCH RESEND] drm/rockchip: cdn-dp-core: Fix cdn_dp_resume unused warning

2021-09-06 Thread Palmer Dabbelt
From: Palmer Dabbelt 

cdn_dp_resume is only used under PM_SLEEP, and now that it's static an
unused function warning is triggered undner !PM_SLEEP.  This
conditionally enables the function to avoid the warning.

Fixes: 7c49abb4c2f8 ("drm/rockchip: cdn-dp-core: Make 
cdn_dp_core_suspend/resume static")
Signed-off-by: Palmer Dabbelt 
---
I sent this one out in January, but it looks like it got lost in the shuffle.
I'm getting this on a RISC-V allmodconfig now.
---
 drivers/gpu/drm/rockchip/cdn-dp-core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.c 
b/drivers/gpu/drm/rockchip/cdn-dp-core.c
index 8ab3247dbc4a..bee0f2d2a9be 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.c
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.c
@@ -1123,6 +1123,7 @@ static int cdn_dp_suspend(struct device *dev)
return ret;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int cdn_dp_resume(struct device *dev)
 {
struct cdn_dp_device *dp = dev_get_drvdata(dev);
@@ -1135,6 +1136,7 @@ static int cdn_dp_resume(struct device *dev)
 
return 0;
 }
+#endif
 
 static int cdn_dp_probe(struct platform_device *pdev)
 {
-- 
2.33.0.153.gba50c8fa24-goog



[PATCH] drm: panel: tl070wsh30: Add a single error handling block at the end of the function.

2021-09-06 Thread Cai Huoqing
A single error handling block at the end of the function could
be brought in to make code a little more concise.

Signed-off-by: Cai Huoqing 
---
 drivers/gpu/drm/panel/panel-tdo-tl070wsh30.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-tdo-tl070wsh30.c 
b/drivers/gpu/drm/panel/panel-tdo-tl070wsh30.c
index 820731be7147..5e54ef4b3a9c 100644
--- a/drivers/gpu/drm/panel/panel-tdo-tl070wsh30.c
+++ b/drivers/gpu/drm/panel/panel-tdo-tl070wsh30.c
@@ -59,8 +59,7 @@ static int tdo_tl070wsh30_panel_prepare(struct drm_panel 
*panel)
err = mipi_dsi_dcs_exit_sleep_mode(tdo_tl070wsh30->link);
if (err < 0) {
dev_err(panel->dev, "failed to exit sleep mode: %d\n", err);
-   regulator_disable(tdo_tl070wsh30->supply);
-   return err;
+   goto err_disable_reg;
}
 
msleep(200);
@@ -68,8 +67,7 @@ static int tdo_tl070wsh30_panel_prepare(struct drm_panel 
*panel)
err = mipi_dsi_dcs_set_display_on(tdo_tl070wsh30->link);
if (err < 0) {
dev_err(panel->dev, "failed to set display on: %d\n", err);
-   regulator_disable(tdo_tl070wsh30->supply);
-   return err;
+   goto err_disable_reg;
}
 
msleep(20);
@@ -77,6 +75,9 @@ static int tdo_tl070wsh30_panel_prepare(struct drm_panel 
*panel)
tdo_tl070wsh30->prepared = true;
 
return 0;
+err_disable_reg:
+   regulator_disable(tdo_tl070wsh30->supply);
+   return err;
 }
 
 static int tdo_tl070wsh30_panel_unprepare(struct drm_panel *panel)
-- 
2.25.1



[PATCH] backlight: l4f00242t03: Add a single error handling block at the end of the function.

2021-09-06 Thread Cai Huoqing
A single error handling block at the end of the function could
be brought in to make code a little more concise.

Signed-off-by: Cai Huoqing 
---
 drivers/video/backlight/l4f00242t03.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/video/backlight/l4f00242t03.c 
b/drivers/video/backlight/l4f00242t03.c
index 46f97d1c3d21..c6f540f1b507 100644
--- a/drivers/video/backlight/l4f00242t03.c
+++ b/drivers/video/backlight/l4f00242t03.c
@@ -65,14 +65,12 @@ static void l4f00242t03_lcd_init(struct spi_device *spi)
ret = regulator_set_voltage(priv->core_reg, 280, 280);
if (ret) {
dev_err(&spi->dev, "failed to set the core regulator 
voltage.\n");
-   regulator_disable(priv->io_reg);
-   return;
+   goto err_disable_reg;
}
ret = regulator_enable(priv->core_reg);
if (ret) {
dev_err(&spi->dev, "failed to enable the core regulator.\n");
-   regulator_disable(priv->io_reg);
-   return;
+   goto err_disable_reg;
}
 
l4f00242t03_reset(priv->reset);
@@ -80,6 +78,10 @@ static void l4f00242t03_lcd_init(struct spi_device *spi)
gpiod_set_value(priv->enable, 1);
msleep(60);
spi_write(spi, (const u8 *)cmd, ARRAY_SIZE(cmd) * sizeof(u16));
+   return;
+
+err_disable_reg:
+   regulator_disable(priv->io_reg);
 }
 
 static void l4f00242t03_lcd_powerdown(struct spi_device *spi)
-- 
2.25.1



Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-06 Thread Christian König

Added a Fixes tag and pushed this to drm-misc-fixes.

It will take a while until it cycles back into the development branches, 
so feel free to push some version to amd-staging-drm-next as well. Just 
ping Alex when you do this.


Thanks,
Christian.

Am 07.09.21 um 06:08 schrieb xinhui pan:

The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.

ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
be stuck as we actually did not free any BO memory. This usually happens
when the fence is not signaled for a long time.

Signed-off-by: xinhui pan 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 8d7fd65ccced..23f906941ac9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
}
  
  	if (bo->deleted) {

-   ttm_bo_cleanup_refs(bo, false, false, locked);
+   ret = ttm_bo_cleanup_refs(bo, false, false, locked);
ttm_bo_put(bo);
-   return 0;
+   return ret == -EBUSY ? -ENOSPC : ret;
}
  
  	ttm_bo_del_from_lru(bo);

@@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
if (locked)
dma_resv_unlock(bo->base.resv);
ttm_bo_put(bo);
-   return ret;
+   return ret == -EBUSY ? -ENOSPC : ret;
  }
  
  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)




[resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-06 Thread xinhui pan
The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.

ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
be stuck as we actually did not free any BO memory. This usually happens
when the fence is not signaled for a long time.

Signed-off-by: xinhui pan 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 8d7fd65ccced..23f906941ac9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
}
 
if (bo->deleted) {
-   ttm_bo_cleanup_refs(bo, false, false, locked);
+   ret = ttm_bo_cleanup_refs(bo, false, false, locked);
ttm_bo_put(bo);
-   return 0;
+   return ret == -EBUSY ? -ENOSPC : ret;
}
 
ttm_bo_del_from_lru(bo);
@@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
if (locked)
dma_resv_unlock(bo->base.resv);
ttm_bo_put(bo);
-   return ret;
+   return ret == -EBUSY ? -ENOSPC : ret;
 }
 
 void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
-- 
2.25.1



[PATCH v2] drm/tidss: Make use of the helper macro SET_RUNTIME_PM_OPS()

2021-09-06 Thread Cai Huoqing
Use the helper macro SET_RUNTIME_PM_OPS() instead of the verbose
operators ".runtime_suspend/.runtime_resume", because the
SET_RUNTIME_PM_OPS() is a nice helper macro that could be brought
in to make code a little more concise.

Signed-off-by: Cai Huoqing 
---
v1->v2: *Remove "#ifdef CONFIG_PM" around around runtime_suspend|resume().
*Make use of pm_ptr() in the assignment in tidss_platform_driver.

v1 comments link:
https://www.spinics.net/lists/dri-devel/msg313178.html
 
drivers/gpu/drm/tidss/tidss_drv.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/tidss/tidss_drv.c 
b/drivers/gpu/drm/tidss/tidss_drv.c
index d620f35688da..4366b5c798e0 100644
--- a/drivers/gpu/drm/tidss/tidss_drv.c
+++ b/drivers/gpu/drm/tidss/tidss_drv.c
@@ -88,16 +88,11 @@ static int __maybe_unused tidss_resume(struct device *dev)
return drm_mode_config_helper_resume(&tidss->ddev);
 }
 
-#ifdef CONFIG_PM
-
 static const struct dev_pm_ops tidss_pm_ops = {
-   .runtime_suspend = tidss_pm_runtime_suspend,
-   .runtime_resume = tidss_pm_runtime_resume,
SET_SYSTEM_SLEEP_PM_OPS(tidss_suspend, tidss_resume)
+   SET_RUNTIME_PM_OPS(tidss_pm_runtime_suspend, tidss_pm_runtime_resume, 
NULL)
 };
 
-#endif /* CONFIG_PM */
-
 /* DRM device Information */
 
 static void tidss_release(struct drm_device *ddev)
@@ -250,9 +245,7 @@ static struct platform_driver tidss_platform_driver = {
.shutdown   = tidss_shutdown,
.driver = {
.name   = "tidss",
-#ifdef CONFIG_PM
-   .pm = &tidss_pm_ops,
-#endif
+   .pm = pm_ptr(&tidss_pm_ops),
.of_match_table = tidss_of_table,
.suppress_bind_attrs = true,
},
-- 
2.25.1



[PATCH] drm: mxsfb: Fix NULL pointer dereference crash on unload

2021-09-06 Thread Marek Vasut
The mxsfb->crtc.funcs may already be NULL when unloading the driver,
in which case calling mxsfb_irq_disable() via drm_irq_uninstall() from
mxsfb_unload() leads to NULL pointer dereference.

Since all we care about is masking the IRQ and mxsfb->base is still
valid, just use that to clear and mask the IRQ.

Fixes: ae1ed00932819 ("drm: mxsfb: Stop using DRM simple display pipeline 
helper")
Signed-off-by: Marek Vasut 
Cc: Daniel Abrecht 
Cc: Emil Velikov 
Cc: Laurent Pinchart 
Cc: Sam Ravnborg 
Cc: Stefan Agner 
---
 drivers/gpu/drm/mxsfb/mxsfb_drv.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/mxsfb/mxsfb_drv.c 
b/drivers/gpu/drm/mxsfb/mxsfb_drv.c
index ec0432fe1bdf8..86d78634a9799 100644
--- a/drivers/gpu/drm/mxsfb/mxsfb_drv.c
+++ b/drivers/gpu/drm/mxsfb/mxsfb_drv.c
@@ -173,7 +173,11 @@ static void mxsfb_irq_disable(struct drm_device *drm)
struct mxsfb_drm_private *mxsfb = drm->dev_private;
 
mxsfb_enable_axi_clk(mxsfb);
-   mxsfb->crtc.funcs->disable_vblank(&mxsfb->crtc);
+
+   /* Disable and clear VBLANK IRQ */
+   writel(CTRL1_CUR_FRAME_DONE_IRQ_EN, mxsfb->base + LCDC_CTRL1 + REG_CLR);
+   writel(CTRL1_CUR_FRAME_DONE_IRQ, mxsfb->base + LCDC_CTRL1 + REG_CLR);
+
mxsfb_disable_axi_clk(mxsfb);
 }
 
-- 
2.33.0



[PATCH] drm/bridge: ti-sn65dsi83: Implement .detach callback

2021-09-06 Thread Marek Vasut
Move detach implementation from sn65dsi83_remove() to dedicated
.detach callback. There is no functional change to the code, but
that detach is now in the correct location.

Signed-off-by: Marek Vasut 
Cc: Jagan Teki 
Cc: Laurent Pinchart 
Cc: Linus Walleij 
Cc: Robert Foss 
Cc: Sam Ravnborg 
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/bridge/ti-sn65dsi83.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi83.c 
b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
index 4ea71d7f0bfbc..13ee313daba96 100644
--- a/drivers/gpu/drm/bridge/ti-sn65dsi83.c
+++ b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
@@ -288,6 +288,19 @@ static int sn65dsi83_attach(struct drm_bridge *bridge,
return ret;
 }
 
+static void sn65dsi83_detach(struct drm_bridge *bridge)
+{
+   struct sn65dsi83 *ctx = bridge_to_sn65dsi83(bridge);
+
+   if (!ctx->dsi)
+   return;
+
+   mipi_dsi_detach(ctx->dsi);
+   mipi_dsi_device_unregister(ctx->dsi);
+   drm_bridge_remove(&ctx->bridge);
+   ctx->dsi = NULL;
+}
+
 static void sn65dsi83_atomic_pre_enable(struct drm_bridge *bridge,
struct drm_bridge_state 
*old_bridge_state)
 {
@@ -588,6 +601,7 @@ sn65dsi83_atomic_get_input_bus_fmts(struct drm_bridge 
*bridge,
 
 static const struct drm_bridge_funcs sn65dsi83_funcs = {
.attach = sn65dsi83_attach,
+   .detach = sn65dsi83_detach,
.atomic_pre_enable  = sn65dsi83_atomic_pre_enable,
.atomic_enable  = sn65dsi83_atomic_enable,
.atomic_disable = sn65dsi83_atomic_disable,
@@ -702,9 +716,6 @@ static int sn65dsi83_remove(struct i2c_client *client)
 {
struct sn65dsi83 *ctx = i2c_get_clientdata(client);
 
-   mipi_dsi_detach(ctx->dsi);
-   mipi_dsi_device_unregister(ctx->dsi);
-   drm_bridge_remove(&ctx->bridge);
of_node_put(ctx->host_node);
 
return 0;
-- 
2.33.0



[PATCH] drm/bridge: ti-sn65dsi83: Check link status register after enabling the bridge

2021-09-06 Thread Marek Vasut
In rare cases, the bridge may not start up correctly, which usually
leads to no display output. In case this happens, warn about it in
the kernel log.

Signed-off-by: Marek Vasut 
Cc: Jagan Teki 
Cc: Laurent Pinchart 
Cc: Linus Walleij 
Cc: Robert Foss 
Cc: Sam Ravnborg 
Cc: dri-devel@lists.freedesktop.org
---
NOTE: See the following:
https://e2e.ti.com/support/interface-group/interface/f/interface-forum/942005/sn65dsi83-dsi83-lvds-bridge---sporadic-behavior---no-video
https://community.nxp.com/t5/i-MX-Processors/i-MX8M-MIPI-DSI-Interface-LVDS-Bridge-Initialization/td-p/1156533
---
 drivers/gpu/drm/bridge/ti-sn65dsi83.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi83.c 
b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
index a32f70bc68ea4..4ea71d7f0bfbc 100644
--- a/drivers/gpu/drm/bridge/ti-sn65dsi83.c
+++ b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
@@ -520,6 +520,11 @@ static void sn65dsi83_atomic_enable(struct drm_bridge 
*bridge,
/* Clear all errors that got asserted during initialization. */
regmap_read(ctx->regmap, REG_IRQ_STAT, &pval);
regmap_write(ctx->regmap, REG_IRQ_STAT, pval);
+
+   usleep_range(1, 12000);
+   regmap_read(ctx->regmap, REG_IRQ_STAT, &pval);
+   if (pval)
+   dev_err(ctx->dev, "Unexpected link status 0x%02x\n", pval);
 }
 
 static void sn65dsi83_atomic_disable(struct drm_bridge *bridge,
-- 
2.33.0



[Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

2021-09-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #53 from Anthony Rabbito (ted...@gmail.com) ---
Thanks for chiming in James! Few things I've observed since adding 'pci=noats'
the graphic artifacts seem to happen way less. I did observe one lockup which
required me to hard shut down the computer. This was a wake from suspend
scenario. 

I used to deal with somwhat similar issues here --
https://bugs.freedesktop.org/show_bug.cgi?id=110674 not sure if that's of any
use. Let me know if a fresh bug is warranted.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[drm:i915-display-struct-refactor 22/25] drivers/gpu/drm/i915/gvt/handlers.c:510:34: error: no member named 'dpll' in 'struct drm_i915_private'

2021-09-06 Thread kernel test robot
tree:   git://people.freedesktop.org/~airlied/linux.git 
i915-display-struct-refactor
head:   e183b125871ffdd77b6de15a963e6fc8a47173c9
commit: 5b99cab055595d1b12d7425e560b5a9fcd15c9a3 [22/25] drm/i915/display: move 
dpll struct into display
config: x86_64-randconfig-a016-20210906 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 
6fe2beba7d2a41964af658c8c59dd172683ef739)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git remote add drm git://people.freedesktop.org/~airlied/linux.git
git fetch --no-tags drm i915-display-struct-refactor
git checkout 5b99cab055595d1b12d7425e560b5a9fcd15c9a3
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir 
ARCH=x86_64 SHELL=/bin/bash drivers/gpu/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/i915/gvt/handlers.c:510:34: error: no member named 'dpll' in 
>> 'struct drm_i915_private'
   refclk = vgpu->gvt->gt->i915->dpll.ref_clks.ssc;
~~~  ^
   drivers/gpu/drm/i915/gvt/handlers.c:541:36: error: no member named 'dpll' in 
'struct drm_i915_private'
   int refclk = vgpu->gvt->gt->i915->dpll.ref_clks.nssc;
~~~  ^
   2 errors generated.


vim +510 drivers/gpu/drm/i915/gvt/handlers.c

04d348ae3f0aea Zhi Wang 2016-04-25  446  
6a4500c7b83f9e Colin Xu 2021-02-26  447  /*
6a4500c7b83f9e Colin Xu 2021-02-26  448   * Only PIPE_A is enabled in current 
vGPU display and PIPE_A is tied to
6a4500c7b83f9e Colin Xu 2021-02-26  449   *   TRANSCODER_A in HW. DDI/PORT 
could be PORT_x depends on
6a4500c7b83f9e Colin Xu 2021-02-26  450   *   setup_virtual_dp_monitor().
6a4500c7b83f9e Colin Xu 2021-02-26  451   * emulate_monitor_status_change() set 
up PLL for PORT_x as the initial enabled
6a4500c7b83f9e Colin Xu 2021-02-26  452   *   DPLL. Later guest driver may 
setup a different DPLLx when setting mode.
6a4500c7b83f9e Colin Xu 2021-02-26  453   * So the correct sequence to find DP 
stream clock is:
6a4500c7b83f9e Colin Xu 2021-02-26  454   *   Check TRANS_DDI_FUNC_CTL on 
TRANSCODER_A to get PORT_x.
6a4500c7b83f9e Colin Xu 2021-02-26  455   *   Check correct PLLx for PORT_x to 
get PLL frequency and DP bitrate.
6a4500c7b83f9e Colin Xu 2021-02-26  456   * Then Refresh rate then can be 
calculated based on follow equations:
6a4500c7b83f9e Colin Xu 2021-02-26  457   *   Pixel clock = h_total * v_total * 
refresh_rate
6a4500c7b83f9e Colin Xu 2021-02-26  458   *   stream clock = Pixel clock
6a4500c7b83f9e Colin Xu 2021-02-26  459   *   ls_clk = DP bitrate
6a4500c7b83f9e Colin Xu 2021-02-26  460   *   Link M/N = strm_clk / ls_clk
6a4500c7b83f9e Colin Xu 2021-02-26  461   */
6a4500c7b83f9e Colin Xu 2021-02-26  462  
6a4500c7b83f9e Colin Xu 2021-02-26  463  static u32 
bdw_vgpu_get_dp_bitrate(struct intel_vgpu *vgpu, enum port port)
6a4500c7b83f9e Colin Xu 2021-02-26  464  {
6a4500c7b83f9e Colin Xu 2021-02-26  465 u32 dp_br = 0;
6a4500c7b83f9e Colin Xu 2021-02-26  466 u32 ddi_pll_sel = 
vgpu_vreg_t(vgpu, PORT_CLK_SEL(port));
6a4500c7b83f9e Colin Xu 2021-02-26  467  
6a4500c7b83f9e Colin Xu 2021-02-26  468 switch (ddi_pll_sel) {
6a4500c7b83f9e Colin Xu 2021-02-26  469 case PORT_CLK_SEL_LCPLL_2700:
6a4500c7b83f9e Colin Xu 2021-02-26  470 dp_br = 27 * 2;
6a4500c7b83f9e Colin Xu 2021-02-26  471 break;
6a4500c7b83f9e Colin Xu 2021-02-26  472 case PORT_CLK_SEL_LCPLL_1350:
6a4500c7b83f9e Colin Xu 2021-02-26  473 dp_br = 135000 * 2;
6a4500c7b83f9e Colin Xu 2021-02-26  474 break;
6a4500c7b83f9e Colin Xu 2021-02-26  475 case PORT_CLK_SEL_LCPLL_810:
6a4500c7b83f9e Colin Xu 2021-02-26  476 dp_br = 81000 * 2;
6a4500c7b83f9e Colin Xu 2021-02-26  477 break;
6a4500c7b83f9e Colin Xu 2021-02-26  478 case PORT_CLK_SEL_SPLL:
6a4500c7b83f9e Colin Xu 2021-02-26  479 {
6a4500c7b83f9e Colin Xu 2021-02-26  480 switch 
(vgpu_vreg_t(vgpu, SPLL_CTL) & SPLL_FREQ_MASK) {
6a4500c7b83f9e Colin Xu 2021-02-26  481 case SPLL_FREQ_810MHz:
6a4500c7b83f9e Colin Xu 2021-02-26  482 dp_br = 81000 * 
2;
6a4500c7b83f9e Colin Xu 2021-02-26  483 break;
6a4500c7b83f9e Colin Xu 2021-02-26  484 case SPLL_FREQ_1350MHz:
6a4500c7b83f9e Colin Xu 2021-02-26  485 dp_br = 135000 
* 2;
6a4500c7b83f9e Colin Xu 2021-02-26  486 break;
6a4500c7b83f9e

[Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

2021-09-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #52 from James Zhu (jam...@amd.com) ---
Created attachment 298691
  --> https://bugzilla.kernel.org/attachment.cgi?id=298691&action=edit
Fix for S3 hung issue

Hi Jerome and kolAflash,

I think iommu device init is put at wrong place during the resume. I attache a
patch. Please confirm if it works.
Thanks!
James

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

RE: [PATCH v2 1/2] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-06 Thread Pan, Xinhui
[AMD Official Use Only]

It is the internal staging drm-next.

-Original Message-
From: Koenig, Christian 
Sent: 2021年9月6日 19:26
To: Pan, Xinhui ; amd-...@lists.freedesktop.org
Cc: Deucher, Alexander ; che...@uniontech.com; 
dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 1/2] drm/ttm: Fix a deadlock if the target BO is not 
idle during swap

Which branch is this patch based on? Please rebase on top drm-misc-fixes and 
resend.

Thanks,
Christian.

Am 06.09.21 um 03:12 schrieb xinhui pan:
> The ret value might be -EBUSY, caller will think lru lock is still
> locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
> list corruption.
>
> ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
> caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout)
> will be stuck as we actually did not free any BO memory. This usually
> happens when the fence is not signaled for a long time.
>
> Signed-off-by: xinhui pan 
> Reviewed-by: Christian König 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> b/drivers/gpu/drm/ttm/ttm_bo.c index 1fedd0eb67ba..f1367107925b 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1159,9 +1159,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
> ttm_operation_ctx *ctx,
>   }
>
>   if (bo->deleted) {
> - ttm_bo_cleanup_refs(bo, false, false, locked);
> + ret = ttm_bo_cleanup_refs(bo, false, false, locked);
>   ttm_bo_put(bo);
> - return 0;
> + return ret == -EBUSY ? -ENOSPC : ret;
>   }
>
>   ttm_bo_move_to_pinned(bo);
> @@ -1215,7 +1215,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
> ttm_operation_ctx *ctx,
>   if (locked)
>   dma_resv_unlock(bo->base.resv);
>   ttm_bo_put(bo);
> - return ret;
> + return ret == -EBUSY ? -ENOSPC : ret;
>   }
>
>   void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)



Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-09-06 Thread Rob Clark
On Mon, Sep 6, 2021 at 12:58 PM Amit Pundir  wrote:
>
> On Mon, 6 Sept 2021 at 21:54, Rob Clark  wrote:
> >
> > On Mon, Sep 6, 2021 at 1:02 AM Amit Pundir  wrote:
> > >
> > > On Sat, 4 Sept 2021 at 01:55, Rob Clark  wrote:
> > > >
> > > > On Fri, Sep 3, 2021 at 12:39 PM John Stultz  
> > > > wrote:
> > > > >
> > > > > On Thu, Jul 29, 2021 at 1:49 PM Rob Clark  wrote:
> > > > > > On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
> > > > > >  wrote:
> > > > > > > On 29/07/2021 21:24, Rob Clark wrote:
> > > > > > > > On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
> > > > > > > >  wrote:
> > > > > > > >>
> > > > > > > >> Hi Rob,
> > > > > > > >>
> > > > > > > >> I've done some more testing! It looks like before that patch 
> > > > > > > >> ("drm/msm: Devfreq tuning") the GPU would never get above
> > > > > > > >> the second frequency in the OPP table (342MHz) (at least, not 
> > > > > > > >> in glxgears). With the patch applied it would more
> > > > > > > >> aggressively jump up to the max frequency which seems to be 
> > > > > > > >> unstable at the default regulator voltages.
> > > > > > > >
> > > > > > > > *ohh*, yeah, ok, that would explain it
> > > > > > > >
> > > > > > > >> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 
> > > > > > > >> 0.988v (instead of the stock 0.516v) makes the GPU stable
> > > > > > > >> at the higher frequencies.
> > > > > > > >>
> > > > > > > >> Applying this patch reverts the behaviour, and the GPU never 
> > > > > > > >> goes above 342MHz in glxgears, losing ~30% performance in
> > > > > > > >> glxgear.
> > > > > > > >>
> > > > > > > >> I think (?) that enabling CPR support would be the proper 
> > > > > > > >> solution to this - that would ensure that the regulators run
> > > > > > > >> at the voltage the hardware needs to be stable.
> > > > > > > >>
> > > > > > > >> Is hacking the voltage higher (although ideally not quite that 
> > > > > > > >> high) an acceptable short term solution until we have
> > > > > > > >> CPR? Or would it be safer to just not make use of the higher 
> > > > > > > >> frequencies on a630 for now?
> > > > > > > >>
> > > > > > > >
> > > > > > > > tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is 
> > > > > > > > already
> > > > > > > > on CC and I added sboyd, maybe one of them knows better.
> > > > > > > >
> > > > > > > > In the short term, removing the higher problematic OPPs from 
> > > > > > > > dts might
> > > > > > > > be a better option than this patch (which I'm dropping), since 
> > > > > > > > there
> > > > > > > > is nothing stopping other workloads from hitting higher OPPs.
> > > > > > > Oh yeah that sounds like a more sensible workaround than mine .
> > > > > > > >
> > > > > > > > I'm slightly curious why I didn't have problems at higher OPPs 
> > > > > > > > on my
> > > > > > > > c630 laptop (sdm850)
> > > > > > > Perhaps you won the sillicon lottery - iirc sdm850 is binned for 
> > > > > > > higher clocks as is out of the factory.
> > > > > > >
> > > > > > > Would it be best to drop the OPPs for all devices? Or just those 
> > > > > > > affected? I guess it's possible another c630 might
> > > > > > > crash where yours doesn't?
> > > > > >
> > > > > > I've not heard any reports of similar issues from the handful of 
> > > > > > other
> > > > > > folks with c630's on #aarch64-laptops.. but I can't really say if 
> > > > > > that
> > > > > > is luck or not.
> > > > > >
> > > > > > Maybe just remove it for affected devices?  But I'll defer to Bjorn.
> > > > >
> > > > > Just as another datapoint, I was just marveling at how suddenly smooth
> > > > > the UI was performing on db845c and Caleb pointed me at the "drm/msm:
> > > > > Devfreq tuning" patch as the likely cause of the improvement, and
> > > > > mid-discussion my board crashed into USB crash mode:
> > > > > [  146.157696][C0] adreno 500.gpu: CP | AHB bus error
> > > > > [  146.163303][C0] adreno 500.gpu: CP | AHB bus error
> > > > > [  146.168837][C0] adreno 500.gpu: RBBM | ATB bus overflow
> > > > > [  146.174960][C0] adreno 500.gpu: CP | HW fault | 
> > > > > status=0x
> > > > > [  146.181917][C0] adreno 500.gpu: CP | AHB bus error
> > > > > [  146.187547][C0] adreno 500.gpu: CP illegal instruction 
> > > > > error
> > > > > [  146.194009][C0] adreno 500.gpu: CP | AHB bus error
> > > > > [  146.308909][T9] Internal error: synchronous external abort:
> > > > > 9610 [#1] PREEMPT SMP
> > > > > [  146.317150][T9] Modules linked in:
> > > > > [  146.320941][T9] CPU: 3 PID: 9 Comm: kworker/u16:1 Tainted: G
> > > > > W 5.14.0-mainline-06795-g42b258c2275c #24
> > > > > [  146.331974][T9] Hardware name: Thundercomm Dragonboar
> > > > > Format: Log Type - Time(microsec) - Message - Optional Info
> > > > > Log Type: B - Since Boot(Power On Reset),  D - Delta,  S - Statistic
> > > > > S - QC_IMAGE_VERSION_STRING=BOOT.XF.2.0-00371-SDM845LZB-1
> > > > > S - IMAGE_VARIANT_STRING

Re: [PATCH 1/2] drm/nouveau/ga102-: support ttm buffer moves via copy engine

2021-09-06 Thread Ben Skeggs
On Tue, 7 Sept 2021 at 10:28, Karol Herbst  wrote:
>
> On Tue, Sep 7, 2021 at 1:28 AM Ben Skeggs  wrote:
> >
> > On Tue, 7 Sept 2021 at 09:17, Karol Herbst  wrote:
> > >
> > > ."
> > >
> > >
> > > On Mon, Sep 6, 2021 at 2:56 AM Ben Skeggs  wrote:
> > > >
> > > > From: Ben Skeggs 
> > > >
> > > > We don't currently have any kind of real acceleration on Ampere GPUs,
> > > > but the TTM memcpy() fallback paths aren't really designed to handle
> > > > copies between different devices, such as on Optimus systems, and
> > > > result in a kernel OOPS.
> > > >
> > > > A few options were investigated to try and fix this, but didn't work
> > > > out, and likely would have resulted in a very unpleasant experience
> > > > for users anyway.
> > > >
> > > > This commit adds just enough support for setting up a single channel
> > > > connected to a copy engine, which the kernel can use to accelerate
> > > > the buffer copies between devices.  Userspace has no access to this
> > > > incomplete channel support, but it's suitable for TTM's needs.
> > > >
> > > > A more complete implementation of host(fifo) for Ampere GPUs is in
> > > > the works, but the required changes are far too invasive that they
> > > > would be unsuitable to backport to fix this issue on current kernels.
> > > >
> > > > Signed-off-by: Ben Skeggs 
> > > > Cc: Lyude Paul 
> > > > Cc: Karol Herbst 
> > > > Cc:  # v5.12+
> > > > ---
> > > >  drivers/gpu/drm/nouveau/include/nvif/class.h  |   2 +
> > > >  .../drm/nouveau/include/nvkm/engine/fifo.h|   1 +
> > > >  drivers/gpu/drm/nouveau/nouveau_bo.c  |   1 +
> > > >  drivers/gpu/drm/nouveau/nouveau_chan.c|   6 +-
> > > >  drivers/gpu/drm/nouveau/nouveau_drm.c |   4 +
> > > >  drivers/gpu/drm/nouveau/nv84_fence.c  |   2 +-
> > > >  .../gpu/drm/nouveau/nvkm/engine/device/base.c |   3 +
> > > >  .../gpu/drm/nouveau/nvkm/engine/fifo/Kbuild   |   1 +
> > > >  .../gpu/drm/nouveau/nvkm/engine/fifo/ga102.c  | 308 ++
> > > >  .../gpu/drm/nouveau/nvkm/subdev/top/ga100.c   |   7 +-
> > > >  10 files changed, 329 insertions(+), 6 deletions(-)
> > > >  create mode 100644 drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga102.c
> > > >
> > > > diff --git a/drivers/gpu/drm/nouveau/include/nvif/class.h 
> > > > b/drivers/gpu/drm/nouveau/include/nvif/class.h
> > > > index c68cc957248e..a582c0cb0cb0 100644
> > > > --- a/drivers/gpu/drm/nouveau/include/nvif/class.h
> > > > +++ b/drivers/gpu/drm/nouveau/include/nvif/class.h
> > > > @@ -71,6 +71,7 @@
> > > >  #define PASCAL_CHANNEL_GPFIFO_A   /* cla06f.h */ 
> > > > 0xc06f
> > > >  #define VOLTA_CHANNEL_GPFIFO_A/* clc36f.h */ 
> > > > 0xc36f
> > > >  #define TURING_CHANNEL_GPFIFO_A   /* clc36f.h */ 
> > > > 0xc46f
> > > > +#define AMPERE_CHANNEL_GPFIFO_B   /* clc36f.h */ 
> > > > 0xc76f
> > > >
> > > >  #define NV50_DISP /* cl5070.h */ 
> > > > 0x5070
> > > >  #define G82_DISP  /* cl5070.h */ 
> > > > 0x8270
> > > > @@ -200,6 +201,7 @@
> > > >  #define PASCAL_DMA_COPY_B
> > > > 0xc1b5
> > > >  #define VOLTA_DMA_COPY_A 
> > > > 0xc3b5
> > > >  #define TURING_DMA_COPY_A
> > > > 0xc5b5
> > > > +#define AMPERE_DMA_COPY_B
> > > > 0xc7b5
> > > >
> > > >  #define FERMI_DECOMPRESS 
> > > > 0x90b8
> > > >
> > > > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h 
> > > > b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> > > > index 54fab7cc36c1..64ee82c7c1be 100644
> > > > --- a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> > > > +++ b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> > > > @@ -77,4 +77,5 @@ int gp100_fifo_new(struct nvkm_device *, enum 
> > > > nvkm_subdev_type, int inst, struct
> > > >  int gp10b_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int 
> > > > inst, struct nvkm_fifo **);
> > > >  int gv100_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int 
> > > > inst, struct nvkm_fifo **);
> > > >  int tu102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int 
> > > > inst, struct nvkm_fifo **);
> > > > +int ga102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int 
> > > > inst, struct nvkm_fifo **);
> > > >  #endif
> > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
> > > > b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > > index 4a7cebac8060..b3e4f555fa05 100644
> > > > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > > @@ -844,6 +844,7 @@ nouveau_bo_move_init(struct nouveau_drm *drm)
> > > > struct ttm_resource *, struct ttm_resource 
> > > > *);
> > > > int (

Re: [PATCH 1/2] drm/nouveau/ga102-: support ttm buffer moves via copy engine

2021-09-06 Thread Karol Herbst
On Tue, Sep 7, 2021 at 1:28 AM Ben Skeggs  wrote:
>
> On Tue, 7 Sept 2021 at 09:17, Karol Herbst  wrote:
> >
> > ."
> >
> >
> > On Mon, Sep 6, 2021 at 2:56 AM Ben Skeggs  wrote:
> > >
> > > From: Ben Skeggs 
> > >
> > > We don't currently have any kind of real acceleration on Ampere GPUs,
> > > but the TTM memcpy() fallback paths aren't really designed to handle
> > > copies between different devices, such as on Optimus systems, and
> > > result in a kernel OOPS.
> > >
> > > A few options were investigated to try and fix this, but didn't work
> > > out, and likely would have resulted in a very unpleasant experience
> > > for users anyway.
> > >
> > > This commit adds just enough support for setting up a single channel
> > > connected to a copy engine, which the kernel can use to accelerate
> > > the buffer copies between devices.  Userspace has no access to this
> > > incomplete channel support, but it's suitable for TTM's needs.
> > >
> > > A more complete implementation of host(fifo) for Ampere GPUs is in
> > > the works, but the required changes are far too invasive that they
> > > would be unsuitable to backport to fix this issue on current kernels.
> > >
> > > Signed-off-by: Ben Skeggs 
> > > Cc: Lyude Paul 
> > > Cc: Karol Herbst 
> > > Cc:  # v5.12+
> > > ---
> > >  drivers/gpu/drm/nouveau/include/nvif/class.h  |   2 +
> > >  .../drm/nouveau/include/nvkm/engine/fifo.h|   1 +
> > >  drivers/gpu/drm/nouveau/nouveau_bo.c  |   1 +
> > >  drivers/gpu/drm/nouveau/nouveau_chan.c|   6 +-
> > >  drivers/gpu/drm/nouveau/nouveau_drm.c |   4 +
> > >  drivers/gpu/drm/nouveau/nv84_fence.c  |   2 +-
> > >  .../gpu/drm/nouveau/nvkm/engine/device/base.c |   3 +
> > >  .../gpu/drm/nouveau/nvkm/engine/fifo/Kbuild   |   1 +
> > >  .../gpu/drm/nouveau/nvkm/engine/fifo/ga102.c  | 308 ++
> > >  .../gpu/drm/nouveau/nvkm/subdev/top/ga100.c   |   7 +-
> > >  10 files changed, 329 insertions(+), 6 deletions(-)
> > >  create mode 100644 drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga102.c
> > >
> > > diff --git a/drivers/gpu/drm/nouveau/include/nvif/class.h 
> > > b/drivers/gpu/drm/nouveau/include/nvif/class.h
> > > index c68cc957248e..a582c0cb0cb0 100644
> > > --- a/drivers/gpu/drm/nouveau/include/nvif/class.h
> > > +++ b/drivers/gpu/drm/nouveau/include/nvif/class.h
> > > @@ -71,6 +71,7 @@
> > >  #define PASCAL_CHANNEL_GPFIFO_A   /* cla06f.h */ 
> > > 0xc06f
> > >  #define VOLTA_CHANNEL_GPFIFO_A/* clc36f.h */ 
> > > 0xc36f
> > >  #define TURING_CHANNEL_GPFIFO_A   /* clc36f.h */ 
> > > 0xc46f
> > > +#define AMPERE_CHANNEL_GPFIFO_B   /* clc36f.h */ 
> > > 0xc76f
> > >
> > >  #define NV50_DISP /* cl5070.h */ 
> > > 0x5070
> > >  #define G82_DISP  /* cl5070.h */ 
> > > 0x8270
> > > @@ -200,6 +201,7 @@
> > >  #define PASCAL_DMA_COPY_B
> > > 0xc1b5
> > >  #define VOLTA_DMA_COPY_A 
> > > 0xc3b5
> > >  #define TURING_DMA_COPY_A
> > > 0xc5b5
> > > +#define AMPERE_DMA_COPY_B
> > > 0xc7b5
> > >
> > >  #define FERMI_DECOMPRESS 
> > > 0x90b8
> > >
> > > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h 
> > > b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> > > index 54fab7cc36c1..64ee82c7c1be 100644
> > > --- a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> > > +++ b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> > > @@ -77,4 +77,5 @@ int gp100_fifo_new(struct nvkm_device *, enum 
> > > nvkm_subdev_type, int inst, struct
> > >  int gp10b_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int 
> > > inst, struct nvkm_fifo **);
> > >  int gv100_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int 
> > > inst, struct nvkm_fifo **);
> > >  int tu102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int 
> > > inst, struct nvkm_fifo **);
> > > +int ga102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int 
> > > inst, struct nvkm_fifo **);
> > >  #endif
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
> > > b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > index 4a7cebac8060..b3e4f555fa05 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > @@ -844,6 +844,7 @@ nouveau_bo_move_init(struct nouveau_drm *drm)
> > > struct ttm_resource *, struct ttm_resource *);
> > > int (*init)(struct nouveau_channel *, u32 handle);
> > > } _methods[] = {
> > > +   {  "COPY", 4, 0xc7b5, nve0_bo_move_copy, 
> > > nve0_bo_move_init },
> >
> > so, I was looking at the COPY class headers and noticed something strange.
> >
> > "BYPASS_L

Re: [PATCH 2/2] drm/nouveau/kms/tu102-: delay enabling cursor until after assign_windows

2021-09-06 Thread Karol Herbst
On Mon, Sep 6, 2021 at 2:56 AM Ben Skeggs  wrote:
>
> From: Ben Skeggs 
>
> Prevent NVD core channel error code 67 occuring and hanging display,
> managed to reproduce on GA102 while testing suspend/resume scenarios.
>
> Required extension of earlier commit to fix interactions with EFI.
>

Reviewed-by: Karol Herbst 


> Fixes: e78b1b545c6c ("drm/nouveau/kms/nv50: workaround EFI GOP window channel 
> format differences").
> Signed-off-by: Ben Skeggs 
> Cc: Lyude Paul 
> Cc: Karol Herbst 
> Cc:  # v5.12+
> ---
>  drivers/gpu/drm/nouveau/dispnv50/head.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv50/head.c 
> b/drivers/gpu/drm/nouveau/dispnv50/head.c
> index f8438a886b64..c3c57be54e1c 100644
> --- a/drivers/gpu/drm/nouveau/dispnv50/head.c
> +++ b/drivers/gpu/drm/nouveau/dispnv50/head.c
> @@ -52,6 +52,7 @@ nv50_head_flush_clr(struct nv50_head *head,
>  void
>  nv50_head_flush_set_wndw(struct nv50_head *head, struct nv50_head_atom *asyh)
>  {
> +   if (asyh->set.curs   ) head->func->curs_set(head, asyh);
> if (asyh->set.olut   ) {
> asyh->olut.offset = nv50_lut_load(&head->olut,
>   asyh->olut.buffer,
> @@ -67,7 +68,6 @@ nv50_head_flush_set(struct nv50_head *head, struct 
> nv50_head_atom *asyh)
> if (asyh->set.view   ) head->func->view(head, asyh);
> if (asyh->set.mode   ) head->func->mode(head, asyh);
> if (asyh->set.core   ) head->func->core_set(head, asyh);
> -   if (asyh->set.curs   ) head->func->curs_set(head, asyh);
> if (asyh->set.base   ) head->func->base(head, asyh);
> if (asyh->set.ovly   ) head->func->ovly(head, asyh);
> if (asyh->set.dither ) head->func->dither  (head, asyh);
> --
> 2.31.1
>



Re: [PATCH v5 01/16] dt-bindings: mediatek: add vdosys1 RDMA definition for mt8195

2021-09-06 Thread Chun-Kuang Hu
Hi, Nancy:

Nancy.Lin  於 2021年9月6日 週一 下午3:15寫道:
>
> Add vdosys1 RDMA definition.
>
> Signed-off-by: Nancy.Lin 
> ---
>  .../display/mediatek/mediatek,mdp-rdma.yaml   | 77 +++
>  1 file changed, 77 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/mediatek/mediatek,mdp-rdma.yaml
>
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/mediatek,mdp-rdma.yaml 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,mdp-rdma.yaml
> new file mode 100644
> index ..3610093848e1
> --- /dev/null
> +++ 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,mdp-rdma.yaml

I've compared the rdma driver in mdp [1] with the rdma driver in
display [2], both are similar. The difference are like merge0 versus
merge5. So I would like both binding document are placed together. In
display folder? In media folder? In SoC folder? I've no idea which one
is better, but at lease put together.

[1] 
https://patchwork.kernel.org/project/linux-mediatek/patch/20210824100027.25989-6-moudy...@mediatek.com/
[2] 
https://patchwork.kernel.org/project/linux-mediatek/patch/20210906071539.12953-12-nancy@mediatek.com/

Regards,
Chun-Kuang.

> @@ -0,0 +1,77 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/display/mediatek/mediatek,mdp-rdma.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: mediatek display MDP RDMA
> +
> +maintainers:
> +  - CK Hu 
> +
> +description: |
> +  The mediatek display MDP RDMA stands for Read Direct Memory Access.
> +  It provides real time data to the back-end panel driver, such as DSI,
> +  DPI and DP_INTF.
> +  It contains one line buffer to store the sufficient pixel data.
> +  RDMA device node must be siblings to the central MMSYS_CONFIG node.
> +  For a description of the MMSYS_CONFIG binding, see
> +  Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml for 
> details.
> +
> +properties:
> +  compatible:
> +oneOf:
> +  - items:
> +  - const: mediatek,mt8195-vdo1-rdma
> +
> +  reg:
> +maxItems: 1
> +
> +  interrupts:
> +maxItems: 1
> +
> +  power-domains:
> +description: A phandle and PM domain specifier as defined by bindings of
> +  the power controller specified by phandle. See
> +  Documentation/devicetree/bindings/power/power-domain.yaml for details.
> +
> +  clocks:
> +items:
> +  - description: RDMA Clock
> +
> +  iommus:
> +description:
> +  This property should point to the respective IOMMU block with master 
> port as argument,
> +  see Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml for 
> details.
> +
> +  mediatek,gce-client-reg:
> +description:
> +  The register of display function block to be set by gce. There are 4 
> arguments,
> +  such as gce node, subsys id, offset and register size. The subsys id 
> that is
> +  mapping to the register of display function blocks is defined in the 
> gce header
> +  include/include/dt-bindings/gce/-gce.h of each chips.
> +$ref: /schemas/types.yaml#/definitions/phandle-array
> +maxItems: 1
> +
> +required:
> +  - compatible
> +  - reg
> +  - power-domains
> +  - clocks
> +  - iommus
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +
> +vdo1_rdma0: vdo1_rdma@1c104000 {
> +compatible = "mediatek,mt8195-vdo1-rdma";
> +reg = <0 0x1c104000 0 0x1000>;
> +interrupts = ;
> +clocks = <&vdosys1 CLK_VDO1_MDP_RDMA0>;
> +power-domains = <&spm MT8195_POWER_DOMAIN_VDOSYS1>;
> +iommus = <&iommu_vdo M4U_PORT_L2_MDP_RDMA0>;
> +mediatek,gce-client-reg = <&gce1 SUBSYS_1c10 0x4000 0x1000>;
> +};
> +
> --
> 2.18.0
>


Re: [PATCH 1/2] drm/nouveau/ga102-: support ttm buffer moves via copy engine

2021-09-06 Thread Ben Skeggs
On Tue, 7 Sept 2021 at 09:17, Karol Herbst  wrote:
>
> ."
>
>
> On Mon, Sep 6, 2021 at 2:56 AM Ben Skeggs  wrote:
> >
> > From: Ben Skeggs 
> >
> > We don't currently have any kind of real acceleration on Ampere GPUs,
> > but the TTM memcpy() fallback paths aren't really designed to handle
> > copies between different devices, such as on Optimus systems, and
> > result in a kernel OOPS.
> >
> > A few options were investigated to try and fix this, but didn't work
> > out, and likely would have resulted in a very unpleasant experience
> > for users anyway.
> >
> > This commit adds just enough support for setting up a single channel
> > connected to a copy engine, which the kernel can use to accelerate
> > the buffer copies between devices.  Userspace has no access to this
> > incomplete channel support, but it's suitable for TTM's needs.
> >
> > A more complete implementation of host(fifo) for Ampere GPUs is in
> > the works, but the required changes are far too invasive that they
> > would be unsuitable to backport to fix this issue on current kernels.
> >
> > Signed-off-by: Ben Skeggs 
> > Cc: Lyude Paul 
> > Cc: Karol Herbst 
> > Cc:  # v5.12+
> > ---
> >  drivers/gpu/drm/nouveau/include/nvif/class.h  |   2 +
> >  .../drm/nouveau/include/nvkm/engine/fifo.h|   1 +
> >  drivers/gpu/drm/nouveau/nouveau_bo.c  |   1 +
> >  drivers/gpu/drm/nouveau/nouveau_chan.c|   6 +-
> >  drivers/gpu/drm/nouveau/nouveau_drm.c |   4 +
> >  drivers/gpu/drm/nouveau/nv84_fence.c  |   2 +-
> >  .../gpu/drm/nouveau/nvkm/engine/device/base.c |   3 +
> >  .../gpu/drm/nouveau/nvkm/engine/fifo/Kbuild   |   1 +
> >  .../gpu/drm/nouveau/nvkm/engine/fifo/ga102.c  | 308 ++
> >  .../gpu/drm/nouveau/nvkm/subdev/top/ga100.c   |   7 +-
> >  10 files changed, 329 insertions(+), 6 deletions(-)
> >  create mode 100644 drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga102.c
> >
> > diff --git a/drivers/gpu/drm/nouveau/include/nvif/class.h 
> > b/drivers/gpu/drm/nouveau/include/nvif/class.h
> > index c68cc957248e..a582c0cb0cb0 100644
> > --- a/drivers/gpu/drm/nouveau/include/nvif/class.h
> > +++ b/drivers/gpu/drm/nouveau/include/nvif/class.h
> > @@ -71,6 +71,7 @@
> >  #define PASCAL_CHANNEL_GPFIFO_A   /* cla06f.h */ 
> > 0xc06f
> >  #define VOLTA_CHANNEL_GPFIFO_A/* clc36f.h */ 
> > 0xc36f
> >  #define TURING_CHANNEL_GPFIFO_A   /* clc36f.h */ 
> > 0xc46f
> > +#define AMPERE_CHANNEL_GPFIFO_B   /* clc36f.h */ 
> > 0xc76f
> >
> >  #define NV50_DISP /* cl5070.h */ 
> > 0x5070
> >  #define G82_DISP  /* cl5070.h */ 
> > 0x8270
> > @@ -200,6 +201,7 @@
> >  #define PASCAL_DMA_COPY_B
> > 0xc1b5
> >  #define VOLTA_DMA_COPY_A 
> > 0xc3b5
> >  #define TURING_DMA_COPY_A
> > 0xc5b5
> > +#define AMPERE_DMA_COPY_B
> > 0xc7b5
> >
> >  #define FERMI_DECOMPRESS 
> > 0x90b8
> >
> > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h 
> > b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> > index 54fab7cc36c1..64ee82c7c1be 100644
> > --- a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> > +++ b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> > @@ -77,4 +77,5 @@ int gp100_fifo_new(struct nvkm_device *, enum 
> > nvkm_subdev_type, int inst, struct
> >  int gp10b_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> > struct nvkm_fifo **);
> >  int gv100_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> > struct nvkm_fifo **);
> >  int tu102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> > struct nvkm_fifo **);
> > +int ga102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> > struct nvkm_fifo **);
> >  #endif
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
> > b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > index 4a7cebac8060..b3e4f555fa05 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > @@ -844,6 +844,7 @@ nouveau_bo_move_init(struct nouveau_drm *drm)
> > struct ttm_resource *, struct ttm_resource *);
> > int (*init)(struct nouveau_channel *, u32 handle);
> > } _methods[] = {
> > +   {  "COPY", 4, 0xc7b5, nve0_bo_move_copy, nve0_bo_move_init 
> > },
>
> so, I was looking at the COPY class headers and noticed something strange.
>
> "BYPASS_L2" was moved with MAXWELL_DMA_COPY_A from bit 11 to bit 20.
> It got split out to SRC_ (20) and DST_ (21) with PASCAL_DMA_COPY_A and
> got removed with AMPERE_DMA_COPY_A.
>
> Since MAXWELL_DMA_COPY_A bit 11 is FORCE_RMWDISABLE. I don't know if
> that causes any issues

Re: [PATCH 1/2] drm/nouveau/ga102-: support ttm buffer moves via copy engine

2021-09-06 Thread Karol Herbst
."


On Mon, Sep 6, 2021 at 2:56 AM Ben Skeggs  wrote:
>
> From: Ben Skeggs 
>
> We don't currently have any kind of real acceleration on Ampere GPUs,
> but the TTM memcpy() fallback paths aren't really designed to handle
> copies between different devices, such as on Optimus systems, and
> result in a kernel OOPS.
>
> A few options were investigated to try and fix this, but didn't work
> out, and likely would have resulted in a very unpleasant experience
> for users anyway.
>
> This commit adds just enough support for setting up a single channel
> connected to a copy engine, which the kernel can use to accelerate
> the buffer copies between devices.  Userspace has no access to this
> incomplete channel support, but it's suitable for TTM's needs.
>
> A more complete implementation of host(fifo) for Ampere GPUs is in
> the works, but the required changes are far too invasive that they
> would be unsuitable to backport to fix this issue on current kernels.
>
> Signed-off-by: Ben Skeggs 
> Cc: Lyude Paul 
> Cc: Karol Herbst 
> Cc:  # v5.12+
> ---
>  drivers/gpu/drm/nouveau/include/nvif/class.h  |   2 +
>  .../drm/nouveau/include/nvkm/engine/fifo.h|   1 +
>  drivers/gpu/drm/nouveau/nouveau_bo.c  |   1 +
>  drivers/gpu/drm/nouveau/nouveau_chan.c|   6 +-
>  drivers/gpu/drm/nouveau/nouveau_drm.c |   4 +
>  drivers/gpu/drm/nouveau/nv84_fence.c  |   2 +-
>  .../gpu/drm/nouveau/nvkm/engine/device/base.c |   3 +
>  .../gpu/drm/nouveau/nvkm/engine/fifo/Kbuild   |   1 +
>  .../gpu/drm/nouveau/nvkm/engine/fifo/ga102.c  | 308 ++
>  .../gpu/drm/nouveau/nvkm/subdev/top/ga100.c   |   7 +-
>  10 files changed, 329 insertions(+), 6 deletions(-)
>  create mode 100644 drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga102.c
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvif/class.h 
> b/drivers/gpu/drm/nouveau/include/nvif/class.h
> index c68cc957248e..a582c0cb0cb0 100644
> --- a/drivers/gpu/drm/nouveau/include/nvif/class.h
> +++ b/drivers/gpu/drm/nouveau/include/nvif/class.h
> @@ -71,6 +71,7 @@
>  #define PASCAL_CHANNEL_GPFIFO_A   /* cla06f.h */ 
> 0xc06f
>  #define VOLTA_CHANNEL_GPFIFO_A/* clc36f.h */ 
> 0xc36f
>  #define TURING_CHANNEL_GPFIFO_A   /* clc36f.h */ 
> 0xc46f
> +#define AMPERE_CHANNEL_GPFIFO_B   /* clc36f.h */ 
> 0xc76f
>
>  #define NV50_DISP /* cl5070.h */ 
> 0x5070
>  #define G82_DISP  /* cl5070.h */ 
> 0x8270
> @@ -200,6 +201,7 @@
>  #define PASCAL_DMA_COPY_B
> 0xc1b5
>  #define VOLTA_DMA_COPY_A 
> 0xc3b5
>  #define TURING_DMA_COPY_A
> 0xc5b5
> +#define AMPERE_DMA_COPY_B
> 0xc7b5
>
>  #define FERMI_DECOMPRESS 
> 0x90b8
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h 
> b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> index 54fab7cc36c1..64ee82c7c1be 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> @@ -77,4 +77,5 @@ int gp100_fifo_new(struct nvkm_device *, enum 
> nvkm_subdev_type, int inst, struct
>  int gp10b_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> struct nvkm_fifo **);
>  int gv100_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> struct nvkm_fifo **);
>  int tu102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> struct nvkm_fifo **);
> +int ga102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> struct nvkm_fifo **);
>  #endif
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
> b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 4a7cebac8060..b3e4f555fa05 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -844,6 +844,7 @@ nouveau_bo_move_init(struct nouveau_drm *drm)
> struct ttm_resource *, struct ttm_resource *);
> int (*init)(struct nouveau_channel *, u32 handle);
> } _methods[] = {
> +   {  "COPY", 4, 0xc7b5, nve0_bo_move_copy, nve0_bo_move_init },

so, I was looking at the COPY class headers and noticed something strange.

"BYPASS_L2" was moved with MAXWELL_DMA_COPY_A from bit 11 to bit 20.
It got split out to SRC_ (20) and DST_ (21) with PASCAL_DMA_COPY_A and
got removed with AMPERE_DMA_COPY_A.

Since MAXWELL_DMA_COPY_A bit 11 is FORCE_RMWDISABLE. I don't know if
that causes any issues, I just noticed this while comparing the copy
class headers.

> {  "COPY", 4, 0xc5b5, nve0_bo_move_copy, nve0_bo_move_init },
> {  "GRCE", 0, 0xc5b5, nve0_bo_move_copy, nvc0_bo_move_init },
> {  "COPY", 4, 0xc3b5, nve0_bo_mo

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-09-06 Thread Rob Clark
On Mon, Sep 6, 2021 at 1:50 PM Rob Clark  wrote:
>
> On Mon, Sep 6, 2021 at 12:58 PM Amit Pundir  wrote:
> >
> > On Mon, 6 Sept 2021 at 21:54, Rob Clark  wrote:
> > >
> > > On Mon, Sep 6, 2021 at 1:02 AM Amit Pundir  wrote:
> > > >
> > > > On Sat, 4 Sept 2021 at 01:55, Rob Clark  wrote:
> > > > >
> > > > > On Fri, Sep 3, 2021 at 12:39 PM John Stultz  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Jul 29, 2021 at 1:49 PM Rob Clark  
> > > > > > wrote:
> > > > > > > On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
> > > > > > >  wrote:
> > > > > > > > On 29/07/2021 21:24, Rob Clark wrote:
> > > > > > > > > On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
> > > > > > > > >  wrote:
> > > > > > > > >>
> > > > > > > > >> Hi Rob,
> > > > > > > > >>
> > > > > > > > >> I've done some more testing! It looks like before that patch 
> > > > > > > > >> ("drm/msm: Devfreq tuning") the GPU would never get above
> > > > > > > > >> the second frequency in the OPP table (342MHz) (at least, 
> > > > > > > > >> not in glxgears). With the patch applied it would more
> > > > > > > > >> aggressively jump up to the max frequency which seems to be 
> > > > > > > > >> unstable at the default regulator voltages.
> > > > > > > > >
> > > > > > > > > *ohh*, yeah, ok, that would explain it
> > > > > > > > >
> > > > > > > > >> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up 
> > > > > > > > >> to 0.988v (instead of the stock 0.516v) makes the GPU stable
> > > > > > > > >> at the higher frequencies.
> > > > > > > > >>
> > > > > > > > >> Applying this patch reverts the behaviour, and the GPU never 
> > > > > > > > >> goes above 342MHz in glxgears, losing ~30% performance in
> > > > > > > > >> glxgear.
> > > > > > > > >>
> > > > > > > > >> I think (?) that enabling CPR support would be the proper 
> > > > > > > > >> solution to this - that would ensure that the regulators run
> > > > > > > > >> at the voltage the hardware needs to be stable.
> > > > > > > > >>
> > > > > > > > >> Is hacking the voltage higher (although ideally not quite 
> > > > > > > > >> that high) an acceptable short term solution until we have
> > > > > > > > >> CPR? Or would it be safer to just not make use of the higher 
> > > > > > > > >> frequencies on a630 for now?
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > > tbh, I'm not sure about the regulator stuff and CPR.. Bjorn 
> > > > > > > > > is already
> > > > > > > > > on CC and I added sboyd, maybe one of them knows better.
> > > > > > > > >
> > > > > > > > > In the short term, removing the higher problematic OPPs from 
> > > > > > > > > dts might
> > > > > > > > > be a better option than this patch (which I'm dropping), 
> > > > > > > > > since there
> > > > > > > > > is nothing stopping other workloads from hitting higher OPPs.
> > > > > > > > Oh yeah that sounds like a more sensible workaround than mine .
> > > > > > > > >
> > > > > > > > > I'm slightly curious why I didn't have problems at higher 
> > > > > > > > > OPPs on my
> > > > > > > > > c630 laptop (sdm850)
> > > > > > > > Perhaps you won the sillicon lottery - iirc sdm850 is binned 
> > > > > > > > for higher clocks as is out of the factory.
> > > > > > > >
> > > > > > > > Would it be best to drop the OPPs for all devices? Or just 
> > > > > > > > those affected? I guess it's possible another c630 might
> > > > > > > > crash where yours doesn't?
> > > > > > >
> > > > > > > I've not heard any reports of similar issues from the handful of 
> > > > > > > other
> > > > > > > folks with c630's on #aarch64-laptops.. but I can't really say if 
> > > > > > > that
> > > > > > > is luck or not.
> > > > > > >
> > > > > > > Maybe just remove it for affected devices?  But I'll defer to 
> > > > > > > Bjorn.
> > > > > >
> > > > > > Just as another datapoint, I was just marveling at how suddenly 
> > > > > > smooth
> > > > > > the UI was performing on db845c and Caleb pointed me at the 
> > > > > > "drm/msm:
> > > > > > Devfreq tuning" patch as the likely cause of the improvement, and
> > > > > > mid-discussion my board crashed into USB crash mode:
> > > > > > [  146.157696][C0] adreno 500.gpu: CP | AHB bus error
> > > > > > [  146.163303][C0] adreno 500.gpu: CP | AHB bus error
> > > > > > [  146.168837][C0] adreno 500.gpu: RBBM | ATB bus overflow
> > > > > > [  146.174960][C0] adreno 500.gpu: CP | HW fault | 
> > > > > > status=0x
> > > > > > [  146.181917][C0] adreno 500.gpu: CP | AHB bus error
> > > > > > [  146.187547][C0] adreno 500.gpu: CP illegal instruction 
> > > > > > error
> > > > > > [  146.194009][C0] adreno 500.gpu: CP | AHB bus error
> > > > > > [  146.308909][T9] Internal error: synchronous external abort:
> > > > > > 9610 [#1] PREEMPT SMP
> > > > > > [  146.317150][T9] Modules linked in:
> > > > > > [  146.320941][T9] CPU: 3 PID: 9 Comm: kworker/u16:1 Tainted: G
> > > > > > W 5.14.0-mainline-06795-g42b258c2275c #24
> > > > > > [  146.331974][  

RE: [RFC PATCH v3 1/6] drm/doc: Color Management and HDR10 RFC

2021-09-06 Thread Shankar, Uma



> -Original Message-
> From: sebast...@sebastianwick.net 
> Sent: Monday, August 16, 2021 7:07 PM
> To: Harry Wentland 
> Cc: Brian Starkey ; Sharma, Shashank
> ; amd-...@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; ppaala...@gmail.com; mca...@google.com;
> jsha...@google.com; deepak.sha...@amd.com; shiris...@amd.com;
> vitaly.pros...@amd.com; aric@amd.com; bhawanpreet.la...@amd.com;
> krunoslav.ko...@amd.com; hersenxs...@amd.com;
> nicholas.kazlaus...@amd.com; laurentiu.pa...@oss.nxp.com;
> ville.syrj...@linux.intel.com; n...@arm.com; Shankar, Uma
> 
> Subject: Re: [RFC PATCH v3 1/6] drm/doc: Color Management and HDR10 RFC
> 
> On 2021-08-16 14:40, Harry Wentland wrote:
> > On 2021-08-16 7:10 a.m., Brian Starkey wrote:
> >> On Fri, Aug 13, 2021 at 10:42:12AM +0530, Sharma, Shashank wrote:
> >>> Hello Brian,
> >>> (+Uma in cc)
> >>>

Thanks Shashank for cc'ing me. Apologies for being late here. Now seems
all stakeholders are back so we can resume the UAPI discussion on color.

> >>> Thanks for your comments, Let me try to fill-in for Harry to keep
> >>> the design discussion going. Please find my comments inline.
> >>>
> >
> > Thanks, Shashank. I'm back at work now. Had to cut my trip short due
> > to rising Covid cases and concern for my kids.
> >
> >>> On 8/2/2021 10:00 PM, Brian Starkey wrote:
> 
> >>
> >> -- snip --
> >>
> 
>  Android doesn't blend in linear space, so any API shouldn't be
>  built around an assumption of linear blending.
> 
> >
> > This seems incorrect but I guess ultimately the OS is in control of
> > this. If we want to allow blending in non-linear space with the new
> > API we would either need to describe the blending space or the
> > pre/post-blending gamma/de-gamma.
> >
> > Any idea if this blending behavior in Android might get changed in the
> > future?
> 
> There is lots of software which blends in sRGB space and designers adjusted 
> to the
> incorrect blending in a way that the result looks right.
> Blending in linear space would result in incorrectly looking images.
> 

I feel we should just leave it to userspace to decide rather than forcing 
linear or non
Linear blending in driver.

> >>>
> >>> If I am not wrong, we still need linear buffers for accurate Gamut
> >>> transformation (SRGB -> BT2020 or other way around) isn't it ?
> >>
> >> Yeah, you need to transform the buffer to linear for color gamut
> >> conversions, but then back to non-linear (probably sRGB or gamma 2.2)
> >> for actual blending.
> >>
> >> This is why I'd like to have the per-plane "OETF/GAMMA" separate from
> >> tone-mapping, so that the composition transfer function is
> >> independent.
> >>
> >>>
> >>
> >> ...
> >>
> > +
> > +Tonemapping in this case could be a simple nits value or `EDR`_
> > +to
> > describe
> > +how to scale the :ref:`SDR luminance`.
> > +
> > +Tonemapping could also include the ability to use a 3D LUT which
> > might be
> > +accompanied by a 1D shaper LUT. The shaper LUT is required in
> > order to
> > +ensure a 3D LUT with limited entries (e.g. 9x9x9, or 17x17x17)
> > operates
> > +in perceptual (non-linear) space, so as to evenly spread the
> > limited
> > +entries evenly across the perceived space.
> 
>  Some terminology care may be needed here - up until this point, I
>  think you've been talking about "tonemapping" being luminance
>  adjustment, whereas I'd expect 3D LUTs to be used for gamut
>  adjustment.
> 
> >>>
> >>> IMO, what harry wants to say here is that, which HW block gets
> >>> picked and how tone mapping is achieved can be a very driver/HW
> >>> specific thing, where one driver can use a 1D/Fixed function block,
> >>> whereas another one can choose more complex HW like a 3D LUT for the
> >>> same.
> >>>
> >>> DRM layer needs to define only the property to hook the API with
> >>> core driver, and the driver can decide which HW to pick and
> >>> configure for the activity. So when we have a tonemapping property,
> >>> we might not have a separate 3D-LUT property, or the driver may fail
> >>> the atomic_check() if both of them are programmed for different
> >>> usages.
> >>
> >> I still think that directly exposing the HW blocks and their
> >> capabilities is the right approach, rather than a "magic" tonemapping
> >> property.
> >>
> >> Yes, userspace would need to have a good understanding of how to use
> >> that hardware, but if the pipeline model is standardised that's the
> >> kind of thing a cross-vendor library could handle.
> >>
> >
> > One problem with cross-vendor libraries is that they might struggle to
> > really be cross-vendor when it comes to unique HW behavior. Or they
> > might pick sub-optimal configurations as they're not aware of the
> > power impact of a configuration. What's an optimal configuration might
> > differ greatly between different HW.
> >
> > We're seeing this problem with "universal" planes as well.
>

[RFC v2 22/22] drm/i915/xelpd: Enable plane gamma

2021-09-06 Thread Uma Shankar
Enable plane gamma feature in check callbacks. Decide
based on the user input.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/display/skl_universal_plane.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c 
b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 61eff22a3503..139863d4e3a7 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -959,7 +959,9 @@ static u32 glk_plane_color_ctl(const struct 
intel_crtc_state *crtc_state,
struct intel_plane *plane = to_intel_plane(plane_state->uapi.plane);
u32 plane_color_ctl = 0;
 
-   plane_color_ctl |= PLANE_COLOR_PLANE_GAMMA_DISABLE;
+   /* FIXME needs hw.gamma_lut */
+   if (!plane_state->uapi.gamma_lut)
+   plane_color_ctl |= PLANE_COLOR_PLANE_GAMMA_DISABLE;
 
/* FIXME needs hw.degamma_lut */
if (plane_state->uapi.degamma_lut)
-- 
2.26.2



[RFC v2 21/22] drm/i915/xelpd: Program Plane Gamma Registers

2021-09-06 Thread Uma Shankar
Extract the LUT and program plane gamma registers.
Enabled multi segmented lut as well.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/display/intel_color.c | 89 ++
 drivers/gpu/drm/i915/i915_reg.h|  9 ++-
 2 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_color.c 
b/drivers/gpu/drm/i915/display/intel_color.c
index b4cad5c92c85..e5f168a32932 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -27,6 +27,9 @@
 #include "intel_de.h"
 #include "intel_display_types.h"
 #include "intel_sprite.h"
+
+#include "skl_universal_plane.h"
+
 #include 
 
 #define CTM_COEFF_SIGN (1ULL << 63)
@@ -2433,16 +2436,102 @@ static void d13_program_plane_degamma_lut(const struct 
drm_plane_state *state,
}
 }
 
+static void d13_program_plane_gamma_lut(const struct drm_plane_state *state,
+   struct drm_color_lut_ext *gamma_lut,
+   u32 offset)
+{
+   struct drm_i915_private *dev_priv = to_i915(state->plane->dev);
+   enum pipe pipe = to_intel_plane(state->plane)->pipe;
+   enum plane_id plane = to_intel_plane(state->plane)->id;
+   u32 i, lut_size;
+
+   if (icl_is_hdr_plane(dev_priv, plane)) {
+   intel_de_write(dev_priv, PLANE_POST_CSC_GAMC_INDEX_ENH(pipe, 
plane, 0),
+  offset | PLANE_PAL_PREC_AUTO_INCREMENT);
+   if (gamma_lut) {
+   lut_size = 32;
+   for (i = 0; i < lut_size; i++) {
+   u64 word = 
drm_color_lut_extract_ext(gamma_lut[i].green, 24);
+   u32 lut_val = (word & 0xff);
+
+   intel_de_write(dev_priv, 
PLANE_POST_CSC_GAMC_DATA_ENH(pipe, plane, 0),
+  lut_val);
+   }
+
+   do {
+   /* Program the max register to clamp values > 
1.0. */
+   intel_de_write(dev_priv, 
PLANE_POST_CSC_GAMC_DATA_ENH(pipe, plane, 0),
+  gamma_lut[i].green);
+   } while (i++ < 34);
+   } else {
+   lut_size = 32;
+   for (i = 0; i < lut_size; i++) {
+   u32 v = (i * ((1 << 24) - 1)) / (lut_size - 1);
+
+   intel_de_write(dev_priv, 
PLANE_POST_CSC_GAMC_DATA_ENH(pipe, plane, 0), v);
+   }
+
+   do {
+   intel_de_write(dev_priv, 
PLANE_POST_CSC_GAMC_DATA_ENH(pipe, plane, 0),
+  1 << 24);
+   } while (i++ < 34);
+   }
+
+   intel_de_write(dev_priv, PLANE_POST_CSC_GAMC_INDEX_ENH(pipe, 
plane, 0), 0);
+   } else {
+   lut_size = 32;
+   /*
+* First 3 planes are HDR, so reduce by 3 to get to the right
+* SDR plane offset
+*/
+   plane = plane - 3;
+
+   intel_de_write(dev_priv, PLANE_POST_CSC_GAMC_INDEX(pipe, plane, 
0),
+  offset | PLANE_PAL_PREC_AUTO_INCREMENT);
+
+   if (gamma_lut) {
+   for (i = 0; i < lut_size; i++)
+   intel_de_write(dev_priv, 
PLANE_POST_CSC_GAMC_DATA(pipe, plane, 0),
+  gamma_lut[i].green & 0x);
+   /* Program the max register to clamp values > 1.0. */
+   while (i < 35)
+   intel_de_write(dev_priv, 
PLANE_POST_CSC_GAMC_DATA(pipe, plane, 0),
+  gamma_lut[i++].green & 0x3);
+   } else {
+   for (i = 0; i < lut_size; i++) {
+   u32 v = (i * ((1 << 16) - 1)) / (lut_size - 1);
+
+   intel_de_write(dev_priv, 
PLANE_POST_CSC_GAMC_DATA(pipe, plane, 0), v);
+   }
+
+   do {
+   intel_de_write(dev_priv, 
PLANE_POST_CSC_GAMC_DATA(pipe, plane, 0),
+  (1 << 16));
+   } while (i++ < 34);
+   }
+
+   intel_de_write(dev_priv, PLANE_POST_CSC_GAMC_INDEX(pipe, plane, 
0), 0);
+   }
+}
+
 static void d13_plane_load_luts(const struct drm_plane_state *plane_state)
 {
const struct drm_property_blob *degamma_lut_blob =
plane_state->degamma_lut;
+   const struct drm_property_blob *gamma_lut_blob =
+   plane_state->gamma_lut;
struct drm_color_lut_ext *degamma_lut = NULL;
+   struct drm_

[RFC v2 20/22] drm/i915/xelpd: Add register definitions for Plane Gamma

2021-09-06 Thread Uma Shankar
Add macros to define Plane Gamma registers

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/i915_reg.h | 73 +
 1 file changed, 73 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index ceee500e64d7..fc4f8b430518 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -11464,6 +11464,79 @@ enum skl_power_gate {
_MMIO_PLANE_GAMC(plane, i, _PLANE_PRE_CSC_GAMC_DATA_1(pipe), \
_PLANE_PRE_CSC_GAMC_DATA_2(pipe))
 
+/* Display13 Plane Gamma Reg */
+#define _PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_1_A0x70160
+#define _PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_1_B0x71160
+#define _PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_2_A0x70260
+#define _PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_2_B0x71260
+#define _PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_1(pipe)_PIPE(pipe, 
_PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_1_A, \
+   
_PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_1_B)
+#define _PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_2(pipe)_PIPE(pipe, 
_PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_2_A, \
+   
_PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_2_B)
+#define PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH(pipe, plane, i) \
+   _MMIO_PLANE_GAMC(plane, i, 
_PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_1(pipe), \
+   _PLANE_POST_CSC_GAMC_SEG0_INDEX_ENH_2(pipe))
+
+#define _PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_1_A 0x70164
+#define _PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_1_B 0x71164
+#define _PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_2_A 0x70264
+#define _PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_2_B 0x71264
+#define _PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_1(pipe) _PIPE(pipe, 
_PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_1_A, \
+   
_PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_1_B)
+#define _PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_2(pipe) _PIPE(pipe, 
_PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_2_A, \
+   
_PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_2_B)
+#define PLANE_POST_CSC_GAMC_SEG0_DATA_ENH(pipe, plane, i)  \
+   _MMIO_PLANE_GAMC(plane, i, 
_PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_1(pipe), \
+   _PLANE_POST_CSC_GAMC_SEG0_DATA_ENH_2(pipe))
+
+#define _PLANE_POST_CSC_GAMC_INDEX_ENH_1_A 0x701d8
+#define _PLANE_POST_CSC_GAMC_INDEX_ENH_1_B 0x711d8
+#define _PLANE_POST_CSC_GAMC_INDEX_ENH_2_A 0x702d8
+#define _PLANE_POST_CSC_GAMC_INDEX_ENH_2_B 0x712d8
+#define _PLANE_POST_CSC_GAMC_INDEX_ENH_1(pipe) _PIPE(pipe, 
_PLANE_POST_CSC_GAMC_INDEX_ENH_1_A, \
+   
_PLANE_POST_CSC_GAMC_INDEX_ENH_1_B)
+#define _PLANE_POST_CSC_GAMC_INDEX_ENH_2(pipe) _PIPE(pipe, 
_PLANE_POST_CSC_GAMC_INDEX_ENH_2_A, \
+   
_PLANE_POST_CSC_GAMC_INDEX_ENH_2_B)
+#define PLANE_POST_CSC_GAMC_INDEX_ENH(pipe, plane, i)  \
+   _MMIO_PLANE_GAMC(plane, i, 
_PLANE_POST_CSC_GAMC_INDEX_ENH_1(pipe), \
+   _PLANE_POST_CSC_GAMC_INDEX_ENH_2(pipe))
+
+#define _PLANE_POST_CSC_GAMC_DATA_ENH_1_A  0x701dc
+#define _PLANE_POST_CSC_GAMC_DATA_ENH_1_B  0x711dc
+#define _PLANE_POST_CSC_GAMC_DATA_ENH_2_A  0x702dc
+#define _PLANE_POST_CSC_GAMC_DATA_ENH_2_B  0x712dc
+#define _PLANE_POST_CSC_GAMC_DATA_ENH_1(pipe)  _PIPE(pipe, 
_PLANE_POST_CSC_GAMC_DATA_ENH_1_A, \
+   
_PLANE_POST_CSC_GAMC_DATA_ENH_1_B)
+#define _PLANE_POST_CSC_GAMC_DATA_ENH_2(pipe)  _PIPE(pipe, 
_PLANE_POST_CSC_GAMC_DATA_ENH_2_A, \
+   
_PLANE_POST_CSC_GAMC_DATA_ENH_2_B)
+#define PLANE_POST_CSC_GAMC_DATA_ENH(pipe, plane, i)   \
+   _MMIO_PLANE_GAMC(plane, i, 
_PLANE_POST_CSC_GAMC_DATA_ENH_1(pipe), \
+   _PLANE_POST_CSC_GAMC_DATA_ENH_2(pipe))
+
+#define _PLANE_POST_CSC_GAMC_INDEX_1_A 0x704d8
+#define _PLANE_POST_CSC_GAMC_INDEX_1_B 0x714d8
+#define _PLANE_POST_CSC_GAMC_INDEX_2_A 0x705d8
+#define _PLANE_POST_CSC_GAMC_INDEX_2_B 0x715d8
+#define _PLANE_POST_CSC_GAMC_INDEX_1(pipe) _PIPE(pipe, 
_PLANE_POST_CSC_GAMC_INDEX_1_A, \
+   _PLANE_POST_CSC_GAMC_INDEX_1_B)
+#define _PLANE_POST_CSC_GAMC_INDEX_2(pipe) _PIPE(pipe, 
_PLANE_POST_CSC_GAMC_INDEX_2_A, \
+   _PLANE_POST_CSC_GAMC_INDEX_2_B)
+#define PLANE_POST_CSC_GAMC_INDEX(pipe, plane, i)  \
+   _MMIO_PLANE_GAMC(plane, i, _PLANE_POST_CSC_GAMC_INDEX_1(pipe), \
+   _PLANE_POST_CSC_GAMC_INDEX_2(pipe))
+
+#define _PLANE_POST_CSC_GAMC_DATA_1_A  0x704dc
+#define _PLANE_POST_CSC_GAMC_DATA_1_B  0x714dc
+#define _PLANE_POST_CSC_GAMC_DATA_2_A  0x705dc
+#define _PLANE_POST_CSC_GAMC_DATA_2_B  0x715dc
+#define _PLANE_POST_CSC_GAMC_DATA_1(pipe)  _PIPE(pipe, 
_PLANE_POST_CSC_GAMC_DATA_1_A, \
+   _PLAN

[RFC v2 18/22] drm: Add Plane Gamma Lut property

2021-09-06 Thread Uma Shankar
Add Plane Gamma Lut as a blob property.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/drm_atomic_state_helper.c |  3 +++
 drivers/gpu/drm/drm_atomic_uapi.c | 10 ++
 drivers/gpu/drm/drm_color_mgmt.c  | 18 ++
 include/drm/drm_plane.h   | 14 ++
 4 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_state_helper.c 
b/drivers/gpu/drm/drm_atomic_state_helper.c
index fafb8af1c9cb..7ddf6e4b956b 100644
--- a/drivers/gpu/drm/drm_atomic_state_helper.c
+++ b/drivers/gpu/drm/drm_atomic_state_helper.c
@@ -316,6 +316,8 @@ void __drm_atomic_helper_plane_duplicate_state(struct 
drm_plane *plane,
drm_property_blob_get(state->degamma_lut);
if (state->ctm)
drm_property_blob_get(state->ctm);
+   if (state->gamma_lut)
+   drm_property_blob_get(state->gamma_lut);
 
state->color_mgmt_changed = false;
 }
@@ -366,6 +368,7 @@ void __drm_atomic_helper_plane_destroy_state(struct 
drm_plane_state *state)
drm_property_blob_put(state->fb_damage_clips);
drm_property_blob_put(state->degamma_lut);
drm_property_blob_put(state->ctm);
+   drm_property_blob_put(state->gamma_lut);
 }
 EXPORT_SYMBOL(__drm_atomic_helper_plane_destroy_state);
 
diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index b5abf03c5d51..a32557a4e0d3 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -615,6 +615,13 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
return ret;
} else if (property == plane->gamma_mode_property) {
state->gamma_mode = val;
+   } else if (property == plane->gamma_lut_property) {
+   ret = drm_atomic_replace_property_blob_from_id(dev,
+   &state->gamma_lut,
+   val, -1, sizeof(struct 
drm_color_lut_ext),
+   &replaced);
+   state->color_mgmt_changed |= replaced;
+   return ret;
} else if (property == config->prop_fb_damage_clips) {
ret = drm_atomic_replace_property_blob_from_id(dev,
&state->fb_damage_clips,
@@ -690,6 +697,9 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
*val = (state->ctm) ? state->ctm->base.id : 0;
} else if (property == plane->gamma_mode_property) {
*val = state->gamma_mode;
+   } else if (property == plane->gamma_lut_property) {
+   *val = (state->gamma_lut) ?
+   state->gamma_lut->base.id : 0;
} else if (property == config->prop_fb_damage_clips) {
*val = (state->fb_damage_clips) ?
state->fb_damage_clips->base.id : 0;
diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
index 02367e691cf3..b5b3ff7f654d 100644
--- a/drivers/gpu/drm/drm_color_mgmt.c
+++ b/drivers/gpu/drm/drm_color_mgmt.c
@@ -613,6 +613,11 @@ EXPORT_SYMBOL(drm_plane_create_color_properties);
  * to query and get the plane gamma color caps and choose the
  * appropriate gamma mode and create lut values accordingly
  *
+ * gamma_lut_property:
+ * Blob property which allows a userspace to provide LUT values
+ * to apply gamma curve using the h/w plane degamma processing
+ * engine, thereby making the content as non-linear.
+ *
  */
 int drm_plane_create_color_mgmt_properties(struct drm_device *dev,
   struct drm_plane *plane,
@@ -648,6 +653,13 @@ int drm_plane_create_color_mgmt_properties(struct 
drm_device *dev,
 
plane->gamma_mode_property = prop;
 
+   prop = drm_property_create(dev, DRM_MODE_PROP_BLOB,
+  "PLANE_GAMMA_LUT", 0);
+   if (!prop)
+   return -ENOMEM;
+
+   plane->gamma_lut_property = prop;
+
return 0;
 }
 EXPORT_SYMBOL(drm_plane_create_color_mgmt_properties);
@@ -685,6 +697,12 @@ void drm_plane_attach_gamma_properties(struct drm_plane 
*plane)
 
drm_object_attach_property(&plane->base,
   plane->gamma_mode_property, 0);
+
+   if (!plane->gamma_lut_property)
+   return;
+
+   drm_object_attach_property(&plane->base,
+  plane->gamma_lut_property, 0);
 }
 EXPORT_SYMBOL(drm_plane_attach_gamma_properties);
 
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index 9081867ecbd1..8b1f506bc5d3 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -270,6 +270,14 @@ struct drm_plane_state {
 */
u32 gamma_mode;
 
+   /* @gamma_lut:
+*
+* Lookup table for converting framebuffer pixel data after applying the
+* color conversion matrix @ctm. See drm_plane_enable_color_mgmt(). The
+* bl

[RFC v2 19/22] drm/i915/xelpd: Define and Initialize Plane Gamma Lut range

2021-09-06 Thread Uma Shankar
Define the structure with XE_LPD gamma lut ranges. HDR and SDR planes
have different capabilities, implemented respective structure for
the HDR planes. Degamma and GAMMA has same Lut caps for SDR planes,
extended the same.

Initialize the mode range caps as well.

Signed-off-by: Uma Shankar 
Signed-off-by: Bhanuprakash Modem 
---
 drivers/gpu/drm/i915/display/intel_color.c | 112 ++---
 1 file changed, 99 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_color.c 
b/drivers/gpu/drm/i915/display/intel_color.c
index e9c80ed41466..b4cad5c92c85 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -2247,7 +2247,7 @@ static const struct drm_color_lut_range d13_degamma_hdr[] 
= {
 };
 
  /* FIXME input bpc? */
-static const struct drm_color_lut_range d13_degamma_sdr[] = {
+static const struct drm_color_lut_range d13_gamma_degamma_sdr[] = {
/* segment 1 */
{
.flags = (DRM_MODE_LUT_GAMMA |
@@ -2297,6 +2297,63 @@ static const struct drm_color_lut_range 
d13_degamma_sdr[] = {
},
 };
 
+ /* FIXME input bpc? */
+static const struct drm_color_lut_range d13_gamma_hdr[] = {
+   /*
+* ToDo: Add Segment 1
+* There is an optional fine segment added with 9 lut values
+* Will be added later
+*/
+
+   /* segment 2 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 32,
+   .input_bpc = 24, .output_bpc = 16,
+   .start = 0, .end = (1 << 24) - 1,
+   .min = 0, .max = (1 << 24) - 1,
+   },
+   /* segment 3 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_REUSE_LAST |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 1,
+   .input_bpc = 24, .output_bpc = 16,
+   .start = (1 << 24) - 1, .end = 1 << 24,
+   .min = 0, .max = 1 << 24,
+   },
+   /* Segment 4 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_REUSE_LAST |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 1,
+   .input_bpc = 24, .output_bpc = 16,
+   .start = 1 << 24, .end = 3 << 24,
+   .min = 0, .max = (3 << 24),
+   },
+   /* Segment 5 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_REUSE_LAST |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 1,
+   .input_bpc = 24, .output_bpc = 16,
+   .start = 3 << 24, .end = 7 << 24,
+   .min = 0, .max = (7 << 24),
+   },
+};
+
 static void d13_program_plane_degamma_lut(const struct drm_plane_state *state,
  struct drm_color_lut_ext *degamma_lut,
  u32 offset)
@@ -2406,26 +2463,55 @@ int intel_plane_color_init(struct drm_plane *plane)
ret = drm_plane_color_add_gamma_degamma_mode_range(plane, "no 
degamma",
   NULL, 0,
   
LUT_TYPE_DEGAMMA);
-   if (icl_is_hdr_plane(dev_priv, to_intel_plane(plane)->id))
+   if (ret)
+   return ret;
+
+   ret = drm_plane_color_add_gamma_degamma_mode_range(plane, "no 
gamma",
+  NULL, 0,
+  
LUT_TYPE_GAMMA);
+   if (ret)
+   return ret;
+
+   if (icl_is_hdr_plane(dev_priv, to_intel_plane(plane)->id)) {
ret = 
drm_plane_color_add_gamma_degamma_mode_range(plane, "plane degamma",
   
d13_degamma_hdr,
   
sizeof(d13_degamma_hdr),
   
LUT_TYPE_DEGAMMA);
-   else
-   ret = 
drm_plane_color_add_gamma_degamma_mode_range(plane,
-  
"plane degamma",
-  

[RFC v2 17/22] drm: Add Plane Gamma Mode property

2021-09-06 Thread Uma Shankar
Add Plane Gamma Mode as a blob property. This is an enum property
with values as blob_id's and exposes the various gamma modes
supported and the lut ranges. Getting the blob id in userspace,
user can get the mode supported and also the range of gamma mode
supported with number of lut coefficients. It can then set one of
the modes using this enum property.

Lut values will be sent through a separate GAMMA_LUT blob property.

Signed-off-by: Uma Shankar 
Signed-off-by: Bhanuprakash Modem 
---
 drivers/gpu/drm/drm_atomic_uapi.c |  4 
 drivers/gpu/drm/drm_color_mgmt.c  | 26 ++
 include/drm/drm_plane.h   | 14 ++
 3 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index e736fd7c1d5b..b5abf03c5d51 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -613,6 +613,8 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
&replaced);
state->color_mgmt_changed |= replaced;
return ret;
+   } else if (property == plane->gamma_mode_property) {
+   state->gamma_mode = val;
} else if (property == config->prop_fb_damage_clips) {
ret = drm_atomic_replace_property_blob_from_id(dev,
&state->fb_damage_clips,
@@ -686,6 +688,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
state->degamma_lut->base.id : 0;
} else if (property == plane->ctm_property) {
*val = (state->ctm) ? state->ctm->base.id : 0;
+   } else if (property == plane->gamma_mode_property) {
+   *val = state->gamma_mode;
} else if (property == config->prop_fb_damage_clips) {
*val = (state->fb_damage_clips) ?
state->fb_damage_clips->base.id : 0;
diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
index 5c3138497b9c..02367e691cf3 100644
--- a/drivers/gpu/drm/drm_color_mgmt.c
+++ b/drivers/gpu/drm/drm_color_mgmt.c
@@ -606,6 +606,13 @@ EXPORT_SYMBOL(drm_plane_create_color_properties);
  * Blob property which allows a userspace to provide CTM coefficients
  * to do color space conversion or any other enhancement by doing a
  * matrix multiplication using the h/w CTM processing engine
+ *
+ * gamma_mode_property:
+ * Blob property which advertizes the possible gamma modes and
+ * lut ranges supported by the platform. This  allows userspace
+ * to query and get the plane gamma color caps and choose the
+ * appropriate gamma mode and create lut values accordingly
+ *
  */
 int drm_plane_create_color_mgmt_properties(struct drm_device *dev,
   struct drm_plane *plane,
@@ -634,6 +641,13 @@ int drm_plane_create_color_mgmt_properties(struct 
drm_device *dev,
 
plane->ctm_property = prop;
 
+   prop = drm_property_create(dev, DRM_MODE_PROP_ENUM,
+  "PLANE_GAMMA_MODE", num_values);
+   if (!prop)
+   return -ENOMEM;
+
+   plane->gamma_mode_property = prop;
+
return 0;
 }
 EXPORT_SYMBOL(drm_plane_create_color_mgmt_properties);
@@ -664,6 +678,16 @@ void drm_plane_attach_ctm_property(struct drm_plane *plane)
 }
 EXPORT_SYMBOL(drm_plane_attach_ctm_property);
 
+void drm_plane_attach_gamma_properties(struct drm_plane *plane)
+{
+   if (!plane->gamma_mode_property)
+   return;
+
+   drm_object_attach_property(&plane->base,
+  plane->gamma_mode_property, 0);
+}
+EXPORT_SYMBOL(drm_plane_attach_gamma_properties);
+
 int drm_plane_color_add_gamma_degamma_mode_range(struct drm_plane *plane,
 const char *name,
 const struct 
drm_color_lut_range *ranges,
@@ -676,6 +700,8 @@ int drm_plane_color_add_gamma_degamma_mode_range(struct 
drm_plane *plane,
 
if (type == LUT_TYPE_DEGAMMA)
prop = plane->degamma_mode_property;
+   else
+   prop = plane->gamma_mode_property;
 
if (!prop)
return -EINVAL;
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index 3d329f71d287..9081867ecbd1 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -263,6 +263,13 @@ struct drm_plane_state {
 */
struct drm_property_blob *ctm;
 
+   /**
+* @gamma_mode: This is a blob_id and exposes the platform capabilities
+* wrt to various gamma modes and the respective lut ranges. This also
+* helps user select a gamma mode amongst the supported ones.
+*/
+   u32 gamma_mode;
+
u8 color_mgmt_changed : 1;
 };
 
@@ -794,6 +801,12 @@ struct drm_plane {
 * degamma LUT.
 */
struct drm_property *ctm_property;
+
+   /*

[RFC v2 16/22] drm/i915/xelpd: Enable Plane CSC

2021-09-06 Thread Uma Shankar
Implement plane CSC for ICL+

Signed-off-by: Uma Shankar 
---
 .../gpu/drm/i915/display/intel_atomic_plane.c |  5 +-
 drivers/gpu/drm/i915/display/intel_color.c| 82 +++
 .../drm/i915/display/skl_universal_plane.c|  4 +
 drivers/gpu/drm/i915/i915_reg.h   |  1 +
 4 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.c 
b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
index 8796fc86b2e5..1637f7890f42 100644
--- a/drivers/gpu/drm/i915/display/intel_atomic_plane.c
+++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
@@ -499,6 +499,7 @@ void skl_update_planes_on_crtc(struct intel_atomic_state 
*state,
intel_atomic_get_new_crtc_state(state, crtc);
struct skl_ddb_entry entries_y[I915_MAX_PLANES];
struct skl_ddb_entry entries_uv[I915_MAX_PLANES];
+   struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
u32 update_mask = new_crtc_state->update_planes;
struct intel_plane *plane;
 
@@ -513,8 +514,10 @@ void skl_update_planes_on_crtc(struct intel_atomic_state 
*state,
struct intel_plane_state *new_plane_state =
intel_atomic_get_new_plane_state(state, plane);
 
-   if (new_plane_state->uapi.color_mgmt_changed)
+   if (new_plane_state->uapi.color_mgmt_changed) {
intel_color_load_plane_luts(&new_plane_state->uapi);
+   
dev_priv->display.load_plane_csc_matrix(&new_plane_state->uapi);
+   }
 
if (new_plane_state->uapi.visible ||
new_plane_state->planar_slave) {
diff --git a/drivers/gpu/drm/i915/display/intel_color.c 
b/drivers/gpu/drm/i915/display/intel_color.c
index f5a9af858d1b..e9c80ed41466 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -2118,6 +2118,83 @@ static void icl_read_luts(struct intel_crtc_state 
*crtc_state)
}
 }
 
+static void icl_load_plane_csc_matrix(const struct drm_plane_state *state)
+{
+   struct drm_i915_private *dev_priv = to_i915(state->plane->dev);
+   enum pipe pipe = to_intel_plane(state->plane)->pipe;
+   enum plane_id plane = to_intel_plane(state->plane)->id;
+   struct drm_color_ctm *ctm;
+   const u64 *input;
+   u16 coeffs[9] = {};
+   u16 postoff = 0;
+   int i;
+
+   if (!icl_is_hdr_plane(dev_priv, plane) || !state->ctm)
+   return;
+
+   ctm = state->ctm->data;
+   input = ctm->matrix;
+
+   /*
+* Convert fixed point S31.32 input to format supported by the
+* hardware.
+*/
+   for (i = 0; i < ARRAY_SIZE(coeffs); i++) {
+   u64 abs_coeff = ((1ULL << 63) - 1) & input[i];
+
+   /*
+* Clamp input value to min/max supported by
+* hardware.
+*/
+   abs_coeff = clamp_val(abs_coeff, 0, CTM_COEFF_4_0 - 1);
+
+   /* sign bit */
+   if (CTM_COEFF_NEGATIVE(input[i]))
+   coeffs[i] |= 1 << 15;
+
+   if (abs_coeff < CTM_COEFF_0_125)
+   coeffs[i] |= (3 << 12) |
+   ILK_CSC_COEFF_FP(abs_coeff, 12);
+   else if (abs_coeff < CTM_COEFF_0_25)
+   coeffs[i] |= (2 << 12) |
+   ILK_CSC_COEFF_FP(abs_coeff, 11);
+   else if (abs_coeff < CTM_COEFF_0_5)
+   coeffs[i] |= (1 << 12) |
+   ILK_CSC_COEFF_FP(abs_coeff, 10);
+   else if (abs_coeff < CTM_COEFF_1_0)
+   coeffs[i] |= ILK_CSC_COEFF_FP(abs_coeff, 9);
+   else if (abs_coeff < CTM_COEFF_2_0)
+   coeffs[i] |= (7 << 12) |
+   ILK_CSC_COEFF_FP(abs_coeff, 8);
+   else
+   coeffs[i] |= (6 << 12) |
+   ILK_CSC_COEFF_FP(abs_coeff, 7);
+   }
+
+   intel_de_write(dev_priv, PLANE_CSC_COEFF(pipe, plane, 0),
+  coeffs[0] << 16 | coeffs[1]);
+   intel_de_write(dev_priv, PLANE_CSC_COEFF(pipe, plane, 1),
+  coeffs[2] << 16);
+
+   intel_de_write(dev_priv, PLANE_CSC_COEFF(pipe, plane, 2),
+  coeffs[3] << 16 | coeffs[4]);
+   intel_de_write(dev_priv, PLANE_CSC_COEFF(pipe, plane, 3),
+  coeffs[5] << 16);
+
+   intel_de_write(dev_priv, PLANE_CSC_COEFF(pipe, plane, 4),
+  coeffs[6] << 16 | coeffs[7]);
+   intel_de_write(dev_priv, PLANE_CSC_COEFF(pipe, plane, 5),
+  coeffs[8] << 16);
+
+   intel_de_write(dev_priv, PLANE_CSC_PREOFF(pipe, plane, 0), 0);
+   intel_de_write(dev_priv, PLANE_CSC_PREOFF(pipe, plane, 1), 0);
+   intel_de_write(dev_priv, PLANE_CSC_PREOFF(pipe, plane, 2), 0);
+
+   intel_de_write(dev

[RFC v2 15/22] drm/i915/xelpd: Define Plane CSC Registers

2021-09-06 Thread Uma Shankar
Define Register macros for plane CSC.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/i915_reg.h | 43 +
 1 file changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0c36a330734f..20c1b8ddded8 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7440,6 +7440,49 @@ enum {
 #define PLANE_COLOR_CTL(pipe, plane)   \
_MMIO_PLANE(plane, _PLANE_COLOR_CTL_1(pipe), _PLANE_COLOR_CTL_2(pipe))
 
+/* Plane CSC Registers */
+#define _PLANE_CSC_RY_GY_1_A   0x70210
+#define _PLANE_CSC_RY_GY_2_A   0x70310
+
+#define _PLANE_CSC_RY_GY_1_B   0x71210
+#define _PLANE_CSC_RY_GY_2_B   0x71310
+
+#define _PLANE_CSC_RY_GY_1(pipe)   _PIPE(pipe, _PLANE_CSC_RY_GY_1_A, \
+ _PLANE_CSC_RY_GY_1_B)
+#define _PLANE_CSC_RY_GY_2(pipe)   _PIPE(pipe, _PLANE_INPUT_CSC_RY_GY_2_A, 
\
+ _PLANE_INPUT_CSC_RY_GY_2_B)
+#define PLANE_CSC_COEFF(pipe, plane, index)_MMIO_PLANE(plane, \
+   
_PLANE_CSC_RY_GY_1(pipe) +  (index) * 4, \
+   
_PLANE_CSC_RY_GY_2(pipe) + (index) * 4)
+
+#define _PLANE_CSC_PREOFF_HI_1_A   0x70228
+#define _PLANE_CSC_PREOFF_HI_2_A   0x70328
+
+#define _PLANE_CSC_PREOFF_HI_1_B   0x71228
+#define _PLANE_CSC_PREOFF_HI_2_B   0x71328
+
+#define _PLANE_CSC_PREOFF_HI_1(pipe)   _PIPE(pipe, _PLANE_CSC_PREOFF_HI_1_A, \
+ _PLANE_CSC_PREOFF_HI_1_B)
+#define _PLANE_CSC_PREOFF_HI_2(pipe)   _PIPE(pipe, _PLANE_CSC_PREOFF_HI_2_A, \
+ _PLANE_CSC_PREOFF_HI_2_B)
+#define PLANE_CSC_PREOFF(pipe, plane, index)   _MMIO_PLANE(plane, 
_PLANE_CSC_PREOFF_HI_1(pipe) + \
+   (index) * 4, 
_PLANE_CSC_PREOFF_HI_2(pipe) + \
+   (index) * 4)
+
+#define _PLANE_CSC_POSTOFF_HI_1_A  0x70234
+#define _PLANE_CSC_POSTOFF_HI_2_A  0x70334
+
+#define _PLANE_CSC_POSTOFF_HI_1_B  0x71234
+#define _PLANE_CSC_POSTOFF_HI_2_B  0x71334
+
+#define _PLANE_CSC_POSTOFF_HI_1(pipe)  _PIPE(pipe, _PLANE_CSC_POSTOFF_HI_1_A, \
+ _PLANE_CSC_POSTOFF_HI_1_B)
+#define _PLANE_CSC_POSTOFF_HI_2(pipe)  _PIPE(pipe, _PLANE_CSC_POSTOFF_HI_2_A, \
+ _PLANE_CSC_POSTOFF_HI_2_B)
+#define PLANE_CSC_POSTOFF(pipe, plane, index)  _MMIO_PLANE(plane, 
_PLANE_CSC_POSTOFF_HI_1(pipe) + \
+   (index) * 4, 
_PLANE_CSC_POSTOFF_HI_2(pipe) + \
+   (index) * 4)
+
 #define _SEL_FETCH_PLANE_BASE_1_A  0x70890
 #define _SEL_FETCH_PLANE_BASE_2_A  0x708B0
 #define _SEL_FETCH_PLANE_BASE_3_A  0x708D0
-- 
2.26.2



[RFC v2 14/22] drm: Add helper to attach Plane ctm property

2021-09-06 Thread Uma Shankar
Add a DRM helper to attach ctm property.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/drm_color_mgmt.c | 10 ++
 include/drm/drm_plane.h  |  1 +
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
index 83832adf3adf..5c3138497b9c 100644
--- a/drivers/gpu/drm/drm_color_mgmt.c
+++ b/drivers/gpu/drm/drm_color_mgmt.c
@@ -654,6 +654,16 @@ void drm_plane_attach_degamma_properties(struct drm_plane 
*plane)
 }
 EXPORT_SYMBOL(drm_plane_attach_degamma_properties);
 
+void drm_plane_attach_ctm_property(struct drm_plane *plane)
+{
+   if (!plane->ctm_property)
+   return;
+
+   drm_object_attach_property(&plane->base,
+  plane->ctm_property, 0);
+}
+EXPORT_SYMBOL(drm_plane_attach_ctm_property);
+
 int drm_plane_color_add_gamma_degamma_mode_range(struct drm_plane *plane,
 const char *name,
 const struct 
drm_color_lut_range *ranges,
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index c4ed1799ecaf..3d329f71d287 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -889,6 +889,7 @@ int drm_plane_create_color_mgmt_properties(struct 
drm_device *dev,
   struct drm_plane *plane,
   int num_values);
 void drm_plane_attach_degamma_properties(struct drm_plane *plane);
+void drm_plane_attach_ctm_property(struct drm_plane *plane);
 int drm_plane_color_add_gamma_degamma_mode_range(struct drm_plane *plane,
 const char *name,
 const struct 
drm_color_lut_range *ranges,
-- 
2.26.2



[RFC v2 13/22] drm: Add Plane CTM property

2021-09-06 Thread Uma Shankar
Add a blob property for plane CSC usage.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/drm_atomic_state_helper.c |  3 +++
 drivers/gpu/drm/drm_atomic_uapi.c | 10 ++
 drivers/gpu/drm/drm_color_mgmt.c  | 11 +++
 include/drm/drm_plane.h   | 15 +++
 4 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_state_helper.c 
b/drivers/gpu/drm/drm_atomic_state_helper.c
index 6e358067cb7a..fafb8af1c9cb 100644
--- a/drivers/gpu/drm/drm_atomic_state_helper.c
+++ b/drivers/gpu/drm/drm_atomic_state_helper.c
@@ -314,6 +314,8 @@ void __drm_atomic_helper_plane_duplicate_state(struct 
drm_plane *plane,
 
if (state->degamma_lut)
drm_property_blob_get(state->degamma_lut);
+   if (state->ctm)
+   drm_property_blob_get(state->ctm);
 
state->color_mgmt_changed = false;
 }
@@ -363,6 +365,7 @@ void __drm_atomic_helper_plane_destroy_state(struct 
drm_plane_state *state)
 
drm_property_blob_put(state->fb_damage_clips);
drm_property_blob_put(state->degamma_lut);
+   drm_property_blob_put(state->ctm);
 }
 EXPORT_SYMBOL(__drm_atomic_helper_plane_destroy_state);
 
diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 904291b96ba9..e736fd7c1d5b 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -605,6 +605,14 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
&replaced);
state->color_mgmt_changed |= replaced;
return ret;
+   } else if (property == plane->ctm_property) {
+   ret = drm_atomic_replace_property_blob_from_id(dev,
+   &state->ctm,
+   val,
+   sizeof(struct drm_color_ctm), -1,
+   &replaced);
+   state->color_mgmt_changed |= replaced;
+   return ret;
} else if (property == config->prop_fb_damage_clips) {
ret = drm_atomic_replace_property_blob_from_id(dev,
&state->fb_damage_clips,
@@ -676,6 +684,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
} else if (property == plane->degamma_lut_property) {
*val = (state->degamma_lut) ?
state->degamma_lut->base.id : 0;
+   } else if (property == plane->ctm_property) {
+   *val = (state->ctm) ? state->ctm->base.id : 0;
} else if (property == config->prop_fb_damage_clips) {
*val = (state->fb_damage_clips) ?
state->fb_damage_clips->base.id : 0;
diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
index 29d0fc1e52b5..83832adf3adf 100644
--- a/drivers/gpu/drm/drm_color_mgmt.c
+++ b/drivers/gpu/drm/drm_color_mgmt.c
@@ -602,6 +602,10 @@ EXPORT_SYMBOL(drm_plane_create_color_properties);
  * engine, thereby making the content as linear for further color
  * processing.
  *
+ * ctm_property:
+ * Blob property which allows a userspace to provide CTM coefficients
+ * to do color space conversion or any other enhancement by doing a
+ * matrix multiplication using the h/w CTM processing engine
  */
 int drm_plane_create_color_mgmt_properties(struct drm_device *dev,
   struct drm_plane *plane,
@@ -623,6 +627,13 @@ int drm_plane_create_color_mgmt_properties(struct 
drm_device *dev,
 
plane->degamma_lut_property = prop;
 
+   prop = drm_property_create(dev, DRM_MODE_PROP_BLOB,
+  "PLANE_CTM", 0);
+   if (!prop)
+   return -ENOMEM;
+
+   plane->ctm_property = prop;
+
return 0;
 }
 EXPORT_SYMBOL(drm_plane_create_color_mgmt_properties);
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index fbfada0b990d..c4ed1799ecaf 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -255,6 +255,14 @@ struct drm_plane_state {
 */
struct drm_property_blob *degamma_lut;
 
+   /**
+* @ctm:
+*
+* Color transformation matrix. See drm_plane_enable_color_mgmt(). The
+* blob (if not NULL) is a &struct drm_color_ctm.
+*/
+   struct drm_property_blob *ctm;
+
u8 color_mgmt_changed : 1;
 };
 
@@ -779,6 +787,13 @@ struct drm_plane {
 * used to convert the framebuffer's colors to linear gamma.
 */
struct drm_property *degamma_lut_property;
+
+   /**
+* @plane_ctm_property: Optional Plane property to set the
+* matrix used to convert colors after the lookup in the
+* degamma LUT.
+*/
+   struct drm_property *ctm_property;
 };
 
 #define obj_to_plane(x) container_of(x, struct drm_plane, base)
-- 
2.26.2



[RFC v2 12/22] drm/i915/xelpd: Load plane color luts from atomic flip

2021-09-06 Thread Uma Shankar
Load plane color luts as part of atomic plane updates.
This will be done only if the plane color luts are changed.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/display/intel_atomic_plane.c | 3 +++
 drivers/gpu/drm/i915/display/intel_atomic_plane.h | 1 +
 drivers/gpu/drm/i915/display/intel_color.c| 9 +
 3 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.c 
b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
index 47234d898549..8796fc86b2e5 100644
--- a/drivers/gpu/drm/i915/display/intel_atomic_plane.c
+++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
@@ -513,6 +513,9 @@ void skl_update_planes_on_crtc(struct intel_atomic_state 
*state,
struct intel_plane_state *new_plane_state =
intel_atomic_get_new_plane_state(state, plane);
 
+   if (new_plane_state->uapi.color_mgmt_changed)
+   intel_color_load_plane_luts(&new_plane_state->uapi);
+
if (new_plane_state->uapi.visible ||
new_plane_state->planar_slave) {
intel_update_plane(plane, new_crtc_state, 
new_plane_state);
diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.h 
b/drivers/gpu/drm/i915/display/intel_atomic_plane.h
index 854f37b49681..3001f4d69b4d 100644
--- a/drivers/gpu/drm/i915/display/intel_atomic_plane.h
+++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.h
@@ -65,5 +65,6 @@ void intel_plane_set_invisible(struct intel_crtc_state 
*crtc_state,
   struct intel_plane_state *plane_state);
 void intel_plane_helper_add(struct intel_plane *plane);
 int intel_plane_color_init(struct drm_plane *plane);
+void intel_color_load_plane_luts(const struct drm_plane_state *plane_state);
 
 #endif /* __INTEL_ATOMIC_PLANE_H__ */
diff --git a/drivers/gpu/drm/i915/display/intel_color.c 
b/drivers/gpu/drm/i915/display/intel_color.c
index 62df5122309a..f5a9af858d1b 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -22,6 +22,7 @@
  *
  */
 
+#include "intel_atomic_plane.h"
 #include "intel_color.h"
 #include "intel_de.h"
 #include "intel_display_types.h"
@@ -2310,6 +2311,14 @@ static void d13_plane_load_luts(const struct 
drm_plane_state *plane_state)
}
 }
 
+void intel_color_load_plane_luts(const struct drm_plane_state *plane_state)
+{
+   struct drm_device *dev = plane_state->plane->dev;
+   struct drm_i915_private *dev_priv = to_i915(dev);
+
+   dev_priv->display.load_plane_luts(plane_state);
+}
+
 int intel_plane_color_init(struct drm_plane *plane)
 {
struct drm_i915_private *dev_priv = to_i915(plane->dev);
-- 
2.26.2



[RFC v2 11/22] drm/i915/xelpd: Initialize plane color features

2021-09-06 Thread Uma Shankar
Initialize plane color features for XE_LPD.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/display/intel_atomic_plane.h  | 1 +
 drivers/gpu/drm/i915/display/skl_universal_plane.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.h 
b/drivers/gpu/drm/i915/display/intel_atomic_plane.h
index 62e5a2a77fd4..854f37b49681 100644
--- a/drivers/gpu/drm/i915/display/intel_atomic_plane.h
+++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.h
@@ -64,5 +64,6 @@ int intel_atomic_plane_check_clipping(struct 
intel_plane_state *plane_state,
 void intel_plane_set_invisible(struct intel_crtc_state *crtc_state,
   struct intel_plane_state *plane_state);
 void intel_plane_helper_add(struct intel_plane *plane);
+int intel_plane_color_init(struct drm_plane *plane);
 
 #endif /* __INTEL_ATOMIC_PLANE_H__ */
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c 
b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 4187a670e840..c4e01ae4343c 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -2184,6 +2184,8 @@ skl_universal_plane_create(struct drm_i915_private 
*dev_priv,
BIT(DRM_SCALING_FILTER_DEFAULT) 
|

BIT(DRM_SCALING_FILTER_NEAREST_NEIGHBOR));
 
+   intel_plane_color_init(&plane->base);
+
intel_plane_helper_add(plane);
 
return plane;
-- 
2.26.2



[RFC v2 10/22] drm/i915/xelpd: Add plane color check to glk_plane_color_ctl

2021-09-06 Thread Uma Shankar
Extended glk_plane_color_ctl to have plane color checks. This helps
enabling the degamma or gamma block based on user inputs.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/display/skl_universal_plane.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c 
b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 724e7b04f3b6..4187a670e840 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -960,6 +960,11 @@ static u32 glk_plane_color_ctl(const struct 
intel_crtc_state *crtc_state,
u32 plane_color_ctl = 0;
 
plane_color_ctl |= PLANE_COLOR_PLANE_GAMMA_DISABLE;
+
+   /* FIXME needs hw.degamma_lut */
+   if (plane_state->uapi.degamma_lut)
+   plane_color_ctl |= PLANE_PRE_CSC_GAMMA_ENABLE;
+
plane_color_ctl |= glk_plane_color_ctl_alpha(plane_state);
 
if (fb->format->is_yuv && !icl_is_hdr_plane(dev_priv, plane->id)) {
-- 
2.26.2



[RFC v2 09/22] drm/i915/xelpd: Program Plane Degamma Registers

2021-09-06 Thread Uma Shankar
Extract the LUT and program plane degamma registers.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/display/intel_color.c | 116 +
 drivers/gpu/drm/i915/i915_reg.h|   2 +
 2 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_color.c 
b/drivers/gpu/drm/i915/display/intel_color.c
index fd0bfdf85703..62df5122309a 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -126,6 +126,29 @@ static bool crtc_state_is_legacy_gamma(const struct 
intel_crtc_state *crtc_state
lut_is_legacy(crtc_state->hw.gamma_lut);
 }
 
+/*
+ * Added to accommodate enhanced LUT precision.
+ * Max LUT precision is 32 bits.
+ */
+static u64 drm_color_lut_extract_ext(u64 user_input, u32 bit_precision)
+{
+   u64 val = user_input & 0x;
+   u32 max;
+
+   if (bit_precision > 32)
+   return 0;
+
+   max = 0x >> (32 - bit_precision);
+   /* Round only if we're not using full precision. */
+   if (bit_precision < 32) {
+   val += 1UL << (32 - bit_precision - 1);
+   val >>= 32 - bit_precision;
+   }
+
+   return ((user_input & 0x) |
+   clamp_val(val, 0, max));
+}
+
 /*
  * When using limited range, multiply the matrix given by userspace by
  * the matrix that we would use for the limited range.
@@ -2196,6 +2219,97 @@ static const struct drm_color_lut_range 
d13_degamma_sdr[] = {
},
 };
 
+static void d13_program_plane_degamma_lut(const struct drm_plane_state *state,
+ struct drm_color_lut_ext *degamma_lut,
+ u32 offset)
+{
+   struct drm_i915_private *dev_priv = to_i915(state->plane->dev);
+   enum pipe pipe = to_intel_plane(state->plane)->pipe;
+   enum plane_id plane = to_intel_plane(state->plane)->id;
+   u32 i, lut_size;
+
+   if (icl_is_hdr_plane(dev_priv, plane)) {
+   lut_size = 128;
+
+   intel_de_write(dev_priv, PLANE_PRE_CSC_GAMC_INDEX_ENH(pipe, 
plane, 0),
+  PLANE_PAL_PREC_AUTO_INCREMENT);
+
+   if (degamma_lut) {
+   for (i = 0; i < lut_size; i++) {
+   u64 word = 
drm_color_lut_extract_ext(degamma_lut[i].green, 24);
+   u32 lut_val = (word & 0xff);
+
+   intel_de_write(dev_priv, 
PLANE_PRE_CSC_GAMC_DATA_ENH(pipe, plane, 0),
+  lut_val);
+   }
+
+   /* Program the max register to clamp values > 1.0. */
+   while (i < 131)
+   intel_de_write(dev_priv, 
PLANE_PRE_CSC_GAMC_DATA_ENH(pipe, plane, 0),
+  degamma_lut[i++].green);
+   } else {
+   for (i = 0; i < lut_size; i++) {
+   u32 v = (i * ((1 << 24) - 1)) / (lut_size - 1);
+
+   intel_de_write(dev_priv, 
PLANE_PRE_CSC_GAMC_DATA_ENH(pipe, plane, 0), v);
+   }
+
+   do {
+   intel_de_write(dev_priv, 
PLANE_PRE_CSC_GAMC_DATA_ENH(pipe, plane, 0),
+  1 << 24);
+   } while (i++ < 130);
+   }
+
+   intel_de_write(dev_priv, PLANE_PRE_CSC_GAMC_INDEX_ENH(pipe, 
plane, 0), 0);
+   } else {
+   lut_size = 32;
+
+   /*
+* First 3 planes are HDR, so reduce by 3 to get to the right
+* SDR plane offset
+*/
+   plane = plane - 3;
+
+   intel_de_write(dev_priv, PLANE_PRE_CSC_GAMC_INDEX(pipe, plane, 
0),
+  PLANE_PAL_PREC_AUTO_INCREMENT);
+
+   if (degamma_lut) {
+   for (i = 0; i < lut_size; i++)
+   intel_de_write(dev_priv, 
PLANE_PRE_CSC_GAMC_DATA(pipe, plane, 0),
+  degamma_lut[i].green);
+   /* Program the max register to clamp values > 1.0. */
+   while (i < 35)
+   intel_de_write(dev_priv, 
PLANE_PRE_CSC_GAMC_DATA(pipe, plane, 0),
+  degamma_lut[i++].green);
+   } else {
+   for (i = 0; i < lut_size; i++) {
+   u32 v = (i * ((1 << 16) - 1)) / (lut_size - 1);
+
+   intel_de_write(dev_priv, 
PLANE_PRE_CSC_GAMC_DATA(pipe, plane, 0), v);
+   }
+
+   do {
+   intel_de_write(dev_priv, 
PLANE_PRE_CSC_GAMC_DATA(pipe, plane, 0),
+  1 

[RFC v2 07/22] drm/i915/xelpd: Enable plane color features

2021-09-06 Thread Uma Shankar
Enable and initialize plane color features.
Also initialize the color features of HDR planes.

Signed-off-by: Uma Shankar 
Signed-off-by: Bhanuprakash Modem 
---
 drivers/gpu/drm/i915/display/intel_color.c | 22 +-
 drivers/gpu/drm/i915/display/intel_color.h |  2 ++
 drivers/gpu/drm/i915/i915_drv.h|  3 +++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_color.c 
b/drivers/gpu/drm/i915/display/intel_color.c
index 6403bd74324b..2307a2e4d73d 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -25,6 +25,7 @@
 #include "intel_color.h"
 #include "intel_de.h"
 #include "intel_display_types.h"
+#include 
 
 #define CTM_COEFF_SIGN (1ULL << 63)
 
@@ -2093,7 +2094,6 @@ static void icl_read_luts(struct intel_crtc_state 
*crtc_state)
 }
 
  /* FIXME input bpc? */
-__maybe_unused
 static const struct drm_color_lut_range d13_degamma_hdr[] = {
/* segment 1 */
{
@@ -2144,6 +2144,26 @@ static const struct drm_color_lut_range 
d13_degamma_hdr[] = {
},
 };
 
+int intel_plane_color_init(struct drm_plane *plane)
+{
+   struct drm_i915_private *dev_priv = to_i915(plane->dev);
+   int ret = 0;
+
+   if (DISPLAY_VER(dev_priv) >= 13) {
+   drm_plane_create_color_mgmt_properties(plane->dev, plane, 2);
+   ret = drm_plane_color_add_gamma_degamma_mode_range(plane, "no 
degamma",
+  NULL, 0,
+  
LUT_TYPE_DEGAMMA);
+   ret = drm_plane_color_add_gamma_degamma_mode_range(plane, 
"plane degamma",
+  
d13_degamma_hdr,
+  
sizeof(d13_degamma_hdr),
+  
LUT_TYPE_DEGAMMA);
+   drm_plane_attach_degamma_properties(plane);
+   }
+
+   return ret;
+}
+
 void intel_color_init(struct intel_crtc *crtc)
 {
struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
diff --git a/drivers/gpu/drm/i915/display/intel_color.h 
b/drivers/gpu/drm/i915/display/intel_color.h
index 173727aaa24d..b8850bb1b0c9 100644
--- a/drivers/gpu/drm/i915/display/intel_color.h
+++ b/drivers/gpu/drm/i915/display/intel_color.h
@@ -10,6 +10,7 @@
 
 struct intel_crtc_state;
 struct intel_crtc;
+struct drm_plane;
 struct drm_property_blob;
 
 void intel_color_init(struct intel_crtc *crtc);
@@ -21,5 +22,6 @@ int intel_color_get_gamma_bit_precision(const struct 
intel_crtc_state *crtc_stat
 bool intel_color_lut_equal(struct drm_property_blob *blob1,
   struct drm_property_blob *blob2,
   u32 gamma_mode, u32 bit_precision);
+int intel_plane_color_init(struct drm_plane *plane);
 
 #endif /* __INTEL_COLOR_H__ */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index be2392bbcecc..a937a20e4c49 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -391,6 +391,9 @@ struct drm_i915_display_funcs {
 */
void (*load_luts)(const struct intel_crtc_state *crtc_state);
void (*read_luts)(struct intel_crtc_state *crtc_state);
+   /* Add Plane Color callbacks */
+   void (*load_plane_csc_matrix)(const struct drm_plane_state 
*plane_state);
+   void (*load_plane_luts)(const struct drm_plane_state *plane_state);
 };
 
 
-- 
2.26.2



[RFC v2 08/22] drm/i915/xelpd: Add color capabilities of SDR planes

2021-09-06 Thread Uma Shankar
Add the Color capabilities of SDR planes.

Signed-off-by: Uma Shankar 
Signed-off-by: Bhanuprakash Modem 
---
 drivers/gpu/drm/i915/display/intel_color.c | 67 --
 1 file changed, 63 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_color.c 
b/drivers/gpu/drm/i915/display/intel_color.c
index 2307a2e4d73d..fd0bfdf85703 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -25,6 +25,7 @@
 #include "intel_color.h"
 #include "intel_de.h"
 #include "intel_display_types.h"
+#include "intel_sprite.h"
 #include 
 
 #define CTM_COEFF_SIGN (1ULL << 63)
@@ -2144,6 +2145,57 @@ static const struct drm_color_lut_range 
d13_degamma_hdr[] = {
},
 };
 
+ /* FIXME input bpc? */
+static const struct drm_color_lut_range d13_degamma_sdr[] = {
+   /* segment 1 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 32,
+   .input_bpc = 16, .output_bpc = 16,
+   .start = 0, .end = (1 << 16) - (1 << 16) / 33,
+   .min = 0, .max = (1 << 16) - 1,
+   },
+   /* segment 2 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_REUSE_LAST |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 1,
+   .input_bpc = 16, .output_bpc = 16,
+   .start = (1 << 16) - (1 << 16) / 33, .end = 1 << 16,
+   .min = 0, .max = 1 << 16,
+   },
+   /* Segment 3 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_REUSE_LAST |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 1,
+   .input_bpc = 16, .output_bpc = 16,
+   .start = 1 << 16, .end = 3 << 16,
+   .min = 0, .max = (8 << 16) - 1,
+   },
+   /* Segment 4 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_REUSE_LAST |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 1,
+   .input_bpc = 16, .output_bpc = 16,
+   .start = 3 << 16, .end = 7 << 16,
+   .min = 0, .max = (8 << 16) - 1,
+   },
+};
+
 int intel_plane_color_init(struct drm_plane *plane)
 {
struct drm_i915_private *dev_priv = to_i915(plane->dev);
@@ -2154,10 +2206,17 @@ int intel_plane_color_init(struct drm_plane *plane)
ret = drm_plane_color_add_gamma_degamma_mode_range(plane, "no 
degamma",
   NULL, 0,
   
LUT_TYPE_DEGAMMA);
-   ret = drm_plane_color_add_gamma_degamma_mode_range(plane, 
"plane degamma",
-  
d13_degamma_hdr,
-  
sizeof(d13_degamma_hdr),
-  
LUT_TYPE_DEGAMMA);
+   if (icl_is_hdr_plane(dev_priv, to_intel_plane(plane)->id))
+   ret = 
drm_plane_color_add_gamma_degamma_mode_range(plane, "plane degamma",
+  
d13_degamma_hdr,
+  
sizeof(d13_degamma_hdr),
+  
LUT_TYPE_DEGAMMA);
+   else
+   ret = 
drm_plane_color_add_gamma_degamma_mode_range(plane,
+  
"plane degamma",
+  
d13_degamma_sdr,
+  
sizeof(d13_degamma_sdr),
+  
LUT_TYPE_DEGAMMA);
drm_plane_attach_degamma_properties(plane);
}
 
-- 
2.26.2



[RFC v2 05/22] drm/i915/xelpd: Define Degamma Lut range struct for HDR planes

2021-09-06 Thread Uma Shankar
Define the structure with XE_LPD degamma lut ranges. HDR and SDR
planes have different capabilities, implemented respective
structure for the HDR planes.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/display/intel_color.c | 52 ++
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_color.c 
b/drivers/gpu/drm/i915/display/intel_color.c
index afcb4bf3826c..6403bd74324b 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -2092,6 +2092,58 @@ static void icl_read_luts(struct intel_crtc_state 
*crtc_state)
}
 }
 
+ /* FIXME input bpc? */
+__maybe_unused
+static const struct drm_color_lut_range d13_degamma_hdr[] = {
+   /* segment 1 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 128,
+   .input_bpc = 24, .output_bpc = 16,
+   .start = 0, .end = (1 << 24) - 1,
+   .min = 0, .max = (1 << 24) - 1,
+   },
+   /* segment 2 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_REUSE_LAST |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 1,
+   .input_bpc = 24, .output_bpc = 16,
+   .start = (1 << 24) - 1, .end = 1 << 24,
+   .min = 0, .max = (1 << 27) - 1,
+   },
+   /* Segment 3 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_REUSE_LAST |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 1,
+   .input_bpc = 24, .output_bpc = 16,
+   .start = 1 << 24, .end = 3 << 24,
+   .min = 0, .max = (1 << 27) - 1,
+   },
+   /* Segment 4 */
+   {
+   .flags = (DRM_MODE_LUT_GAMMA |
+ DRM_MODE_LUT_REFLECT_NEGATIVE |
+ DRM_MODE_LUT_INTERPOLATE |
+ DRM_MODE_LUT_REUSE_LAST |
+ DRM_MODE_LUT_NON_DECREASING),
+   .count = 1,
+   .input_bpc = 24, .output_bpc = 16,
+   .start = 3 << 24, .end = 7 << 24,
+   .min = 0, .max = (1 << 27) - 1,
+   },
+};
+
 void intel_color_init(struct intel_crtc *crtc)
 {
struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
-- 
2.26.2



[RFC v2 06/22] drm/i915/xelpd: Add register definitions for Plane Degamma

2021-09-06 Thread Uma Shankar
Add macros to define Plane Degamma registers

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/i915/i915_reg.h | 52 +
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 313432ed6196..919982c878ac 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -262,6 +262,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
  
INTEL_INFO(dev_priv)->cursor_offsets[PIPE_A] + (reg) + \
  DISPLAY_MMIO_BASE(dev_priv))
 
+/* Plane Gamma Registers */
+#define _MMIO_PLANE_GAMC(plane, i, a, b)  _MMIO(_PIPE(plane, a, b) + (i) * 4)
+
 #define __MASKED_FIELD(mask, value) ((mask) << 16 | (value))
 #define _MASKED_FIELD(mask, value) ({ \
if (__builtin_constant_p(mask))\
@@ -11366,6 +11369,55 @@ enum skl_power_gate {
_PAL_PREC_MULTI_SEG_DATA_A, \
_PAL_PREC_MULTI_SEG_DATA_B)
 
+/* Display13 Plane Degmma Reg */
+#define _PLANE_PRE_CSC_GAMC_INDEX_ENH_1_A  0x701d0
+#define _PLANE_PRE_CSC_GAMC_INDEX_ENH_1_B  0x711d0
+#define _PLANE_PRE_CSC_GAMC_INDEX_ENH_2_A  0x702d0
+#define _PLANE_PRE_CSC_GAMC_INDEX_ENH_2_B  0x712d0
+#define _PLANE_PRE_CSC_GAMC_INDEX_ENH_1(pipe)  _PIPE(pipe, 
_PLANE_PRE_CSC_GAMC_INDEX_ENH_1_A, \
+   
_PLANE_PRE_CSC_GAMC_INDEX_ENH_1_B)
+#define _PLANE_PRE_CSC_GAMC_INDEX_ENH_2(pipe)  _PIPE(pipe, 
_PLANE_PRE_CSC_GAMC_INDEX_ENH_2_A, \
+   
_PLANE_PRE_CSC_GAMC_INDEX_ENH_2_B)
+#define PLANE_PRE_CSC_GAMC_INDEX_ENH(pipe, plane, i)   \
+   _MMIO_PLANE_GAMC(plane, i, 
_PLANE_PRE_CSC_GAMC_INDEX_ENH_1(pipe), \
+   _PLANE_PRE_CSC_GAMC_INDEX_ENH_2(pipe))
+
+#define _PLANE_PRE_CSC_GAMC_DATA_ENH_1_A   0x701d4
+#define _PLANE_PRE_CSC_GAMC_DATA_ENH_1_B   0x711d4
+#define _PLANE_PRE_CSC_GAMC_DATA_ENH_2_A   0x702d4
+#define _PLANE_PRE_CSC_GAMC_DATA_ENH_2_B   0x712d4
+#define _PLANE_PRE_CSC_GAMC_DATA_ENH_1(pipe)   _PIPE(pipe, 
_PLANE_PRE_CSC_GAMC_DATA_ENH_1_A, \
+   
_PLANE_PRE_CSC_GAMC_DATA_ENH_1_B)
+#define _PLANE_PRE_CSC_GAMC_DATA_ENH_2(pipe)   _PIPE(pipe, 
_PLANE_PRE_CSC_GAMC_DATA_ENH_2_A, \
+   
_PLANE_PRE_CSC_GAMC_DATA_ENH_2_B)
+#define PLANE_PRE_CSC_GAMC_DATA_ENH(pipe, plane, i)\
+   _MMIO_PLANE_GAMC(plane, i, 
_PLANE_PRE_CSC_GAMC_DATA_ENH_1(pipe), \
+   _PLANE_PRE_CSC_GAMC_DATA_ENH_2(pipe))
+
+#define _PLANE_PRE_CSC_GAMC_INDEX_1_A  0x704d0
+#define _PLANE_PRE_CSC_GAMC_INDEX_1_B  0x714d0
+#define _PLANE_PRE_CSC_GAMC_INDEX_2_A  0x705d0
+#define _PLANE_PRE_CSC_GAMC_INDEX_2_B  0x715d0
+#define _PLANE_PRE_CSC_GAMC_INDEX_1(pipe)  _PIPE(pipe, 
_PLANE_PRE_CSC_GAMC_INDEX_1_A, \
+   _PLANE_PRE_CSC_GAMC_INDEX_1_B)
+#define _PLANE_PRE_CSC_GAMC_INDEX_2(pipe)  _PIPE(pipe, 
_PLANE_PRE_CSC_GAMC_INDEX_2_A, \
+   _PLANE_PRE_CSC_GAMC_INDEX_2_B)
+#define PLANE_PRE_CSC_GAMC_INDEX(pipe, plane, i)   \
+   _MMIO_PLANE_GAMC(plane, i, _PLANE_PRE_CSC_GAMC_INDEX_1(pipe), \
+   _PLANE_PRE_CSC_GAMC_INDEX_2(pipe))
+
+#define _PLANE_PRE_CSC_GAMC_DATA_1_A   0x704d4
+#define _PLANE_PRE_CSC_GAMC_DATA_1_B   0x714d4
+#define _PLANE_PRE_CSC_GAMC_DATA_2_A   0x705d4
+#define _PLANE_PRE_CSC_GAMC_DATA_2_B   0x715d4
+#define _PLANE_PRE_CSC_GAMC_DATA_1(pipe)   _PIPE(pipe, 
_PLANE_PRE_CSC_GAMC_DATA_1_A, \
+   _PLANE_PRE_CSC_GAMC_DATA_1_B)
+#define _PLANE_PRE_CSC_GAMC_DATA_2(pipe)   _PIPE(pipe, 
_PLANE_PRE_CSC_GAMC_DATA_2_A, \
+   _PLANE_PRE_CSC_GAMC_DATA_2_B)
+#define PLANE_PRE_CSC_GAMC_DATA(pipe, plane, i)\
+   _MMIO_PLANE_GAMC(plane, i, _PLANE_PRE_CSC_GAMC_DATA_1(pipe), \
+   _PLANE_PRE_CSC_GAMC_DATA_2(pipe))
+
 /* pipe CSC & degamma/gamma LUTs on CHV */
 #define _CGM_PIPE_A_CSC_COEFF01(VLV_DISPLAY_BASE + 0x67900)
 #define _CGM_PIPE_A_CSC_COEFF23(VLV_DISPLAY_BASE + 0x67904)
-- 
2.26.2



[RFC v2 04/22] drm: Add Plane Degamma Lut property

2021-09-06 Thread Uma Shankar
Add Plane Degamma Lut as a blob property. User will calculate
the lut values, create the blob and send it to driver using
this property. Lut calculation will be based on the gamma mode
chosen out of the gamma mode exposed.

Signed-off-by: Uma Shankar 
---
 drivers/gpu/drm/drm_atomic_state_helper.c |  4 
 drivers/gpu/drm/drm_atomic_uapi.c | 10 ++
 drivers/gpu/drm/drm_color_mgmt.c  | 19 +++
 include/drm/drm_plane.h   | 14 ++
 4 files changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_state_helper.c 
b/drivers/gpu/drm/drm_atomic_state_helper.c
index f26b03853711..6e358067cb7a 100644
--- a/drivers/gpu/drm/drm_atomic_state_helper.c
+++ b/drivers/gpu/drm/drm_atomic_state_helper.c
@@ -312,6 +312,9 @@ void __drm_atomic_helper_plane_duplicate_state(struct 
drm_plane *plane,
state->commit = NULL;
state->fb_damage_clips = NULL;
 
+   if (state->degamma_lut)
+   drm_property_blob_get(state->degamma_lut);
+
state->color_mgmt_changed = false;
 }
 EXPORT_SYMBOL(__drm_atomic_helper_plane_duplicate_state);
@@ -359,6 +362,7 @@ void __drm_atomic_helper_plane_destroy_state(struct 
drm_plane_state *state)
drm_crtc_commit_put(state->commit);
 
drm_property_blob_put(state->fb_damage_clips);
+   drm_property_blob_put(state->degamma_lut);
 }
 EXPORT_SYMBOL(__drm_atomic_helper_plane_destroy_state);
 
diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 3c952123f747..904291b96ba9 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -598,6 +598,13 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_range = val;
} else if (property == plane->degamma_mode_property) {
state->degamma_mode = val;
+   } else if (property == plane->degamma_lut_property) {
+   ret = drm_atomic_replace_property_blob_from_id(dev,
+   &state->degamma_lut,
+   val, -1, sizeof(struct 
drm_color_lut_ext),
+   &replaced);
+   state->color_mgmt_changed |= replaced;
+   return ret;
} else if (property == config->prop_fb_damage_clips) {
ret = drm_atomic_replace_property_blob_from_id(dev,
&state->fb_damage_clips,
@@ -666,6 +673,9 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
*val = state->color_range;
} else if (property == plane->degamma_mode_property) {
*val = state->degamma_mode;
+   } else if (property == plane->degamma_lut_property) {
+   *val = (state->degamma_lut) ?
+   state->degamma_lut->base.id : 0;
} else if (property == config->prop_fb_damage_clips) {
*val = (state->fb_damage_clips) ?
state->fb_damage_clips->base.id : 0;
diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
index 085ed0d0db00..29d0fc1e52b5 100644
--- a/drivers/gpu/drm/drm_color_mgmt.c
+++ b/drivers/gpu/drm/drm_color_mgmt.c
@@ -596,6 +596,12 @@ EXPORT_SYMBOL(drm_plane_create_color_properties);
  * to query and get the plane degamma color caps and choose the
  * appropriate degamma mode and create lut values accordingly
  *
+ * degamma_lut_property:
+ * Blob property which allows a userspace to provide LUT values
+ * to apply degamma curve using the h/w plane degamma processing
+ * engine, thereby making the content as linear for further color
+ * processing.
+ *
  */
 int drm_plane_create_color_mgmt_properties(struct drm_device *dev,
   struct drm_plane *plane,
@@ -610,6 +616,13 @@ int drm_plane_create_color_mgmt_properties(struct 
drm_device *dev,
 
plane->degamma_mode_property = prop;
 
+   prop = drm_property_create(dev, DRM_MODE_PROP_BLOB,
+  "PLANE_DEGAMMA_LUT", 0);
+   if (!prop)
+   return -ENOMEM;
+
+   plane->degamma_lut_property = prop;
+
return 0;
 }
 EXPORT_SYMBOL(drm_plane_create_color_mgmt_properties);
@@ -621,6 +634,12 @@ void drm_plane_attach_degamma_properties(struct drm_plane 
*plane)
 
drm_object_attach_property(&plane->base,
   plane->degamma_mode_property, 0);
+
+   if (!plane->degamma_lut_property)
+   return;
+
+   drm_object_attach_property(&plane->base,
+  plane->degamma_lut_property, 0);
 }
 EXPORT_SYMBOL(drm_plane_attach_degamma_properties);
 
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index b9064101db2b..fbfada0b990d 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -247,6 +247,14 @@ struct drm_plane_state {
 */
u32 degamm

[RFC v2 03/22] drm: Add Plane Degamma Mode property

2021-09-06 Thread Uma Shankar
Add Plane Degamma Mode as an enum property. Create a helper
function for all plane color management features.

This is an enum property with values as blob_id's and exposes
the various gamma modes supported and the lut ranges. Getting
the blob id in userspace, user can get the mode supported and
also the range of gamma mode supported with number of lut
coefficients. It can then set one of the modes using this
enum property.

Lut values will be sent through separate GAMMA_LUT blob property.

Signed-off-by: Uma Shankar 
---
 Documentation/gpu/drm-kms.rst | 90 ++
 drivers/gpu/drm/drm_atomic.c  |  1 +
 drivers/gpu/drm/drm_atomic_state_helper.c |  2 +
 drivers/gpu/drm/drm_atomic_uapi.c |  4 +
 drivers/gpu/drm/drm_color_mgmt.c  | 93 ++-
 include/drm/drm_mode_object.h |  2 +-
 include/drm/drm_plane.h   | 23 ++
 7 files changed, 212 insertions(+), 3 deletions(-)

diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
index 1ef7951ded5e..f4658417bf20 100644
--- a/Documentation/gpu/drm-kms.rst
+++ b/Documentation/gpu/drm-kms.rst
@@ -545,9 +545,99 @@ Damage Tracking Properties
 Color Management Properties
 ---
 
+Below is how a typical hardware pipeline for color
+will look like:
+
+.. kernel-render:: DOT
+   :alt: Display Color Pipeline
+   :caption: Display Color Pipeline Overview
+
+   digraph "KMS" {
+  node [shape=box]
+
+  subgraph cluster_static {
+  style=dashed
+  label="Display Color Hardware Blocks"
+
+  node [bgcolor=grey style=filled]
+  "Plane Degamma A" -> "Plane CSC/CTM A"
+  "Plane CSC/CTM A" -> "Plane Gamma A"
+  "Pipe Blender" [color=lightblue,style=filled, width=5.25, 
height=0.75];
+  "Plane Gamma A" -> "Pipe Blender"
+ "Pipe Blender" -> "Pipe DeGamma"
+  "Pipe DeGamma" -> "Pipe CSC/CTM"
+  "Pipe CSC/CTM" -> "Pipe Gamma"
+  "Pipe Gamma" -> "Pipe Output"
+  }
+
+  subgraph cluster_static {
+  style=dashed
+
+  node [shape=box]
+  "Plane Degamma B" -> "Plane CSC/CTM B"
+  "Plane CSC/CTM B" -> "Plane Gamma B"
+  "Plane Gamma B" -> "Pipe Blender"
+  }
+
+  subgraph cluster_static {
+  style=dashed
+
+  node [shape=box]
+  "Plane Degamma C" -> "Plane CSC/CTM C"
+  "Plane CSC/CTM C" -> "Plane Gamma C"
+  "Plane Gamma C" -> "Pipe Blender"
+  }
+
+  subgraph cluster_fb {
+  style=dashed
+  label="RAM"
+
+  node [shape=box width=1.7 height=0.2]
+
+  "FB 1" -> "Plane Degamma A"
+  "FB 2" -> "Plane Degamma B"
+  "FB 3" -> "Plane Degamma C"
+  }
+   }
+
+In real world usecases,
+
+1. Plane Degamma can be used to linearize a non linear gamma
+encoded framebuffer. This is needed to do any linear math like
+color space conversion. For ex, linearize frames encoded in SRGB
+or by HDR curve.
+
+2. Later Plane CTM block can convert the content to some different
+colorspace. For ex, SRGB to BT2020 etc.
+
+3. Plane Gamma block can be used later to re-apply the non-linear
+curve. This can also be used to apply Tone Mapping for HDR usecases.
+
+All the layers or framebuffers need to be converted to same color
+space and format before blending. The plane color hardware blocks
+can help with this. Once the Data is blended, similar color processing
+can be done on blended output using pipe color hardware blocks.
+
+DRM Properties have been created to define and expose all these
+hardware blocks to userspace. A userspace application (compositor
+or any color app) can use these interfaces and define policies to
+efficiently use the display hardware for such color operations.
+
+Pipe Color Management Properties
+-
+
 .. kernel-doc:: drivers/gpu/drm/drm_color_mgmt.c
:doc: overview
 
+Plane Color Management Properties
+-
+
+.. kernel-doc:: drivers/gpu/drm/drm_color_mgmt.c
+   :doc: Plane Color Properties
+
+.. kernel-doc:: drivers/gpu/drm/drm_color_mgmt.c
+   :doc: export
+
 Tile Group Property
 ---
 
diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index ff1416cd609a..fddf9df15cd5 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -709,6 +709,7 @@ static void drm_atomic_plane_print_state(struct drm_printer 
*p,
   drm_get_color_encoding_name(state->color_encoding));
drm_printf(p, "\tcolor-range=%s\n",
   drm_get_color_range_name(state->color_range));
+   drm_printf(p, "\tcolor_mgmt_changed=%d\n", state->color_mgmt_changed);
 
if (plane->funcs->atomic_print_state)
plane->funcs->atomic_print_state(p, state);
diff --git a/drivers/gpu/drm/drm_atomic_state_helper.c 
b/drivers/gpu/drm/drm_atomic_state_helper.c
index ddcf5c2c8e6a..

[RFC v2 02/22] drm: Add Enhanced Gamma and color lut range attributes

2021-09-06 Thread Uma Shankar
Existing LUT precision structure is having only 16 bit
precision. This is not enough for upcoming enhanced hardwares
and advance usecases like HDR processing. Hence added a new
structure with 32 bit precision values.

This also defines a new structure to define color lut ranges,
along with related macro definitions and enums. This will help
describe multi segmented lut ranges in the hardware.

Signed-off-by: Uma Shankar 
---
 include/uapi/drm/drm_mode.h | 58 +
 1 file changed, 58 insertions(+)

diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 90c55383f1ee..1079794c86c3 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -903,6 +903,64 @@ struct hdr_output_metadata {
};
 };
 
+/*
+ * DRM_MODE_LUT_GAMMA|DRM_MODE_LUT_DEGAMMA is legal and means the LUT
+ * can be used for either purpose, but not simultaneously. To expose
+ * modes that support gamma and degamma simultaneously the gamma mode
+ * must declare distinct DRM_MODE_LUT_GAMMA and DRM_MODE_LUT_DEGAMMA
+ * ranges.
+ */
+/* LUT is for gamma (after CTM) */
+#define DRM_MODE_LUT_GAMMA BIT(0)
+/* LUT is for degamma (before CTM) */
+#define DRM_MODE_LUT_DEGAMMA BIT(1)
+/* linearly interpolate between the points */
+#define DRM_MODE_LUT_INTERPOLATE BIT(2)
+/*
+ * the last value of the previous range is the
+ * first value of the current range.
+ */
+#define DRM_MODE_LUT_REUSE_LAST BIT(3)
+/* the curve must be non-decreasing */
+#define DRM_MODE_LUT_NON_DECREASING BIT(4)
+/* the curve is reflected across origin for negative inputs */
+#define DRM_MODE_LUT_REFLECT_NEGATIVE BIT(5)
+/* the same curve (red) is used for blue and green channels as well */
+#define DRM_MODE_LUT_SINGLE_CHANNEL BIT(6)
+
+struct drm_color_lut_range {
+   /* DRM_MODE_LUT_* */
+   __u32 flags;
+   /* number of points on the curve */
+   __u16 count;
+   /* input/output bits per component */
+   __u8 input_bpc, output_bpc;
+   /* input start/end values */
+   __s32 start, end;
+   /* output min/max values */
+   __s32 min, max;
+};
+
+enum lut_type {
+   LUT_TYPE_DEGAMMA = 0,
+   LUT_TYPE_GAMMA = 1,
+};
+
+/*
+ * Creating 64 bit palette entries for better data
+ * precision. This will be required for HDR and
+ * similar color processing usecases.
+ */
+struct drm_color_lut_ext {
+   /*
+* Data is U32.32 fixed point format.
+*/
+   __u64 red;
+   __u64 green;
+   __u64 blue;
+   __u64 reserved;
+};
+
 #define DRM_MODE_PAGE_FLIP_EVENT 0x01
 #define DRM_MODE_PAGE_FLIP_ASYNC 0x02
 #define DRM_MODE_PAGE_FLIP_TARGET_ABSOLUTE 0x4
-- 
2.26.2



[RFC v2 00/22] Add Support for Plane Color Lut and CSC features

2021-09-06 Thread Uma Shankar
This is how a typical display color hardware pipeline looks like:
 +---+
 |RAM|
 |  +--++-++-+   |
 |  | FB 1 ||  FB 2   || FB N|   |
 |  +--++-++-+   |
 +---+
   |  Plane Color Hardware Block |
 ++
 | +---v-+   +---v---+   +---v--+ |
 | | Plane A |   | Plane B   |   | Plane N  | |
 | | DeGamma |   | Degamma   |   | Degamma  | |
 | +---+-+   +---+---+   +---+--+ |
 | | |   ||
 | +---v-+   +---v---+   +---v--+ |
 | |Plane A  |   | Plane B   |   | Plane N  | |
 | |CSC/CTM  |   | CSC/CTM   |   | CSC/CTM  | |
 | +---+-+   ++--+   ++-+ |
 | |  |   |   |
 | +---v-+   +v--+   +v-+ |
 | | Plane A |   | Plane B   |   | Plane N  | |
 | | Gamma   |   | Gamma |   | Gamma| |
 | +---+-+   ++--+   ++-+ |
 | |  |   |   |
 ++
+--v--v---v---|
||   ||
||   Pipe Blender||
+++
|||
|+---v--+ |
||  Pipe DeGamma| |
||  | |
|+---+--+ |
||Pipe Color  |
|+---v--+ Hardware|
||  Pipe CSC/CTM| |
||  | |
|+---+--+ |
|||
|+---v--+ |
||  Pipe Gamma  | |
||  | |
|+---+--+ |
|||
+-+
 |
 v
   Pipe Output

This patch series adds properties for plane color features. It adds
properties for degamma used to linearize data and CSC used for gamut
conversion. It also includes Gamma support used to again non-linearize
data as per panel supported color space. These can be utilize by user
space to convert planes from one format to another, one color space to
another etc.

Userspace can take smart blending decisions and utilize these hardware
supported plane color features to get accurate color profile. The same
can help in consistent color quality from source to panel taking
advantage of advanced color features in hardware.

These patches add the property interfaces and enable helper functions.
This series adds Intel's XE_LPD hw specific plane gamma feature. We
can build up and add other platform/hardware specific implementation
on top of this series.

Credits: Special mention and credits to Ville Syrjala for coming up
with a design for this feature and inputs. This series is based on
his original design and idea.

Note: Userspace support for this new UAPI will be done on Chrome in
alignment with weston and general opensource community.
Discussion ongoing with Harry Wentland, Pekka and community on color
pipeline and UAPI design. Harry's RFC below:
https://patchwork.freedesktop.org/series/89506/
We need to converge on a common UAPI interface which caters to
all the modern color hardware pipelines. 

ToDo: State readout for this feature will be added next.

v2: Added UAPI description and added change in the rfc section of
kernel Documentation folder

Uma Shankar (22):
  drm: RFC for Plane Color Hardware Pipeline
  drm: Add Enhanced Gamma and color lut range attributes
  drm: Add Plane Degamma Mode property
  drm: Add Plane Degamma Lut property
  drm/i915/xelpd: Define Degamma Lut range struct for HDR planes
  drm/i915/xelpd: Add register definitions for Plane Degamma
  drm/i915/xelpd: Enable plane color features
  drm/i915/xelpd: Add color capabilities of SDR planes
  drm/i915/xelpd: Program Plane Degamma Registers
  drm/i915/xelpd: Add plane color check to glk_plane_color_ctl
  drm/i915/xelpd: Initialize plane color features
  drm/i915/xelpd: Load plane color luts from atomic flip
  drm: Add Plane CTM property
  drm: Add helper to attach Plane ctm property
  drm/i915/xelpd: Define Plane CSC Registers
  drm/i915/xelpd: Enable Plane CSC
  drm: Add Plane Gamma Mode property
  drm: Add Plane Gamma Lut property
  drm/i915/xelpd: Define and Initialize Plane Gamma Lut range
  drm/i915/xelpd: Add register definitions for Plane Gamma
  drm/i915/xelpd: Program Plane Gamma Registers
  drm/i915/xelpd: Enable plane gamma

 Documentation/gpu/drm-kms.rst |  90 +++
 Documentati

[RFC v2 01/22] drm: RFC for Plane Color Hardware Pipeline

2021-09-06 Thread Uma Shankar
This is a RFC proposal for plane color hardware blocks.
It exposes the property interface to userspace and calls
out the details or interfaces created and the intended
purpose.

Credits: Ville Syrjälä 
Signed-off-by: Uma Shankar 
---
 Documentation/gpu/rfc/drm_color_pipeline.rst | 167 +++
 1 file changed, 167 insertions(+)
 create mode 100644 Documentation/gpu/rfc/drm_color_pipeline.rst

diff --git a/Documentation/gpu/rfc/drm_color_pipeline.rst 
b/Documentation/gpu/rfc/drm_color_pipeline.rst
new file mode 100644
index ..0d1ca858783b
--- /dev/null
+++ b/Documentation/gpu/rfc/drm_color_pipeline.rst
@@ -0,0 +1,167 @@
+==
+Display Color Pipeline: Proposed DRM Properties
+==
+
+This is how a typical display color hardware pipeline looks like:
+ +---+
+ |RAM|
+ |  +--++-++-+   |
+ |  | FB 1 ||  FB 2   || FB N|   |
+ |  +--++-++-+   |
+ +---+
+   |  Plane Color Hardware Block |
+ ++
+ | +---v-+   +---v---+   +---v--+ |
+ | | Plane A |   | Plane B   |   | Plane N  | |
+ | | DeGamma |   | Degamma   |   | Degamma  | |
+ | +---+-+   +---+---+   +---+--+ |
+ | | |   ||
+ | +---v-+   +---v---+   +---v--+ |
+ | |Plane A  |   | Plane B   |   | Plane N  | |
+ | |CSC/CTM  |   | CSC/CTM   |   | CSC/CTM  | |
+ | +---+-+   ++--+   ++-+ |
+ | |  |   |   |
+ | +---v-+   +v--+   +v-+ |
+ | | Plane A |   | Plane B   |   | Plane N  | |
+ | | Gamma   |   | Gamma |   | Gamma| |
+ | +---+-+   ++--+   ++-+ |
+ | |  |   |   |
+ ++
++--v--v---v---|
+||   ||
+||   Pipe Blender||
++++
+|||
+|+---v--+ |
+||  Pipe DeGamma| |
+||  | |
+|+---+--+ |
+||Pipe Color  |
+|+---v--+ Hardware|
+||  Pipe CSC/CTM| |
+||  | |
+|+---+--+ |
+|||
+|+---v--+ |
+||  Pipe Gamma  | |
+||  | |
+|+---+--+ |
+|||
++-+
+ |
+ v
+   Pipe Output
+
+Proposal is to have below properties for a plane:
+
+* Plane Degamma or Pre-Curve:
+   * This will be used to linearize the input framebuffer data.
+   * It will apply the reverse of the color transfer function.
+   * It can be a degamma curve or OETF for HDR.
+   * This linear data can be further acted on by the following
+   * color hardware blocks in the display hardware pipeline
+
+UAPI Name: PLANE_DEGAMMA_MODE
+Description: Enum property with values as blob_id's which advertizes the
+   possible degamma modes and lut ranges supported by the platform.
+   This  allows userspace to query and get the plane degamma color
+   caps and choose the appropriate degamma mode and create lut values
+   accordingly.
+
+UAPI Name: PLANE_DEGAMMA_LUT
+Description: Blob property which allows a userspace to provide LUT values
+to apply degamma curve using the h/w plane degamma processing
+engine, thereby making the content as linear for further color
+processing. Userspace gets the size of LUT and precision etc
+from PLANE_DEGAMA_MODE_PROPERTY
+   
+* Plane CTM
+   * This is a Property to program the color transformation matrix.
+   * This can be used to perform a color space conversion like
+   * BT2020 to BT709 or BT601 etc.
+   * This block is generally kept after the degamma unit so that
+   * linear data can be fed to it for conversion.
+
+UAPI Name: PLANE_CTM
+Description: Blob property which allows a userspace to provide CTM coefficients
+to do color space conversion or any other enhancement by doing a
+matrix multiplication using the h/w CTM processing engine
+
+* Plane Gamma or Post-Curve
+   * This can be used to perform 2 operations:
+   * non-lineralize the framebuff

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-09-06 Thread Rob Clark
On Mon, Sep 6, 2021 at 12:58 PM Amit Pundir  wrote:
>
> On Mon, 6 Sept 2021 at 21:54, Rob Clark  wrote:
> >
> > On Mon, Sep 6, 2021 at 1:02 AM Amit Pundir  wrote:
> > >
> > > On Sat, 4 Sept 2021 at 01:55, Rob Clark  wrote:
> > > >
> > > > On Fri, Sep 3, 2021 at 12:39 PM John Stultz  
> > > > wrote:
> > > > >
> > > > > On Thu, Jul 29, 2021 at 1:49 PM Rob Clark  wrote:
> > > > > > On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
> > > > > >  wrote:
> > > > > > > On 29/07/2021 21:24, Rob Clark wrote:
> > > > > > > > On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
> > > > > > > >  wrote:
> > > > > > > >>
> > > > > > > >> Hi Rob,
> > > > > > > >>
> > > > > > > >> I've done some more testing! It looks like before that patch 
> > > > > > > >> ("drm/msm: Devfreq tuning") the GPU would never get above
> > > > > > > >> the second frequency in the OPP table (342MHz) (at least, not 
> > > > > > > >> in glxgears). With the patch applied it would more
> > > > > > > >> aggressively jump up to the max frequency which seems to be 
> > > > > > > >> unstable at the default regulator voltages.
> > > > > > > >
> > > > > > > > *ohh*, yeah, ok, that would explain it
> > > > > > > >
> > > > > > > >> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 
> > > > > > > >> 0.988v (instead of the stock 0.516v) makes the GPU stable
> > > > > > > >> at the higher frequencies.
> > > > > > > >>
> > > > > > > >> Applying this patch reverts the behaviour, and the GPU never 
> > > > > > > >> goes above 342MHz in glxgears, losing ~30% performance in
> > > > > > > >> glxgear.
> > > > > > > >>
> > > > > > > >> I think (?) that enabling CPR support would be the proper 
> > > > > > > >> solution to this - that would ensure that the regulators run
> > > > > > > >> at the voltage the hardware needs to be stable.
> > > > > > > >>
> > > > > > > >> Is hacking the voltage higher (although ideally not quite that 
> > > > > > > >> high) an acceptable short term solution until we have
> > > > > > > >> CPR? Or would it be safer to just not make use of the higher 
> > > > > > > >> frequencies on a630 for now?
> > > > > > > >>
> > > > > > > >
> > > > > > > > tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is 
> > > > > > > > already
> > > > > > > > on CC and I added sboyd, maybe one of them knows better.
> > > > > > > >
> > > > > > > > In the short term, removing the higher problematic OPPs from 
> > > > > > > > dts might
> > > > > > > > be a better option than this patch (which I'm dropping), since 
> > > > > > > > there
> > > > > > > > is nothing stopping other workloads from hitting higher OPPs.
> > > > > > > Oh yeah that sounds like a more sensible workaround than mine .
> > > > > > > >
> > > > > > > > I'm slightly curious why I didn't have problems at higher OPPs 
> > > > > > > > on my
> > > > > > > > c630 laptop (sdm850)
> > > > > > > Perhaps you won the sillicon lottery - iirc sdm850 is binned for 
> > > > > > > higher clocks as is out of the factory.
> > > > > > >
> > > > > > > Would it be best to drop the OPPs for all devices? Or just those 
> > > > > > > affected? I guess it's possible another c630 might
> > > > > > > crash where yours doesn't?
> > > > > >
> > > > > > I've not heard any reports of similar issues from the handful of 
> > > > > > other
> > > > > > folks with c630's on #aarch64-laptops.. but I can't really say if 
> > > > > > that
> > > > > > is luck or not.
> > > > > >
> > > > > > Maybe just remove it for affected devices?  But I'll defer to Bjorn.
> > > > >
> > > > > Just as another datapoint, I was just marveling at how suddenly smooth
> > > > > the UI was performing on db845c and Caleb pointed me at the "drm/msm:
> > > > > Devfreq tuning" patch as the likely cause of the improvement, and
> > > > > mid-discussion my board crashed into USB crash mode:
> > > > > [  146.157696][C0] adreno 500.gpu: CP | AHB bus error
> > > > > [  146.163303][C0] adreno 500.gpu: CP | AHB bus error
> > > > > [  146.168837][C0] adreno 500.gpu: RBBM | ATB bus overflow
> > > > > [  146.174960][C0] adreno 500.gpu: CP | HW fault | 
> > > > > status=0x
> > > > > [  146.181917][C0] adreno 500.gpu: CP | AHB bus error
> > > > > [  146.187547][C0] adreno 500.gpu: CP illegal instruction 
> > > > > error
> > > > > [  146.194009][C0] adreno 500.gpu: CP | AHB bus error
> > > > > [  146.308909][T9] Internal error: synchronous external abort:
> > > > > 9610 [#1] PREEMPT SMP
> > > > > [  146.317150][T9] Modules linked in:
> > > > > [  146.320941][T9] CPU: 3 PID: 9 Comm: kworker/u16:1 Tainted: G
> > > > > W 5.14.0-mainline-06795-g42b258c2275c #24
> > > > > [  146.331974][T9] Hardware name: Thundercomm Dragonboar
> > > > > Format: Log Type - Time(microsec) - Message - Optional Info
> > > > > Log Type: B - Since Boot(Power On Reset),  D - Delta,  S - Statistic
> > > > > S - QC_IMAGE_VERSION_STRING=BOOT.XF.2.0-00371-SDM845LZB-1
> > > > > S - IMAGE_VARIANT_STRING

Re: [PATCH v1 6/6] drm/mediatek: Add mt8195 DisplayPort driver

2021-09-06 Thread Sam Ravnborg
Hi Markus,

On Mon, Sep 06, 2021 at 09:35:29PM +0200, Markus Schneider-Pargmann wrote:
> This patch adds a DisplayPort driver for the Mediatek mt8195 SoC.
> 
> It supports both functional units on the mt8195, the embedded
> DisplayPort as well as the external DisplayPort units. It offers
> hot-plug-detection, audio up to 8 channels, and DisplayPort 1.4 with up
> to 4 lanes.
> 
> This driver is based on an initial version by
> Jason-JH.Lin .
> 
> Signed-off-by: Markus Schneider-Pargmann 
> ---
> 
> Notes:
> Changes RFC -> v1:
> - Removed unused register definitions.
> - Replaced workqueue with threaded irq.
> - Removed connector code.
> - Move to atomic_* drm functions.
> - General cleanups of the code.
> - Remove unused select GENERIC_PHY.
> 
>  drivers/gpu/drm/mediatek/Kconfig  |6 +
>  drivers/gpu/drm/mediatek/Makefile |2 +
>  drivers/gpu/drm/mediatek/mtk_dp.c | 2881 +
>  drivers/gpu/drm/mediatek/mtk_dp_reg.h |  580 +
>  4 files changed, 3469 insertions(+)
>  create mode 100644 drivers/gpu/drm/mediatek/mtk_dp.c
>  create mode 100644 drivers/gpu/drm/mediatek/mtk_dp_reg.h
> 


> +
> +static const struct drm_bridge_funcs mtk_dp_bridge_funcs = {
> + .atomic_duplicate_state = drm_atomic_helper_bridge_duplicate_state,
> + .atomic_destroy_state = drm_atomic_helper_bridge_destroy_state,
> + .atomic_reset = drm_atomic_helper_bridge_reset,
> + .attach = mtk_dp_bridge_attach,
> + .detach = mtk_dp_bridge_detach,
> + .pre_enable = mtk_dp_bridge_pre_enable,
Use the atomic variant here as pre_enable is deprecated.

> + .atomic_enable = mtk_dp_bridge_atomic_enable,
> + .atomic_disable = mtk_dp_bridge_atomic_disable,
> + .post_disable = mtk_dp_bridge_post_disable,
Use the atomic variant here as .post_disable is deprecated.

> + .get_edid = mtk_dp_get_edid,
> + .detect = mtk_dp_bdg_detect,
> +};

Everything else I skimmed looked fine. But it was a quick skim so..

Sam


Re: [PATCH v1 4/6] video/hdmi: Add audio_infoframe packing for DP

2021-09-06 Thread Sam Ravnborg
Hi Markus,

On Mon, Sep 06, 2021 at 09:35:27PM +0200, Markus Schneider-Pargmann wrote:
> Similar to HDMI, DP uses audio infoframes as well which are structured
> very similar to the HDMI ones.
> 
> This patch adds a helper function to pack the HDMI audio infoframe for
> DP, called hdmi_audio_infoframe_pack_for_dp().
> hdmi_audio_infoframe_pack_only() is split into two parts. One of them
> packs the payload only and can be used for HDMI and DP.
> 
> Signed-off-by: Markus Schneider-Pargmann 
> ---
>  drivers/video/hdmi.c | 87 +++-
>  include/linux/hdmi.h |  4 ++
>  2 files changed, 73 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/video/hdmi.c b/drivers/video/hdmi.c
> index 947be761dfa4..59c4341549e4 100644
> --- a/drivers/video/hdmi.c
> +++ b/drivers/video/hdmi.c
> @@ -387,6 +387,28 @@ int hdmi_audio_infoframe_check(struct 
> hdmi_audio_infoframe *frame)
>  }
>  EXPORT_SYMBOL(hdmi_audio_infoframe_check);
>  
> +static void
> +hdmi_audio_infoframe_pack_payload(const struct hdmi_audio_infoframe *frame,
> +   u8 *buffer)
> +{
> + u8 channels;
> +
> + if (frame->channels >= 2)
> + channels = frame->channels - 1;
> + else
> + channels = 0;
> +
> + buffer[0] = ((frame->coding_type & 0xf) << 4) | (channels & 0x7);
> + buffer[1] = ((frame->sample_frequency & 0x7) << 2) |
> +  (frame->sample_size & 0x3);
> + buffer[2] = frame->coding_type_ext & 0x1f;
> + buffer[3] = frame->channel_allocation;
> + buffer[4] = (frame->level_shift_value & 0xf) << 3;
> +
> + if (frame->downmix_inhibit)
> + buffer[4] |= BIT(7);
> +}
> +
>  /**
>   * hdmi_audio_infoframe_pack_only() - write HDMI audio infoframe to binary 
> buffer
>   * @frame: HDMI audio infoframe
> @@ -404,7 +426,6 @@ EXPORT_SYMBOL(hdmi_audio_infoframe_check);
>  ssize_t hdmi_audio_infoframe_pack_only(const struct hdmi_audio_infoframe 
> *frame,
>  void *buffer, size_t size)
>  {
> - unsigned char channels;
>   u8 *ptr = buffer;
>   size_t length;
>   int ret;
> @@ -420,28 +441,13 @@ ssize_t hdmi_audio_infoframe_pack_only(const struct 
> hdmi_audio_infoframe *frame,
>  
>   memset(buffer, 0, size);
>  
> - if (frame->channels >= 2)
> - channels = frame->channels - 1;
> - else
> - channels = 0;
> -
>   ptr[0] = frame->type;
>   ptr[1] = frame->version;
>   ptr[2] = frame->length;
>   ptr[3] = 0; /* checksum */
>  
> - /* start infoframe payload */
> - ptr += HDMI_INFOFRAME_HEADER_SIZE;
> -
> - ptr[0] = ((frame->coding_type & 0xf) << 4) | (channels & 0x7);
> - ptr[1] = ((frame->sample_frequency & 0x7) << 2) |
> -  (frame->sample_size & 0x3);
> - ptr[2] = frame->coding_type_ext & 0x1f;
> - ptr[3] = frame->channel_allocation;
> - ptr[4] = (frame->level_shift_value & 0xf) << 3;
> -
> - if (frame->downmix_inhibit)
> - ptr[4] |= BIT(7);
> + hdmi_audio_infoframe_pack_payload(frame,
> +   ptr + HDMI_INFOFRAME_HEADER_SIZE);
>  
>   hdmi_infoframe_set_checksum(buffer, length);
>  
> @@ -479,6 +485,51 @@ ssize_t hdmi_audio_infoframe_pack(struct 
> hdmi_audio_infoframe *frame,
>  }
>  EXPORT_SYMBOL(hdmi_audio_infoframe_pack);
>  
> +/**
> + * hdmi_audio_infoframe_pack_for_dp - Pack a HDMI Audio infoframe for
> + *displayport
> + *
> + * @frame HDMI Audio infoframe
> + * @header Header buffer to be used
> + * @header_size Size of header buffer
> + * @data Data buffer to be used
> + * @data_size Size of data buffer
> + * @dp_version Display Port version to be encoded in the header
> + *
> + * Packs a HDMI Audio Infoframe to be sent over Display Port. This function
> + * fills both header and data buffer with the required data.
> + *
> + * Return: Number of total written bytes or a negative errno on failure.
> + */
> +ssize_t hdmi_audio_infoframe_pack_for_dp(struct hdmi_audio_infoframe *frame,
> +  void *header, size_t header_size,
> +  void *data, size_t data_size,
> +  u8 dp_version)
> +{
> + int ret;
> + u8 *hdr_ptr = header;
> +
> + ret = hdmi_audio_infoframe_check(frame);
> + if (ret)
> + return ret;
> +
> + if (header_size < 4 || data_size < frame->length)
> + return -ENOSPC;
> +
> + memset(header, 0, header_size);
> + memset(data, 0, data_size);
> +
> + // Secondary-data packet header
> + hdr_ptr[1] = frame->type;
> + hdr_ptr[2] = 0x1B;  // As documented by DP spec for Secondary-data 
> Packets
Any constant we could use or define and use here?
Hard coding 0x1b is the less desireable option.

Sam

> + hdr_ptr[3] = (dp_version & 0x3f) << 2;
> +
> + hdmi_audio_infoframe_pack_payload(frame, data);
> +
> 

[PATCH] drm/msm/dsi: dsi_phy_14nm: Take ready-bit into account in poll_for_ready

2021-09-06 Thread Marijn Suijten
The downstream driver models this PLL lock check as an if-elseif-else.
The only way to reach the else case where pll_locked=true [1] is by
succeeding both readl_poll_timeout_atomic calls (which return zero on
success) in the if _and_ elseif condition.  Hence both the "lock" and
"ready" bit need to be tested in the SM_READY_STATUS register before
considering the PLL locked and ready to go.

Tested on the Sony Xperia XA2 Ultra (nile-discovery, sdm630).

[1]: 
https://source.codeaurora.org/quic/la/kernel/msm-4.19/tree/drivers/clk/qcom/mdss/mdss-dsi-pll-14nm-util.c?h=LA.UM.9.2.1.r1-08000-sdm660.0#n302

Fixes: f079f6d999cb ("drm/msm/dsi: Add PHY/PLL for 8x96")
Signed-off-by: Marijn Suijten 
---
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c 
b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c
index 8905f365c932..789b08c24d25 100644
--- a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c
+++ b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c
@@ -110,14 +110,13 @@ static struct dsi_pll_14nm *pll_14nm_list[DSI_MAX];
 static bool pll_14nm_poll_for_ready(struct dsi_pll_14nm *pll_14nm,
u32 nb_tries, u32 timeout_us)
 {
-   bool pll_locked = false;
+   bool pll_locked = false, pll_ready = false;
void __iomem *base = pll_14nm->phy->pll_base;
u32 tries, val;
 
tries = nb_tries;
while (tries--) {
-   val = dsi_phy_read(base +
-  REG_DSI_14nm_PHY_PLL_RESET_SM_READY_STATUS);
+   val = dsi_phy_read(base + 
REG_DSI_14nm_PHY_PLL_RESET_SM_READY_STATUS);
pll_locked = !!(val & BIT(5));
 
if (pll_locked)
@@ -126,23 +125,24 @@ static bool pll_14nm_poll_for_ready(struct dsi_pll_14nm 
*pll_14nm,
udelay(timeout_us);
}
 
-   if (!pll_locked) {
-   tries = nb_tries;
-   while (tries--) {
-   val = dsi_phy_read(base +
-   REG_DSI_14nm_PHY_PLL_RESET_SM_READY_STATUS);
-   pll_locked = !!(val & BIT(0));
+   if (!pll_locked)
+   goto out;
 
-   if (pll_locked)
-   break;
+   tries = nb_tries;
+   while (tries--) {
+   val = dsi_phy_read(base + 
REG_DSI_14nm_PHY_PLL_RESET_SM_READY_STATUS);
+   pll_ready = !!(val & BIT(0));
 
-   udelay(timeout_us);
-   }
+   if (pll_ready)
+   break;
+
+   udelay(timeout_us);
}
 
-   DBG("DSI PLL is %slocked", pll_locked ? "" : "*not* ");
+out:
+   DBG("DSI PLL is %slocked, %sready", pll_locked ? "" : "*not* ", 
pll_ready ? "" : "*not* ");
 
-   return pll_locked;
+   return pll_locked && pll_ready;
 }
 
 static void dsi_pll_14nm_config_init(struct dsi_pll_config *pconf)
-- 
2.33.0



[PATCH] drm/msm/dsi: Use division result from div_u64_rem in 7nm and 14nm PLL

2021-09-06 Thread Marijn Suijten
div_u64_rem provides the result of the divison and additonally the
remainder; don't use this function to solely calculate the remainder
while calculating the division again with div_u64.

A similar improvement was applied earlier to the 10nm pll in
5c191fef4ce2 ("drm/msm/dsi_pll_10nm: Fix dividing the same numbers
twice").

Signed-off-by: Marijn Suijten 
---
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c | 4 +---
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c  | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c 
b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c
index 3c1e2106d962..8905f365c932 100644
--- a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c
+++ b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c
@@ -213,9 +213,7 @@ static void pll_14nm_dec_frac_calc(struct dsi_pll_14nm 
*pll, struct dsi_pll_conf
DBG("vco_clk_rate=%lld ref_clk_rate=%lld", vco_clk_rate, fref);
 
dec_start_multiple = div_u64(vco_clk_rate * multiplier, fref);
-   div_u64_rem(dec_start_multiple, multiplier, &div_frac_start);
-
-   dec_start = div_u64(dec_start_multiple, multiplier);
+   dec_start = div_u64_rem(dec_start_multiple, multiplier, 
&div_frac_start);
 
pconf->dec_start = (u32)dec_start;
pconf->div_frac_start = div_frac_start;
diff --git a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c 
b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c
index c77c30628cca..1a5abbd9fb76 100644
--- a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c
+++ b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c
@@ -114,9 +114,7 @@ static void dsi_pll_calc_dec_frac(struct dsi_pll_7nm *pll, 
struct dsi_pll_config
 
multiplier = 1 << FRAC_BITS;
dec_multiple = div_u64(pll_freq * multiplier, divider);
-   div_u64_rem(dec_multiple, multiplier, &frac);
-
-   dec = div_u64(dec_multiple, multiplier);
+   dec = div_u64_rem(dec_multiple, multiplier, &frac);
 
if (!(pll->phy->cfg->quirks & DSI_PHY_7NM_QUIRK_V4_1))
config->pll_clock_inverters = 0x28;
-- 
2.33.0



Re: [PATCH v1 3/6] drm/edid: Add cea_sad helpers for freq/length

2021-09-06 Thread Sam Ravnborg
Hi Markus,

On Mon, Sep 06, 2021 at 09:35:26PM +0200, Markus Schneider-Pargmann wrote:
> This patch adds two helper functions that extract the frequency and word
> length from a struct cea_sad.
> 
> For these helper functions new defines are added that help translate the
> 'freq' and 'byte2' fields into real numbers.
> 
> Signed-off-by: Markus Schneider-Pargmann 
> ---
>  drivers/gpu/drm/drm_edid.c | 57 ++
>  include/drm/drm_edid.h | 18 ++--
>  2 files changed, 73 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index 81d5f2524246..2389d34ce10e 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -4666,6 +4666,63 @@ int drm_edid_to_speaker_allocation(struct edid *edid, 
> u8 **sadb)
>  }
>  EXPORT_SYMBOL(drm_edid_to_speaker_allocation);
>  
> +/**
> + * drm_cea_sad_get_sample_rate - Extract the sample rate from cea_sad
> + * @sad: Pointer to the cea_sad struct
> + *
> + * Extracts the cea_sad frequency field and returns the sample rate in Hz.
> + *
> + * Return: Sample rate in Hz or a negative errno if parsing failed.
> + */
> +int drm_cea_sad_get_sample_rate(struct cea_sad *sad)

It would be nice to use const struct cea_sad *sad here.

> +{
> + switch (sad->freq) {
> + case CEA_SAD_FREQ_32KHZ:
> + return 32000;
> + case CEA_SAD_FREQ_44KHZ:
> + return 44100;
> + case CEA_SAD_FREQ_48KHZ:
> + return 48000;
> + case CEA_SAD_FREQ_88KHZ:
> + return 88200;
> + case CEA_SAD_FREQ_96KHZ:
> + return 96000;
> + case CEA_SAD_FREQ_176KHZ:
> + return 176400;
> + case CEA_SAD_FREQ_192KHZ:
> + return 192000;
> + default:
> + return -EINVAL;
> + }
> +}
> +EXPORT_SYMBOL(drm_cea_sad_get_sample_rate);
> +
> +/**
> + * drm_cea_sad_get_uncompressed_word_length - Extract word length
> + * @sad: Pointer to the cea_sad struct
> + *
> + * Extracts the cea_sad byte2 field and returns the word length for an
> + * uncompressed stream.
> + *
> + * Note: This function may only be called for uncompressed audio.
Can you check or this and WARN (or drm_WARN) if this is not the case?

> + *
> + * Return: Word length in bits or a negative errno if parsing failed.
> + */
> +int drm_cea_sad_get_uncompressed_word_length(struct cea_sad *sad)
Again, consider to use const.

Sam

> +{
> + switch (sad->byte2) {
> + case CEA_SAD_UNCOMPRESSED_WORD_16BIT:
> + return 16;
> + case CEA_SAD_UNCOMPRESSED_WORD_20BIT:
> + return 20;
> + case CEA_SAD_UNCOMPRESSED_WORD_24BIT:
> + return 24;
> + default:
> + return -EINVAL;
> + }
> +}
> +EXPORT_SYMBOL(drm_cea_sad_get_uncompressed_word_length);
> +
>  /**
>   * drm_av_sync_delay - compute the HDMI/DP sink audio-video sync delay
>   * @connector: connector associated with the HDMI/DP sink
> diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
> index 759328a5eeb2..bed091a749ef 100644
> --- a/include/drm/drm_edid.h
> +++ b/include/drm/drm_edid.h
> @@ -361,12 +361,24 @@ struct edid {
>  
>  /* Short Audio Descriptor */
>  struct cea_sad {
> - u8 format;
> + u8 format; /* See HDMI_AUDIO_CODING_TYPE_* */
>   u8 channels; /* max number of channels - 1 */
> - u8 freq;
> + u8 freq; /* See CEA_SAD_FREQ_* */
>   u8 byte2; /* meaning depends on format */
>  };
>  
> +#define CEA_SAD_FREQ_32KHZ  BIT(0)
> +#define CEA_SAD_FREQ_44KHZ  BIT(1)
> +#define CEA_SAD_FREQ_48KHZ  BIT(2)
> +#define CEA_SAD_FREQ_88KHZ  BIT(3)
> +#define CEA_SAD_FREQ_96KHZ  BIT(4)
> +#define CEA_SAD_FREQ_176KHZ BIT(5)
> +#define CEA_SAD_FREQ_192KHZ BIT(6)
> +
> +#define CEA_SAD_UNCOMPRESSED_WORD_16BIT BIT(0)
> +#define CEA_SAD_UNCOMPRESSED_WORD_20BIT BIT(1)
> +#define CEA_SAD_UNCOMPRESSED_WORD_24BIT BIT(2)
> +
>  struct drm_encoder;
>  struct drm_connector;
>  struct drm_connector_state;
> @@ -374,6 +386,8 @@ struct drm_display_mode;
>  
>  int drm_edid_to_sad(struct edid *edid, struct cea_sad **sads);
>  int drm_edid_to_speaker_allocation(struct edid *edid, u8 **sadb);
> +int drm_cea_sad_get_sample_rate(struct cea_sad *sad);
> +int drm_cea_sad_get_uncompressed_word_length(struct cea_sad *sad);
>  int drm_av_sync_delay(struct drm_connector *connector,
> const struct drm_display_mode *mode);
>  
> -- 
> 2.33.0


Re: [PATCH v1 1/6] dt-bindings: mediatek,dpi: Add mt8195 dpintf

2021-09-06 Thread Sam Ravnborg
Hi Markus,

On Mon, Sep 06, 2021 at 09:35:24PM +0200, Markus Schneider-Pargmann wrote:
> DP_INTF is similar to the actual dpi. They differ in some points
> regarding registers and what needs to be set but the function blocks
> itself are similar in design.
> 
> Signed-off-by: Markus Schneider-Pargmann 

I fail to see why they share the same dt-schema as the main content in
the schema is the clocks and they differ.

A new mediatek,dpintf schema seems more appropriate.

I recall I though so when reading the RFC variant but failed to comment on it.

Sam

> ---
>  .../display/mediatek/mediatek,dpi.yaml| 43 ---
>  1 file changed, 37 insertions(+), 6 deletions(-)
> 
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml
> index dd2896a40ff0..1a158b719ce6 100644
> --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml
> +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml
> @@ -4,7 +4,7 @@
>  $id: http://devicetree.org/schemas/display/mediatek/mediatek,dpi.yaml#
>  $schema: http://devicetree.org/meta-schemas/core.yaml#
>  
> -title: mediatek DPI Controller Device Tree Bindings
> +title: mediatek DPI/DP_INTF Controller Device Tree Bindings
>  
>  maintainers:
>- CK Hu 
> @@ -13,7 +13,8 @@ maintainers:
>  description: |
>The Mediatek DPI function block is a sink of the display subsystem and
>provides 8-bit RGB/YUV444 or 8/10/10-bit YUV422 pixel data on a parallel
> -  output bus.
> +  output bus. The Mediatek DP_INTF is a similar function block that is
> +  connected to the (embedded) display port function block.
>  
>  properties:
>compatible:
> @@ -23,6 +24,7 @@ properties:
>- mediatek,mt8173-dpi
>- mediatek,mt8183-dpi
>- mediatek,mt8192-dpi
> +  - mediatek,mt8195-dpintf
>  
>reg:
>  maxItems: 1
> @@ -37,10 +39,11 @@ properties:
>- description: DPI PLL
>  
>clock-names:
> -items:
> -  - const: pixel
> -  - const: engine
> -  - const: pll
> +description:
> +  For dpi clocks pixel, engine and pll are required. For dpintf pixel,
> +  hf_fmm and hf_fdp are required.
> +minItems: 3
> +maxItems: 3
>  
>pinctrl-0: true
>pinctrl-1: true
> @@ -64,6 +67,34 @@ required:
>- clock-names
>- port
>  
> +allOf:
> +  - if:
> +  properties:
> +compatible:
> +  contains:
> +enum:
> +  - mediatek,mt8195-dpintf
> +then:
> +  properties:
> +clocks:
> +  minItems: 3
> +  maxItems: 3
> +clock-names:
> +  items:
> +- const: pixel
> +- const: hf_fmm
> +- const: hf_fdp
> +else:
> +  properties:
> +clocks:
> +  minItems: 3
> +  maxItems: 3
> +clock-names:
> +  items:
> +- const: pixel
> +- const: engine
> +- const: pll
> +
>  additionalProperties: false
>  
>  examples:
> -- 
> 2.33.0


Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-09-06 Thread Amit Pundir
On Mon, 6 Sept 2021 at 21:54, Rob Clark  wrote:
>
> On Mon, Sep 6, 2021 at 1:02 AM Amit Pundir  wrote:
> >
> > On Sat, 4 Sept 2021 at 01:55, Rob Clark  wrote:
> > >
> > > On Fri, Sep 3, 2021 at 12:39 PM John Stultz  
> > > wrote:
> > > >
> > > > On Thu, Jul 29, 2021 at 1:49 PM Rob Clark  wrote:
> > > > > On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
> > > > >  wrote:
> > > > > > On 29/07/2021 21:24, Rob Clark wrote:
> > > > > > > On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
> > > > > > >  wrote:
> > > > > > >>
> > > > > > >> Hi Rob,
> > > > > > >>
> > > > > > >> I've done some more testing! It looks like before that patch 
> > > > > > >> ("drm/msm: Devfreq tuning") the GPU would never get above
> > > > > > >> the second frequency in the OPP table (342MHz) (at least, not in 
> > > > > > >> glxgears). With the patch applied it would more
> > > > > > >> aggressively jump up to the max frequency which seems to be 
> > > > > > >> unstable at the default regulator voltages.
> > > > > > >
> > > > > > > *ohh*, yeah, ok, that would explain it
> > > > > > >
> > > > > > >> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 
> > > > > > >> 0.988v (instead of the stock 0.516v) makes the GPU stable
> > > > > > >> at the higher frequencies.
> > > > > > >>
> > > > > > >> Applying this patch reverts the behaviour, and the GPU never 
> > > > > > >> goes above 342MHz in glxgears, losing ~30% performance in
> > > > > > >> glxgear.
> > > > > > >>
> > > > > > >> I think (?) that enabling CPR support would be the proper 
> > > > > > >> solution to this - that would ensure that the regulators run
> > > > > > >> at the voltage the hardware needs to be stable.
> > > > > > >>
> > > > > > >> Is hacking the voltage higher (although ideally not quite that 
> > > > > > >> high) an acceptable short term solution until we have
> > > > > > >> CPR? Or would it be safer to just not make use of the higher 
> > > > > > >> frequencies on a630 for now?
> > > > > > >>
> > > > > > >
> > > > > > > tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is 
> > > > > > > already
> > > > > > > on CC and I added sboyd, maybe one of them knows better.
> > > > > > >
> > > > > > > In the short term, removing the higher problematic OPPs from dts 
> > > > > > > might
> > > > > > > be a better option than this patch (which I'm dropping), since 
> > > > > > > there
> > > > > > > is nothing stopping other workloads from hitting higher OPPs.
> > > > > > Oh yeah that sounds like a more sensible workaround than mine .
> > > > > > >
> > > > > > > I'm slightly curious why I didn't have problems at higher OPPs on 
> > > > > > > my
> > > > > > > c630 laptop (sdm850)
> > > > > > Perhaps you won the sillicon lottery - iirc sdm850 is binned for 
> > > > > > higher clocks as is out of the factory.
> > > > > >
> > > > > > Would it be best to drop the OPPs for all devices? Or just those 
> > > > > > affected? I guess it's possible another c630 might
> > > > > > crash where yours doesn't?
> > > > >
> > > > > I've not heard any reports of similar issues from the handful of other
> > > > > folks with c630's on #aarch64-laptops.. but I can't really say if that
> > > > > is luck or not.
> > > > >
> > > > > Maybe just remove it for affected devices?  But I'll defer to Bjorn.
> > > >
> > > > Just as another datapoint, I was just marveling at how suddenly smooth
> > > > the UI was performing on db845c and Caleb pointed me at the "drm/msm:
> > > > Devfreq tuning" patch as the likely cause of the improvement, and
> > > > mid-discussion my board crashed into USB crash mode:
> > > > [  146.157696][C0] adreno 500.gpu: CP | AHB bus error
> > > > [  146.163303][C0] adreno 500.gpu: CP | AHB bus error
> > > > [  146.168837][C0] adreno 500.gpu: RBBM | ATB bus overflow
> > > > [  146.174960][C0] adreno 500.gpu: CP | HW fault | 
> > > > status=0x
> > > > [  146.181917][C0] adreno 500.gpu: CP | AHB bus error
> > > > [  146.187547][C0] adreno 500.gpu: CP illegal instruction error
> > > > [  146.194009][C0] adreno 500.gpu: CP | AHB bus error
> > > > [  146.308909][T9] Internal error: synchronous external abort:
> > > > 9610 [#1] PREEMPT SMP
> > > > [  146.317150][T9] Modules linked in:
> > > > [  146.320941][T9] CPU: 3 PID: 9 Comm: kworker/u16:1 Tainted: G
> > > > W 5.14.0-mainline-06795-g42b258c2275c #24
> > > > [  146.331974][T9] Hardware name: Thundercomm Dragonboar
> > > > Format: Log Type - Time(microsec) - Message - Optional Info
> > > > Log Type: B - Since Boot(Power On Reset),  D - Delta,  S - Statistic
> > > > S - QC_IMAGE_VERSION_STRING=BOOT.XF.2.0-00371-SDM845LZB-1
> > > > S - IMAGE_VARIANT_STRING=SDM845LA
> > > > S - OEM_IMAGE_VERSION_STRING=TSBJ-FA-PC-02170
> > > >
> > > > So Caleb sent me to this thread. :)
> > > >
> > > > I'm still trying to trip it again, but it does seem like db845c is
> > > > also seeing some stability issues with Linus' HEAD.
> > > >
> > >
> > > 

[PATCH v1 6/6] drm/mediatek: Add mt8195 DisplayPort driver

2021-09-06 Thread Markus Schneider-Pargmann
This patch adds a DisplayPort driver for the Mediatek mt8195 SoC.

It supports both functional units on the mt8195, the embedded
DisplayPort as well as the external DisplayPort units. It offers
hot-plug-detection, audio up to 8 channels, and DisplayPort 1.4 with up
to 4 lanes.

This driver is based on an initial version by
Jason-JH.Lin .

Signed-off-by: Markus Schneider-Pargmann 
---

Notes:
Changes RFC -> v1:
- Removed unused register definitions.
- Replaced workqueue with threaded irq.
- Removed connector code.
- Move to atomic_* drm functions.
- General cleanups of the code.
- Remove unused select GENERIC_PHY.

 drivers/gpu/drm/mediatek/Kconfig  |6 +
 drivers/gpu/drm/mediatek/Makefile |2 +
 drivers/gpu/drm/mediatek/mtk_dp.c | 2881 +
 drivers/gpu/drm/mediatek/mtk_dp_reg.h |  580 +
 4 files changed, 3469 insertions(+)
 create mode 100644 drivers/gpu/drm/mediatek/mtk_dp.c
 create mode 100644 drivers/gpu/drm/mediatek/mtk_dp_reg.h

diff --git a/drivers/gpu/drm/mediatek/Kconfig b/drivers/gpu/drm/mediatek/Kconfig
index 2976d21e9a34..6d6a5b5872f2 100644
--- a/drivers/gpu/drm/mediatek/Kconfig
+++ b/drivers/gpu/drm/mediatek/Kconfig
@@ -28,3 +28,9 @@ config DRM_MEDIATEK_HDMI
select PHY_MTK_HDMI
help
  DRM/KMS HDMI driver for Mediatek SoCs
+
+config MTK_DPTX_SUPPORT
+   tristate "DRM DPTX Support for Mediatek SoCs"
+   depends on DRM_MEDIATEK
+   help
+ DRM/KMS Display Port driver for Mediatek SoCs.
diff --git a/drivers/gpu/drm/mediatek/Makefile 
b/drivers/gpu/drm/mediatek/Makefile
index 3abd27d7c91d..ba6e2228bbf8 100644
--- a/drivers/gpu/drm/mediatek/Makefile
+++ b/drivers/gpu/drm/mediatek/Makefile
@@ -25,3 +25,5 @@ mediatek-drm-hdmi-objs := mtk_cec.o \
  mtk_hdmi_ddc.o
 
 obj-$(CONFIG_DRM_MEDIATEK_HDMI) += mediatek-drm-hdmi.o
+
+obj-$(CONFIG_MTK_DPTX_SUPPORT) += mtk_dp.o
diff --git a/drivers/gpu/drm/mediatek/mtk_dp.c 
b/drivers/gpu/drm/mediatek/mtk_dp.c
new file mode 100644
index ..1bd07c8d2f69
--- /dev/null
+++ b/drivers/gpu/drm/mediatek/mtk_dp.c
@@ -0,0 +1,2881 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019 MediaTek Inc.
+ * Copyright (c) 2021 BayLibre
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "mtk_dp_reg.h"
+
+#define MTK_DP_AUX_WAIT_REPLY_COUNT2
+#define MTK_DP_CHECK_SINK_CAP_TIMEOUT_COUNT3
+
+#define MTK_DP_MAX_LANES   4
+#define MTK_DP_MAX_LINK_RATE   MTK_DP_LINKRATE_HBR3
+
+#define MTK_DP_TBC_BUF_READ_START_ADDR 0x08
+
+#define MTK_DP_TRAIN_RETRY_LIMIT   8
+#define MTK_DP_TRAIN_MAX_ITERATIONS5
+
+#define MTK_DP_AUX_WRITE_READ_WAIT_TIME_US 20
+
+#define MTK_DP_DP_VERSION_11   0x11
+
+enum mtk_dp_state {
+   MTK_DP_STATE_INITIAL,
+   MTK_DP_STATE_IDLE,
+   MTK_DP_STATE_PREPARE,
+   MTK_DP_STATE_NORMAL,
+};
+
+enum mtk_dp_train_state {
+   MTK_DP_TRAIN_STATE_STARTUP = 0,
+   MTK_DP_TRAIN_STATE_CHECKCAP,
+   MTK_DP_TRAIN_STATE_CHECKEDID,
+   MTK_DP_TRAIN_STATE_TRAINING_PRE,
+   MTK_DP_TRAIN_STATE_TRAINING,
+   MTK_DP_TRAIN_STATE_CHECKTIMING,
+   MTK_DP_TRAIN_STATE_NORMAL,
+   MTK_DP_TRAIN_STATE_POWERSAVE,
+   MTK_DP_TRAIN_STATE_DPIDLE,
+};
+
+struct mtk_dp_timings {
+   struct videomode vm;
+
+   u16 htotal;
+   u16 vtotal;
+   u8 frame_rate;
+   u32 pix_rate_khz;
+};
+
+struct mtk_dp_train_info {
+   bool tps3;
+   bool tps4;
+   bool sink_ssc;
+   bool cable_plugged_in;
+   bool cable_state_change;
+   bool cr_done;
+   bool eq_done;
+
+   // link_rate is in multiple of 0.27Gbps
+   int link_rate;
+   int lane_count;
+
+   int irq_status;
+   int check_cap_count;
+};
+
+// Same values as used by the DP Spec for MISC0 bits 1 and 2
+enum mtk_dp_color_format {
+   MTK_DP_COLOR_FORMAT_RGB_444 = 0,
+   MTK_DP_COLOR_FORMAT_YUV_422 = 1,
+   MTK_DP_COLOR_FORMAT_YUV_444 = 2,
+   MTK_DP_COLOR_FORMAT_YUV_420 = 3,
+   MTK_DP_COLOR_FORMAT_YONLY   = 4,
+   MTK_DP_COLOR_FORMAT_RAW = 5,
+   MTK_DP_COLOR_FORMAT_RESERVED= 6,
+   MTK_DP_COLOR_FORMAT_DEFAULT = MTK_DP_COLOR_FORMAT_RGB_444,
+   MTK_DP_COLOR_FORMAT_UNKNOWN = 15,
+};
+
+// Multiple of 0.27Gbps
+enum mtk_dp_linkrate {
+   MTK_DP_LINKRATE_RBR =  0x6,
+   MTK_DP_LINKRATE_HBR =  0xA,
+   MTK_DP_LINKRATE_HBR2= 0x14,
+   MTK_DP_LINKRATE_HBR25   = 0x19,
+   MTK_DP_LINKRATE_HBR3= 0x1E,
+};
+
+// Same values as used for DP Spec MISC

[PATCH v1 2/6] dt-bindings: mediatek,dp: Add Display Port binding

2021-09-06 Thread Markus Schneider-Pargmann
This controller is present on different mediatek hardware. Currently
mt8195 and mt8395 have this controller without a functional difference,
so only one compatible is added.

The controller can be in two forms, for a normal display port and for
embedded display port.

Signed-off-by: Markus Schneider-Pargmann 
---

Notes:
I added the mediatek maintainers in the list of maintainers as I wasn't sure
whom I should put there. Please let me know if I am supposed to add my mail
there.

 .../display/mediatek/mediatek,dp.yaml | 89 +++
 1 file changed, 89 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/display/mediatek/mediatek,dp.yaml

diff --git 
a/Documentation/devicetree/bindings/display/mediatek/mediatek,dp.yaml 
b/Documentation/devicetree/bindings/display/mediatek/mediatek,dp.yaml
new file mode 100644
index ..f7a35962c23b
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,dp.yaml
@@ -0,0 +1,89 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/mediatek/mediatek,dp.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Mediatek Display Port Controller
+
+maintainers:
+  - CK Hu 
+  - Jitao shi 
+
+description: |
+  Device tree bindings for the Mediatek (embedded) Display Port controller
+  present on some Mediatek SoCs.
+
+properties:
+  compatible:
+enum:
+  - mediatek,mt8195-edp_tx
+  - mediatek,mt8195-dp_tx
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  clocks:
+items:
+  - description: faxi clock
+
+  clock-names:
+items:
+  - const: faxi
+
+  power-domains:
+maxItems: 1
+
+  ports:
+$ref: /schemas/graph.yaml#/properties/ports
+properties:
+  port@0:
+$ref: /schemas/graph.yaml#/properties/port
+description: Input endpoint of the controller, usually dp_intf
+
+  port@1:
+$ref: /schemas/graph.yaml#/properties/port
+description: Output endpoint of the controller
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - ports
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+dp_tx: edp_tx@1c50 {
+compatible = "mediatek,mt8195-edp_tx";
+reg = <0 0x1c50 0 0x8000>;
+interrupts = ;
+power-domains = <&spm MT8195_POWER_DOMAIN_EPD_TX>;
+pinctrl-names = "default";
+pinctrl-0 = <&edp_pin>;
+status = "okay";
+
+ports {
+#address-cells = <1>;
+#size-cells = <0>;
+
+port@0 {
+reg = <0>;
+edp_in: endpoint {
+remote-endpoint = <&dp_intf0_out>;
+};
+};
+port@1 {
+reg = <1>;
+edp_out: endpoint {
+   remote-endpoint = <&panel_in>;
+};
+};
+};
+};
-- 
2.33.0



[PATCH v1 1/6] dt-bindings: mediatek,dpi: Add mt8195 dpintf

2021-09-06 Thread Markus Schneider-Pargmann
DP_INTF is similar to the actual dpi. They differ in some points
regarding registers and what needs to be set but the function blocks
itself are similar in design.

Signed-off-by: Markus Schneider-Pargmann 
---
 .../display/mediatek/mediatek,dpi.yaml| 43 ---
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml 
b/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml
index dd2896a40ff0..1a158b719ce6 100644
--- a/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml
+++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml
@@ -4,7 +4,7 @@
 $id: http://devicetree.org/schemas/display/mediatek/mediatek,dpi.yaml#
 $schema: http://devicetree.org/meta-schemas/core.yaml#
 
-title: mediatek DPI Controller Device Tree Bindings
+title: mediatek DPI/DP_INTF Controller Device Tree Bindings
 
 maintainers:
   - CK Hu 
@@ -13,7 +13,8 @@ maintainers:
 description: |
   The Mediatek DPI function block is a sink of the display subsystem and
   provides 8-bit RGB/YUV444 or 8/10/10-bit YUV422 pixel data on a parallel
-  output bus.
+  output bus. The Mediatek DP_INTF is a similar function block that is
+  connected to the (embedded) display port function block.
 
 properties:
   compatible:
@@ -23,6 +24,7 @@ properties:
   - mediatek,mt8173-dpi
   - mediatek,mt8183-dpi
   - mediatek,mt8192-dpi
+  - mediatek,mt8195-dpintf
 
   reg:
 maxItems: 1
@@ -37,10 +39,11 @@ properties:
   - description: DPI PLL
 
   clock-names:
-items:
-  - const: pixel
-  - const: engine
-  - const: pll
+description:
+  For dpi clocks pixel, engine and pll are required. For dpintf pixel,
+  hf_fmm and hf_fdp are required.
+minItems: 3
+maxItems: 3
 
   pinctrl-0: true
   pinctrl-1: true
@@ -64,6 +67,34 @@ required:
   - clock-names
   - port
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - mediatek,mt8195-dpintf
+then:
+  properties:
+clocks:
+  minItems: 3
+  maxItems: 3
+clock-names:
+  items:
+- const: pixel
+- const: hf_fmm
+- const: hf_fdp
+else:
+  properties:
+clocks:
+  minItems: 3
+  maxItems: 3
+clock-names:
+  items:
+- const: pixel
+- const: engine
+- const: pll
+
 additionalProperties: false
 
 examples:
-- 
2.33.0



[PATCH v1 5/6] drm/mediatek: dpi: Add dpintf support

2021-09-06 Thread Markus Schneider-Pargmann
dpintf is the displayport interface hardware unit. This unit is similar
to dpi and can reuse most of the code.

This patch adds support for mt8195-dpintf to this dpi driver. Main
differences are:
 - Some features/functional components are not available for dpintf
   which are now excluded from code execution once is_dpintf is set
 - dpintf can and needs to choose between different clockdividers based
   on the clockspeed. This is done by choosing a different clock parent.
 - There are two additional clocks that need to be managed. These are
   only set for dpintf and will be set to NULL if not supplied. The
   clk_* calls handle these as normal clocks then.
 - Some register contents differ slightly between the two components. To
   work around this I added register bits/masks with a DPINTF_ prefix
   and use them where different.

Based on a separate driver for dpintf created by
Jason-JH.Lin .

Signed-off-by: Markus Schneider-Pargmann 
---

Notes:
Changes RFC -> v1:
- Remove setting parents and fully rely on the clock tree instead which 
already
  models a mux at the important place.
- Integrated mtk_dpi dpintf changes into the mediatek drm driver.

 drivers/gpu/drm/mediatek/mtk_dpi.c  | 248 
 drivers/gpu/drm/mediatek/mtk_dpi_regs.h |  12 +
 drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c |   4 +
 drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h |   1 +
 drivers/gpu/drm/mediatek/mtk_drm_drv.c  |   3 +
 5 files changed, 217 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_dpi.c 
b/drivers/gpu/drm/mediatek/mtk_dpi.c
index e94738fe4db8..986c7f72483f 100644
--- a/drivers/gpu/drm/mediatek/mtk_dpi.c
+++ b/drivers/gpu/drm/mediatek/mtk_dpi.c
@@ -71,6 +71,8 @@ struct mtk_dpi {
void __iomem *regs;
struct device *dev;
struct clk *engine_clk;
+   struct clk *hf_fmm_clk;
+   struct clk *hf_fdp_clk;
struct clk *pixel_clk;
struct clk *tvd_clk;
int irq;
@@ -125,6 +127,7 @@ struct mtk_dpi_conf {
bool edge_sel_en;
const u32 *output_fmts;
u32 num_output_fmts;
+   bool is_dpintf;
 };
 
 static void mtk_dpi_mask(struct mtk_dpi *dpi, u32 offset, u32 val, u32 mask)
@@ -153,30 +156,52 @@ static void mtk_dpi_disable(struct mtk_dpi *dpi)
 static void mtk_dpi_config_hsync(struct mtk_dpi *dpi,
 struct mtk_dpi_sync_param *sync)
 {
-   mtk_dpi_mask(dpi, DPI_TGEN_HWIDTH,
-sync->sync_width << HPW, HPW_MASK);
-   mtk_dpi_mask(dpi, DPI_TGEN_HPORCH,
-sync->back_porch << HBP, HBP_MASK);
-   mtk_dpi_mask(dpi, DPI_TGEN_HPORCH, sync->front_porch << HFP,
-HFP_MASK);
+   if (dpi->conf->is_dpintf) {
+   mtk_dpi_mask(dpi, DPI_TGEN_HWIDTH,
+sync->sync_width << HPW, DPINTF_HPW_MASK);
+   mtk_dpi_mask(dpi, DPI_TGEN_HPORCH,
+sync->back_porch << HBP, DPINTF_HBP_MASK);
+   mtk_dpi_mask(dpi, DPI_TGEN_HPORCH, sync->front_porch << HFP,
+DPINTF_HFP_MASK);
+   } else {
+   mtk_dpi_mask(dpi, DPI_TGEN_HWIDTH,
+sync->sync_width << HPW, HPW_MASK);
+   mtk_dpi_mask(dpi, DPI_TGEN_HPORCH,
+sync->back_porch << HBP, HBP_MASK);
+   mtk_dpi_mask(dpi, DPI_TGEN_HPORCH, sync->front_porch << HFP,
+HFP_MASK);
+   }
 }
 
 static void mtk_dpi_config_vsync(struct mtk_dpi *dpi,
 struct mtk_dpi_sync_param *sync,
 u32 width_addr, u32 porch_addr)
 {
-   mtk_dpi_mask(dpi, width_addr,
-sync->sync_width << VSYNC_WIDTH_SHIFT,
-VSYNC_WIDTH_MASK);
mtk_dpi_mask(dpi, width_addr,
 sync->shift_half_line << VSYNC_HALF_LINE_SHIFT,
 VSYNC_HALF_LINE_MASK);
-   mtk_dpi_mask(dpi, porch_addr,
-sync->back_porch << VSYNC_BACK_PORCH_SHIFT,
-VSYNC_BACK_PORCH_MASK);
-   mtk_dpi_mask(dpi, porch_addr,
-sync->front_porch << VSYNC_FRONT_PORCH_SHIFT,
-VSYNC_FRONT_PORCH_MASK);
+
+   if (dpi->conf->is_dpintf) {
+   mtk_dpi_mask(dpi, width_addr,
+sync->sync_width << VSYNC_WIDTH_SHIFT,
+DPINTF_VSYNC_WIDTH_MASK);
+   mtk_dpi_mask(dpi, porch_addr,
+sync->back_porch << VSYNC_BACK_PORCH_SHIFT,
+DPINTF_VSYNC_BACK_PORCH_MASK);
+   mtk_dpi_mask(dpi, porch_addr,
+sync->front_porch << VSYNC_FRONT_PORCH_SHIFT,
+DPINTF_VSYNC_FRONT_PORCH_MASK);
+   } else {
+   mtk_dpi_mask(dpi, width_addr,
+sync->sync_width << VSYNC_WIDTH_SHIFT,
+

[PATCH v1 3/6] drm/edid: Add cea_sad helpers for freq/length

2021-09-06 Thread Markus Schneider-Pargmann
This patch adds two helper functions that extract the frequency and word
length from a struct cea_sad.

For these helper functions new defines are added that help translate the
'freq' and 'byte2' fields into real numbers.

Signed-off-by: Markus Schneider-Pargmann 
---
 drivers/gpu/drm/drm_edid.c | 57 ++
 include/drm/drm_edid.h | 18 ++--
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 81d5f2524246..2389d34ce10e 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -4666,6 +4666,63 @@ int drm_edid_to_speaker_allocation(struct edid *edid, u8 
**sadb)
 }
 EXPORT_SYMBOL(drm_edid_to_speaker_allocation);
 
+/**
+ * drm_cea_sad_get_sample_rate - Extract the sample rate from cea_sad
+ * @sad: Pointer to the cea_sad struct
+ *
+ * Extracts the cea_sad frequency field and returns the sample rate in Hz.
+ *
+ * Return: Sample rate in Hz or a negative errno if parsing failed.
+ */
+int drm_cea_sad_get_sample_rate(struct cea_sad *sad)
+{
+   switch (sad->freq) {
+   case CEA_SAD_FREQ_32KHZ:
+   return 32000;
+   case CEA_SAD_FREQ_44KHZ:
+   return 44100;
+   case CEA_SAD_FREQ_48KHZ:
+   return 48000;
+   case CEA_SAD_FREQ_88KHZ:
+   return 88200;
+   case CEA_SAD_FREQ_96KHZ:
+   return 96000;
+   case CEA_SAD_FREQ_176KHZ:
+   return 176400;
+   case CEA_SAD_FREQ_192KHZ:
+   return 192000;
+   default:
+   return -EINVAL;
+   }
+}
+EXPORT_SYMBOL(drm_cea_sad_get_sample_rate);
+
+/**
+ * drm_cea_sad_get_uncompressed_word_length - Extract word length
+ * @sad: Pointer to the cea_sad struct
+ *
+ * Extracts the cea_sad byte2 field and returns the word length for an
+ * uncompressed stream.
+ *
+ * Note: This function may only be called for uncompressed audio.
+ *
+ * Return: Word length in bits or a negative errno if parsing failed.
+ */
+int drm_cea_sad_get_uncompressed_word_length(struct cea_sad *sad)
+{
+   switch (sad->byte2) {
+   case CEA_SAD_UNCOMPRESSED_WORD_16BIT:
+   return 16;
+   case CEA_SAD_UNCOMPRESSED_WORD_20BIT:
+   return 20;
+   case CEA_SAD_UNCOMPRESSED_WORD_24BIT:
+   return 24;
+   default:
+   return -EINVAL;
+   }
+}
+EXPORT_SYMBOL(drm_cea_sad_get_uncompressed_word_length);
+
 /**
  * drm_av_sync_delay - compute the HDMI/DP sink audio-video sync delay
  * @connector: connector associated with the HDMI/DP sink
diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
index 759328a5eeb2..bed091a749ef 100644
--- a/include/drm/drm_edid.h
+++ b/include/drm/drm_edid.h
@@ -361,12 +361,24 @@ struct edid {
 
 /* Short Audio Descriptor */
 struct cea_sad {
-   u8 format;
+   u8 format; /* See HDMI_AUDIO_CODING_TYPE_* */
u8 channels; /* max number of channels - 1 */
-   u8 freq;
+   u8 freq; /* See CEA_SAD_FREQ_* */
u8 byte2; /* meaning depends on format */
 };
 
+#define CEA_SAD_FREQ_32KHZ  BIT(0)
+#define CEA_SAD_FREQ_44KHZ  BIT(1)
+#define CEA_SAD_FREQ_48KHZ  BIT(2)
+#define CEA_SAD_FREQ_88KHZ  BIT(3)
+#define CEA_SAD_FREQ_96KHZ  BIT(4)
+#define CEA_SAD_FREQ_176KHZ BIT(5)
+#define CEA_SAD_FREQ_192KHZ BIT(6)
+
+#define CEA_SAD_UNCOMPRESSED_WORD_16BIT BIT(0)
+#define CEA_SAD_UNCOMPRESSED_WORD_20BIT BIT(1)
+#define CEA_SAD_UNCOMPRESSED_WORD_24BIT BIT(2)
+
 struct drm_encoder;
 struct drm_connector;
 struct drm_connector_state;
@@ -374,6 +386,8 @@ struct drm_display_mode;
 
 int drm_edid_to_sad(struct edid *edid, struct cea_sad **sads);
 int drm_edid_to_speaker_allocation(struct edid *edid, u8 **sadb);
+int drm_cea_sad_get_sample_rate(struct cea_sad *sad);
+int drm_cea_sad_get_uncompressed_word_length(struct cea_sad *sad);
 int drm_av_sync_delay(struct drm_connector *connector,
  const struct drm_display_mode *mode);
 
-- 
2.33.0



[PATCH v1 4/6] video/hdmi: Add audio_infoframe packing for DP

2021-09-06 Thread Markus Schneider-Pargmann
Similar to HDMI, DP uses audio infoframes as well which are structured
very similar to the HDMI ones.

This patch adds a helper function to pack the HDMI audio infoframe for
DP, called hdmi_audio_infoframe_pack_for_dp().
hdmi_audio_infoframe_pack_only() is split into two parts. One of them
packs the payload only and can be used for HDMI and DP.

Signed-off-by: Markus Schneider-Pargmann 
---
 drivers/video/hdmi.c | 87 +++-
 include/linux/hdmi.h |  4 ++
 2 files changed, 73 insertions(+), 18 deletions(-)

diff --git a/drivers/video/hdmi.c b/drivers/video/hdmi.c
index 947be761dfa4..59c4341549e4 100644
--- a/drivers/video/hdmi.c
+++ b/drivers/video/hdmi.c
@@ -387,6 +387,28 @@ int hdmi_audio_infoframe_check(struct hdmi_audio_infoframe 
*frame)
 }
 EXPORT_SYMBOL(hdmi_audio_infoframe_check);
 
+static void
+hdmi_audio_infoframe_pack_payload(const struct hdmi_audio_infoframe *frame,
+ u8 *buffer)
+{
+   u8 channels;
+
+   if (frame->channels >= 2)
+   channels = frame->channels - 1;
+   else
+   channels = 0;
+
+   buffer[0] = ((frame->coding_type & 0xf) << 4) | (channels & 0x7);
+   buffer[1] = ((frame->sample_frequency & 0x7) << 2) |
+(frame->sample_size & 0x3);
+   buffer[2] = frame->coding_type_ext & 0x1f;
+   buffer[3] = frame->channel_allocation;
+   buffer[4] = (frame->level_shift_value & 0xf) << 3;
+
+   if (frame->downmix_inhibit)
+   buffer[4] |= BIT(7);
+}
+
 /**
  * hdmi_audio_infoframe_pack_only() - write HDMI audio infoframe to binary 
buffer
  * @frame: HDMI audio infoframe
@@ -404,7 +426,6 @@ EXPORT_SYMBOL(hdmi_audio_infoframe_check);
 ssize_t hdmi_audio_infoframe_pack_only(const struct hdmi_audio_infoframe 
*frame,
   void *buffer, size_t size)
 {
-   unsigned char channels;
u8 *ptr = buffer;
size_t length;
int ret;
@@ -420,28 +441,13 @@ ssize_t hdmi_audio_infoframe_pack_only(const struct 
hdmi_audio_infoframe *frame,
 
memset(buffer, 0, size);
 
-   if (frame->channels >= 2)
-   channels = frame->channels - 1;
-   else
-   channels = 0;
-
ptr[0] = frame->type;
ptr[1] = frame->version;
ptr[2] = frame->length;
ptr[3] = 0; /* checksum */
 
-   /* start infoframe payload */
-   ptr += HDMI_INFOFRAME_HEADER_SIZE;
-
-   ptr[0] = ((frame->coding_type & 0xf) << 4) | (channels & 0x7);
-   ptr[1] = ((frame->sample_frequency & 0x7) << 2) |
-(frame->sample_size & 0x3);
-   ptr[2] = frame->coding_type_ext & 0x1f;
-   ptr[3] = frame->channel_allocation;
-   ptr[4] = (frame->level_shift_value & 0xf) << 3;
-
-   if (frame->downmix_inhibit)
-   ptr[4] |= BIT(7);
+   hdmi_audio_infoframe_pack_payload(frame,
+ ptr + HDMI_INFOFRAME_HEADER_SIZE);
 
hdmi_infoframe_set_checksum(buffer, length);
 
@@ -479,6 +485,51 @@ ssize_t hdmi_audio_infoframe_pack(struct 
hdmi_audio_infoframe *frame,
 }
 EXPORT_SYMBOL(hdmi_audio_infoframe_pack);
 
+/**
+ * hdmi_audio_infoframe_pack_for_dp - Pack a HDMI Audio infoframe for
+ *displayport
+ *
+ * @frame HDMI Audio infoframe
+ * @header Header buffer to be used
+ * @header_size Size of header buffer
+ * @data Data buffer to be used
+ * @data_size Size of data buffer
+ * @dp_version Display Port version to be encoded in the header
+ *
+ * Packs a HDMI Audio Infoframe to be sent over Display Port. This function
+ * fills both header and data buffer with the required data.
+ *
+ * Return: Number of total written bytes or a negative errno on failure.
+ */
+ssize_t hdmi_audio_infoframe_pack_for_dp(struct hdmi_audio_infoframe *frame,
+void *header, size_t header_size,
+void *data, size_t data_size,
+u8 dp_version)
+{
+   int ret;
+   u8 *hdr_ptr = header;
+
+   ret = hdmi_audio_infoframe_check(frame);
+   if (ret)
+   return ret;
+
+   if (header_size < 4 || data_size < frame->length)
+   return -ENOSPC;
+
+   memset(header, 0, header_size);
+   memset(data, 0, data_size);
+
+   // Secondary-data packet header
+   hdr_ptr[1] = frame->type;
+   hdr_ptr[2] = 0x1B;  // As documented by DP spec for Secondary-data 
Packets
+   hdr_ptr[3] = (dp_version & 0x3f) << 2;
+
+   hdmi_audio_infoframe_pack_payload(frame, data);
+
+   return frame->length + 4;
+}
+EXPORT_SYMBOL(hdmi_audio_infoframe_pack_for_dp);
+
 /**
  * hdmi_vendor_infoframe_init() - initialize an HDMI vendor infoframe
  * @frame: HDMI vendor infoframe
diff --git a/include/linux/hdmi.h b/include/linux/hdmi.h
index c8ec982ff498..f576a0b08c85 100644
--- a/include/linux/hdmi.h
+++ b/include/linux/hdmi.h
@@ -334,6

[PATCH v1 0/6] drm/mediatek: Add mt8195 DisplayPort driver

2021-09-06 Thread Markus Schneider-Pargmann
Hi everyone,

this series is built around the DisplayPort driver. The dpi/dpintf driver and
the added helper functions are required for the DisplayPort driver to work.

It is version 1 of the patch series following the RFC version:
https://lore.kernel.org/linux-mediatek/20210816192523.1739365-1-...@baylibre.com/

Note: This patch series is currently tested on v5.10 and I am currently working
on testing it on v5.14.

The series is now based on these patch series and its dependencies:
- Add Mediatek Soc DRM (vdosys0) support for mt8195
  
https://lore.kernel.org/linux-mediatek/20210825144833.7757-1-jason-jh@mediatek.com/
- Add MediaTek SoC DRM (vdosys1) support for mt8195
  
https://lore.kernel.org/linux-mediatek/20210825100531.5653-1-nancy@mediatek.com/

Changes in v1:
- Added DP binding documentation.
- Addressed feedback from the RFC.
- General cleanups in DPI and DP drivers.
- See individual patches for details on the changes done.

Thanks in advance for any feedback and comments.

Best,
Markus


Markus Schneider-Pargmann (6):
  dt-bindings: mediatek,dpi: Add mt8195 dpintf
  dt-bindings: mediatek,dp: Add Display Port binding
  drm/edid: Add cea_sad helpers for freq/length
  video/hdmi: Add audio_infoframe packing for DP
  drm/mediatek: dpi: Add dpintf support
  drm/mediatek: Add mt8195 DisplayPort driver

 .../display/mediatek/mediatek,dp.yaml |   89 +
 .../display/mediatek/mediatek,dpi.yaml|   43 +-
 drivers/gpu/drm/drm_edid.c|   57 +
 drivers/gpu/drm/mediatek/Kconfig  |6 +
 drivers/gpu/drm/mediatek/Makefile |2 +
 drivers/gpu/drm/mediatek/mtk_dp.c | 2881 +
 drivers/gpu/drm/mediatek/mtk_dp_reg.h |  580 
 drivers/gpu/drm/mediatek/mtk_dpi.c|  248 +-
 drivers/gpu/drm/mediatek/mtk_dpi_regs.h   |   12 +
 drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c   |4 +
 drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h   |1 +
 drivers/gpu/drm/mediatek/mtk_drm_drv.c|3 +
 drivers/video/hdmi.c  |   87 +-
 include/drm/drm_edid.h|   18 +-
 include/linux/hdmi.h  |4 +
 15 files changed, 3958 insertions(+), 77 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/display/mediatek/mediatek,dp.yaml
 create mode 100644 drivers/gpu/drm/mediatek/mtk_dp.c
 create mode 100644 drivers/gpu/drm/mediatek/mtk_dp_reg.h

-- 
2.33.0



[PATCH v2 6/6] drm/fourcc: Add the ADL-P specific pitch requirements of CCS modifiers

2021-09-06 Thread Imre Deak
On Alderlake-P for all CCS modifiers the main surface pitch must be
either 8 Y-tile width or the multiple of 16 Y-tile widths. The CCS
surface pitch must be rounded up to power-of-two.

Adjust the modifier descriptions accordingly.

Cc: Nanley G Chery 
Cc: Juha-Pekka Heikkila 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Imre Deak 
---
 include/uapi/drm/drm_fourcc.h | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 45a914850be0d..b63b7fa8bbac6 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -522,8 +522,16 @@ extern "C" {
  * The main surface is Y-tiled and at plane index 0, the CCS is linear and
  * at index 1. A 64B CCS cache line corresponds to an area of 4x1 tiles in
  * main surface. In other words, 4 bits in CCS map to a main surface cache
- * line pair. The main surface pitch is required to be a multiple of four
- * Y-tile widths.
+ * line pair.
+ *
+ * The pitch of the main surface is required to be either 8 or a multiple of
+ * 16 Y-tile widths on Alderlake-P and a multiple of 4 Y-tile widths on other
+ * platforms.
+ *
+ * The pitch of the CCS surface must be calculated using the
+ *ccs_surface_pitch=main_surface_pitch_in_bytes / 512 * 64.
+ * formula. On Alderlake-P this pitch must be rounded up to be power-of-two
+ * sized.
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS fourcc_mod_code(INTEL, 6)
 
@@ -533,10 +541,12 @@ extern "C" {
  * The main surface is Y-tiled and at plane index 0, the CCS is linear and
  * at index 1. A 64B CCS cache line corresponds to an area of 4x1 tiles in
  * main surface. In other words, 4 bits in CCS map to a main surface cache
- * line pair. The main surface pitch is required to be a multiple of four
- * Y-tile widths. For semi-planar formats like NV12, CCS planes follow the
+ * line pair.  For semi-planar formats like NV12, CCS planes follow the
  * Y and UV planes i.e., planes 0 and 1 are used for Y and UV surfaces,
  * planes 2 and 3 for the respective CCS.
+ *
+ * About the requirement on the main and CCS surface pitches see the
+ * description for I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS.
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS fourcc_mod_code(INTEL, 7)
 
@@ -554,8 +564,10 @@ extern "C" {
  * Clear Color value when applicable. The Converted Clear Color values are
  * consumed by the DE. The last 64 bits are used to store Color Discard Enable
  * and Depth Clear Value Valid which are ignored by the DE. A CCS cache line
- * corresponds to an area of 4x1 tiles in the main surface. The main surface
- * pitch is required to be a multiple of 4 tile widths.
+ * corresponds to an area of 4x1 tiles in the main surface.
+ *
+ * About the requirement on the main and CCS surface pitches see the
+ * description for I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS.
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
 
-- 
2.27.0



Re: [Intel-gfx] [PATCH v7 5/8] drm_print: add choice to use dynamic debug in drm-debug

2021-09-06 Thread jim . cromie
> I'll try to extract the "executive summary" from this, you tell me if I
> got it right.
>
> So using or not using dynamic debug for DRM debug ends up being about
> shifting the cost between kernel binary size (data section grows by each
> pr_debug call site) and runtime conditionals?

Yes.

> Since the table sizes you mention seem significant enough, I think that
> justifies existence of DRM_USE_DYNAMIC_DEBUG. It would probably be a
> good idea to put some commentary on that there. Ideally including some
> rough estimates both including space cost per call site and space cost
> for a typical distro kernel build?

yeah, agreed.  I presume you mean in Kconfig entry,
since commits have some size info now - I have i915, amdgpu, nouveau;
I can see some prose improvements for 5/8




> Regards,
>
> Tvrtko

thanks
Jim


Re: [Intel-gfx] [PATCH v7 3/8] i915/gvt: use DEFINE_DYNAMIC_DEBUG_CATEGORIES to create "gvt:core:" etc categories

2021-09-06 Thread jim . cromie
On Mon, Sep 6, 2021 at 6:26 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:
>
>
> On 03/09/2021 20:22, jim.cro...@gmail.com wrote:
> > On Fri, Sep 3, 2021 at 5:07 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 31/08/2021 21:21, Jim Cromie wrote:
> >>> The gvt component of this driver has ~120 pr_debugs, in 9 categories
> >>> quite similar to those in DRM.  Following the interface model of
> >>> drm.debug, add a parameter to map bits to these categorizations.
> >>>
> >>> DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
> >>>"dyndbg bitmap desc",
> >>>{ "gvt:cmd: ",  "command processing" },

> >>> v7:
> >>> . move ccflags addition up to i915/Makefile from i915/gvt
> >>> ---
> >>>drivers/gpu/drm/i915/Makefile  |  4 
> >>>drivers/gpu/drm/i915/i915_params.c | 35
++
> >>
> >> Can this work if put under gvt/ or at least intel_gvt.h|c?

I tried this.
I moved the code block into gvt/debug.c (new file)
added it to Makefile GVT_SOURCES
dunno why it wont make.
frustratig basic err, Im not seeing.
It does seem proper placement, will resolve...


> >>
> >
> > I thought it belonged here more, at least according to the name of the
> > config.var
>
> Hmm bear with me please - the categories this patch creates are intended
> to be used explicitly from the GVT "sub-module", or they somehow even
> get automatically used with no further intervention to callers required?
>

2009 - v5.9.0  the only users were admins reading/echoing
/proc/dynamic_debug/control
presumably cuz they wanted more info in the logs, episodically.
v5.9.0 exported dynamic_debug_exec_queries for in-kernel use,
reusing the stringy: echo $query_command > control  idiom.
My intention was to let in-kernel users roll their own drm.debug type
interface,
or whatever else they needed.  nobodys using it yet.

patch 1/8 implements that drm.debug interface.
5/8 is the primary use case
3/8 (this patch) & 4/8 are patches of opportunity, test cases, proof of
function/utility.
its value as such is easier control of those pr-debugs than given by echo >
control

Sean Paul  seanp...@chromium.org worked up a patchset to do runtime
steering of drm-debug stream,
in particular watching for drm:atomic:fail: type activity (a subcategory
which doesnt exist yet).
5/8 conflicts with his patchset, I have an rfc approach to that, so his
concerns are mine too.


note:  if this patchset goes in, we dont *really* need the export anymore,
since the main use case is covered.  We could un-export, and re-add later
if its needed for a different use case.  Further, it seems likely that the
callbacks
(refactored) would be a better basis for new in-kernel users.
If not that, then full exposure of struct ddebug_query to in-kernel use


not quite sure how we got 2 chunks, but theres 1 more q below.

On Mon, Sep 6, 2021 at 6:26 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:

>
> On 03/09/2021 20:22, jim.cro...@gmail.com wrote:
> > On Fri, Sep 3, 2021 at 5:07 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 31/08/2021 21:21, Jim Cromie wrote:
> >>> The gvt component of this driver has ~120 pr_debugs, in 9 categories
> >>> quite similar to those in DRM.  Following the interface model of
> >>> drm.debug, add a parameter to map bits to these categorizations.
> >>>
> >>> DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
> >>>"dyndbg bitmap desc",
> >>>{ "gvt:cmd: ",  "command processing" },
> >>>{ "gvt:core: ", "core help" },
> >>>{ "gvt:dpy: ",  "display help" },
> >>>{ "gvt:el: ",   "help" },
> >>>{ "gvt:irq: ",  "help" },
> >>>{ "gvt:mm: ",   "help" },
> >>>{ "gvt:mmio: ", "help" },
> >>>{ "gvt:render: ", "help" },
> >>>{ "gvt:sched: " "help" });
> >>>
> >
> > BTW, Ive dropped the help field, its already handled, dont need to
> clutter.
> >
> >
> >>> The actual patch has a few details different, cmd_help() macro emits
> >>> the initialization construct.
> >>>
> >>> if CONFIG_DRM_USE_DYNAMIC_DEBUG, then -DDYNAMIC_DEBUG_MODULE is added
> >>> cflags, by gvt/Makefile.
> >>>
> >>> Signed-off-by: Jim Cromie 
> >>> ---
> >>> v5:
> >>> . static decl of vector of bit->class descriptors - Emil.V
> >>> . relocate gvt-makefile chunk from elsewhere
> >>> v7:
> >>> . move ccflags addition up to i915/Makefile from i915/gvt
> >>> ---
> >>>drivers/gpu/drm/i915/Makefile  |  4 
> >>>drivers/gpu/drm/i915/i915_params.c | 35
> ++
> >>
> >> Can this work if put under gvt/ or at least intel_gvt.h|c?
> >>
> >
> > I thought it belonged here more, at least according to the name of the
> > config.var
>
> Hmm bear with me please - the categories this patch creates are intended
> to be used explicitly from the GVT "sub-module", or they somehow even
> get automatically used with no further intervention to callers required?
>
> > CONFIG_DRM_USE_DYNAMIC_DEBUG.
> >
> > I suppose its not a great name, its narrow purpos

Re: [PATCH] doc: gpu: Add document describing buffer exchange

2021-09-06 Thread Robert Beckett




On 05/09/2021 13:27, Daniel Stone wrote:

Since there's a lot of confusion around this, document both the rules
and the best practice around negotiating, allocating, importing, and
using buffers when crossing context/process/device/subsystem boundaries.

This ties up all of dmabuf, formats and modifiers, and their usage.

Signed-off-by: Daniel Stone 
---

This is just a quick first draft, inspired by:
   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637

It's not complete or perfect, but I'm off to eat a roast then have a
nice walk in the sun, so figured it'd be better to dash it off rather
than let it rot on my hard drive.


  .../gpu/exchanging-pixel-buffers.rst  | 285 ++
  Documentation/gpu/index.rst   |   1 +
  2 files changed, 286 insertions(+)
  create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst

diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst 
b/Documentation/gpu/exchanging-pixel-buffers.rst
new file mode 100644
index ..75c4de13d5c8
--- /dev/null
+++ b/Documentation/gpu/exchanging-pixel-buffers.rst
@@ -0,0 +1,285 @@
+.. Copyright 2021 Collabora Ltd.
+
+
+Exchanging pixel buffers
+
+
+As originally designed, the Linux graphics subsystem had extremely limited
+support for sharing pixel-buffer allocations between processes, devices, and
+subsystems. Modern systems require extensive integration between all three
+classes; this document details how applications and kernel subsystems should
+approach this sharing for two-dimensional image data.
+
+It is written with reference to the DRM subsystem for GPU and display devices,
+V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
+support, however any other subsystems should also follow this design and 
advice.
+
+
+Formats and modifiers
+=
+
+Each buffer must have an underlying format. This format describes the data 
which
+can be stored and loaded for each pixel. Although each subsystem has its own
+format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be
+reused wherever possible, as they are the standard descriptions used for
+interchange.
+
+Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
+the translation between one or more pixels in memory, and the color data
+contained within that memory. The number and type of color channels are
+described: whether they are RGB or YUV, integer or floating-point, the size
+of each channel and their locations within the pixel memory, and the
+relationship between color planes.
+
+For example, `DRM_FORMAT_ARGB` describes a format in which each pixel has a
+single 32-bit value in memory. Alpha, red, green, and blue, color channels are
+available at 8-byte precision per channel, ordered respectively from most to


think you meant 8-bit there


+least significant bits in little-endian storage. As a more complex example,
+`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
+stored in separate memory planes, where the chroma plane is stored at half the
+resolution in both dimensions (i.e. one U/V chroma sample is stored for each 
2x2
+pixel grouping).
+
+Format modifiers describe a translation mechanism between these per-pixel 
memory
+samples, and the actual memory storage for the buffer. The most straightforward
+modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel 
has
+contiguous storage beginning at (0,0); each pixel's location in memory will be
+`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange
+format, and most convenient for CPU access.
+
+Modern hardware employs much more sophisticated access mechanisms, typically
+making use of tiled access and possibly also compression. For example, the
+`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
+are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
+memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
+stores pixels (4,0) to (7,3) inclusive.
+
+Some modifiers may modify the number of memory buffers required to store the
+data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second
+memory buffer to RGB formats in which it stores data about the status of every
+tile, notably including whether the tile is fully populated with pixel data, or
+can be expanded from a single solid color.
+
+These extended layouts are highly vendor-specific, and even specific to
+particular generations or configurations of devices per-vendor. For this 
reason,
+support of modifiers must be explicitly enumerated and negotiated by all users
+in order to ensure a compatible and optimal pipeline, as discussed below.
+
+
+Dimensions and size
+===
+
+Each pixel buffer must be accompanied by logical pixel dimensions. This refers
+to the number of unique samples whic

[PATCH v2 6/6] drm/i915: Reduce the number of objects subject to memcpy recover

2021-09-06 Thread Thomas Hellström
We really only need memcpy restore for objects that affect the
operability of the migrate context. That is, primarily the page-table
objects of the migrate VM.

Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early
restores using memcpy and a way to assign LMEM page-table object flags
to be used by the vms.

Restore objects without this flag with the gpu blitter and only objects
carrying the flag using TTM memcpy.

Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and
defer for a later audit which vms actually need it. Most importantly, user-
allocated vms with pinned page-table objects can be restored using the
blitter.

Performance-wise memcpy restore is probably as fast as gpu restore if not
faster, but using gpu restore will help tackling future restrictions in
mappable LMEM size.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c  |  4 ++--
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h |  9 ++---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c   |  6 --
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  6 --
 drivers/gpu/drm/i915/gem/selftests/huge_pages.c  |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c |  2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c |  5 +++--
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h |  4 +++-
 drivers/gpu/drm/i915/gt/intel_ggtt.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c  |  3 ++-
 drivers/gpu/drm/i915/gt/intel_gtt.h  |  9 +++--
 drivers/gpu/drm/i915/gt/intel_migrate.c  |  2 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c| 13 -
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c |  2 +-
 drivers/gpu/drm/i915/gvt/scheduler.c |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c|  4 ++--
 17 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index fd169cf2f75a..3dbebced0950 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1312,7 +1312,7 @@ i915_gem_create_context(struct drm_i915_private *i915,
} else if (HAS_FULL_PPGTT(i915)) {
struct i915_ppgtt *ppgtt;
 
-   ppgtt = i915_ppgtt_create(&i915->gt);
+   ppgtt = i915_ppgtt_create(&i915->gt, 0);
if (IS_ERR(ppgtt)) {
drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
PTR_ERR(ppgtt));
@@ -1490,7 +1490,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void 
*data,
if (args->flags)
return -EINVAL;
 
-   ppgtt = i915_ppgtt_create(&i915->gt);
+   ppgtt = i915_ppgtt_create(&i915->gt, 0);
if (IS_ERR(ppgtt))
return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 66123ba46247..477b98b656b4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -294,13 +294,16 @@ struct drm_i915_gem_object {
 #define I915_BO_ALLOC_USERBIT(3)
 /* Object may lose its contents on suspend / resume */
 #define I915_BO_ALLOC_PM_VOLATILE BIT(4)
+/* Object needs to be restored early using memcpy during resume */
+#define I915_BO_ALLOC_PM_EARLYBIT(5)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
 I915_BO_ALLOC_USER | \
-I915_BO_ALLOC_PM_VOLATILE)
-#define I915_BO_READONLY  BIT(5)
-#define I915_TILING_QUIRK_BIT 6 /* unknown swizzling; do not release! */
+I915_BO_ALLOC_PM_VOLATILE | \
+I915_BO_ALLOC_PM_EARLY)
+#define I915_BO_READONLY  BIT(6)
+#define I915_TILING_QUIRK_BIT 7 /* unknown swizzling; do not release! */
 
/**
 * @mem_flags - Mutable placement-related flags
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 9746c255ddcc..cdd344f64404 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -98,9 +98,11 @@ int i915_gem_backup_suspend(struct drm_i915_private *i915)
 * More objects may have become unpinned as requests were
 * retired. Now try to evict again. The gt may be wedged here
 * in which case we automatically fall back to memcpy.
+* We allow also backing up pinned objects that have not been
+* marked for early recover, and that may contain, for example,
+* page-tables for the migrate context.
 */
-
-   ret = lmem_suspend(i915, true, false);
+   ret = lmem_suspend(i915, true, t

[PATCH v2 3/6] drm/i915 Implement LMEM backup and restore for suspend / resume

2021-09-06 Thread Thomas Hellström
Just evict unpinned objects to system. For pinned LMEM objects,
make a backup system object and blit the contents to that.

Backup is performed in three steps,
1: Opportunistically evict evictable objects using the gpu blitter.
2: After gt idle, evict evictable objects using the gpu blitter. This will
be modified in an upcoming patch to backup pinned objects that are not used
by the blitter itself.
3: Backup remaining pinned objects using memcpy.

Also move uC suspend to after 2) to make sure we have a functional GuC
during 2) if using GuC submission.

v2:
- Major refactor to make sure gem_exec_suspend@hang-SX subtests work, and
  suspend / resume works with a slightly modified GuC submission enabling
  patch series.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/Makefile |   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pm.c|  92 +++-
 drivers/gpu/drm/i915/gem/i915_gem_pm.h|   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  29 ++-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |  10 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c| 205 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.h|  24 ++
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |   4 +-
 drivers/gpu/drm/i915/i915_drv.c   |  10 +-
 drivers/gpu/drm/i915/i915_drv.h   |   2 +-
 11 files changed, 364 insertions(+), 17 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index c36c8a4f0716..3379a0a6c91e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -155,6 +155,7 @@ gem-y += \
gem/i915_gem_throttle.o \
gem/i915_gem_tiling.o \
gem/i915_gem_ttm.o \
+   gem/i915_gem_ttm_pm.o \
gem/i915_gem_userptr.o \
gem/i915_gem_wait.o \
gem/i915_gemfs.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 2471f36aaff3..734cc8e16481 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -534,6 +534,7 @@ struct drm_i915_gem_object {
struct {
struct sg_table *cached_io_st;
struct i915_gem_object_page_iter get_io_page;
+   struct drm_i915_gem_object *backup;
bool created:1;
} ttm;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 8b9d7d14c4bd..9746c255ddcc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -5,6 +5,7 @@
  */
 
 #include "gem/i915_gem_pm.h"
+#include "gem/i915_gem_ttm_pm.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_gt_requests.h"
@@ -39,7 +40,79 @@ void i915_gem_suspend(struct drm_i915_private *i915)
i915_gem_drain_freed_objects(i915);
 }
 
-void i915_gem_suspend_late(struct drm_i915_private *i915)
+static int lmem_restore(struct drm_i915_private *i915, bool allow_gpu)
+{
+   struct intel_memory_region *mr;
+   int ret = 0, id;
+
+   for_each_memory_region(mr, i915, id) {
+   if (mr->type == INTEL_MEMORY_LOCAL) {
+   ret = i915_ttm_restore_region(mr, allow_gpu);
+   if (ret)
+   break;
+   }
+   }
+
+   return ret;
+}
+
+static int lmem_suspend(struct drm_i915_private *i915, bool allow_gpu,
+   bool backup_pinned)
+{
+   struct intel_memory_region *mr;
+   int ret = 0, id;
+
+   for_each_memory_region(mr, i915, id) {
+   if (mr->type == INTEL_MEMORY_LOCAL) {
+   ret = i915_ttm_backup_region(mr, allow_gpu, 
backup_pinned);
+   if (ret)
+   break;
+   }
+   }
+
+   return ret;
+}
+
+static void lmem_recover(struct drm_i915_private *i915)
+{
+   struct intel_memory_region *mr;
+   int id;
+
+   for_each_memory_region(mr, i915, id)
+   if (mr->type == INTEL_MEMORY_LOCAL)
+   i915_ttm_recover_region(mr);
+}
+
+int i915_gem_backup_suspend(struct drm_i915_private *i915)
+{
+   int ret;
+
+   /* Opportunistically try to evict unpinned objects */
+   ret = lmem_suspend(i915, true, false);
+   if (ret)
+   goto out_recover;
+
+   i915_gem_suspend(i915);
+
+   /*
+* More objects may have become unpinned as requests were
+* retired. Now try to evict again. The gt may be wedged here
+* in which case we automatically fall back to memcpy.
+*/
+
+   ret = lmem_suspend(i915, true, false);
+   if (ret)
+   goto out_recover;
+
+   return 0;
+
+out_recover:
+   lmem_re

[PATCH v2 4/6] drm/i915/gt: Register the migrate contexts with their engines

2021-09-06 Thread Thomas Hellström
Pinned contexts, like the migrate contexts need reset after resume
since their context image may have been lost. Also the GuC needs to
register pinned contexts.

Add a list to struct intel_engine_cs where we add all pinned contexts on
creation, and traverse that list at resume time to reset the pinned
contexts.

This fixes the kms_pipe_crc_basic@suspend-read-crc-pipe-a selftest for now,
but proper LMEM backup / restore is needed for full suspend functionality.
However, note that even with full LMEM backup / restore it may be
desirable to keep the reset since backing up the migrate context images
must happen using memcpy() after the migrate context has become inactive,
and for performance- and other reasons we want to avoid memcpy() from
LMEM.

Also traverse the list at guc_init_lrc_mapping() calling
guc_kernel_context_pin() for the pinned contexts, like is already done
for the kernel context.

v2:
- Don't reset the contexts on each __engine_unpark() but rather at
  resume time (Chris Wilson).
v3:
- Reset contexts in the engine sanitize callback. (Chris Wilson)

Cc: Tvrtko Ursulin 
Cc: Matthew Auld 
Cc: Maarten Lankhorst 
Cc: Brost Matthew 
Cc: Chris Wilson 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |  8 +++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  4 
 drivers/gpu/drm/i915/gt/intel_engine_pm.c | 23 +++
 drivers/gpu/drm/i915/gt/intel_engine_pm.h |  2 ++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  7 ++
 .../drm/i915/gt/intel_execlists_submission.c  |  2 ++
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  3 +++
 drivers/gpu/drm/i915/gt/mock_engine.c |  2 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 +++---
 9 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e54351a170e2..a63631ea0ec4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -152,6 +152,14 @@ struct intel_context {
/** sseu: Control eu/slice partitioning */
struct intel_sseu sseu;
 
+   /**
+* pinned_contexts_link: List link for the engine's pinned contexts.
+* This is only used if this is a perma-pinned kernel context and
+* the list is assumed to only be manipulated during driver load
+* or unload time so no mutex protection currently.
+*/
+   struct list_head pinned_contexts_link;
+
u8 wa_bb_page; /* if set, page num reserved for context workarounds */
 
struct {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 332efea696a5..c606a4714904 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -320,6 +320,7 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
intel_engine_id id)
 
BUILD_BUG_ON(BITS_PER_TYPE(engine->mask) < I915_NUM_ENGINES);
 
+   INIT_LIST_HEAD(&engine->pinned_contexts_list);
engine->id = id;
engine->legacy_idx = INVALID_ENGINE;
engine->mask = BIT(id);
@@ -875,6 +876,8 @@ intel_engine_create_pinned_context(struct intel_engine_cs 
*engine,
return ERR_PTR(err);
}
 
+   list_add_tail(&ce->pinned_contexts_link, &engine->pinned_contexts_list);
+
/*
 * Give our perma-pinned kernel timelines a separate lockdep class,
 * so that we can use them from within the normal user timelines
@@ -897,6 +900,7 @@ void intel_engine_destroy_pinned_context(struct 
intel_context *ce)
list_del(&ce->timeline->engine_link);
mutex_unlock(&hwsp->vm->mutex);
 
+   list_del(&ce->pinned_contexts_link);
intel_context_unpin(ce);
intel_context_put(ce);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 1f07ac4e0672..dacd62773735 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -298,6 +298,29 @@ void intel_engine_init__pm(struct intel_engine_cs *engine)
intel_engine_init_heartbeat(engine);
 }
 
+/**
+ * intel_engine_reset_pinned_contexts - Reset the pinned contexts of
+ * an engine.
+ * @engine: The engine whose pinned contexts we want to reset.
+ *
+ * Typically the pinned context LMEM images lose or get their content
+ * corrupted on suspend. This function resets their images.
+ */
+void intel_engine_reset_pinned_contexts(struct intel_engine_cs *engine)
+{
+   struct intel_context *ce;
+
+   list_for_each_entry(ce, &engine->pinned_contexts_list,
+   pinned_contexts_link) {
+   /* kernel context gets reset at __engine_unpark() */
+   if (ce == engine->kernel_context)
+   continue;
+
+   dbg_poison_ce(ce);
+   ce->ops->reset(ce);
+   }
+}
+
 #if 

[PATCH v2 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend

2021-09-06 Thread Thomas Hellström
Pinned context images are now reset during resume. Don't back them up,
and assuming that rings can be assumed empty at suspend, don't back them
up either.

Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that an
object is allowed to lose its content on suspend.

Signed-off-by: Thomas Hellström 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h| 17 ++---
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c  |  3 +++
 drivers/gpu/drm/i915/gt/intel_lrc.c |  3 ++-
 drivers/gpu/drm/i915/gt/intel_ring.c|  3 ++-
 4 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 734cc8e16481..66123ba46247 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -288,16 +288,19 @@ struct drm_i915_gem_object {
I915_SELFTEST_DECLARE(struct list_head st_link);
 
unsigned long flags;
-#define I915_BO_ALLOC_CONTIGUOUS BIT(0)
-#define I915_BO_ALLOC_VOLATILE   BIT(1)
-#define I915_BO_ALLOC_CPU_CLEAR  BIT(2)
-#define I915_BO_ALLOC_USER   BIT(3)
+#define I915_BO_ALLOC_CONTIGUOUS  BIT(0)
+#define I915_BO_ALLOC_VOLATILEBIT(1)
+#define I915_BO_ALLOC_CPU_CLEAR   BIT(2)
+#define I915_BO_ALLOC_USERBIT(3)
+/* Object may lose its contents on suspend / resume */
+#define I915_BO_ALLOC_PM_VOLATILE BIT(4)
 #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
-I915_BO_ALLOC_USER)
-#define I915_BO_READONLY BIT(4)
-#define I915_TILING_QUIRK_BIT5 /* unknown swizzling; do not release! */
+I915_BO_ALLOC_USER | \
+I915_BO_ALLOC_PM_VOLATILE)
+#define I915_BO_READONLY  BIT(5)
+#define I915_TILING_QUIRK_BIT 6 /* unknown swizzling; do not release! */
 
/**
 * @mem_flags - Mutable placement-related flags
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 3884bf45dab8..eaceecfc3f19 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -61,6 +61,9 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region 
*apply,
if (!pm_apply->backup_pinned)
return 0;
 
+   if (obj->flags & I915_BO_ALLOC_PM_VOLATILE)
+   return 0;
+
sys_region = i915->mm.regions[INTEL_REGION_SMEM];
backup = i915_gem_object_create_region(sys_region,
   obj->base.size,
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 6ba8daea2f56..3ef9eaf8c50e 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -942,7 +942,8 @@ __lrc_alloc_state(struct intel_context *ce, struct 
intel_engine_cs *engine)
context_size += PAGE_SIZE;
}
 
-   obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
+   obj = i915_gem_object_create_lmem(engine->i915, context_size,
+ I915_BO_ALLOC_PM_VOLATILE);
if (IS_ERR(obj))
obj = i915_gem_object_create_shmem(engine->i915, context_size);
if (IS_ERR(obj))
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c
index 7c4d5158e03b..2fdd52b62092 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -112,7 +112,8 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt 
*ggtt, int size)
struct drm_i915_gem_object *obj;
struct i915_vma *vma;
 
-   obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE);
+   obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE |
+ I915_BO_ALLOC_PM_VOLATILE);
if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))
-- 
2.31.1



[PATCH v2 2/6] drm/i915/gem: Implement a function to process all gem objects of a region

2021-09-06 Thread Thomas Hellström
An upcoming common pattern is to traverse the region object list and
perform certain actions on all objects in a region. It's a little tricky
to get the list locking right, in particular since a gem object may
change region unless it's pinned or the object lock is held.

Define a function that does this for us and that takes an argument that
defines the action to be performed on each object.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_region.c | 70 ++
 drivers/gpu/drm/i915/gem/i915_gem_region.h | 33 ++
 2 files changed, 103 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c 
b/drivers/gpu/drm/i915/gem/i915_gem_region.c
index 1f557b2178ed..a016ccec36f3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c
@@ -80,3 +80,73 @@ i915_gem_object_create_region(struct intel_memory_region 
*mem,
i915_gem_object_free(obj);
return ERR_PTR(err);
 }
+
+/**
+ * i915_gem_process_region - Iterate over all objects of a region using ops
+ * to process and optionally skip objects
+ * @mr: The memory region
+ * @apply: ops and private data
+ *
+ * This function can be used to iterate over the regions object list,
+ * checking whether to skip objects, and, if not, lock the objects and
+ * process them using the supplied ops. Note that this function temporarily
+ * removes objects from the region list while iterating, so that if run
+ * concurrently with itself may not iterate over all objects.
+ *
+ * Return: 0 if successful, negative error code on failure.
+ */
+int i915_gem_process_region(struct intel_memory_region *mr,
+   struct i915_gem_apply_to_region *apply)
+{
+   const struct i915_gem_apply_to_region_ops *ops = apply->ops;
+   struct drm_i915_gem_object *obj;
+   struct list_head still_in_list;
+   int ret = 0;
+
+   /*
+* In the future, a non-NULL apply->ww could mean the caller is
+* already in a locking transaction and provides its own context.
+*/
+   GEM_WARN_ON(apply->ww);
+
+   INIT_LIST_HEAD(&still_in_list);
+   mutex_lock(&mr->objects.lock);
+   for (;;) {
+   struct i915_gem_ww_ctx ww;
+
+   obj = list_first_entry_or_null(&mr->objects.list, typeof(*obj),
+  mm.region_link);
+   if (!obj)
+   break;
+
+   list_move_tail(&obj->mm.region_link, &still_in_list);
+   if (!kref_get_unless_zero(&obj->base.refcount))
+   continue;
+
+   /*
+* Note: Someone else might be migrating the object at this
+* point. The object's region is not stable until we lock
+* the object.
+*/
+   mutex_unlock(&mr->objects.lock);
+   apply->ww = &ww;
+   for_i915_gem_ww(&ww, ret, apply->interruptible) {
+   ret = i915_gem_object_lock(obj, apply->ww);
+   if (ret)
+   continue;
+
+   if (obj->mm.region == mr)
+   ret = ops->process_obj(apply, obj);
+   /* Implicit object unlock */
+   }
+
+   i915_gem_object_put(obj);
+   mutex_lock(&mr->objects.lock);
+   if (ret)
+   break;
+   }
+   list_splice_tail(&still_in_list, &mr->objects.list);
+   mutex_unlock(&mr->objects.lock);
+
+   return ret;
+}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.h 
b/drivers/gpu/drm/i915/gem/i915_gem_region.h
index 1008e580a89a..f62195847056 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.h
@@ -12,6 +12,37 @@ struct intel_memory_region;
 struct drm_i915_gem_object;
 struct sg_table;
 
+struct i915_gem_apply_to_region;
+
+/**
+ * struct i915_gem_apply_to_region_ops - ops to use when iterating over all
+ * region objects.
+ */
+struct i915_gem_apply_to_region_ops {
+   /**
+* process_obj - Process the current object
+* @apply: Embed this for provate data
+* @obj: The current object.
+*/
+   int (*process_obj)(struct i915_gem_apply_to_region *apply,
+  struct drm_i915_gem_object *obj);
+};
+
+/**
+ * struct i915_gem_apply_to_region - Argument to the struct
+ * i915_gem_apply_to_region_ops functions.
+ * @ops: The ops for the operation.
+ * @ww: Locking context used for the transaction.
+ * @interruptible: Whether to perform object locking interruptible.
+ *
+ * This structure is intended to be embedded in a private struct if needed
+ */
+struct i915_gem_apply_to_region {
+   const struct i915_gem_apply_to_region_ops *ops;
+   struct i915_gem_ww_ctx *ww;
+   u32 interruptible:1;
+};
+
 void i915_gem_object_init_memory_region(struct drm_i9

[PATCH v2 1/6] drm/i915/ttm: Implement a function to copy the contents of two TTM-base objects

2021-09-06 Thread Thomas Hellström
When backing up or restoring contents of pinned objects at suspend /
resume time we need to allocate a new object as the backup. Add a function
to facilitate copies between the two. Some data needs to be copied before
the migration context is ready for operation, so make sure we can
disable accelerated copies.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 69 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h |  4 ++
 2 files changed, 64 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 59ca53a3ef6a..df2dcbad1eb9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -432,6 +432,7 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
   bool clear,
   struct ttm_resource *dst_mem,
+  struct ttm_tt *dst_ttm,
   struct sg_table *dst_st)
 {
struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
@@ -441,14 +442,14 @@ static int i915_ttm_accel_move(struct ttm_buffer_object 
*bo,
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
struct sg_table *src_st;
struct i915_request *rq;
-   struct ttm_tt *ttm = bo->ttm;
+   struct ttm_tt *src_ttm = bo->ttm;
enum i915_cache_level src_level, dst_level;
int ret;
 
if (!i915->gt.migrate.context)
return -EINVAL;
 
-   dst_level = i915_ttm_cache_level(i915, dst_mem, ttm);
+   dst_level = i915_ttm_cache_level(i915, dst_mem, dst_ttm);
if (clear) {
if (bo->type == ttm_bo_type_kernel)
return -EINVAL;
@@ -465,10 +466,10 @@ static int i915_ttm_accel_move(struct ttm_buffer_object 
*bo,
}
intel_engine_pm_put(i915->gt.migrate.context->engine);
} else {
-   src_st = src_man->use_tt ? i915_ttm_tt_get_st(ttm) :
+   src_st = src_man->use_tt ? i915_ttm_tt_get_st(src_ttm) :
obj->ttm.cached_io_st;
 
-   src_level = i915_ttm_cache_level(i915, bo->resource, ttm);
+   src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
intel_engine_pm_get(i915->gt.migrate.context->engine);
ret = intel_context_migrate_copy(i915->gt.migrate.context,
 NULL, src_st->sgl, src_level,
@@ -488,11 +489,14 @@ static int i915_ttm_accel_move(struct ttm_buffer_object 
*bo,
 
 static void __i915_ttm_move(struct ttm_buffer_object *bo, bool clear,
struct ttm_resource *dst_mem,
-   struct sg_table *dst_st)
+   struct ttm_tt *dst_ttm,
+   struct sg_table *dst_st,
+   bool allow_accel)
 {
-   int ret;
+   int ret = -EINVAL;
 
-   ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_st);
+   if (allow_accel)
+   ret = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm, dst_st);
if (ret) {
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
struct intel_memory_region *dst_reg, *src_reg;
@@ -507,7 +511,7 @@ static void __i915_ttm_move(struct ttm_buffer_object *bo, 
bool clear,
GEM_BUG_ON(!dst_reg || !src_reg);
 
dst_iter = !cpu_maps_iomem(dst_mem) ?
-   ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
+   ttm_kmap_iter_tt_init(&_dst_iter.tt, dst_ttm) :
ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
 dst_st, dst_reg->region.start);
 
@@ -562,7 +566,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool 
evict,
 
clear = !cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
if (!(clear && ttm && !(ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)))
-   __i915_ttm_move(bo, clear, dst_mem, dst_st);
+   __i915_ttm_move(bo, clear, dst_mem, bo->ttm, dst_st, true);
 
ttm_bo_move_sync_cleanup(bo, dst_mem);
i915_ttm_adjust_domains_after_move(obj);
@@ -973,3 +977,50 @@ i915_gem_ttm_system_setup(struct drm_i915_private *i915,
intel_memory_region_set_name(mr, "system-ttm");
return mr;
 }
+
+/**
+ * i915_gem_obj_copy_ttm - Copy the contents of one ttm-based gem object to
+ * another
+ * @dst: The destination object
+ * @src: The source object
+ * @allow_accel: Allow using the blitter. Otherwise TTM memcpy is used.
+ * @intr: Whether to perform waits interruptible:
+ *
+ * Note: The caller is responsible for assuring that the underlying
+ * TTM objects are populated if needed and locked.
+ *
+ * Return: Zero on succ

[PATCH v2 0/6] drm/i915: Suspend / resume backup- and restore of LMEM.

2021-09-06 Thread Thomas Hellström
Implement backup and restore of LMEM during suspend / resume.
What complicates things a bit is handling of pinned LMEM memory during
suspend and the fact that we might be dealing with unmappable LMEM in
the future, which makes us want to restrict the number of pinned objects that
need memcpy resume.

The first two patches are prereq patches implementing object content copy
and a generic means of iterating through all objects in a region.
The third patch adds the backup / recover / restore functions and the
two last patches deal with restricting the number of objects we need to
use memcpy for.

v2:
- Some polishing of patch 4/6, see patch commit message for details (Chris
  Wilson)
- Rework of patch 3/6.

Thomas Hellström (6):
  drm/i915/ttm: Implement a function to copy the contents of two
TTM-base objects
  drm/i915/gem: Implement a function to process all gem objects of a
region
  drm/i915 Implement LMEM backup and restore for suspend / resume
  drm/i915/gt: Register the migrate contexts with their engines
  drm/i915: Don't back up pinned LMEM context images and rings during
suspend
  drm/i915: Reduce the number of objects subject to memcpy recover

 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   4 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_pm.c|  94 +++-
 drivers/gpu/drm/i915/gem/i915_gem_pm.h|   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_region.c|  70 ++
 drivers/gpu/drm/i915/gem/i915_gem_region.h|  33 +++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  98 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |  14 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c| 210 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.h|  24 ++
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |   2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |   5 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |   4 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   8 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |   4 +
 drivers/gpu/drm/i915/gt/intel_engine_pm.c |  23 ++
 drivers/gpu/drm/i915/gt/intel_engine_pm.h |   2 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 +
 .../drm/i915/gt/intel_execlists_submission.c  |   2 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c  |   2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c|   2 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |   4 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c   |   3 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h   |   9 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c   |   3 +-
 drivers/gpu/drm/i915/gt/intel_migrate.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |  13 +-
 drivers/gpu/drm/i915/gt/intel_ring.c  |   3 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   3 +
 drivers/gpu/drm/i915/gt/mock_engine.c |   2 +
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  12 +-
 drivers/gpu/drm/i915/gvt/scheduler.c  |   2 +-
 drivers/gpu/drm/i915/i915_drv.c   |  10 +-
 drivers/gpu/drm/i915/i915_drv.h   |   2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   4 +-
 38 files changed, 649 insertions(+), 60 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.h

-- 
2.31.1



Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-09-06 Thread Rob Clark
On Mon, Sep 6, 2021 at 1:02 AM Amit Pundir  wrote:
>
> On Sat, 4 Sept 2021 at 01:55, Rob Clark  wrote:
> >
> > On Fri, Sep 3, 2021 at 12:39 PM John Stultz  wrote:
> > >
> > > On Thu, Jul 29, 2021 at 1:49 PM Rob Clark  wrote:
> > > > On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
> > > >  wrote:
> > > > > On 29/07/2021 21:24, Rob Clark wrote:
> > > > > > On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
> > > > > >  wrote:
> > > > > >>
> > > > > >> Hi Rob,
> > > > > >>
> > > > > >> I've done some more testing! It looks like before that patch 
> > > > > >> ("drm/msm: Devfreq tuning") the GPU would never get above
> > > > > >> the second frequency in the OPP table (342MHz) (at least, not in 
> > > > > >> glxgears). With the patch applied it would more
> > > > > >> aggressively jump up to the max frequency which seems to be 
> > > > > >> unstable at the default regulator voltages.
> > > > > >
> > > > > > *ohh*, yeah, ok, that would explain it
> > > > > >
> > > > > >> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 
> > > > > >> 0.988v (instead of the stock 0.516v) makes the GPU stable
> > > > > >> at the higher frequencies.
> > > > > >>
> > > > > >> Applying this patch reverts the behaviour, and the GPU never goes 
> > > > > >> above 342MHz in glxgears, losing ~30% performance in
> > > > > >> glxgear.
> > > > > >>
> > > > > >> I think (?) that enabling CPR support would be the proper solution 
> > > > > >> to this - that would ensure that the regulators run
> > > > > >> at the voltage the hardware needs to be stable.
> > > > > >>
> > > > > >> Is hacking the voltage higher (although ideally not quite that 
> > > > > >> high) an acceptable short term solution until we have
> > > > > >> CPR? Or would it be safer to just not make use of the higher 
> > > > > >> frequencies on a630 for now?
> > > > > >>
> > > > > >
> > > > > > tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is 
> > > > > > already
> > > > > > on CC and I added sboyd, maybe one of them knows better.
> > > > > >
> > > > > > In the short term, removing the higher problematic OPPs from dts 
> > > > > > might
> > > > > > be a better option than this patch (which I'm dropping), since there
> > > > > > is nothing stopping other workloads from hitting higher OPPs.
> > > > > Oh yeah that sounds like a more sensible workaround than mine .
> > > > > >
> > > > > > I'm slightly curious why I didn't have problems at higher OPPs on my
> > > > > > c630 laptop (sdm850)
> > > > > Perhaps you won the sillicon lottery - iirc sdm850 is binned for 
> > > > > higher clocks as is out of the factory.
> > > > >
> > > > > Would it be best to drop the OPPs for all devices? Or just those 
> > > > > affected? I guess it's possible another c630 might
> > > > > crash where yours doesn't?
> > > >
> > > > I've not heard any reports of similar issues from the handful of other
> > > > folks with c630's on #aarch64-laptops.. but I can't really say if that
> > > > is luck or not.
> > > >
> > > > Maybe just remove it for affected devices?  But I'll defer to Bjorn.
> > >
> > > Just as another datapoint, I was just marveling at how suddenly smooth
> > > the UI was performing on db845c and Caleb pointed me at the "drm/msm:
> > > Devfreq tuning" patch as the likely cause of the improvement, and
> > > mid-discussion my board crashed into USB crash mode:
> > > [  146.157696][C0] adreno 500.gpu: CP | AHB bus error
> > > [  146.163303][C0] adreno 500.gpu: CP | AHB bus error
> > > [  146.168837][C0] adreno 500.gpu: RBBM | ATB bus overflow
> > > [  146.174960][C0] adreno 500.gpu: CP | HW fault | 
> > > status=0x
> > > [  146.181917][C0] adreno 500.gpu: CP | AHB bus error
> > > [  146.187547][C0] adreno 500.gpu: CP illegal instruction error
> > > [  146.194009][C0] adreno 500.gpu: CP | AHB bus error
> > > [  146.308909][T9] Internal error: synchronous external abort:
> > > 9610 [#1] PREEMPT SMP
> > > [  146.317150][T9] Modules linked in:
> > > [  146.320941][T9] CPU: 3 PID: 9 Comm: kworker/u16:1 Tainted: G
> > > W 5.14.0-mainline-06795-g42b258c2275c #24
> > > [  146.331974][T9] Hardware name: Thundercomm Dragonboar
> > > Format: Log Type - Time(microsec) - Message - Optional Info
> > > Log Type: B - Since Boot(Power On Reset),  D - Delta,  S - Statistic
> > > S - QC_IMAGE_VERSION_STRING=BOOT.XF.2.0-00371-SDM845LZB-1
> > > S - IMAGE_VARIANT_STRING=SDM845LA
> > > S - OEM_IMAGE_VERSION_STRING=TSBJ-FA-PC-02170
> > >
> > > So Caleb sent me to this thread. :)
> > >
> > > I'm still trying to trip it again, but it does seem like db845c is
> > > also seeing some stability issues with Linus' HEAD.
> > >
> >
> > Caleb's original pastebin seems to have expired (or at least require
> > some sort of ubuntu login to access).. were the crashes he was seeing
> > also 'AHB bus error'?
>
> I can reproduce this hard crash
> https://www.irccloud.com/pastebin/Cu6UJntE/ and a gpu lockup
> https://www.i

Re: [Intel-gfx] [PATCH] drm/i915/selftests: fixup igt_shrink_thp

2021-09-06 Thread Tvrtko Ursulin



On 06/09/2021 14:48, Matthew Auld wrote:

On 06/09/2021 13:53, Tvrtko Ursulin wrote:


On 06/09/2021 13:30, Matthew Auld wrote:

On 06/09/2021 13:19, Tvrtko Ursulin wrote:


On 06/09/2021 10:17, Matthew Auld wrote:
Since the object might still be active here, the shrink_all will 
simply

ignore it, which blows up in the test, since the pages will still be
there. Currently THP is disabled which should result in the test being
skipped, but if we ever re-enable THP we might start seeing the 
failure.

Fix this by forcing I915_SHRINK_ACTIVE.

v2: Some machine in the shard runs doesn't seem to have any available
swap when running this test. Try to handle this.

Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
Cc: Thomas Hellström 
Reviewed-by: Tvrtko Ursulin  #v1
---
  .../gpu/drm/i915/gem/selftests/huge_pages.c   | 31 
++-

  1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c

index a094f3ce1a90..46ea1997c114 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1519,6 +1519,7 @@ static int igt_shrink_thp(void *arg)
  struct i915_vma *vma;
  unsigned int flags = PIN_USER;
  unsigned int n;
+    bool should_swap;
  int err = 0;
  /*
@@ -1567,23 +1568,39 @@ static int igt_shrink_thp(void *arg)
  break;
  }
  i915_gem_context_unlock_engines(ctx);
+    /*
+ * Nuke everything *before* we unpin the pages so we can be 
reasonably
+ * sure that when later checking get_nr_swap_pages() that some 
random

+ * leftover object doesn't steal the remaining swap space.
+ */
+    i915_gem_shrink(NULL, i915, -1UL, NULL,
+    I915_SHRINK_BOUND |
+    I915_SHRINK_UNBOUND |
+    I915_SHRINK_ACTIVE);
  i915_vma_unpin(vma);
  if (err)
  goto out_put;
+
  /*
- * Now that the pages are *unpinned* shrink-all should invoke
- * shmem to truncate our pages.
+ * Now that the pages are *unpinned* shrinking should invoke
+ * shmem to truncate our pages, if we have available swap.
   */
-    i915_gem_shrink_all(i915);
-    if (i915_gem_object_has_pages(obj)) {
-    pr_err("shrink-all didn't truncate the pages\n");
+    should_swap = get_nr_swap_pages() > 0;
+    i915_gem_shrink(NULL, i915, -1UL, NULL,
+    I915_SHRINK_BOUND |
+    I915_SHRINK_UNBOUND |
+    I915_SHRINK_ACTIVE);
+    if (should_swap == i915_gem_object_has_pages(obj)) {


Hmm is there any value running the test if no swap (given objects 
used by the test are "willneed"), or you could simplify and just do 
early skip?


Maybe. My thinking was that this adds some coverage if say the device 
is not configured with swap. i.e assert that the pages don't 
magically disappear, and that their contents still persist etc.


Happy to make it skip instead though?


So reducing it to a basic shrinker test in that case. Hm.. do you know 
if we have a non THP specific tests for that already somewhere in 
selftests (I can't spot any), or just in IGT?


Just IGT I think, outside of some cases where we call gem_shrink in very 
specific places, which would be hard to do from an IGT.




If we indeed don't have it in selftests, then I guess question is 
whether it is warranted to "hide" such a basic test in the THP 
"drawer", or instead adding a generic shrinker test should be 
considered. (And one could then follow with a question should a basic 
generic test have a THP sub-test.)


The reason for the selftest vs IGT is mostly because userspace doesn't 
have any knowledge of the underlying pages, or whether THP is used. IIRC 
there was some issue with THP + our shmem backend in the past, so also 
adding some basic coverage for THP + i915-gem shrinker seemed 
reasonable. Even if we don't have swap space, I think it still makes 
some sense to call into gem_shrink with our target THP object.


Okay, as you have probably guessed I have no strong feelings either way, 
so you can freely upgrade my r-b to current.


Regards,

Tvrtko





It's hard to say where the boundary for selftests-vs-IGT coverage 
should be in this case. I mean would it be warranted to add such a 
generic shrinker selftest. It is mostly testable from userspace, but 
kernel can do a few more introspections and sanity checks at cost of 
growing kernel code.


Regards,

Tvrtko





Regards,

Tvrtko


+    pr_err("unexpected pages mismatch, should_swap=%s\n",
+   yesno(should_swap));
  err = -EINVAL;
  goto out_put;
  }
-    if (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys) {
-    pr_err("residual page-size bits left\n");
+    if (should_swap == (obj->mm.page_sizes.sg || 
obj->mm.page_sizes.phys)) {
+    pr_err("unexpected residual page-size bits, 
should_swap=%s\n",

+   yesno(should_swap));
  err = -EINVAL;
  goto out_put

Re: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread

2021-09-06 Thread Jingwen Chen
Hi Christian/Andrey/Daniel,

I read Boris's patch about ordered workqueue and I think maybe we can
leverage this change.
https://lore.kernel.org/dri-devel/20210625133327.2598825-2-boris.brezil...@collabora.com/

As the TDR race condition we are talking about is caused by a bailing
job being deleted from pending list. While if we use the ordered
workqueue for timedout in the driver, there will be no bailing job.

Do you have any suggestions?

Best Regards,
JingWen Chen

On Mon Sep 06, 2021 at 02:36:52PM +0800, Liu, Monk wrote:
> [AMD Official Use Only]
> 
> > I'm fearing that just repeating what Alex said, but to make it clear 
> > once more: That is *not* necessary!
> >
> > The shared repository is owned by upstream maintainers and they are 
> > usually free to do restructuring work without getting acknowledge from 
> > every single driver maintainer.
> 
> Hi Daniel
> 
> Anyway thanks for officially confirm to me of working model & policy in 
> community, I don't want to put my opinion here due to that's not my call to 
> change no matter how.
> I only want to let this diagnostic TDR scheme going to a good end for AMD or 
> even for all DRM vendor.
> 
> How about this way, we still have a final patch not landed in DRM scheduler 
> and I would like jingwen to present it to you and AlexD/Christian/Andrey,  I 
> believe you will have concerns or objections regarding this patch, but that's 
> fine, let us figure it out together, how to make it acceptable by you and 
> other vendors that working with DRM scheduler.
> 
> P.S.:  I had to repeat myself again, we are not popping up new idea suddenly, 
> it is disconnection issue, we didn't have changes (or plan to have changes) 
> in DRM scheduler before, but eventually we found we must make job_timeout and 
> sched_main to work in a serialized otherwise it won't work based on current 
> scheduler's code structure.
> 
> Thanks 
> 
> --
> Monk Liu | Cloud-GPU Core team
> --
> 
> -Original Message-
> From: Daniel Vetter  
> Sent: Friday, September 3, 2021 12:11 AM
> To: Koenig, Christian 
> Cc: Liu, Monk ; Dave Airlie ; Alex 
> Deucher ; Grodzovsky, Andrey 
> ; Chen, JingWen ; DRI 
> Development ; amd-...@lists.freedesktop.org
> Subject: Re: [diagnostic TDR mode patches] unify our solution 
> opinions/suggestions in one thread
> 
> On Thu, Sep 2, 2021 at 1:00 PM Christian König  
> wrote:
> >
> > Hi Monk,
> >
> > Am 02.09.21 um 07:52 schrieb Liu, Monk:
> > > [AMD Official Use Only]
> > >
> > > I'm not sure I can add much to help this along, I'm sure Alex has 
> > > some internal training, Once your driver is upstream, it belongs to 
> > > upstream, you can maintain it, but you no longer control it 100%, it's a 
> > > tradeoff, it's not one companies always understand.
> > > Usually people are fine developing away internally, but once interaction 
> > > with other parts of the kernel/subsystem is required they have the 
> > > realisation that they needed to work upstream 6 months earlier.
> > > The best time to interact with upstream was 6 months ago, the second best 
> > > time is now.
> > > <<<
> > >
> > > Daniel/AlexD
> > >
> > > I didn't mean your changes on AMD driver need my personal approval 
> > > or review ... and  I'm totally already get used that our driver is not 
> > > 100% under control by AMDers, but supposedly any one from community 
> > > (including you) who tend to change AMD's driver need at least to get 
> > > approvement from someone in AMD, e.g.: AlexD or Christian, doesn't that 
> > > reasonable?
> >
> > I'm fearing that just repeating what Alex said, but to make it clear 
> > once more: That is *not* necessary!
> >
> > The shared repository is owned by upstream maintainers and they are 
> > usually free to do restructuring work without getting acknowledge from 
> > every single driver maintainer.
> >
> > Anybody can of course technically object to upstream design decisions, 
> > but that means that you need to pay attention to the mailing lists in 
> > the first place.
> >
> > > just like we need your approve if we try to modify DRM-sched, or need 
> > > panfrost's approval if we need to change panfrost code ...
> > >
> > > by only CC AMD's engineers looks not quite properly, how do you know if 
> > > your changes (on AMD code part) are conflicting with AMD's on-going 
> > > internal features/refactoring or not ?
> >
> > Well because AMD is supposed to work in public as much as possible and 
> > ask upstream before doing changes to the code base.
> >
> > Additional to that design decisions are supposed to be discussed on 
> > the mailing list and *not* internally.
> 
> Yeah I'm honestly really surprised about the course of this discussion here. 
> With Alex, Christian and others amd has a lot of folks with years/decades of 
> experience in how to collaborate in upstream, when to pull in others 
> proactively and when that's not needed, and in general how to 

Re: [Intel-gfx] [PATCH] drm/i915/selftests: fixup igt_shrink_thp

2021-09-06 Thread Matthew Auld

On 06/09/2021 13:53, Tvrtko Ursulin wrote:


On 06/09/2021 13:30, Matthew Auld wrote:

On 06/09/2021 13:19, Tvrtko Ursulin wrote:


On 06/09/2021 10:17, Matthew Auld wrote:

Since the object might still be active here, the shrink_all will simply
ignore it, which blows up in the test, since the pages will still be
there. Currently THP is disabled which should result in the test being
skipped, but if we ever re-enable THP we might start seeing the 
failure.

Fix this by forcing I915_SHRINK_ACTIVE.

v2: Some machine in the shard runs doesn't seem to have any available
swap when running this test. Try to handle this.

Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
Cc: Thomas Hellström 
Reviewed-by: Tvrtko Ursulin  #v1
---
  .../gpu/drm/i915/gem/selftests/huge_pages.c   | 31 
++-

  1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c

index a094f3ce1a90..46ea1997c114 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1519,6 +1519,7 @@ static int igt_shrink_thp(void *arg)
  struct i915_vma *vma;
  unsigned int flags = PIN_USER;
  unsigned int n;
+    bool should_swap;
  int err = 0;
  /*
@@ -1567,23 +1568,39 @@ static int igt_shrink_thp(void *arg)
  break;
  }
  i915_gem_context_unlock_engines(ctx);
+    /*
+ * Nuke everything *before* we unpin the pages so we can be 
reasonably
+ * sure that when later checking get_nr_swap_pages() that some 
random

+ * leftover object doesn't steal the remaining swap space.
+ */
+    i915_gem_shrink(NULL, i915, -1UL, NULL,
+    I915_SHRINK_BOUND |
+    I915_SHRINK_UNBOUND |
+    I915_SHRINK_ACTIVE);
  i915_vma_unpin(vma);
  if (err)
  goto out_put;
+
  /*
- * Now that the pages are *unpinned* shrink-all should invoke
- * shmem to truncate our pages.
+ * Now that the pages are *unpinned* shrinking should invoke
+ * shmem to truncate our pages, if we have available swap.
   */
-    i915_gem_shrink_all(i915);
-    if (i915_gem_object_has_pages(obj)) {
-    pr_err("shrink-all didn't truncate the pages\n");
+    should_swap = get_nr_swap_pages() > 0;
+    i915_gem_shrink(NULL, i915, -1UL, NULL,
+    I915_SHRINK_BOUND |
+    I915_SHRINK_UNBOUND |
+    I915_SHRINK_ACTIVE);
+    if (should_swap == i915_gem_object_has_pages(obj)) {


Hmm is there any value running the test if no swap (given objects 
used by the test are "willneed"), or you could simplify and just do 
early skip?


Maybe. My thinking was that this adds some coverage if say the device 
is not configured with swap. i.e assert that the pages don't magically 
disappear, and that their contents still persist etc.


Happy to make it skip instead though?


So reducing it to a basic shrinker test in that case. Hm.. do you know 
if we have a non THP specific tests for that already somewhere in 
selftests (I can't spot any), or just in IGT?


Just IGT I think, outside of some cases where we call gem_shrink in very 
specific places, which would be hard to do from an IGT.




If we indeed don't have it in selftests, then I guess question is 
whether it is warranted to "hide" such a basic test in the THP "drawer", 
or instead adding a generic shrinker test should be considered. (And one 
could then follow with a question should a basic generic test have a THP 
sub-test.)


The reason for the selftest vs IGT is mostly because userspace doesn't 
have any knowledge of the underlying pages, or whether THP is used. IIRC 
there was some issue with THP + our shmem backend in the past, so also 
adding some basic coverage for THP + i915-gem shrinker seemed 
reasonable. Even if we don't have swap space, I think it still makes 
some sense to call into gem_shrink with our target THP object.




It's hard to say where the boundary for selftests-vs-IGT coverage should 
be in this case. I mean would it be warranted to add such a generic 
shrinker selftest. It is mostly testable from userspace, but kernel can 
do a few more introspections and sanity checks at cost of growing kernel 
code.


Regards,

Tvrtko





Regards,

Tvrtko


+    pr_err("unexpected pages mismatch, should_swap=%s\n",
+   yesno(should_swap));
  err = -EINVAL;
  goto out_put;
  }
-    if (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys) {
-    pr_err("residual page-size bits left\n");
+    if (should_swap == (obj->mm.page_sizes.sg || 
obj->mm.page_sizes.phys)) {

+    pr_err("unexpected residual page-size bits, should_swap=%s\n",
+   yesno(should_swap));
  err = -EINVAL;
  goto out_put;
  }



Re: [Intel-gfx] [PATCH] drm/i915/selftests: fixup igt_shrink_thp

2021-09-06 Thread Tvrtko Ursulin



On 06/09/2021 13:30, Matthew Auld wrote:

On 06/09/2021 13:19, Tvrtko Ursulin wrote:


On 06/09/2021 10:17, Matthew Auld wrote:

Since the object might still be active here, the shrink_all will simply
ignore it, which blows up in the test, since the pages will still be
there. Currently THP is disabled which should result in the test being
skipped, but if we ever re-enable THP we might start seeing the failure.
Fix this by forcing I915_SHRINK_ACTIVE.

v2: Some machine in the shard runs doesn't seem to have any available
swap when running this test. Try to handle this.

Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
Cc: Thomas Hellström 
Reviewed-by: Tvrtko Ursulin  #v1
---
  .../gpu/drm/i915/gem/selftests/huge_pages.c   | 31 ++-
  1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c

index a094f3ce1a90..46ea1997c114 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1519,6 +1519,7 @@ static int igt_shrink_thp(void *arg)
  struct i915_vma *vma;
  unsigned int flags = PIN_USER;
  unsigned int n;
+    bool should_swap;
  int err = 0;
  /*
@@ -1567,23 +1568,39 @@ static int igt_shrink_thp(void *arg)
  break;
  }
  i915_gem_context_unlock_engines(ctx);
+    /*
+ * Nuke everything *before* we unpin the pages so we can be 
reasonably
+ * sure that when later checking get_nr_swap_pages() that some 
random

+ * leftover object doesn't steal the remaining swap space.
+ */
+    i915_gem_shrink(NULL, i915, -1UL, NULL,
+    I915_SHRINK_BOUND |
+    I915_SHRINK_UNBOUND |
+    I915_SHRINK_ACTIVE);
  i915_vma_unpin(vma);
  if (err)
  goto out_put;
+
  /*
- * Now that the pages are *unpinned* shrink-all should invoke
- * shmem to truncate our pages.
+ * Now that the pages are *unpinned* shrinking should invoke
+ * shmem to truncate our pages, if we have available swap.
   */
-    i915_gem_shrink_all(i915);
-    if (i915_gem_object_has_pages(obj)) {
-    pr_err("shrink-all didn't truncate the pages\n");
+    should_swap = get_nr_swap_pages() > 0;
+    i915_gem_shrink(NULL, i915, -1UL, NULL,
+    I915_SHRINK_BOUND |
+    I915_SHRINK_UNBOUND |
+    I915_SHRINK_ACTIVE);
+    if (should_swap == i915_gem_object_has_pages(obj)) {


Hmm is there any value running the test if no swap (given objects used 
by the test are "willneed"), or you could simplify and just do early 
skip?


Maybe. My thinking was that this adds some coverage if say the device is 
not configured with swap. i.e assert that the pages don't magically 
disappear, and that their contents still persist etc.


Happy to make it skip instead though?


So reducing it to a basic shrinker test in that case. Hm.. do you know 
if we have a non THP specific tests for that already somewhere in 
selftests (I can't spot any), or just in IGT?


If we indeed don't have it in selftests, then I guess question is 
whether it is warranted to "hide" such a basic test in the THP "drawer", 
or instead adding a generic shrinker test should be considered. (And one 
could then follow with a question should a basic generic test have a THP 
sub-test.)


It's hard to say where the boundary for selftests-vs-IGT coverage should 
be in this case. I mean would it be warranted to add such a generic 
shrinker selftest. It is mostly testable from userspace, but kernel can 
do a few more introspections and sanity checks at cost of growing kernel 
code.


Regards,

Tvrtko





Regards,

Tvrtko


+    pr_err("unexpected pages mismatch, should_swap=%s\n",
+   yesno(should_swap));
  err = -EINVAL;
  goto out_put;
  }
-    if (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys) {
-    pr_err("residual page-size bits left\n");
+    if (should_swap == (obj->mm.page_sizes.sg || 
obj->mm.page_sizes.phys)) {

+    pr_err("unexpected residual page-size bits, should_swap=%s\n",
+   yesno(should_swap));
  err = -EINVAL;
  goto out_put;
  }



Re: [Intel-gfx] [PATCH] drm/i915/selftests: fixup igt_shrink_thp

2021-09-06 Thread Matthew Auld

On 06/09/2021 13:19, Tvrtko Ursulin wrote:


On 06/09/2021 10:17, Matthew Auld wrote:

Since the object might still be active here, the shrink_all will simply
ignore it, which blows up in the test, since the pages will still be
there. Currently THP is disabled which should result in the test being
skipped, but if we ever re-enable THP we might start seeing the failure.
Fix this by forcing I915_SHRINK_ACTIVE.

v2: Some machine in the shard runs doesn't seem to have any available
swap when running this test. Try to handle this.

Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
Cc: Thomas Hellström 
Reviewed-by: Tvrtko Ursulin  #v1
---
  .../gpu/drm/i915/gem/selftests/huge_pages.c   | 31 ++-
  1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c

index a094f3ce1a90..46ea1997c114 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1519,6 +1519,7 @@ static int igt_shrink_thp(void *arg)
  struct i915_vma *vma;
  unsigned int flags = PIN_USER;
  unsigned int n;
+    bool should_swap;
  int err = 0;
  /*
@@ -1567,23 +1568,39 @@ static int igt_shrink_thp(void *arg)
  break;
  }
  i915_gem_context_unlock_engines(ctx);
+    /*
+ * Nuke everything *before* we unpin the pages so we can be 
reasonably
+ * sure that when later checking get_nr_swap_pages() that some 
random

+ * leftover object doesn't steal the remaining swap space.
+ */
+    i915_gem_shrink(NULL, i915, -1UL, NULL,
+    I915_SHRINK_BOUND |
+    I915_SHRINK_UNBOUND |
+    I915_SHRINK_ACTIVE);
  i915_vma_unpin(vma);
  if (err)
  goto out_put;
+
  /*
- * Now that the pages are *unpinned* shrink-all should invoke
- * shmem to truncate our pages.
+ * Now that the pages are *unpinned* shrinking should invoke
+ * shmem to truncate our pages, if we have available swap.
   */
-    i915_gem_shrink_all(i915);
-    if (i915_gem_object_has_pages(obj)) {
-    pr_err("shrink-all didn't truncate the pages\n");
+    should_swap = get_nr_swap_pages() > 0;
+    i915_gem_shrink(NULL, i915, -1UL, NULL,
+    I915_SHRINK_BOUND |
+    I915_SHRINK_UNBOUND |
+    I915_SHRINK_ACTIVE);
+    if (should_swap == i915_gem_object_has_pages(obj)) {


Hmm is there any value running the test if no swap (given objects used 
by the test are "willneed"), or you could simplify and just do early skip?


Maybe. My thinking was that this adds some coverage if say the device is 
not configured with swap. i.e assert that the pages don't magically 
disappear, and that their contents still persist etc.


Happy to make it skip instead though?



Regards,

Tvrtko


+    pr_err("unexpected pages mismatch, should_swap=%s\n",
+   yesno(should_swap));
  err = -EINVAL;
  goto out_put;
  }
-    if (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys) {
-    pr_err("residual page-size bits left\n");
+    if (should_swap == (obj->mm.page_sizes.sg || 
obj->mm.page_sizes.phys)) {

+    pr_err("unexpected residual page-size bits, should_swap=%s\n",
+   yesno(should_swap));
  err = -EINVAL;
  goto out_put;
  }



Re: [PATCH] doc: gpu: Add document describing buffer exchange

2021-09-06 Thread Simon Ser
> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and
> using buffers when crossing context/process/device/subsystem boundaries.
>
> This ties up all of dmabuf, formats and modifiers, and their usage.
>
> Signed-off-by: Daniel Stone 

Thanks a lot for this write-up! This looks very good to me, a few comments
below.

> ---
>
> This is just a quick first draft, inspired by:
>   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637
>
> It's not complete or perfect, but I'm off to eat a roast then have a
> nice walk in the sun, so figured it'd be better to dash it off rather
> than let it rot on my hard drive.
>
>
>  .../gpu/exchanging-pixel-buffers.rst  | 285 ++
>  Documentation/gpu/index.rst   |   1 +
>  2 files changed, 286 insertions(+)
>  create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst
>
> diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst 
> b/Documentation/gpu/exchanging-pixel-buffers.rst
> new file mode 100644
> index ..75c4de13d5c8
> --- /dev/null
> +++ b/Documentation/gpu/exchanging-pixel-buffers.rst
> @@ -0,0 +1,285 @@
> +.. Copyright 2021 Collabora Ltd.
> +
> +
> +Exchanging pixel buffers
> +
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display 
> devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and 
> advice.
> +
> +
> +Formats and modifiers
> +=
> +
> +Each buffer must have an underlying format. This format describes the data 
> which
> +can be stored and loaded for each pixel. Although each subsystem has its own
> +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should 
> be

RST uses double backticks for inline code blocks (applies to the whole 
document).

> +reused wherever possible, as they are the standard descriptions used for
> +interchange.

Maybe mention that the canonical source of formats and modifiers can be found
in include/uapi/drm/drm_fourcc.h.

> +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
> +the translation between one or more pixels in memory, and the color data
> +contained within that memory. The number and type of color channels are

Pekka uses the term "color value", which I find a bit better than repeating
"data".

> +described: whether they are RGB or YUV, integer or floating-point, the size
> +of each channel and their locations within the pixel memory, and the
> +relationship between color planes.
> +
> +For example, `DRM_FORMAT_ARGB` describes a format in which each pixel 
> has a
> +single 32-bit value in memory. Alpha, red, green, and blue, color channels 
> are
> +available at 8-byte precision per channel, ordered respectively from most to
> +least significant bits in little-endian storage. As a more complex example,
> +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
> +stored in separate memory planes, where the chroma plane is stored at half 
> the
> +resolution in both dimensions (i.e. one U/V chroma sample is stored for each 
> 2x2
> +pixel grouping).
> +
> +Format modifiers describe a translation mechanism between these per-pixel 
> memory
> +samples, and the actual memory storage for the buffer. The most 
> straightforward
> +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel 
> has
> +contiguous storage beginning at (0,0); each pixel's location in memory will 
> be
> +`base + (y * stride) + (x * bpp)`. This is considered the baseline 
> interchange
> +format, and most convenient for CPU access.

Hm, maybe in more simple terms we could explain that the pixels are stored
sequentially row-by-row from the top-left corner to the bottom-right one?

Maybe we can drop the "base" from the formula and say that each pixel's
location in memory will be at offset `y * stride + x * bpp`? Or maybe this is
confusing with offset being mentioned below as an additional parameter?

> +Modern hardware employs much more sophisticated access mechanisms, typically
> +making use of tiled access and possibly also compression. For example, the
> +`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile 
> in
> +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
> +stores pixels (4,0) to (7,3) i

Re: [Intel-gfx] [PATCH v7 3/8] i915/gvt: use DEFINE_DYNAMIC_DEBUG_CATEGORIES to create "gvt:core:" etc categories

2021-09-06 Thread Tvrtko Ursulin



On 03/09/2021 20:22, jim.cro...@gmail.com wrote:

On Fri, Sep 3, 2021 at 5:07 AM Tvrtko Ursulin
 wrote:



On 31/08/2021 21:21, Jim Cromie wrote:

The gvt component of this driver has ~120 pr_debugs, in 9 categories
quite similar to those in DRM.  Following the interface model of
drm.debug, add a parameter to map bits to these categorizations.

DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
   "dyndbg bitmap desc",
   { "gvt:cmd: ",  "command processing" },
   { "gvt:core: ", "core help" },
   { "gvt:dpy: ",  "display help" },
   { "gvt:el: ",   "help" },
   { "gvt:irq: ",  "help" },
   { "gvt:mm: ",   "help" },
   { "gvt:mmio: ", "help" },
   { "gvt:render: ", "help" },
   { "gvt:sched: " "help" });



BTW, Ive dropped the help field, its already handled, dont need to clutter.



The actual patch has a few details different, cmd_help() macro emits
the initialization construct.

if CONFIG_DRM_USE_DYNAMIC_DEBUG, then -DDYNAMIC_DEBUG_MODULE is added
cflags, by gvt/Makefile.

Signed-off-by: Jim Cromie 
---
v5:
. static decl of vector of bit->class descriptors - Emil.V
. relocate gvt-makefile chunk from elsewhere
v7:
. move ccflags addition up to i915/Makefile from i915/gvt
---
   drivers/gpu/drm/i915/Makefile  |  4 
   drivers/gpu/drm/i915/i915_params.c | 35 ++


Can this work if put under gvt/ or at least intel_gvt.h|c?



I thought it belonged here more, at least according to the name of the
config.var


Hmm bear with me please - the categories this patch creates are intended 
to be used explicitly from the GVT "sub-module", or they somehow even 
get automatically used with no further intervention to callers required?



CONFIG_DRM_USE_DYNAMIC_DEBUG.

I suppose its not a great name, its narrow purpose is to swap
drm-debug api to use dyndbg.   drm-evrything already "uses"
dyndbg if CONFIG_DYNAMIC_DEBUG=y, those gvt/pr_debugs in particular.

Theres also CONFIG_DYNAMIC_DEBUG_CORE=y,
which drm basically ignores currently.

So with the name CONFIG_DRM_USE_DYNAMIC_DEBUG
it seemed proper to arrange for that  to be true on DD-CORE=y builds,
by adding -DDYNAMIC_DEBUG_MODULE

Does that make some sense ?
How to best resolve the frictions ?
new CONFIG names ?


   2 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 4f22cac1c49b..5a4e371a3ec2 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -30,6 +30,10 @@ CFLAGS_display/intel_fbdev.o = $(call cc-disable-warning, 
override-init)

   subdir-ccflags-y += -I$(srctree)/$(src)

+#ifdef CONFIG_DRM_USE_DYNAMIC_DEBUG
+ccflags-y += -DDYNAMIC_DEBUG_MODULE
+#endif


Ignores whether CONFIG_DRM_I915_GVT is enabled or not?



not intentionally.
I think theres 2 things youre noting:

1 - make frag into gvt/Makefile
I had it there earlier, not sure why I moved it up.
maybe some confusion on proper scope of the flag.


2 - move new declaration code in i915-param.c inside the gvt ifdef

Im good with that.
I'll probably copy the ifdef wrapper down rather than move the decl up.
ie:

#if __and(IS_ENABLED(CONFIG_DRM_I915_GVT),
   IS_ENABLED(CONFIG_DRM_USE_DYNAMIC_DEBUG))

unsigned long __gvt_debug;
EXPORT_SYMBOL(__gvt_debug);



+
   # Please keep these build lists sorted!

   # core driver code
diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index e07f4cfea63a..e645e149485e 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -265,3 +265,38 @@ void i915_params_free(struct i915_params *params)
   I915_PARAMS_FOR_EACH(FREE);
   #undef FREE
   }
+
+#ifdef CONFIG_DRM_USE_DYNAMIC_DEBUG
+/* todo: needs DYNAMIC_DEBUG_MODULE in some cases */
+
+unsigned long __gvt_debug;
+EXPORT_SYMBOL(__gvt_debug);
+
+#define _help(key)   "\t\"" key "\"\t: help for " key "\n"
+
+#define I915_GVT_CATEGORIES(name) \
+ " Enable debug output via /sys/module/i915/parameters/" #name   \
+ ", where each bit enables a debug category.\n"  \
+ _help("gvt:cmd:")   \
+ _help("gvt:core:")  \
+ _help("gvt:dpy:")   \
+ _help("gvt:el:")\
+ _help("gvt:irq:")   \
+ _help("gvt:mm:")\
+ _help("gvt:mmio:")  \
+ _help("gvt:render:")\
+ _help("gvt:sched:")
+
+DEFINE_DYNAMIC_DEBUG_CATEGORIES(debug_gvt, __gvt_debug,
+ I915_GVT_CATEGORIES(debug_gvt),
+ _DD_cat_("gvt:cmd:"),
+ _DD_cat_("gvt:core:"),
+ _DD_cat_("gvt:dpy:"),
+ _

Re: [Intel-gfx] [PATCH] drm/i915/selftests: fixup igt_shrink_thp

2021-09-06 Thread Tvrtko Ursulin



On 06/09/2021 10:17, Matthew Auld wrote:

Since the object might still be active here, the shrink_all will simply
ignore it, which blows up in the test, since the pages will still be
there. Currently THP is disabled which should result in the test being
skipped, but if we ever re-enable THP we might start seeing the failure.
Fix this by forcing I915_SHRINK_ACTIVE.

v2: Some machine in the shard runs doesn't seem to have any available
swap when running this test. Try to handle this.

Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
Cc: Thomas Hellström 
Reviewed-by: Tvrtko Ursulin  #v1
---
  .../gpu/drm/i915/gem/selftests/huge_pages.c   | 31 ++-
  1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index a094f3ce1a90..46ea1997c114 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1519,6 +1519,7 @@ static int igt_shrink_thp(void *arg)
struct i915_vma *vma;
unsigned int flags = PIN_USER;
unsigned int n;
+   bool should_swap;
int err = 0;
  
  	/*

@@ -1567,23 +1568,39 @@ static int igt_shrink_thp(void *arg)
break;
}
i915_gem_context_unlock_engines(ctx);
+   /*
+* Nuke everything *before* we unpin the pages so we can be reasonably
+* sure that when later checking get_nr_swap_pages() that some random
+* leftover object doesn't steal the remaining swap space.
+*/
+   i915_gem_shrink(NULL, i915, -1UL, NULL,
+   I915_SHRINK_BOUND |
+   I915_SHRINK_UNBOUND |
+   I915_SHRINK_ACTIVE);
i915_vma_unpin(vma);
if (err)
goto out_put;
  
+

/*
-* Now that the pages are *unpinned* shrink-all should invoke
-* shmem to truncate our pages.
+* Now that the pages are *unpinned* shrinking should invoke
+* shmem to truncate our pages, if we have available swap.
 */
-   i915_gem_shrink_all(i915);
-   if (i915_gem_object_has_pages(obj)) {
-   pr_err("shrink-all didn't truncate the pages\n");
+   should_swap = get_nr_swap_pages() > 0;
+   i915_gem_shrink(NULL, i915, -1UL, NULL,
+   I915_SHRINK_BOUND |
+   I915_SHRINK_UNBOUND |
+   I915_SHRINK_ACTIVE);
+   if (should_swap == i915_gem_object_has_pages(obj)) {


Hmm is there any value running the test if no swap (given objects used 
by the test are "willneed"), or you could simplify and just do early skip?


Regards,

Tvrtko


+   pr_err("unexpected pages mismatch, should_swap=%s\n",
+  yesno(should_swap));
err = -EINVAL;
goto out_put;
}
  
-	if (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys) {

-   pr_err("residual page-size bits left\n");
+   if (should_swap == (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys)) {
+   pr_err("unexpected residual page-size bits, should_swap=%s\n",
+  yesno(should_swap));
err = -EINVAL;
goto out_put;
}



Re: [PATCH v2 1/2] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-06 Thread Christian König
Which branch is this patch based on? Please rebase on top drm-misc-fixes 
and resend.


Thanks,
Christian.

Am 06.09.21 um 03:12 schrieb xinhui pan:

The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.

ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
be stuck as we actually did not free any BO memory. This usually happens
when the fence is not signaled for a long time.

Signed-off-by: xinhui pan 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 1fedd0eb67ba..f1367107925b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1159,9 +1159,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
}
  
  	if (bo->deleted) {

-   ttm_bo_cleanup_refs(bo, false, false, locked);
+   ret = ttm_bo_cleanup_refs(bo, false, false, locked);
ttm_bo_put(bo);
-   return 0;
+   return ret == -EBUSY ? -ENOSPC : ret;
}
  
  	ttm_bo_move_to_pinned(bo);

@@ -1215,7 +1215,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
if (locked)
dma_resv_unlock(bo->base.resv);
ttm_bo_put(bo);
-   return ret;
+   return ret == -EBUSY ? -ENOSPC : ret;
  }
  
  void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)




Re: [PATCH v2 0/2] Fix a hung during memory pressure test

2021-09-06 Thread Christian König

Am 06.09.21 um 12:16 schrieb Pan, Xinhui:

2021年9月6日 17:04,Christian König  写道:



Am 06.09.21 um 03:12 schrieb xinhui pan:

A long time ago, someone reports system got hung during memory test.
In recent days, I am trying to look for or understand the potential
deadlock in ttm/amdgpu code.

This patchset aims to fix the deadlock during ttm populate.

TTM has a parameter called pages_limit, when allocated GTT memory
reaches this limit, swapout would be triggered. As ttm_bo_swapout does
not return the correct retval, populate might get hung.

UVD ib test uses GTT which might be insufficient. So a gpu recovery
would hung if populate hung.

Ah, now I understand what you are trying to do.

Problem is that won't work either. Allocating VRAM can easily land you inside 
the same deadlock.

We need to avoid the allocation altogether for this for work correctly.

looks like we need reserve some pages at sw init.


Yeah, something like that should do it.

But keep in mind that you then need a lock or similar when using the 
resource to prevent concurrent use.





I have made one drm test which alloc two GTT BOs, submit gfx copy
commands and free these BOs without waiting fence. What's more, these
gfx copy commands will cause gfx ring hang. So gpu recovery would be
triggered.

Mhm, that should never be possible. It is perfectly valid for an application to 
terminate without waitting for the GFX submission to be completed.

gfx ring hangs because of the command is illegal.
the packet is COMMAND [30:21] | BYTE_COUNT [20:0]
I use 0xFF << 20 to hang the ring on purpose.


Ok that makes more sense.

Thanks,
Christian.




Going to push patch #1 to drm-misc-fixes or drm-misc-next-fixes in a moment.

Thanks,
Christian.


Now here is one possible deadlock case.
gpu_recovery
  -> stop drm scheduler
  -> asic reset
-> ib test
   -> tt populate (uvd ib test)
->  ttm_bo_swapout (BO A) // this always fails as the fence of
BO A would not be signaled by schedluer or HW. Hit deadlock.

I paste the drm test patch below.
#modprobe ttm pages_limit=65536
#amdgpu_test -s 1 -t 4
---
  tests/amdgpu/basic_tests.c | 32 ++--
  1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
index dbf02fee..f85ed340 100644
--- a/tests/amdgpu/basic_tests.c
+++ b/tests/amdgpu/basic_tests.c
@@ -65,13 +65,16 @@ static void amdgpu_direct_gma_test(void);
  static void amdgpu_command_submission_write_linear_helper(unsigned ip_type);
  static void amdgpu_command_submission_const_fill_helper(unsigned ip_type);
  static void amdgpu_command_submission_copy_linear_helper(unsigned ip_type);
-static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
+static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
   unsigned ip_type,
   int instance, int pm4_dw, uint32_t 
*pm4_src,
   int res_cnt, amdgpu_bo_handle *resources,
   struct amdgpu_cs_ib_info *ib_info,
-  struct amdgpu_cs_request *ibs_request);
+  struct amdgpu_cs_request *ibs_request, 
int sync, int repeat);
   +#define amdgpu_test_exec_cs_helper(...) \
+   _amdgpu_test_exec_cs_helper(__VA_ARGS__, 1, 1)
+
  CU_TestInfo basic_tests[] = {
{ "Query Info Test",  amdgpu_query_info_test },
{ "Userptr Test",  amdgpu_userptr_test },
@@ -1341,12 +1344,12 @@ static void amdgpu_command_submission_compute(void)
   * pm4_src, resources, ib_info, and ibs_request
   * submit command stream described in ibs_request and wait for this IB 
accomplished
   */
-static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
+static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
   unsigned ip_type,
   int instance, int pm4_dw, uint32_t 
*pm4_src,
   int res_cnt, amdgpu_bo_handle *resources,
   struct amdgpu_cs_ib_info *ib_info,
-  struct amdgpu_cs_request *ibs_request)
+  struct amdgpu_cs_request *ibs_request, 
int sync, int repeat)
  {
int r;
uint32_t expired;
@@ -1395,12 +1398,15 @@ static void 
amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
CU_ASSERT_NOT_EQUAL(ibs_request, NULL);
/* submit CS */
-   r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
+   while (repeat--)
+   r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
CU_ASSERT_EQUAL(r, 0);
r = amdgpu_bo_list_destroy(ibs_request->resources);
CU_ASSERT_EQUAL(r, 0);
  + if (!sync)
+   return;
fence_status.ip_type 

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-09-06 Thread Amit Pundir
On Sat, 4 Sept 2021 at 01:55, Rob Clark  wrote:
>
> On Fri, Sep 3, 2021 at 12:39 PM John Stultz  wrote:
> >
> > On Thu, Jul 29, 2021 at 1:49 PM Rob Clark  wrote:
> > > On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
> > >  wrote:
> > > > On 29/07/2021 21:24, Rob Clark wrote:
> > > > > On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
> > > > >  wrote:
> > > > >>
> > > > >> Hi Rob,
> > > > >>
> > > > >> I've done some more testing! It looks like before that patch 
> > > > >> ("drm/msm: Devfreq tuning") the GPU would never get above
> > > > >> the second frequency in the OPP table (342MHz) (at least, not in 
> > > > >> glxgears). With the patch applied it would more
> > > > >> aggressively jump up to the max frequency which seems to be unstable 
> > > > >> at the default regulator voltages.
> > > > >
> > > > > *ohh*, yeah, ok, that would explain it
> > > > >
> > > > >> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 
> > > > >> 0.988v (instead of the stock 0.516v) makes the GPU stable
> > > > >> at the higher frequencies.
> > > > >>
> > > > >> Applying this patch reverts the behaviour, and the GPU never goes 
> > > > >> above 342MHz in glxgears, losing ~30% performance in
> > > > >> glxgear.
> > > > >>
> > > > >> I think (?) that enabling CPR support would be the proper solution 
> > > > >> to this - that would ensure that the regulators run
> > > > >> at the voltage the hardware needs to be stable.
> > > > >>
> > > > >> Is hacking the voltage higher (although ideally not quite that high) 
> > > > >> an acceptable short term solution until we have
> > > > >> CPR? Or would it be safer to just not make use of the higher 
> > > > >> frequencies on a630 for now?
> > > > >>
> > > > >
> > > > > tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is already
> > > > > on CC and I added sboyd, maybe one of them knows better.
> > > > >
> > > > > In the short term, removing the higher problematic OPPs from dts might
> > > > > be a better option than this patch (which I'm dropping), since there
> > > > > is nothing stopping other workloads from hitting higher OPPs.
> > > > Oh yeah that sounds like a more sensible workaround than mine .
> > > > >
> > > > > I'm slightly curious why I didn't have problems at higher OPPs on my
> > > > > c630 laptop (sdm850)
> > > > Perhaps you won the sillicon lottery - iirc sdm850 is binned for higher 
> > > > clocks as is out of the factory.
> > > >
> > > > Would it be best to drop the OPPs for all devices? Or just those 
> > > > affected? I guess it's possible another c630 might
> > > > crash where yours doesn't?
> > >
> > > I've not heard any reports of similar issues from the handful of other
> > > folks with c630's on #aarch64-laptops.. but I can't really say if that
> > > is luck or not.
> > >
> > > Maybe just remove it for affected devices?  But I'll defer to Bjorn.
> >
> > Just as another datapoint, I was just marveling at how suddenly smooth
> > the UI was performing on db845c and Caleb pointed me at the "drm/msm:
> > Devfreq tuning" patch as the likely cause of the improvement, and
> > mid-discussion my board crashed into USB crash mode:
> > [  146.157696][C0] adreno 500.gpu: CP | AHB bus error
> > [  146.163303][C0] adreno 500.gpu: CP | AHB bus error
> > [  146.168837][C0] adreno 500.gpu: RBBM | ATB bus overflow
> > [  146.174960][C0] adreno 500.gpu: CP | HW fault | status=0x
> > [  146.181917][C0] adreno 500.gpu: CP | AHB bus error
> > [  146.187547][C0] adreno 500.gpu: CP illegal instruction error
> > [  146.194009][C0] adreno 500.gpu: CP | AHB bus error
> > [  146.308909][T9] Internal error: synchronous external abort:
> > 9610 [#1] PREEMPT SMP
> > [  146.317150][T9] Modules linked in:
> > [  146.320941][T9] CPU: 3 PID: 9 Comm: kworker/u16:1 Tainted: G
> > W 5.14.0-mainline-06795-g42b258c2275c #24
> > [  146.331974][T9] Hardware name: Thundercomm Dragonboar
> > Format: Log Type - Time(microsec) - Message - Optional Info
> > Log Type: B - Since Boot(Power On Reset),  D - Delta,  S - Statistic
> > S - QC_IMAGE_VERSION_STRING=BOOT.XF.2.0-00371-SDM845LZB-1
> > S - IMAGE_VARIANT_STRING=SDM845LA
> > S - OEM_IMAGE_VERSION_STRING=TSBJ-FA-PC-02170
> >
> > So Caleb sent me to this thread. :)
> >
> > I'm still trying to trip it again, but it does seem like db845c is
> > also seeing some stability issues with Linus' HEAD.
> >
>
> Caleb's original pastebin seems to have expired (or at least require
> some sort of ubuntu login to access).. were the crashes he was seeing
> also 'AHB bus error'?

I can reproduce this hard crash
https://www.irccloud.com/pastebin/Cu6UJntE/ and a gpu lockup
https://www.irccloud.com/pastebin/6Ryd2Pug/ at times reliably, by
running antutu benchmark on pocof1.

Reverting 9bc95570175a ("drm/msm: Devfreq tuning") helps and I no
longer see these errors.

Complete dmesg for hardcrash https://pastebin.com/raw/GLZVQFQN

Regards,
Amit Pundir

>
> If you have

Re: [Intel-gfx] [PATCH v7 5/8] drm_print: add choice to use dynamic debug in drm-debug

2021-09-06 Thread Tvrtko Ursulin



On 03/09/2021 22:57, jim.cro...@gmail.com wrote:

On Fri, Sep 3, 2021 at 5:15 AM Tvrtko Ursulin
 wrote:



On 31/08/2021 21:21, Jim Cromie wrote:

drm's debug system writes 10 distinct categories of messages to syslog
using a small API[1]: drm_dbg*(10 names), DRM_DEV_DEBUG*(3 names),
DRM_DEBUG*(8 names).  There are thousands of these callsites, each
categorized in this systematized way.

These callsites can be enabled at runtime by their category, each
controlled by a bit in drm.debug (/sys/modules/drm/parameter/debug).
In the current "basic" implementation, drm_debug_enabled() tests these
bits in __drm_debug each time an API[1] call is executed; while cheap
individually, the costs accumulate with uptime.

This patch uses dynamic-debug with jump-label to patch enabled calls
onto their respective NOOP slots, avoiding all runtime bit-checks of
__drm_debug by drm_debug_enabled().

Dynamic debug has no concept of category, but we can emulate one by
replacing enum categories with a set of prefix-strings; "drm:core:",
"drm:kms:" "drm:driver:" etc, and prepend them (at compile time) to
the given formats.

Then we can use:
`echo module drm format "^drm:core: " +p > control`

to enable the whole category with one query.


Probably stupid question - enabling stuff at boot time still works as
described in Documentation/admin-guide/dynamic-debug-howto.rst?



yes.  its turned on in earlyinit, and cmdline args are a processed then,
and when modules are added



Second question, which perhaps has been covered in the past so apologies
if redundant - what is the advantage of allowing this to be
configurable, versus perhaps always enabling it? Like what would be the
reasons someone wouldn't just want to have CONFIG_DYNAMIC_DEBUG compiled
in? Kernel binary size?



Im unaware of anything on this topic, but I can opine :-)
Its been configurable since I saw it and thought "jump-labels are cool!"

code is small
[jimc@frodo local-i915m]$ size lib/dynamic_debug.o
textdata bss dec hex filename
   240168041  64   321217d79 lib/dynamic_debug.o

Its data tables are big, particularly the __dyndbg section
builtins:
dyndbg: 108 debug prints in module mptcp
dyndbg:   2 debug prints in module i386
dyndbg:   2 debug prints in module xen
dyndbg:   2 debug prints in module fixup
dyndbg:   7 debug prints in module irq
dyndbg: 3039 prdebugs in 283 modules, 11 KiB in ddebug tables, 166 kiB
in __dyndbg section

bash-5.1#
bash-5.1# for m in i915 amdgpu ; do modprobe $m dyndbg=+_ ; done
dyndbg: 384 debug prints in module drm
dyndbg: 211 debug prints in module drm_kms_helper
dyndbg:   2 debug prints in module ttm
dyndbg:   8 debug prints in module video
dyndbg: 1727 debug prints in module i915
dyndbg: processed 1 queries, with 3852 matches, 0 errs
dyndbg: 3852 debug prints in module amdgpu
[drm] amdgpu kernel modesetting enabled.
amdgpu: CRAT table disabled by module option
amdgpu: Virtual CRAT table created for CPU
amdgpu: Topology: Add CPU node
bash-5.1#

At 56 bytes / callsite, it adds up.
And teaching DRM to use it enlarges its use dramatically,
not just in drm itself, but in many drivers

amdgpu has 3852 callsite, (vs 3039 in my kernel), so it has ~240kb.
It has extra (large chunks generated by macros) to trim,
but i915 has ~1700, and drm has ~380

I have WIP to reduce the table space, by splitting it into 2 separate ones;
guts and decorations (module, function, file pointers).
The decoration recs are redundant, 45% are copies of previous
(function changes fastest)
It needs much rework, but it should get 20% overall.
decorations are 24/56 of footprint.


I'll try to extract the "executive summary" from this, you tell me if I 
got it right.


So using or not using dynamic debug for DRM debug ends up being about 
shifting the cost between kernel binary size (data section grows by each 
pr_debug call site) and runtime conditionals?


Since the table sizes you mention seem significant enough, I think that 
justifies existence of DRM_USE_DYNAMIC_DEBUG. It would probably be a 
good idea to put some commentary on that there. Ideally including some 
rough estimates both including space cost per call site and space cost 
for a typical distro kernel build?


Regards,

Tvrtko


Re: [PATCH v2 0/2] Fix a hung during memory pressure test

2021-09-06 Thread Pan, Xinhui


> 2021年9月6日 17:04,Christian König  写道:
> 
> 
> 
> Am 06.09.21 um 03:12 schrieb xinhui pan:
>> A long time ago, someone reports system got hung during memory test.
>> In recent days, I am trying to look for or understand the potential
>> deadlock in ttm/amdgpu code.
>> 
>> This patchset aims to fix the deadlock during ttm populate.
>> 
>> TTM has a parameter called pages_limit, when allocated GTT memory
>> reaches this limit, swapout would be triggered. As ttm_bo_swapout does
>> not return the correct retval, populate might get hung.
>> 
>> UVD ib test uses GTT which might be insufficient. So a gpu recovery
>> would hung if populate hung.
> 
> Ah, now I understand what you are trying to do.
> 
> Problem is that won't work either. Allocating VRAM can easily land you inside 
> the same deadlock.
> 
> We need to avoid the allocation altogether for this for work correctly.

looks like we need reserve some pages at sw init.

> 
>> 
>> I have made one drm test which alloc two GTT BOs, submit gfx copy
>> commands and free these BOs without waiting fence. What's more, these
>> gfx copy commands will cause gfx ring hang. So gpu recovery would be
>> triggered.
> 
> Mhm, that should never be possible. It is perfectly valid for an application 
> to terminate without waitting for the GFX submission to be completed.

gfx ring hangs because of the command is illegal.
the packet is COMMAND [30:21] | BYTE_COUNT [20:0]
I use 0xFF << 20 to hang the ring on purpose.

> 
> Going to push patch #1 to drm-misc-fixes or drm-misc-next-fixes in a moment.
> 
> Thanks,
> Christian.
> 
>> 
>> Now here is one possible deadlock case.
>> gpu_recovery
>>  -> stop drm scheduler
>>  -> asic reset
>>-> ib test
>>   -> tt populate (uvd ib test)
>>  ->  ttm_bo_swapout (BO A) // this always fails as the fence of
>>  BO A would not be signaled by schedluer or HW. Hit deadlock.
>> 
>> I paste the drm test patch below.
>> #modprobe ttm pages_limit=65536
>> #amdgpu_test -s 1 -t 4
>> ---
>>  tests/amdgpu/basic_tests.c | 32 ++--
>>  1 file changed, 14 insertions(+), 18 deletions(-)
>> 
>> diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
>> index dbf02fee..f85ed340 100644
>> --- a/tests/amdgpu/basic_tests.c
>> +++ b/tests/amdgpu/basic_tests.c
>> @@ -65,13 +65,16 @@ static void amdgpu_direct_gma_test(void);
>>  static void amdgpu_command_submission_write_linear_helper(unsigned ip_type);
>>  static void amdgpu_command_submission_const_fill_helper(unsigned ip_type);
>>  static void amdgpu_command_submission_copy_linear_helper(unsigned ip_type);
>> -static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
>> +static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle 
>> context_handle,
>> unsigned ip_type,
>> int instance, int pm4_dw, uint32_t 
>> *pm4_src,
>> int res_cnt, amdgpu_bo_handle *resources,
>> struct amdgpu_cs_ib_info *ib_info,
>> -   struct amdgpu_cs_request *ibs_request);
>> +   struct amdgpu_cs_request *ibs_request, 
>> int sync, int repeat);
>>   +#define amdgpu_test_exec_cs_helper(...) \
>> +_amdgpu_test_exec_cs_helper(__VA_ARGS__, 1, 1)
>> +
>>  CU_TestInfo basic_tests[] = {
>>  { "Query Info Test",  amdgpu_query_info_test },
>>  { "Userptr Test",  amdgpu_userptr_test },
>> @@ -1341,12 +1344,12 @@ static void amdgpu_command_submission_compute(void)
>>   * pm4_src, resources, ib_info, and ibs_request
>>   * submit command stream described in ibs_request and wait for this IB 
>> accomplished
>>   */
>> -static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
>> +static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle 
>> context_handle,
>> unsigned ip_type,
>> int instance, int pm4_dw, uint32_t 
>> *pm4_src,
>> int res_cnt, amdgpu_bo_handle *resources,
>> struct amdgpu_cs_ib_info *ib_info,
>> -   struct amdgpu_cs_request *ibs_request)
>> +   struct amdgpu_cs_request *ibs_request, 
>> int sync, int repeat)
>>  {
>>  int r;
>>  uint32_t expired;
>> @@ -1395,12 +1398,15 @@ static void 
>> amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
>>  CU_ASSERT_NOT_EQUAL(ibs_request, NULL);
>>  /* submit CS */
>> -r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
>> +while (repeat--)
>> +r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
>>  CU_ASSERT_EQUAL(r, 0);
>>  r = amdgpu_bo_list_destroy(ibs_request->resources);
>>  CU_ASSERT_EQUAL(r, 0);
>>  +   if (!sync)
>> +return;
>>  fence_status.ip_type = ip_type;
>>  fence_statu

Re: [PATCH v3 03/16] drm/edid: Allow the querying/working with the panel ID from the EDID

2021-09-06 Thread Jani Nikula
On Wed, 01 Sep 2021, Douglas Anderson  wrote:
> EDIDs have 32-bits worth of data which is intended to be used to
> uniquely identify the make/model of a panel. This has historically
> been used only internally in the EDID processing code to identify
> quirks with panels.
>
> We'd like to use this panel ID in panel-simple to identify which panel
> is hooked up and from that information figure out power sequence
> timings. Let's expose this information from the EDID code and also
> allow it to be accessed early, before a connector has been created.
>
> To make matching in the panel-simple code easier, we'll return the
> panel ID as a 32-bit value. We'll provide some functions for
> converting this value back and forth to something more human readable.
>
> Signed-off-by: Douglas Anderson 
> ---
>
> Changes in v3:
> - Decode hex product ID w/ same endianness as everyone else.
>
>  drivers/gpu/drm/drm_edid.c | 59 ++
>  include/drm/drm_edid.h | 47 ++
>  2 files changed, 106 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index a22c38482a90..ac128bc3478a 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -2086,6 +2086,65 @@ struct edid *drm_get_edid(struct drm_connector 
> *connector,
>  }
>  EXPORT_SYMBOL(drm_get_edid);
>  
> +/**
> + * drm_get_panel_id - Get a panel's ID through DDC
> + * @adapter: I2C adapter to use for DDC
> + *
> + * This function reads the first block of the EDID of a panel and (assuming
> + * that the EDID is valid) extracts the ID out of it. The ID is a 32-bit 
> value
> + * (16 bits of manufacturer ID and 16 bits of per-manufacturer ID) that's
> + * supposed to be different for each different modem of panel.
> + *
> + * This function is intended to be used during early probing on devices where
> + * more than one panel might be present. Because of its intended use it must
> + * assume that the EDID of the panel is correct, at least as far as the ID
> + * is concerned (in other words, we don't process any overrides here).
> + *
> + * NOTE: it's expected that this function and drm_do_get_edid() will both
> + * be read the EDID, but there is no caching between them. Since we're only
> + * reading the first block, hopefully this extra overhead won't be too big.
> + *
> + * Return: A 32-bit ID that should be different for each make/model of panel.
> + * See the functions encode_edid_id() and decode_edid_id() for some
> + * details on the structure of this ID.
> + */
> +u32 drm_get_panel_id(struct i2c_adapter *adapter)

Please call it drm_edid_get_panel_id() because that's what it is, and
this is in drm_edid.[ch].

> +{
> + struct edid *edid;
> + u32 val;
> +
> + edid = drm_do_get_edid_blk0(drm_do_probe_ddc_edid, adapter, NULL, NULL);
> +
> + /*
> +  * There are no manufacturer IDs of 0, so if there is a problem reading
> +  * the EDID then we'll just return 0.
> +  */
> + if (IS_ERR_OR_NULL(edid))
> + return 0;
> +
> + /*
> +  * In theory we could try to de-obfuscate this like edid_get_quirks()
> +  * does, but it's easier to just deal with a 32-bit number.

Hmm, but is it, really? AFAICT this is just an internal representation
for a table, where it could just as well be stored in a struct that
could be just as compact now, but extensible later. You populate the
table via an encoding macro, then decode the id using a function - while
it could be in a format that's directly usable without the decode. If
suitably chosen, the struct could perhaps be reused between the quirks
code and your code.

> +  *
> +  * NOTE that we deal with endianness differently for the top half
> +  * of this ID than for the bottom half. The bottom half (the product
> +  * id) gets decoded as little endian by the EDID_PRODUCT_ID because
> +  * that's how everyone seems to interpret it. The top half (the mfg_id)
> +  * gets stored as big endian because that makes encode_edid_id() and
> +  * decode_edid_id() easier to write (it's easier to extract the ASCII).
> +  * It doesn't really matter, though, as long as the number here is
> +  * unique.
> +  */
> + val = (u32)edid->mfg_id[0] << 24   |
> +   (u32)edid->mfg_id[1] << 16   |
> +   (u32)EDID_PRODUCT_ID(edid);
> +
> + kfree(edid);
> +
> + return val;
> +}
> +EXPORT_SYMBOL(drm_get_panel_id);
> +
>  /**
>   * drm_get_edid_switcheroo - get EDID data for a vga_switcheroo output
>   * @connector: connector we're probing
> diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
> index deccfd39e6db..73da40d0b5d1 100644
> --- a/include/drm/drm_edid.h
> +++ b/include/drm/drm_edid.h
> @@ -508,6 +508,52 @@ static inline u8 drm_eld_get_conn_type(const uint8_t 
> *eld)
>   return eld[DRM_ELD_SAD_COUNT_CONN_TYPE] & DRM_ELD_CONN_TYPE_MASK;
>  }
>  
> +/**
> + * encode_edid_id - Encode an ID for match

Re: [PATCH v3 02/16] drm/edid: Break out reading block 0 of the EDID

2021-09-06 Thread Jani Nikula
On Wed, 01 Sep 2021, Douglas Anderson  wrote:
> A future change wants to be able to read just block 0 of the EDID, so
> break it out of drm_do_get_edid() into a sub-function.
>
> This is intended to be a no-op change--just code movement.
>
> Signed-off-by: Douglas Anderson 
> ---
>
> (no changes since v1)
>
>  drivers/gpu/drm/drm_edid.c | 62 +++---
>  1 file changed, 44 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index 6325877c5fd6..a22c38482a90 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -1905,6 +1905,43 @@ int drm_add_override_edid_modes(struct drm_connector 
> *connector)
>  }
>  EXPORT_SYMBOL(drm_add_override_edid_modes);
>  
> +static struct edid *drm_do_get_edid_blk0(

Maybe base_block instead of blk0?

> + int (*get_edid_block)(void *data, u8 *buf, unsigned int block,
> +   size_t len),
> + void *data, bool *edid_corrupt, int *null_edid_counter)
> +{
> + int i;
> + u8 *edid;

With void *edid, this function wouldn't need the cast internally.

> +
> + if ((edid = kmalloc(EDID_LENGTH, GFP_KERNEL)) == NULL)
> + return NULL;

Could split the allocation and NULL check to two separate lines per
coding style, while at it?

BR,
Jani.

> +
> + /* base block fetch */
> + for (i = 0; i < 4; i++) {
> + if (get_edid_block(data, edid, 0, EDID_LENGTH))
> + goto out;
> + if (drm_edid_block_valid(edid, 0, false, edid_corrupt))
> + break;
> + if (i == 0 && drm_edid_is_zero(edid, EDID_LENGTH)) {
> + if (null_edid_counter)
> + (*null_edid_counter)++;
> + goto carp;
> + }
> + }
> + if (i == 4)
> + goto carp;
> +
> + return (struct edid *)edid;
> +
> +carp:
> + kfree(edid);
> + return ERR_PTR(-EINVAL);
> +
> +out:
> + kfree(edid);
> + return NULL;
> +}
> +
>  /**
>   * drm_do_get_edid - get EDID data using a custom EDID block read function
>   * @connector: connector we're probing
> @@ -1938,25 +1975,16 @@ struct edid *drm_do_get_edid(struct drm_connector 
> *connector,
>   if (override)
>   return override;
>  
> - if ((edid = kmalloc(EDID_LENGTH, GFP_KERNEL)) == NULL)
> + edid = (u8 *)drm_do_get_edid_blk0(get_edid_block, data,
> +   &connector->edid_corrupt,
> +   &connector->null_edid_counter);
> + if (IS_ERR_OR_NULL(edid)) {
> + if (IS_ERR(edid))
> + connector_bad_edid(connector, edid, 1);
>   return NULL;
> -
> - /* base block fetch */
> - for (i = 0; i < 4; i++) {
> - if (get_edid_block(data, edid, 0, EDID_LENGTH))
> - goto out;
> - if (drm_edid_block_valid(edid, 0, false,
> -  &connector->edid_corrupt))
> - break;
> - if (i == 0 && drm_edid_is_zero(edid, EDID_LENGTH)) {
> - connector->null_edid_counter++;
> - goto carp;
> - }
>   }
> - if (i == 4)
> - goto carp;
>  
> - /* if there's no extensions, we're done */
> + /* if there's no extensions or no connector, we're done */
>   valid_extensions = edid[0x7e];
>   if (valid_extensions == 0)
>   return (struct edid *)edid;
> @@ -2010,8 +2038,6 @@ struct edid *drm_do_get_edid(struct drm_connector 
> *connector,
>  
>   return (struct edid *)edid;
>  
> -carp:
> - connector_bad_edid(connector, edid, 1);
>  out:
>   kfree(edid);
>   return NULL;

-- 
Jani Nikula, Intel Open Source Graphics Center


[PATCH] drm/amd/display: make configure_lttpr_mode_transparent and configure_lttpr_mode_non_transparent static

2021-09-06 Thread Jiapeng Chong
From: chongjiapeng 

This symbols is not used outside of dc_link_dp.c, so marks it static.

Fix the following sparse warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1766:16:
warning: symbol 'configure_lttpr_mode_non_transparent' was not declared.
Should it be static?

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1755:16:
warning: symbol 'configure_lttpr_mode_transparent' was not declared.
Should it be static?

Reported-by: Abaci Robot 
Signed-off-by: chongjiapeng 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index a666401..4e2cf8f 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -1752,7 +1752,7 @@ uint8_t dp_convert_to_count(uint8_t lttpr_repeater_count)
return 0; // invalid value
 }
 
-enum dc_status configure_lttpr_mode_transparent(struct dc_link *link)
+static enum dc_status configure_lttpr_mode_transparent(struct dc_link *link)
 {
uint8_t repeater_mode = DP_PHY_REPEATER_MODE_TRANSPARENT;
 
@@ -1763,7 +1763,7 @@ enum dc_status configure_lttpr_mode_transparent(struct 
dc_link *link)
sizeof(repeater_mode));
 }
 
-enum dc_status configure_lttpr_mode_non_transparent(
+static enum dc_status configure_lttpr_mode_non_transparent(
struct dc_link *link,
const struct link_training_settings *lt_settings)
 {
-- 
1.8.3.1



[PATCH] drm/amd/display: Fix warning comparing pointer to 0

2021-09-06 Thread Jiapeng Chong
From: chongjiapeng 

Fix the following coccicheck warning:

./drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c:643:35-36:
WARNING comparing pointer to 0.

Reported-by: Abaci Robot 
Signed-off-by: chongjiapeng 
---
 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
index 4a4894e..15491e3 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
@@ -640,7 +640,7 @@ void dcn31_clk_mgr_construct(
sizeof(struct dcn31_watermarks),
&clk_mgr->smu_wm_set.mc_address.quad_part);
 
-   if (clk_mgr->smu_wm_set.wm_set == 0) {
+   if (!clk_mgr->smu_wm_set.wm_set) {
clk_mgr->smu_wm_set.wm_set = &dummy_wms;
clk_mgr->smu_wm_set.mc_address.quad_part = 0;
}
-- 
1.8.3.1



[PATCH] drm/i915/selftests: fixup igt_shrink_thp

2021-09-06 Thread Matthew Auld
Since the object might still be active here, the shrink_all will simply
ignore it, which blows up in the test, since the pages will still be
there. Currently THP is disabled which should result in the test being
skipped, but if we ever re-enable THP we might start seeing the failure.
Fix this by forcing I915_SHRINK_ACTIVE.

v2: Some machine in the shard runs doesn't seem to have any available
swap when running this test. Try to handle this.

Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
Cc: Thomas Hellström 
Reviewed-by: Tvrtko Ursulin  #v1
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   | 31 ++-
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index a094f3ce1a90..46ea1997c114 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1519,6 +1519,7 @@ static int igt_shrink_thp(void *arg)
struct i915_vma *vma;
unsigned int flags = PIN_USER;
unsigned int n;
+   bool should_swap;
int err = 0;
 
/*
@@ -1567,23 +1568,39 @@ static int igt_shrink_thp(void *arg)
break;
}
i915_gem_context_unlock_engines(ctx);
+   /*
+* Nuke everything *before* we unpin the pages so we can be reasonably
+* sure that when later checking get_nr_swap_pages() that some random
+* leftover object doesn't steal the remaining swap space.
+*/
+   i915_gem_shrink(NULL, i915, -1UL, NULL,
+   I915_SHRINK_BOUND |
+   I915_SHRINK_UNBOUND |
+   I915_SHRINK_ACTIVE);
i915_vma_unpin(vma);
if (err)
goto out_put;
 
+
/*
-* Now that the pages are *unpinned* shrink-all should invoke
-* shmem to truncate our pages.
+* Now that the pages are *unpinned* shrinking should invoke
+* shmem to truncate our pages, if we have available swap.
 */
-   i915_gem_shrink_all(i915);
-   if (i915_gem_object_has_pages(obj)) {
-   pr_err("shrink-all didn't truncate the pages\n");
+   should_swap = get_nr_swap_pages() > 0;
+   i915_gem_shrink(NULL, i915, -1UL, NULL,
+   I915_SHRINK_BOUND |
+   I915_SHRINK_UNBOUND |
+   I915_SHRINK_ACTIVE);
+   if (should_swap == i915_gem_object_has_pages(obj)) {
+   pr_err("unexpected pages mismatch, should_swap=%s\n",
+  yesno(should_swap));
err = -EINVAL;
goto out_put;
}
 
-   if (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys) {
-   pr_err("residual page-size bits left\n");
+   if (should_swap == (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys)) {
+   pr_err("unexpected residual page-size bits, should_swap=%s\n",
+  yesno(should_swap));
err = -EINVAL;
goto out_put;
}
-- 
2.26.3



Re: [PATCH v2 0/2] Fix a hung during memory pressure test

2021-09-06 Thread Christian König




Am 06.09.21 um 03:12 schrieb xinhui pan:

A long time ago, someone reports system got hung during memory test.
In recent days, I am trying to look for or understand the potential
deadlock in ttm/amdgpu code.

This patchset aims to fix the deadlock during ttm populate.

TTM has a parameter called pages_limit, when allocated GTT memory
reaches this limit, swapout would be triggered. As ttm_bo_swapout does
not return the correct retval, populate might get hung.

UVD ib test uses GTT which might be insufficient. So a gpu recovery
would hung if populate hung.


Ah, now I understand what you are trying to do.

Problem is that won't work either. Allocating VRAM can easily land you 
inside the same deadlock.


We need to avoid the allocation altogether for this for work correctly.



I have made one drm test which alloc two GTT BOs, submit gfx copy
commands and free these BOs without waiting fence. What's more, these
gfx copy commands will cause gfx ring hang. So gpu recovery would be
triggered.


Mhm, that should never be possible. It is perfectly valid for an 
application to terminate without waitting for the GFX submission to be 
completed.


Going to push patch #1 to drm-misc-fixes or drm-misc-next-fixes in a moment.

Thanks,
Christian.



Now here is one possible deadlock case.
gpu_recovery
  -> stop drm scheduler
  -> asic reset
-> ib test
   -> tt populate (uvd ib test)
->  ttm_bo_swapout (BO A) // this always fails as the fence of
BO A would not be signaled by schedluer or HW. Hit deadlock.

I paste the drm test patch below.
#modprobe ttm pages_limit=65536
#amdgpu_test -s 1 -t 4
---
  tests/amdgpu/basic_tests.c | 32 ++--
  1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
index dbf02fee..f85ed340 100644
--- a/tests/amdgpu/basic_tests.c
+++ b/tests/amdgpu/basic_tests.c
@@ -65,13 +65,16 @@ static void amdgpu_direct_gma_test(void);
  static void amdgpu_command_submission_write_linear_helper(unsigned ip_type);
  static void amdgpu_command_submission_const_fill_helper(unsigned ip_type);
  static void amdgpu_command_submission_copy_linear_helper(unsigned ip_type);
-static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
+static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
   unsigned ip_type,
   int instance, int pm4_dw, uint32_t 
*pm4_src,
   int res_cnt, amdgpu_bo_handle *resources,
   struct amdgpu_cs_ib_info *ib_info,
-  struct amdgpu_cs_request *ibs_request);
+  struct amdgpu_cs_request *ibs_request, 
int sync, int repeat);
   
+#define amdgpu_test_exec_cs_helper(...) \

+   _amdgpu_test_exec_cs_helper(__VA_ARGS__, 1, 1)
+
  CU_TestInfo basic_tests[] = {
{ "Query Info Test",  amdgpu_query_info_test },
{ "Userptr Test",  amdgpu_userptr_test },
@@ -1341,12 +1344,12 @@ static void amdgpu_command_submission_compute(void)
   * pm4_src, resources, ib_info, and ibs_request
   * submit command stream described in ibs_request and wait for this IB 
accomplished
   */
-static void amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
+static void _amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
   unsigned ip_type,
   int instance, int pm4_dw, uint32_t 
*pm4_src,
   int res_cnt, amdgpu_bo_handle *resources,
   struct amdgpu_cs_ib_info *ib_info,
-  struct amdgpu_cs_request *ibs_request)
+  struct amdgpu_cs_request *ibs_request, 
int sync, int repeat)
  {
int r;
uint32_t expired;
@@ -1395,12 +1398,15 @@ static void 
amdgpu_test_exec_cs_helper(amdgpu_context_handle context_handle,
CU_ASSERT_NOT_EQUAL(ibs_request, NULL);
  
  	/* submit CS */

-   r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
+   while (repeat--)
+   r = amdgpu_cs_submit(context_handle, 0, ibs_request, 1);
CU_ASSERT_EQUAL(r, 0);
  
  	r = amdgpu_bo_list_destroy(ibs_request->resources);

CU_ASSERT_EQUAL(r, 0);
  
+	if (!sync)

+   return;
fence_status.ip_type = ip_type;
fence_status.ip_instance = 0;
fence_status.ring = ibs_request->ring;
@@ -1667,7 +1673,7 @@ static void 
amdgpu_command_submission_sdma_const_fill(void)
  
  static void amdgpu_command_submission_copy_linear_helper(unsigned ip_type)

  {
-   const int sdma_write_length = 1024;
+   const int sdma_write_length = (255) << 20;
const int pm4_dw = 256;
amdgpu_context_handle context_handle;
amdgpu_bo_handle bo1, bo2;
@@ -

Re: [PATCH v2 2/2] drm/amdpgu: Use VRAM domain in UVD IB test

2021-09-06 Thread Christian König

Am 06.09.21 um 03:12 schrieb xinhui pan:

Like vce/vcn does, visible VRAM is OK for ib test.
While commit a11d9ff3ebe0 ("drm/amdgpu: use GTT for
uvd_get_create/destory_msg") says VRAM is not mapped correctly in his
platform which is likely an arm64.

So lets change back to use VRAM on x86_64 platform.


That's still a rather clear NAK. This issue is not related to ARM at all 
and you are trying to fix a problem which is independent of the platform.


Christian.



Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index d451c359606a..e4b75f33ccc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1178,7 +1178,11 @@ int amdgpu_uvd_get_create_msg(struct amdgpu_ring *ring, 
uint32_t handle,
int r, i;
  
  	r = amdgpu_bo_create_reserved(adev, 1024, PAGE_SIZE,

+#ifdef CONFIG_X86_64
+ AMDGPU_GEM_DOMAIN_VRAM,
+#else
  AMDGPU_GEM_DOMAIN_GTT,
+#endif
  &bo, NULL, (void **)&msg);
if (r)
return r;
@@ -1210,7 +1214,11 @@ int amdgpu_uvd_get_destroy_msg(struct amdgpu_ring *ring, 
uint32_t handle,
int r, i;
  
  	r = amdgpu_bo_create_reserved(adev, 1024, PAGE_SIZE,

+#ifdef CONFIG_X86_64
+ AMDGPU_GEM_DOMAIN_VRAM,
+#else
  AMDGPU_GEM_DOMAIN_GTT,
+#endif
  &bo, NULL, (void **)&msg);
if (r)
return r;




  1   2   >