RE: [CI 2/3] drm/i915: Use vblank worker to unpin old legacy cursor fb safely

2024-02-04 Thread Borah, Chaitanya Kumar
Hello Maarten,

> -Original Message-
> From: Intel-gfx  On Behalf Of
> Maarten Lankhorst
> Sent: Wednesday, January 31, 2024 5:40 PM
> To: intel...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org
> Subject: [CI 2/3] drm/i915: Use vblank worker to unpin old legacy cursor fb
> safely
> 
> From: Ville Syrjälä 
> 
> The cursor hardware only does sync updates, and thus the hardware will be
> scanning out from the old fb until the next start of vblank.
> So in order to make the legacy cursor fastpath actually safe we should not
> unpin the old fb until we're sure the hardware has ceased accessing it. The
> simplest approach is to just use a vblank work here to do the delayed unpin.
> 
> Not 100% sure it's a good idea to put this onto the same high priority vblank
> worker as eg. our timing critical gamma updates.
> But let's keep it simple for now, and it we later discover that this is 
> causing
> problems we can think about adding a lower priority worker for such things.
> 
> This patch is slightly reworked by Maarten
> 

We have been looking into the CI regression which were seen with this patch 
series.

As far as we understand, there are multiple code paths which manipulate the 
plane state. They can be called "asynchronously" at any time which is leading 
to the
random dump stacks and NULL pointer references. [1] and [2] are some of the 
examples. We have identified the following code paths.

1. vblank fb unpin worker
2. intel_atomic_cleanup_work
3. legacy cursor ioctl
4. atomic commit ioctls

Another interesting finding was that, in at least the ADL-P, the vblank unpin 
worker was never scheduled from the legacy cursor update call. In other words, 
the following condition was never true within intel_cursor_unpin_work

if (old_plane_state->ggtt_vma != new_plane_state->ggtt_vma)

The vblank worker was exclusively scheduled from intel_pipe_update_end.

We tried a few things to work around these issues.

1. add a check in the plane state destroy hook to not move forward if the 
vblank worker is scheduled.
2. add checks before accessing frame buffer object (we are not sure yet how 
much this helps but we have found that this operation causes dump stacks)
3. do not defer the intel atomic cleanup into a work queue, instead execute it 
at the end of atomic commit tail.

With these changes we were able to get good results in trybot.[3] There were 
some "Potential atomic update failure" issues but in a discussion with Ville we 
concluded that these can be ignored.

I will float an RFC series with these changes to trigger a discussion.

Regards

Chaitanya

[1]
<4> [436.749743] Call Trace:
<4> [436.749746]  
<4> [436.749748]  ? __die_body+0x1a/0x60
<4> [436.749756]  ? die_addr+0x38/0x60
<4> [436.749759]  ? exc_general_protection+0x1a2/0x400
<4> [436.749767]  ? asm_exc_general_protection+0x26/0x30
<4> [436.749773]  ? process_scheduled_works+0x264/0x530
<4> [436.749778]  ? __pfx_intel_cleanup_plane_fb+0x10/0x10 [i915]
<4> [436.750085]  ? intel_cleanup_plane_fb+0x10/0x90 [i915]
<4> [436.750391]  drm_atomic_helper_cleanup_planes+0x42/0x60
<4> [436.750398]  intel_atomic_cleanup_work+0x66/0xb0 [i915]
<4> [436.750704]  ? process_scheduled_works+0x264/0x530
<4> [436.750709]  process_scheduled_works+0x2db/0x530
<4> [436.750715]  ? __pfx_worker_thread+0x10/0x10
<4> [436.750718]  worker_thread+0x18c/0x350
<4> [436.750722]  ? __pfx_worker_thread+0x10/0x10
<4> [436.750725]  kthread+0xfe/0x130
<4> [436.750730]  ? __pfx_kthread+0x10/0x10
<4> [436.750735]  ret_from_fork+0x2c/0x50
<4> [436.750739]  ? __pfx_kthread+0x10/0x10
<4> [436.750743]  ret_from_fork_asm+0x1b/0x30
<4> [436.750750]  

[2]

  613.496510] Call Trace:
[  613.496514]  
[  613.496516]  ? __die_body+0x1a/0x60
[  613.496525]  ? page_fault_oops+0x156/0x450
[  613.496530]  ? do_user_addr_fault+0x65/0x9e0
[  613.496536]  ? exc_page_fault+0x68/0x1a0
[  613.496543]  ? asm_exc_page_fault+0x26/0x30
[  613.496551]  ? intel_display_rps_mark_interactive+0x4/0x40 [i915]
[  613.496860]  intel_cleanup_plane_fb+0x5d/0xc0 [i915]
[  613.497173]  drm_atomic_helper_cleanup_planes+0x42/0x60
[  613.497181]  intel_atomic_cleanup_work+0x70/0xc0 [i915]
[  613.497493]  ? process_scheduled_works+0x264/0x530
[  613.497498]  process_scheduled_works+0x2db/0x530
[  613.497504]  ? __pfx_worker_thread+0x10/0x10
[  613.497507]  worker_thread+0x18c/0x350
[  613.497511]  ? __pfx_worker_thread+0x10/0x10
[  613.497514]  kthread+0xfe/0x130
[  613.497520]  ? __pfx_kthread+0x10/0x10
[  613.497524]  ret_from_fork+0x2c/0x50
[  613.497528]  ? __pfx_kthread+0x10/0x10
[  613.497532]  ret_from_fork_asm+0x1b/0x30
[  613.497540]  

[3] https://patchwork.freedesktop.org/series/128430/#rev7

> Cc: Maarten Lankhorst 
> Signed-off-by: Ville Syrjälä 
> Signed-off-by

[CI 2/3] drm/i915: Use vblank worker to unpin old legacy cursor fb safely

2024-02-02 Thread Maarten Lankhorst
From: Ville Syrjälä 

The cursor hardware only does sync updates, and thus the hardware
will be scanning out from the old fb until the next start of vblank.
So in order to make the legacy cursor fastpath actually safe we
should not unpin the old fb until we're sure the hardware has
ceased accessing it. The simplest approach is to just use a vblank
work here to do the delayed unpin.

Not 100% sure it's a good idea to put this onto the same high
priority vblank worker as eg. our timing critical gamma updates.
But let's keep it simple for now, and it we later discover that
this is causing problems we can think about adding a lower
priority worker for such things.

This patch is slightly reworked by Maarten

Cc: Maarten Lankhorst 
Signed-off-by: Ville Syrjälä 
Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/display/intel_cursor.c   | 26 +--
 drivers/gpu/drm/i915/display/intel_display.c  |  3 +++
 .../drm/i915/display/intel_display_types.h|  3 +++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_cursor.c 
b/drivers/gpu/drm/i915/display/intel_cursor.c
index f8b33999d43fc..9021c0c1683d6 100644
--- a/drivers/gpu/drm/i915/display/intel_cursor.c
+++ b/drivers/gpu/drm/i915/display/intel_cursor.c
@@ -654,6 +654,17 @@ static bool intel_cursor_format_mod_supported(struct 
drm_plane *_plane,
return format == DRM_FORMAT_ARGB;
 }
 
+static void intel_cursor_unpin_work(struct kthread_work *base)
+{
+   struct drm_vblank_work *work = to_drm_vblank_work(base);
+   struct intel_plane_state *plane_state =
+   container_of(work, typeof(*plane_state), unpin_work);
+   struct intel_plane *plane = to_intel_plane(plane_state->uapi.plane);
+
+   intel_plane_unpin_fb(plane_state);
+   intel_plane_destroy_state(>base, _state->uapi);
+}
+
 static int
 intel_legacy_cursor_update(struct drm_plane *_plane,
   struct drm_crtc *_crtc,
@@ -797,14 +808,25 @@ intel_legacy_cursor_update(struct drm_plane *_plane,
 
intel_psr_unlock(crtc_state);
 
-   intel_plane_unpin_fb(old_plane_state);
+   if (old_plane_state->ggtt_vma != new_plane_state->ggtt_vma) {
+   drm_vblank_work_init(_plane_state->unpin_work, >base,
+intel_cursor_unpin_work);
+
+   drm_vblank_work_schedule(_plane_state->unpin_work,
+
drm_crtc_accurate_vblank_count(>base) + 1,
+false);
+
+   old_plane_state = NULL;
+   } else {
+   intel_plane_unpin_fb(old_plane_state);
+   }
 
 out_free:
if (new_crtc_state)
intel_crtc_destroy_state(>base, _crtc_state->uapi);
if (ret)
intel_plane_destroy_state(>base, _plane_state->uapi);
-   else
+   else if (old_plane_state)
intel_plane_destroy_state(>base, _plane_state->uapi);
return ret;
 
diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index a92e959c8ac7b..36425889b5bc2 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -64,6 +64,7 @@
 #include "intel_crt.h"
 #include "intel_crtc.h"
 #include "intel_crtc_state_dump.h"
+#include "intel_cursor.h"
 #include "intel_ddi.h"
 #include "intel_de.h"
 #include "intel_display_driver.h"
@@ -6780,6 +6781,8 @@ static void intel_commit_modeset_disables(struct 
intel_atomic_state *state)
continue;
 
intel_crtc_disable_planes(state, crtc);
+
+   drm_vblank_work_flush_all(>base);
}
 
/* Only disable port sync and MST slaves */
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h 
b/drivers/gpu/drm/i915/display/intel_display_types.h
index ae2e8cff9d691..5fdb9eccab5a6 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -713,6 +713,9 @@ struct intel_plane_state {
 
struct intel_fb_view view;
 
+   /* for legacy cursor fb unpin */
+   struct drm_vblank_work unpin_work;
+
/* Plane pxp decryption state */
bool decrypt;
 
-- 
2.43.0



[CI 2/3] drm/i915: Use vblank worker to unpin old legacy cursor fb safely

2024-01-31 Thread Maarten Lankhorst
From: Ville Syrjälä 

The cursor hardware only does sync updates, and thus the hardware
will be scanning out from the old fb until the next start of vblank.
So in order to make the legacy cursor fastpath actually safe we
should not unpin the old fb until we're sure the hardware has
ceased accessing it. The simplest approach is to just use a vblank
work here to do the delayed unpin.

Not 100% sure it's a good idea to put this onto the same high
priority vblank worker as eg. our timing critical gamma updates.
But let's keep it simple for now, and it we later discover that
this is causing problems we can think about adding a lower
priority worker for such things.

This patch is slightly reworked by Maarten

Cc: Maarten Lankhorst 
Signed-off-by: Ville Syrjälä 
Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/display/intel_cursor.c   | 26 +--
 drivers/gpu/drm/i915/display/intel_display.c  |  3 +++
 .../drm/i915/display/intel_display_types.h|  3 +++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_cursor.c 
b/drivers/gpu/drm/i915/display/intel_cursor.c
index 926e2de00eb58..64bdf0eb7943c 100644
--- a/drivers/gpu/drm/i915/display/intel_cursor.c
+++ b/drivers/gpu/drm/i915/display/intel_cursor.c
@@ -635,6 +635,17 @@ static bool intel_cursor_format_mod_supported(struct 
drm_plane *_plane,
return format == DRM_FORMAT_ARGB;
 }
 
+static void intel_cursor_unpin_work(struct kthread_work *base)
+{
+   struct drm_vblank_work *work = to_drm_vblank_work(base);
+   struct intel_plane_state *plane_state =
+   container_of(work, typeof(*plane_state), unpin_work);
+   struct intel_plane *plane = to_intel_plane(plane_state->uapi.plane);
+
+   intel_plane_unpin_fb(plane_state);
+   intel_plane_destroy_state(>base, _state->uapi);
+}
+
 static int
 intel_legacy_cursor_update(struct drm_plane *_plane,
   struct drm_crtc *_crtc,
@@ -762,14 +773,25 @@ intel_legacy_cursor_update(struct drm_plane *_plane,
 
local_irq_enable();
 
-   intel_plane_unpin_fb(old_plane_state);
+   if (old_plane_state->ggtt_vma != new_plane_state->ggtt_vma) {
+   drm_vblank_work_init(_plane_state->unpin_work, >base,
+intel_cursor_unpin_work);
+
+   drm_vblank_work_schedule(_plane_state->unpin_work,
+
drm_crtc_accurate_vblank_count(>base) + 1,
+false);
+
+   old_plane_state = NULL;
+   } else {
+   intel_plane_unpin_fb(old_plane_state);
+   }
 
 out_free:
if (new_crtc_state)
intel_crtc_destroy_state(>base, _crtc_state->uapi);
if (ret)
intel_plane_destroy_state(>base, _plane_state->uapi);
-   else
+   else if (old_plane_state)
intel_plane_destroy_state(>base, _plane_state->uapi);
return ret;
 
diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index b10aad15a63d9..b3d73ded097c4 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -64,6 +64,7 @@
 #include "intel_crt.h"
 #include "intel_crtc.h"
 #include "intel_crtc_state_dump.h"
+#include "intel_cursor.h"
 #include "intel_ddi.h"
 #include "intel_de.h"
 #include "intel_display_driver.h"
@@ -6771,6 +6772,8 @@ static void intel_commit_modeset_disables(struct 
intel_atomic_state *state)
continue;
 
intel_crtc_disable_planes(state, crtc);
+
+   drm_vblank_work_flush_all(>base);
}
 
/* Only disable port sync and MST slaves */
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h 
b/drivers/gpu/drm/i915/display/intel_display_types.h
index 3fdd8a5179831..b7faeeb5ddc1c 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -713,6 +713,9 @@ struct intel_plane_state {
 
struct intel_fb_view view;
 
+   /* for legacy cursor fb unpin */
+   struct drm_vblank_work unpin_work;
+
/* Plane pxp decryption state */
bool decrypt;
 
-- 
2.43.0



[[CI] 2/3] drm/i915: Use vblank worker to unpin old legacy cursor fb safely

2024-01-31 Thread Maarten Lankhorst
From: Ville Syrjälä 

The cursor hardware only does sync updates, and thus the hardware
will be scanning out from the old fb until the next start of vblank.
So in order to make the legacy cursor fastpath actually safe we
should not unpin the old fb until we're sure the hardware has
ceased accessing it. The simplest approach is to just use a vblank
work here to do the delayed unpin.

Not 100% sure it's a good idea to put this onto the same high
priority vblank worker as eg. our timing critical gamma updates.
But let's keep it simple for now, and it we later discover that
this is causing problems we can think about adding a lower
priority worker for such things.

This patch is slightly reworked by Maarten

Cc: Maarten Lankhorst 
Signed-off-by: Ville Syrjälä 
Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/display/intel_cursor.c   | 26 +--
 drivers/gpu/drm/i915/display/intel_display.c  |  3 +++
 .../drm/i915/display/intel_display_types.h|  3 +++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_cursor.c 
b/drivers/gpu/drm/i915/display/intel_cursor.c
index 926e2de00eb58..64bdf0eb7943c 100644
--- a/drivers/gpu/drm/i915/display/intel_cursor.c
+++ b/drivers/gpu/drm/i915/display/intel_cursor.c
@@ -635,6 +635,17 @@ static bool intel_cursor_format_mod_supported(struct 
drm_plane *_plane,
return format == DRM_FORMAT_ARGB;
 }
 
+static void intel_cursor_unpin_work(struct kthread_work *base)
+{
+   struct drm_vblank_work *work = to_drm_vblank_work(base);
+   struct intel_plane_state *plane_state =
+   container_of(work, typeof(*plane_state), unpin_work);
+   struct intel_plane *plane = to_intel_plane(plane_state->uapi.plane);
+
+   intel_plane_unpin_fb(plane_state);
+   intel_plane_destroy_state(>base, _state->uapi);
+}
+
 static int
 intel_legacy_cursor_update(struct drm_plane *_plane,
   struct drm_crtc *_crtc,
@@ -762,14 +773,25 @@ intel_legacy_cursor_update(struct drm_plane *_plane,
 
local_irq_enable();
 
-   intel_plane_unpin_fb(old_plane_state);
+   if (old_plane_state->ggtt_vma != new_plane_state->ggtt_vma) {
+   drm_vblank_work_init(_plane_state->unpin_work, >base,
+intel_cursor_unpin_work);
+
+   drm_vblank_work_schedule(_plane_state->unpin_work,
+
drm_crtc_accurate_vblank_count(>base) + 1,
+false);
+
+   old_plane_state = NULL;
+   } else {
+   intel_plane_unpin_fb(old_plane_state);
+   }
 
 out_free:
if (new_crtc_state)
intel_crtc_destroy_state(>base, _crtc_state->uapi);
if (ret)
intel_plane_destroy_state(>base, _plane_state->uapi);
-   else
+   else if (old_plane_state)
intel_plane_destroy_state(>base, _plane_state->uapi);
return ret;
 
diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index b10aad15a63d9..b3d73ded097c4 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -64,6 +64,7 @@
 #include "intel_crt.h"
 #include "intel_crtc.h"
 #include "intel_crtc_state_dump.h"
+#include "intel_cursor.h"
 #include "intel_ddi.h"
 #include "intel_de.h"
 #include "intel_display_driver.h"
@@ -6771,6 +6772,8 @@ static void intel_commit_modeset_disables(struct 
intel_atomic_state *state)
continue;
 
intel_crtc_disable_planes(state, crtc);
+
+   drm_vblank_work_flush_all(>base);
}
 
/* Only disable port sync and MST slaves */
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h 
b/drivers/gpu/drm/i915/display/intel_display_types.h
index 3fdd8a5179831..b7faeeb5ddc1c 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -713,6 +713,9 @@ struct intel_plane_state {
 
struct intel_fb_view view;
 
+   /* for legacy cursor fb unpin */
+   struct drm_vblank_work unpin_work;
+
/* Plane pxp decryption state */
bool decrypt;
 
-- 
2.43.0