Re: [Intel-gfx] [PATCH v4 5/6] drm/i915/dp_link_training: Set all downstream MST ports to BAD before retrying

2023-09-01 Thread Gil Dekel
On Fri, Sep 1, 2023 at 5:13 PM Gil Dekel  wrote:
>
> On Fri, Sep 1, 2023 at 2:55 PM Rodrigo Vivi  wrote:
> >
> > On Thu, Aug 24, 2023 at 04:50:20PM -0400, Gil Dekel wrote:
> > > Before sending a uevent to userspace in order to trigger a corrective
> > > modeset, we change the failing connector's link-status to BAD. However,
> > > the downstream MST branch ports are left in their original GOOD state.
> > >
> > > This patch utilizes the drm helper function
> > > drm_dp_set_mst_topology_link_status() to rectify this and set all
> > > downstream MST connectors' link-status to BAD before emitting the uevent
> > > to userspace.
> > >
> > > Signed-off-by: Gil Dekel 
> > > ---
> > >  drivers/gpu/drm/i915/display/intel_dp.c | 16 ++--
> > >  1 file changed, 10 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> > > b/drivers/gpu/drm/i915/display/intel_dp.c
> > > index 42353b1ac487..e8b10f59e141 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_dp.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> > > @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct 
> > > work_struct *work)
> > >   struct intel_dp *intel_dp =
> > >   container_of(work, typeof(*intel_dp), modeset_retry_work);
> > >   struct drm_connector *connector = 
> > > &intel_dp->attached_connector->base;
> > > - drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", 
> > > connector->base.id,
> > > - connector->name);
> > >
> > > - /* Grab the locks before changing connector property*/
> > > - mutex_lock(&connector->dev->mode_config.mutex);
> > > - /* Set connector link status to BAD and send a Uevent to notify
> > > -  * userspace to do a modeset.
> > > + /* Set the connector's (and possibly all its downstream MST ports') 
> > > link
> > > +  * status to BAD.
> > >*/
> > > + mutex_lock(&connector->dev->mode_config.mutex);
> > > + drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> 
> > > %d\n",
> > > + connector->base.id, connector->name,
> > > + connector->state->link_status, 
> > > DRM_MODE_LINK_STATUS_BAD);
> > >   drm_connector_set_link_status_property(connector,
> > >  DRM_MODE_LINK_STATUS_BAD);
> > > + if (intel_dp->is_mst) {
> > > + drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr,
> > > + 
> > > DRM_MODE_LINK_STATUS_BAD);
> >
> > Something is weird with the locking here.
> > I noticed that on patch 3 this new function also gets the same
> > mutex_lock(&connector->dev->mode_config.mutex);
> >
> > Since you didn't reach the deadlock, I'm clearly missing something
> > on the flow. But regardless of what I could be missing, I believe
> > this is totally not future proof and we will for sure hit dead-lock
> > cases.
> >
> You are not wrong.
>
> Something must have been wrong in my workflow, as I was positive I
> tested the code with this lock, but I must remember wrong. I tried
> testing my current code and it immediately locked, as you expected.
> So thank you for catching this.
>
> Lyude's original patch didn't include drm_dp_set_mst_topology_link_status()
> as an exposed drm helper function, so when I adjusted it for this series, I
> decided to add locks similar to how her other function using
> drm_dp_set_mst_topology_link_status() did. However, I failed to use the
> right lock, which is:
> drm_modeset_lock(&connector->dev->mode_config.connection_mutex, NULL);
> drm_modeset_unlock(&connector->dev->mode_config.connection_mutex);
> This is similar to how drm_connector_set_link_status_property() locks
> before writing to link_status.
>
> I made sure to test my code with the above locks, and it runs well. Here's
> an instrumented log excerpt for failing link-training with an MST hub
> (I hacked the driver to always fail non eDP connectors and print the
> raw pointer addresses of the drm_device and mutex right before locking):
> [   43.466329] i915 :00:02.0: [drm] *ERROR* Link Training
> Unsuccessful via gildekel HACK - (not eDP)
> [   43.594950] i915 :00:02.0: [drm] *ERROR* Link Training
> Unsuccessful via gildekel HACK - (not eDP)
> [   43.594979] i915 :00:02.0: [drm] *ERROR* Link Training Unsuccessful
> [   43.595023] i915 :00:02.0: [drm] *ERROR* [CONNECTOR:273:DP-3]:
> [   43.595028] i915 :00:02.0: [drm] *ERROR*
> connector->dev=d4850450
> [   43.595033] i915 :00:02.0: [drm] *ERROR*
> connector->dev->mode_config.mutex=aac3fe45
> [   44.771091] i915 :00:02.0: [drm] *ERROR*
> [MST-CONNECTOR:300:DP-5]:
> [   44.771108] i915 :00:02.0: [drm] *ERROR*
> connector->dev=3fb97435
> [   44.771115] i915 :00:02.0: [drm] *ERROR*
> &connector->dev->mode_config.connection_mutex=9aece20e
> [   44.771127] i915 :00:02.0: [drm] *ERROR*
> [MST-CONNECTOR:303:DP-6]:
> [   44.771132

Re: [Intel-gfx] [PATCH v4 5/6] drm/i915/dp_link_training: Set all downstream MST ports to BAD before retrying

2023-09-01 Thread Gil Dekel
On Fri, Sep 1, 2023 at 2:55 PM Rodrigo Vivi  wrote:
>
> On Thu, Aug 24, 2023 at 04:50:20PM -0400, Gil Dekel wrote:
> > Before sending a uevent to userspace in order to trigger a corrective
> > modeset, we change the failing connector's link-status to BAD. However,
> > the downstream MST branch ports are left in their original GOOD state.
> >
> > This patch utilizes the drm helper function
> > drm_dp_set_mst_topology_link_status() to rectify this and set all
> > downstream MST connectors' link-status to BAD before emitting the uevent
> > to userspace.
> >
> > Signed-off-by: Gil Dekel 
> > ---
> >  drivers/gpu/drm/i915/display/intel_dp.c | 16 ++--
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> > b/drivers/gpu/drm/i915/display/intel_dp.c
> > index 42353b1ac487..e8b10f59e141 100644
> > --- a/drivers/gpu/drm/i915/display/intel_dp.c
> > +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> > @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct 
> > work_struct *work)
> >   struct intel_dp *intel_dp =
> >   container_of(work, typeof(*intel_dp), modeset_retry_work);
> >   struct drm_connector *connector = &intel_dp->attached_connector->base;
> > - drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id,
> > - connector->name);
> >
> > - /* Grab the locks before changing connector property*/
> > - mutex_lock(&connector->dev->mode_config.mutex);
> > - /* Set connector link status to BAD and send a Uevent to notify
> > -  * userspace to do a modeset.
> > + /* Set the connector's (and possibly all its downstream MST ports') 
> > link
> > +  * status to BAD.
> >*/
> > + mutex_lock(&connector->dev->mode_config.mutex);
> > + drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> 
> > %d\n",
> > + connector->base.id, connector->name,
> > + connector->state->link_status, DRM_MODE_LINK_STATUS_BAD);
> >   drm_connector_set_link_status_property(connector,
> >  DRM_MODE_LINK_STATUS_BAD);
> > + if (intel_dp->is_mst) {
> > + drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr,
> > + DRM_MODE_LINK_STATUS_BAD);
>
> Something is weird with the locking here.
> I noticed that on patch 3 this new function also gets the same
> mutex_lock(&connector->dev->mode_config.mutex);
>
> Since you didn't reach the deadlock, I'm clearly missing something
> on the flow. But regardless of what I could be missing, I believe
> this is totally not future proof and we will for sure hit dead-lock
> cases.
>
You are not wrong.

Something must have been wrong in my workflow, as I was positive I
tested the code with this lock, but I must remember wrong. I tried
testing my current code and it immediately locked, as you expected.
So thank you for catching this.

Lyude's original patch didn't include drm_dp_set_mst_topology_link_status()
as an exposed drm helper function, so when I adjusted it for this series, I
decided to add locks similar to how her other function using
drm_dp_set_mst_topology_link_status() did. However, I failed to use the
right lock, which is:
drm_modeset_lock(&connector->dev->mode_config.connection_mutex, NULL);
drm_modeset_unlock(&connector->dev->mode_config.connection_mutex);
This is similar to how drm_connector_set_link_status_property() locks
before writing to link_status.

I made sure to test my code with the above locks, and it runs well. Here's
an instrumented log excerpt for failing link-training with an MST hub
(I hacked the driver to always fail non eDP connectors and print the
raw pointer addresses of the drm_device and mutex right before locking):
[   43.466329] i915 :00:02.0: [drm] *ERROR* Link Training
Unsuccessful via gildekel HACK - (not eDP)
[   43.594950] i915 :00:02.0: [drm] *ERROR* Link Training
Unsuccessful via gildekel HACK - (not eDP)
[   43.594979] i915 :00:02.0: [drm] *ERROR* Link Training Unsuccessful
[   43.595023] i915 :00:02.0: [drm] *ERROR* [CONNECTOR:273:DP-3]:
[   43.595028] i915 :00:02.0: [drm] *ERROR*
connector->dev=d4850450
[   43.595033] i915 :00:02.0: [drm] *ERROR*
connector->dev->mode_config.mutex=aac3fe45
[   44.771091] i915 :00:02.0: [drm] *ERROR*
[MST-CONNECTOR:300:DP-5]:
[   44.771108] i915 :00:02.0: [drm] *ERROR*
connector->dev=3fb97435
[   44.771115] i915 :00:02.0: [drm] *ERROR*
&connector->dev->mode_config.connection_mutex=9aece20e
[   44.771127] i915 :00:02.0: [drm] *ERROR*
[MST-CONNECTOR:303:DP-6]:
[   44.771132] i915 :00:02.0: [drm] *ERROR*
connector->dev=75236b75
[   44.771137] i915 :00:02.0: [drm] *ERROR*
&connector->dev->mode_config.connection_mutex=9aece20e

Also, I was under the assumption that all connectors in an MST topology
should reference the 

Re: [Intel-gfx] [PATCH v4 5/6] drm/i915/dp_link_training: Set all downstream MST ports to BAD before retrying

2023-09-01 Thread Rodrigo Vivi
On Thu, Aug 24, 2023 at 04:50:20PM -0400, Gil Dekel wrote:
> Before sending a uevent to userspace in order to trigger a corrective
> modeset, we change the failing connector's link-status to BAD. However,
> the downstream MST branch ports are left in their original GOOD state.
> 
> This patch utilizes the drm helper function
> drm_dp_set_mst_topology_link_status() to rectify this and set all
> downstream MST connectors' link-status to BAD before emitting the uevent
> to userspace.
> 
> Signed-off-by: Gil Dekel 
> ---
>  drivers/gpu/drm/i915/display/intel_dp.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index 42353b1ac487..e8b10f59e141 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct 
> work_struct *work)
>   struct intel_dp *intel_dp =
>   container_of(work, typeof(*intel_dp), modeset_retry_work);
>   struct drm_connector *connector = &intel_dp->attached_connector->base;
> - drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id,
> - connector->name);
> 
> - /* Grab the locks before changing connector property*/
> - mutex_lock(&connector->dev->mode_config.mutex);
> - /* Set connector link status to BAD and send a Uevent to notify
> -  * userspace to do a modeset.
> + /* Set the connector's (and possibly all its downstream MST ports') link
> +  * status to BAD.
>*/
> + mutex_lock(&connector->dev->mode_config.mutex);
> + drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n",
> + connector->base.id, connector->name,
> + connector->state->link_status, DRM_MODE_LINK_STATUS_BAD);
>   drm_connector_set_link_status_property(connector,
>  DRM_MODE_LINK_STATUS_BAD);
> + if (intel_dp->is_mst) {
> + drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr,
> + DRM_MODE_LINK_STATUS_BAD);

Something is weird with the locking here.
I noticed that on patch 3 this new function also gets the same
mutex_lock(&connector->dev->mode_config.mutex);

Since you didn't reach the deadlock, I'm clearly missing something
on the flow. But regardless of what I could be missing, I believe
this is totally not future proof and we will for sure hit dead-lock
cases.

> + }
>   mutex_unlock(&connector->dev->mode_config.mutex);
>   /* Send Hotplug uevent so userspace can reprobe */
>   drm_kms_helper_connector_hotplug_event(connector);
> --
> Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics


[PATCH v4 5/6] drm/i915/dp_link_training: Set all downstream MST ports to BAD before retrying

2023-08-24 Thread Gil Dekel
Before sending a uevent to userspace in order to trigger a corrective
modeset, we change the failing connector's link-status to BAD. However,
the downstream MST branch ports are left in their original GOOD state.

This patch utilizes the drm helper function
drm_dp_set_mst_topology_link_status() to rectify this and set all
downstream MST connectors' link-status to BAD before emitting the uevent
to userspace.

Signed-off-by: Gil Dekel 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 42353b1ac487..e8b10f59e141 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct 
work_struct *work)
struct intel_dp *intel_dp =
container_of(work, typeof(*intel_dp), modeset_retry_work);
struct drm_connector *connector = &intel_dp->attached_connector->base;
-   drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id,
-   connector->name);

-   /* Grab the locks before changing connector property*/
-   mutex_lock(&connector->dev->mode_config.mutex);
-   /* Set connector link status to BAD and send a Uevent to notify
-* userspace to do a modeset.
+   /* Set the connector's (and possibly all its downstream MST ports') link
+* status to BAD.
 */
+   mutex_lock(&connector->dev->mode_config.mutex);
+   drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n",
+   connector->base.id, connector->name,
+   connector->state->link_status, DRM_MODE_LINK_STATUS_BAD);
drm_connector_set_link_status_property(connector,
   DRM_MODE_LINK_STATUS_BAD);
+   if (intel_dp->is_mst) {
+   drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr,
+   DRM_MODE_LINK_STATUS_BAD);
+   }
mutex_unlock(&connector->dev->mode_config.mutex);
/* Send Hotplug uevent so userspace can reprobe */
drm_kms_helper_connector_hotplug_event(connector);
--
Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics