[Intel-gfx] [PATCH v4 0/6] drm/i915/dp_link_training: Define a final failure state when link training fails

2023-08-24 Thread Gil Dekel
Next version of https://patchwork.freedesktop.org/series/122850/

v4:
  Another blunder. I uploaded the patches from my ChromeiumOS kernel dev repo
  instead of drm-tip/drm-tip. Apologies for the noise :(

v3:
  Still learning the ropes of upstream workflow. Apologies for mucking up v2.
  This is just a re-upload.

v2:
  Reorganize into:
  1) Add for final failure state for SST and MST link training fallback.
  2) Add a DRM helper for setting downstream MST ports' link-status state.
  3) Make handling SST and MST connectors simpler via intel_dp.
  4) Update link-status for downstream MST ports.
  5) Emit a uevent with the "link-status" trigger property.

v1:
Currently, when link training fails after all fallback values have been
exhausted, the i915 driver seizes to send uevents to userspace. This leave
userspace thinking that the last passing atomic commit was successful, and that
all connectors (displays) are connected and operational, when in fact, the last
link failed to train and the displays remain dark. This manifests as "zombie"
displays in userspace, in which users observe the displays appear in their
display settings page, but they are dark and unresponsive.

Since, at the time of writing, MST link training fallback is not implemented,
failing MST link training is a significantly more common case then a complete
SST link training failure. And with users using MST hubs more than ever to
connect multiple displays via their USB-C ports we observe this case often.

This patchset series suggest a solution, in which a final failure state is
defined. In this final state, the connector's bit rate capabilities, namely
max_link_rate and max_link_lane_count, are set to 0. This effectively set the
connector's bandwidth to 0Gbps, thus causing all its modes to be pruned in the
following connector probing.

Next, with this state defined, we emit a link-status=Bad uevent. The next time
userspace probes the connector, it should recognize that the connector has no
modes and ignore it since it is in a bad state.

I am aware that always sending a uevent and never stopping may result in some
userspaces having their expectations broken and enter an infinite loop of
modesets and link-training attempts. However, per DRM link-status spec:
```
 * link-status:
 *  Connector link-status property to indicate the status of link. The
 *  default value of link-status is "GOOD". If something fails during or
 *  after modeset, the kernel driver may set this to "BAD" and issue a
 *  hotplug uevent. Drivers should update this value using
 *  drm_connector_set_link_status_property().
 *
 *  When user-space receives the hotplug uevent and detects a "BAD"
 *  link-status, the sink doesn't receive pixels anymore (e.g. the screen
 *  becomes completely black). The list of available modes may have
 *  changed. User-space is expected to pick a new mode if the current one
 *  has disappeared and perform a new modeset with link-status set to
 *  "GOOD" to re-enable the connector.
```
(form drivers/gpu/drm/drm_connector.c - DOC: standard connector properties)

it seems reasonable to assume that the suggested state is an extension of the
spec's guidelines, in which the next new mode userspace picks for a connector
with no modes is - none, thus breaking the cycle of failed link-training
attempts.

I suspect that, maybe, zeroing out the bit rate capabilities is not the right
way to go, and perhaps marking the connector as disconnected instead may be a
better solution. However, if marking a connector disconnected is the way to go,
We will have to iterate over all MST ports in the MST case and mark the spawned
connectors as disconnected as well.

As a final note I should add that this approach was tested with ChromeOS as
userspace, and we observed that the zombie displays stop showing up once the
connectors are pruned of all their modes and are ignored by userspace.

For your consideration and guidance.
Thanks,

Gil Dekel (6):
  drm/i915/dp_link_training: Add a final failing state to link training
fallback
  drm/i915/dp_link_training: Add a final failing state to link training
fallback for MST
  drm/dp_mst: Add drm_dp_set_mst_topology_link_status()
  drm/i915: Move DP modeset_retry_work into intel_dp
  drm/i915/dp_link_training: Set all downstream MST ports to BAD before
retrying
  drm/i915/dp_link_training: Emit a link-status=Bad uevent with trigger
property

 drivers/gpu/drm/display/drm_dp_mst_topology.c | 39 ++
 drivers/gpu/drm/i915/display/intel_display.c  | 14 +++-
 .../drm/i915/display/intel_display_types.h|  6 +-
 drivers/gpu/drm/i915/display/intel_dp.c   | 75 ---
 drivers/gpu/drm/i915/display/intel_dp.h   |  2 +-
 .../drm/i915/display/intel_dp_link_training.c | 11 ++-
 include/drm/display/drm_dp_mst_helper.h   |  3 +
 7 files changed, 110 insertions(+), 40 deletions(-)

--
Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics

Re: [Intel-gfx] [PATCH v4 0/6] drm/i915/dp_link_training: Define a final failure state when link training fails

2023-08-30 Thread Lyude Paul
Other then the name typo (s/Pual/Paul):

Signed-off-by: Lyude Paul  (just since I co-authored
things~)
Reviewed-by: Lyude Paul 

I think we definitely want to make sure we get intel's opinions on this
though, especially regarding the usage of link-status. I think we're close
enough to link-status's intended purpose, but I definitely would like to know
what others think about that since userspace will definitely have to handle
situations like this a bit differently than with SST.

Also - definitely make sure you take a look at Imre's patch series that's
currently on the list (I just finished reviewing it), since it adds some
things to the helpers that might end up being useful here :)

https://patchwork.freedesktop.org/series/122589/

On Thu, 2023-08-24 at 16:50 -0400, Gil Dekel wrote:
> Next version of https://patchwork.freedesktop.org/series/122850/
> 
> v4:
>   Another blunder. I uploaded the patches from my ChromeiumOS kernel dev repo
>   instead of drm-tip/drm-tip. Apologies for the noise :(
> 
> v3:
>   Still learning the ropes of upstream workflow. Apologies for mucking up v2.
>   This is just a re-upload.
> 
> v2:
>   Reorganize into:
>   1) Add for final failure state for SST and MST link training fallback.
>   2) Add a DRM helper for setting downstream MST ports' link-status state.
>   3) Make handling SST and MST connectors simpler via intel_dp.
>   4) Update link-status for downstream MST ports.
>   5) Emit a uevent with the "link-status" trigger property.
> 
> v1:
> Currently, when link training fails after all fallback values have been
> exhausted, the i915 driver seizes to send uevents to userspace. This leave
> userspace thinking that the last passing atomic commit was successful, and 
> that
> all connectors (displays) are connected and operational, when in fact, the 
> last
> link failed to train and the displays remain dark. This manifests as "zombie"
> displays in userspace, in which users observe the displays appear in their
> display settings page, but they are dark and unresponsive.
> 
> Since, at the time of writing, MST link training fallback is not implemented,
> failing MST link training is a significantly more common case then a complete
> SST link training failure. And with users using MST hubs more than ever to
> connect multiple displays via their USB-C ports we observe this case often.
> 
> This patchset series suggest a solution, in which a final failure state is
> defined. In this final state, the connector's bit rate capabilities, namely
> max_link_rate and max_link_lane_count, are set to 0. This effectively set the
> connector's bandwidth to 0Gbps, thus causing all its modes to be pruned in the
> following connector probing.
> 
> Next, with this state defined, we emit a link-status=Bad uevent. The next time
> userspace probes the connector, it should recognize that the connector has no
> modes and ignore it since it is in a bad state.
> 
> I am aware that always sending a uevent and never stopping may result in some
> userspaces having their expectations broken and enter an infinite loop of
> modesets and link-training attempts. However, per DRM link-status spec:
> ```
>  * link-status:
>  *  Connector link-status property to indicate the status of link. The
>  *  default value of link-status is "GOOD". If something fails during or
>  *  after modeset, the kernel driver may set this to "BAD" and issue a
>  *  hotplug uevent. Drivers should update this value using
>  *  drm_connector_set_link_status_property().
>  *
>  *  When user-space receives the hotplug uevent and detects a "BAD"
>  *  link-status, the sink doesn't receive pixels anymore (e.g. the screen
>  *  becomes completely black). The list of available modes may have
>  *  changed. User-space is expected to pick a new mode if the current one
>  *  has disappeared and perform a new modeset with link-status set to
>  *  "GOOD" to re-enable the connector.
> ```
> (form drivers/gpu/drm/drm_connector.c - DOC: standard connector properties)
> 
> it seems reasonable to assume that the suggested state is an extension of the
> spec's guidelines, in which the next new mode userspace picks for a connector
> with no modes is - none, thus breaking the cycle of failed link-training
> attempts.
> 
> I suspect that, maybe, zeroing out the bit rate capabilities is not the right
> way to go, and perhaps marking the connector as disconnected instead may be a
> better solution. However, if marking a connector disconnected is the way to 
> go,
> We will have to iterate over all MST ports in the MST case and mark the 
> spawned
> connectors as disconnected as well.

I -think- this is probably fine, that's likely how I'd 

> 
> As a final note I should add that this approach was tested with ChromeOS as
> userspace, and we observed that the zombie displays stop showing up once the
> connectors are pruned of all their modes and are ignored by userspace.
> 
> For your consideration and guidance

Re: [Intel-gfx] [PATCH v4 0/6] drm/i915/dp_link_training: Define a final failure state when link training fails

2023-09-01 Thread Rodrigo Vivi
On Wed, Aug 30, 2023 at 05:41:37PM -0400, Lyude Paul wrote:
> Other then the name typo (s/Pual/Paul):
> 
> Signed-off-by: Lyude Paul  (just since I co-authored
> things~)

I believe having the Co-developed-by: in the patches that you helped
out would be nice.

> Reviewed-by: Lyude Paul 
> 
> I think we definitely want to make sure we get intel's opinions on this
> though, especially regarding the usage of link-status. I think we're close
> enough to link-status's intended purpose, but I definitely would like to know
> what others think about that since userspace will definitely have to handle
> situations like this a bit differently than with SST.
> 
> Also - definitely make sure you take a look at Imre's patch series that's
> currently on the list (I just finished reviewing it), since it adds some
> things to the helpers that might end up being useful here :)
> 
> https://patchwork.freedesktop.org/series/122589/
> 
> On Thu, 2023-08-24 at 16:50 -0400, Gil Dekel wrote:
> > Next version of https://patchwork.freedesktop.org/series/122850/
> > 
> > v4:
> >   Another blunder. I uploaded the patches from my ChromeiumOS kernel dev 
> > repo
> >   instead of drm-tip/drm-tip. Apologies for the noise :(
> > 
> > v3:
> >   Still learning the ropes of upstream workflow. Apologies for mucking up 
> > v2.
> >   This is just a re-upload.
> > 
> > v2:
> >   Reorganize into:
> >   1) Add for final failure state for SST and MST link training fallback.
> >   2) Add a DRM helper for setting downstream MST ports' link-status state.
> >   3) Make handling SST and MST connectors simpler via intel_dp.
> >   4) Update link-status for downstream MST ports.
> >   5) Emit a uevent with the "link-status" trigger property.
> > 
> > v1:
> > Currently, when link training fails after all fallback values have been
> > exhausted, the i915 driver seizes to send uevents to userspace. This leave
> > userspace thinking that the last passing atomic commit was successful, and 
> > that
> > all connectors (displays) are connected and operational, when in fact, the 
> > last
> > link failed to train and the displays remain dark. This manifests as 
> > "zombie"
> > displays in userspace, in which users observe the displays appear in their
> > display settings page, but they are dark and unresponsive.
> > 
> > Since, at the time of writing, MST link training fallback is not 
> > implemented,
> > failing MST link training is a significantly more common case then a 
> > complete
> > SST link training failure. And with users using MST hubs more than ever to
> > connect multiple displays via their USB-C ports we observe this case often.
> > 
> > This patchset series suggest a solution, in which a final failure state is
> > defined. In this final state, the connector's bit rate capabilities, namely
> > max_link_rate and max_link_lane_count, are set to 0. This effectively set 
> > the
> > connector's bandwidth to 0Gbps, thus causing all its modes to be pruned in 
> > the
> > following connector probing.
> > 
> > Next, with this state defined, we emit a link-status=Bad uevent. The next 
> > time
> > userspace probes the connector, it should recognize that the connector has 
> > no
> > modes and ignore it since it is in a bad state.
> > 
> > I am aware that always sending a uevent and never stopping may result in 
> > some
> > userspaces having their expectations broken and enter an infinite loop of
> > modesets and link-training attempts. However, per DRM link-status spec:
> > ```
> >  * link-status:
> >  *  Connector link-status property to indicate the status of link. The
> >  *  default value of link-status is "GOOD". If something fails during or
> >  *  after modeset, the kernel driver may set this to "BAD" and issue a
> >  *  hotplug uevent. Drivers should update this value using
> >  *  drm_connector_set_link_status_property().
> >  *
> >  *  When user-space receives the hotplug uevent and detects a "BAD"
> >  *  link-status, the sink doesn't receive pixels anymore (e.g. the 
> > screen
> >  *  becomes completely black). The list of available modes may have
> >  *  changed. User-space is expected to pick a new mode if the current 
> > one
> >  *  has disappeared and perform a new modeset with link-status set to
> >  *  "GOOD" to re-enable the connector.
> > ```
> > (form drivers/gpu/drm/drm_connector.c - DOC: standard connector properties)
> > 
> > it seems reasonable to assume that the suggested state is an extension of 
> > the
> > spec's guidelines, in which the next new mode userspace picks for a 
> > connector
> > with no modes is - none, thus breaking the cycle of failed link-training
> > attempts.
> > 
> > I suspect that, maybe, zeroing out the bit rate capabilities is not the 
> > right
> > way to go, and perhaps marking the connector as disconnected instead may be 
> > a
> > better solution. However, if marking a connector disconnected is the way to 
> > go,
> > We will have to iterate over all MST

Re: [Intel-gfx] [PATCH v4 0/6] drm/i915/dp_link_training: Define a final failure state when link training fails

2023-09-01 Thread Rodrigo Vivi
On Fri, Sep 01, 2023 at 02:38:11PM -0400, Rodrigo Vivi wrote:
> On Wed, Aug 30, 2023 at 05:41:37PM -0400, Lyude Paul wrote:
> > Other then the name typo (s/Pual/Paul):
> > 
> > Signed-off-by: Lyude Paul  (just since I co-authored
> > things~)
> 
> I believe having the Co-developed-by: in the patches that you helped
> out would be nice.
> 
> > Reviewed-by: Lyude Paul 
> > 
> > I think we definitely want to make sure we get intel's opinions on this
> > though, especially regarding the usage of link-status. I think we're close
> > enough to link-status's intended purpose, but I definitely would like to 
> > know
> > what others think about that since userspace will definitely have to handle
> > situations like this a bit differently than with SST.

I'd like to get Jani, or Ville, or Imre's eyes here. I believe this
series has a good potential to solve some bad lingering MST issues,
but I'm indeed concerned with the impact on depending on userspace
behavior here.

(Besides that potential dead-lock...)

> > 
> > Also - definitely make sure you take a look at Imre's patch series that's
> > currently on the list (I just finished reviewing it), since it adds some
> > things to the helpers that might end up being useful here :)
> > 
> > https://patchwork.freedesktop.org/series/122589/
> > 
> > On Thu, 2023-08-24 at 16:50 -0400, Gil Dekel wrote:
> > > Next version of https://patchwork.freedesktop.org/series/122850/
> > > 
> > > v4:
> > >   Another blunder. I uploaded the patches from my ChromeiumOS kernel dev 
> > > repo
> > >   instead of drm-tip/drm-tip. Apologies for the noise :(
> > > 
> > > v3:
> > >   Still learning the ropes of upstream workflow. Apologies for mucking up 
> > > v2.
> > >   This is just a re-upload.
> > > 
> > > v2:
> > >   Reorganize into:
> > >   1) Add for final failure state for SST and MST link training fallback.
> > >   2) Add a DRM helper for setting downstream MST ports' link-status state.
> > >   3) Make handling SST and MST connectors simpler via intel_dp.
> > >   4) Update link-status for downstream MST ports.
> > >   5) Emit a uevent with the "link-status" trigger property.
> > > 
> > > v1:
> > > Currently, when link training fails after all fallback values have been
> > > exhausted, the i915 driver seizes to send uevents to userspace. This leave
> > > userspace thinking that the last passing atomic commit was successful, 
> > > and that
> > > all connectors (displays) are connected and operational, when in fact, 
> > > the last
> > > link failed to train and the displays remain dark. This manifests as 
> > > "zombie"
> > > displays in userspace, in which users observe the displays appear in their
> > > display settings page, but they are dark and unresponsive.
> > > 
> > > Since, at the time of writing, MST link training fallback is not 
> > > implemented,
> > > failing MST link training is a significantly more common case then a 
> > > complete
> > > SST link training failure. And with users using MST hubs more than ever to
> > > connect multiple displays via their USB-C ports we observe this case 
> > > often.
> > > 
> > > This patchset series suggest a solution, in which a final failure state is
> > > defined. In this final state, the connector's bit rate capabilities, 
> > > namely
> > > max_link_rate and max_link_lane_count, are set to 0. This effectively set 
> > > the
> > > connector's bandwidth to 0Gbps, thus causing all its modes to be pruned 
> > > in the
> > > following connector probing.
> > > 
> > > Next, with this state defined, we emit a link-status=Bad uevent. The next 
> > > time
> > > userspace probes the connector, it should recognize that the connector 
> > > has no
> > > modes and ignore it since it is in a bad state.
> > > 
> > > I am aware that always sending a uevent and never stopping may result in 
> > > some
> > > userspaces having their expectations broken and enter an infinite loop of
> > > modesets and link-training attempts. However, per DRM link-status spec:
> > > ```
> > >  * link-status:
> > >  *  Connector link-status property to indicate the status of link. The
> > >  *  default value of link-status is "GOOD". If something fails during 
> > > or
> > >  *  after modeset, the kernel driver may set this to "BAD" and issue a
> > >  *  hotplug uevent. Drivers should update this value using
> > >  *  drm_connector_set_link_status_property().
> > >  *
> > >  *  When user-space receives the hotplug uevent and detects a "BAD"
> > >  *  link-status, the sink doesn't receive pixels anymore (e.g. the 
> > > screen
> > >  *  becomes completely black). The list of available modes may have
> > >  *  changed. User-space is expected to pick a new mode if the current 
> > > one
> > >  *  has disappeared and perform a new modeset with link-status set to
> > >  *  "GOOD" to re-enable the connector.
> > > ```
> > > (form drivers/gpu/drm/drm_connector.c - DOC: standard connector 
> > > properties)
> > > 
> > > it seems reas

Re: [Intel-gfx] [PATCH v4 0/6] drm/i915/dp_link_training: Define a final failure state when link training fails

2023-09-01 Thread Gil Dekel
On Fri, Sep 1, 2023 at 3:05 PM Rodrigo Vivi  wrote:
>
> On Fri, Sep 01, 2023 at 02:38:11PM -0400, Rodrigo Vivi wrote:
> > On Wed, Aug 30, 2023 at 05:41:37PM -0400, Lyude Paul wrote:
> > > Other then the name typo (s/Pual/Paul):
> > >
> > > Signed-off-by: Lyude Paul  (just since I co-authored
> > > things~)
> >
> > I believe having the Co-developed-by: in the patches that you helped
> > out would be nice.
> >
Agreed. I think I'll set Lyude as the author of the two patches she originally
wrote, and set myself as the co-developer. Thank you both for pointing this out.

> > > Reviewed-by: Lyude Paul 
> > >
> > > I think we definitely want to make sure we get intel's opinions on this
> > > though, especially regarding the usage of link-status. I think we're close
> > > enough to link-status's intended purpose, but I definitely would like to 
> > > know
> > > what others think about that since userspace will definitely have to 
> > > handle
> > > situations like this a bit differently than with SST.
>
> I'd like to get Jani, or Ville, or Imre's eyes here. I believe this
> series has a good potential to solve some bad lingering MST issues,
> but I'm indeed concerned with the impact on depending on userspace
> behavior here.
>
I would love to have other userspaces reviewing (or at least ACKing)
this series. Any thoughts on who should be added on the next revision?

> (Besides that potential dead-lock...)
>
Response is on patch 4/5.
> > >
> > > Also - definitely make sure you take a look at Imre's patch series that's
> > > currently on the list (I just finished reviewing it), since it adds some
> > > things to the helpers that might end up being useful here :)
> > >
> > > https://patchwork.freedesktop.org/series/122589/
> > >
Do you have anything particular in mind?

> > > On Thu, 2023-08-24 at 16:50 -0400, Gil Dekel wrote:
> > > > Next version of https://patchwork.freedesktop.org/series/122850/
> > > >
> > > > v4:
> > > >   Another blunder. I uploaded the patches from my ChromeiumOS kernel 
> > > > dev repo
> > > >   instead of drm-tip/drm-tip. Apologies for the noise :(
> > > >
> > > > v3:
> > > >   Still learning the ropes of upstream workflow. Apologies for mucking 
> > > > up v2.
> > > >   This is just a re-upload.
> > > >
> > > > v2:
> > > >   Reorganize into:
> > > >   1) Add for final failure state for SST and MST link training fallback.
> > > >   2) Add a DRM helper for setting downstream MST ports' link-status 
> > > > state.
> > > >   3) Make handling SST and MST connectors simpler via intel_dp.
> > > >   4) Update link-status for downstream MST ports.
> > > >   5) Emit a uevent with the "link-status" trigger property.
> > > >
> > > > v1:
> > > > Currently, when link training fails after all fallback values have been
> > > > exhausted, the i915 driver seizes to send uevents to userspace. This 
> > > > leave
> > > > userspace thinking that the last passing atomic commit was successful, 
> > > > and that
> > > > all connectors (displays) are connected and operational, when in fact, 
> > > > the last
> > > > link failed to train and the displays remain dark. This manifests as 
> > > > "zombie"
> > > > displays in userspace, in which users observe the displays appear in 
> > > > their
> > > > display settings page, but they are dark and unresponsive.
> > > >
> > > > Since, at the time of writing, MST link training fallback is not 
> > > > implemented,
> > > > failing MST link training is a significantly more common case then a 
> > > > complete
> > > > SST link training failure. And with users using MST hubs more than ever 
> > > > to
> > > > connect multiple displays via their USB-C ports we observe this case 
> > > > often.
> > > >
> > > > This patchset series suggest a solution, in which a final failure state 
> > > > is
> > > > defined. In this final state, the connector's bit rate capabilities, 
> > > > namely
> > > > max_link_rate and max_link_lane_count, are set to 0. This effectively 
> > > > set the
> > > > connector's bandwidth to 0Gbps, thus causing all its modes to be pruned 
> > > > in the
> > > > following connector probing.
> > > >
> > > > Next, with this state defined, we emit a link-status=Bad uevent. The 
> > > > next time
> > > > userspace probes the connector, it should recognize that the connector 
> > > > has no
> > > > modes and ignore it since it is in a bad state.
> > > >
> > > > I am aware that always sending a uevent and never stopping may result 
> > > > in some
> > > > userspaces having their expectations broken and enter an infinite loop 
> > > > of
> > > > modesets and link-training attempts. However, per DRM link-status spec:
> > > > ```
> > > >  * link-status:
> > > >  *  Connector link-status property to indicate the status of link. 
> > > > The
> > > >  *  default value of link-status is "GOOD". If something fails 
> > > > during or
> > > >  *  after modeset, the kernel driver may set this to "BAD" and 
> > > > issue a
> > > >  *  hotplug uevent. Drivers sho