Re: [Intel-gfx] [PATCH] drm/dp: Power cycle display if LINK_ADDRESS fails.
On Thu, 2018-01-04 at 23:46 +, Pandiyan, Dhinakaran wrote: > On Thu, 2018-01-04 at 18:21 -0500, Lyude Paul wrote: > > Sorry for the late reply, I've been having very similar issues on my own > > MST hub > > and I wanted to make sure that they were the same issue, although it seems > > like > > they aren't. > > > > So; I've been doing a lot of MST debugging this week and last and something > > I've > > discovered needs to be taken into account sometimes with MST hubs is the > > actual > > state that they're in at the point that the DRM driver detects them. I've > > managed to on multiple occasions, get my hub into a weird state by: > > > > - Plugging in MST displays into the hub > > - Turning on the machine > > - Unplugging MST displays from the hub (while still in the BIOS) > > - Booting into linux > > - Plugging MST displays into the hub > > - Everything times out, the world explodes, the economy collapses, etc. > > > > I think maybe, especially since this should be perfectly valid behavior and > > not > > break well or poor behaving hubs, we should do a power cycle with the > > display > > like this when the DP port initially detects an MST hub and before we start > > doing any serious communication with it. Could you see if this fixes your > > issue > > instead of the patch you've got here? > > > > Well, my reasoning to power cycle after a failure was to not affect hubs > that work. Besides that I also saw an odd cycle of timeouts and late > replies when I did this. > > 1. Detect hub > 2. Toggle power state and send LINK_ADDRESS req. > 3. LINK_ADDRESS req times out. > 4. Toggle power state and send LINK_ADDRESS req. > 5. Get late response for the first (and expired) LINK_ADDRESS req. > 6. Go to step 3 > > It seems like toggling the power states flushes out some message buffers > in the MST hub. > > But, in the retry approach, power cycling resulted in getting the > response for the current LINK_ADDRESS request along with the previous > one that timed out. > > I could not come up with an explanation for all the behavior. So, I > decided to get back to this later. > > > > As well, mind attaching your full dmesg with drm.debug=0x6? > > I don't have the old logs anymore, will try to get something for you. Here you go Setup: Laptop -> ThinkPad dock -> Dell MST monitor -> Dell monitor Hotplugged display to dock[passed] - https://pastebin.com/CemRGsCb Connected boot[failed] - https://pastebin.com/jjXP6HzB > > -DK > ___ > Intel-gfx mailing list > intel-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH] drm/dp: Power cycle display if LINK_ADDRESS fails.
On Fri, 2017-12-22 at 00:48 +, Pandiyan, Dhinakaran wrote: > On Thu, 2017-12-21 at 08:53 +0200, Jani Nikula wrote: > > On Wed, 20 Dec 2017, Dhinakaran Pandiyan> > wrote: > > > Occasionally there are LINK_ADDRESS sideband messages timing out with the > > > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These > > > failures lead to the display not coming up on boot. Power cycling the port > > > corresponding to the MST monitor's branch device and resending the message > > > fixes the issue. I am not entirely sure if this is specific to my setup. > > > However, as the power state is toggled conditionally on LINK_ADDRESS > > > timeouts, this should not affect the working cases. > > > > With stuff like this, I always wonder if catering for a failing setup > > blocks us from improving working setups, because once we add this, we > > can't regress it. For example, is there a valid scenario where we'd want > > to fail fast here, instead of retrying? > > Link address failure would result in not probing a branch device that > already has been detected. I guess the fail fast case will be applicable > if the said device was not really present but the parent branch told us > otherwise. > The other option is we could check the device ID of the dock and implement this as a quirk. Btw I found some relevant information in the spec. "The Message Transaction originator must perform the reply timeout check. If an error to a request causes the system to be in an invalid state (e.g., all nodes failed to delete a virtual channel, it is the responsibility of the Message Transaction originator to return the system to a valid state). The Message Transaction originator is responsible for any retries." and "SET_POWER_STATE Must be programmed to 001 (binary) upon power-on reset or an upstream device disconnect. 001 = Set local Sink device and all downstream Sink devices to D0 (normal operation mode)." I wonder if the dock is missing this step because the monitor seems to work well with a discrete MST hub. > > > > Some nits below. > > > > > Cc: Lyude > > > Cc: Dave Airlie > > > Cc: Jani Nikula > > > Signed-off-by: Dhinakaran Pandiyan > > > --- > > > drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++-- > > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > > > b/drivers/gpu/drm/drm_dp_mst_topology.c > > > index 70dcfa58d3c2..e06defcdcf18 100644 > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > > > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct > > > drm_dp_mst_topology_mgr *mgr, > > > int len; > > > struct drm_dp_sideband_msg_tx *txmsg; > > > int ret; > > > + int attempts = 5; > > > > > > - txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > > +retry: txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > > if (!txmsg) > > > return; > > > > > > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct > > > drm_dp_mst_topology_mgr *mgr, > > > } > > > (*mgr->cbs->hotplug)(mgr); > > > } > > > + } else if (attempts--) { > > > > You'll end up doing (attempts + 1) attempts, including the first one. > Yeah, that is what I intended to do :) I renamed it from 'retry' to > 'attempt' at the last moment, which made it a bit confusing I suppose. > > > I am stressing testing my setup more to see how well this recovery works > and update this patch. > Here's what I got with 266 rounds of reboots. Correct link address at 1st try180 Retry 145 Retry 232 Retry 30 Retry 40 Retry 52 Giving up 3 Incorrect link address 4 The retries help about 30% of the cases. > > > > > > > + kfree(txmsg); > > > > How about memset(txmsg, 0, sizoef(*txmsg)); here and move the goto label > > down to avoid repeated allocations? > Absolutely. > > > > > > + drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent, > > > + false); > > > + drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent, > > > + true); > > > + DRM_DEBUG_KMS("link address failed %d, retrying\n", ret); > > > > Maybe do the debug message before you power down/up? > Ok. > > > > BR, > > Jani. > > > > > + goto retry; > > > } else { > > > mstb->link_address_sent = false; > > > - DRM_DEBUG_KMS("link address failed %d\n", ret); > > > + DRM_DEBUG_KMS("link address failed %d, giving up\n", ret); > > > } > > > > > > kfree(txmsg); > > > ___ > Intel-gfx mailing list > intel-...@lists.freedesktop.org >
Re: [Intel-gfx] [PATCH] drm/dp: Power cycle display if LINK_ADDRESS fails.
On Thu, Dec 21, 2017 at 05:32:43PM -0800, Manasi Navare wrote: > On Thu, Dec 21, 2017 at 05:06:22PM -0800, Pandiyan, Dhinakaran wrote: > > On Thu, 2017-12-21 at 10:52 -0800, Manasi Navare wrote: > > > On Wed, Dec 20, 2017 at 10:36:24PM -0800, Dhinakaran Pandiyan wrote: > > > > Occasionally there are LINK_ADDRESS sideband messages timing out with > > > > the > > > > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. > > > > These > > > > failures lead to the display not coming up on boot. Power cycling the > > > > port > > > > corresponding to the MST monitor's branch device and resending the > > > > message > > > > fixes the issue. I am not entirely sure if this is specific to my setup. > > > > However, as the power state is toggled conditionally on LINK_ADDRESS > > > > timeouts, this should not affect the working cases. > > > > > > > > > > > Cc: Lyude> > > > Cc: Dave Airlie > > > > Cc: Jani Nikula > > > > Signed-off-by: Dhinakaran Pandiyan > > > > --- > > > > drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++-- > > > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > > > > b/drivers/gpu/drm/drm_dp_mst_topology.c > > > > index 70dcfa58d3c2..e06defcdcf18 100644 > > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > > > > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct > > > > drm_dp_mst_topology_mgr *mgr, > > > > int len; > > > > struct drm_dp_sideband_msg_tx *txmsg; > > > > int ret; > > > > + int attempts = 5; > > > > > > > > > > Does the spec say this should be retried 5 times or is this just an > > > experimental number? > > > > The spec. does not say how many times to retry, but it does say the > > source is responsible for retrying. > > > > > We have such magical numbers for retries all over the DP code and that > > > makes debugging > > > harder later, so atleast a comment of why its 5 would help. > > > > > Takes about 22 seconds from boot to complete 5 retries on my SKL, I > > think that's enough trying from the kernel side before pulling out the > > DP cable makes sense :) > > > > It still appears to be a magical number so better to comment it properly. > Helps in the debug > > > > > > > > - txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > > > +retry: txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > > > if (!txmsg) > > > > return; > > > > > > > > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct > > > > drm_dp_mst_topology_mgr *mgr, > > > > } > > > > (*mgr->cbs->hotplug)(mgr); > > > > } > > > > + } else if (attempts--) { > > > > > > This should be --attempts if you want the total attempts to be 5 > > I don't. > > Yes the variable attempts is misleading in that case. Probably call it "tries" > I meant retries.. Manasi > Manasi > > > > > > > Manasi > > > > > > > + kfree(txmsg); > > > > + drm_dp_send_power_updown_phy(mstb->mgr, > > > > mstb->port_parent, > > > > +false); > > > > + drm_dp_send_power_updown_phy(mstb->mgr, > > > > mstb->port_parent, > > > > +true); > > > > + DRM_DEBUG_KMS("link address failed %d, retrying\n", > > > > ret); > > > > + goto retry; > > > > } else { > > > > mstb->link_address_sent = false; > > > > - DRM_DEBUG_KMS("link address failed %d\n", ret); > > > > + DRM_DEBUG_KMS("link address failed %d, giving up\n", > > > > ret); > > > > } > > > > > > > > kfree(txmsg); > > > > -- > > > > 2.11.0 > > > > > > > > ___ > > > > dri-devel mailing list > > > > dri-devel@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > ___ > > > Intel-gfx mailing list > > > intel-...@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH] drm/dp: Power cycle display if LINK_ADDRESS fails.
On Thu, Dec 21, 2017 at 05:06:22PM -0800, Pandiyan, Dhinakaran wrote: > On Thu, 2017-12-21 at 10:52 -0800, Manasi Navare wrote: > > On Wed, Dec 20, 2017 at 10:36:24PM -0800, Dhinakaran Pandiyan wrote: > > > Occasionally there are LINK_ADDRESS sideband messages timing out with the > > > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These > > > failures lead to the display not coming up on boot. Power cycling the port > > > corresponding to the MST monitor's branch device and resending the message > > > fixes the issue. I am not entirely sure if this is specific to my setup. > > > However, as the power state is toggled conditionally on LINK_ADDRESS > > > timeouts, this should not affect the working cases. > > > > > > > > Cc: Lyude> > > Cc: Dave Airlie > > > Cc: Jani Nikula > > > Signed-off-by: Dhinakaran Pandiyan > > > --- > > > drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++-- > > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > > > b/drivers/gpu/drm/drm_dp_mst_topology.c > > > index 70dcfa58d3c2..e06defcdcf18 100644 > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > > > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct > > > drm_dp_mst_topology_mgr *mgr, > > > int len; > > > struct drm_dp_sideband_msg_tx *txmsg; > > > int ret; > > > + int attempts = 5; > > > > > > > Does the spec say this should be retried 5 times or is this just an > > experimental number? > > The spec. does not say how many times to retry, but it does say the > source is responsible for retrying. > > > We have such magical numbers for retries all over the DP code and that > > makes debugging > > harder later, so atleast a comment of why its 5 would help. > > Takes about 22 seconds from boot to complete 5 retries on my SKL, I > think that's enough trying from the kernel side before pulling out the > DP cable makes sense :) > It still appears to be a magical number so better to comment it properly. Helps in the debug > > > > > - txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > > +retry: txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > > if (!txmsg) > > > return; > > > > > > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct > > > drm_dp_mst_topology_mgr *mgr, > > > } > > > (*mgr->cbs->hotplug)(mgr); > > > } > > > + } else if (attempts--) { > > > > This should be --attempts if you want the total attempts to be 5 > I don't. Yes the variable attempts is misleading in that case. Probably call it "tries" Manasi > > > > Manasi > > > > > + kfree(txmsg); > > > + drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent, > > > + false); > > > + drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent, > > > + true); > > > + DRM_DEBUG_KMS("link address failed %d, retrying\n", ret); > > > + goto retry; > > > } else { > > > mstb->link_address_sent = false; > > > - DRM_DEBUG_KMS("link address failed %d\n", ret); > > > + DRM_DEBUG_KMS("link address failed %d, giving up\n", ret); > > > } > > > > > > kfree(txmsg); > > > -- > > > 2.11.0 > > > > > > ___ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > ___ > > Intel-gfx mailing list > > intel-...@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH] drm/dp: Power cycle display if LINK_ADDRESS fails.
On Thu, 2017-12-21 at 10:52 -0800, Manasi Navare wrote: > On Wed, Dec 20, 2017 at 10:36:24PM -0800, Dhinakaran Pandiyan wrote: > > Occasionally there are LINK_ADDRESS sideband messages timing out with the > > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These > > failures lead to the display not coming up on boot. Power cycling the port > > corresponding to the MST monitor's branch device and resending the message > > fixes the issue. I am not entirely sure if this is specific to my setup. > > However, as the power state is toggled conditionally on LINK_ADDRESS > > timeouts, this should not affect the working cases. > > > > > Cc: Lyude> > Cc: Dave Airlie > > Cc: Jani Nikula > > Signed-off-by: Dhinakaran Pandiyan > > --- > > drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > > b/drivers/gpu/drm/drm_dp_mst_topology.c > > index 70dcfa58d3c2..e06defcdcf18 100644 > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct > > drm_dp_mst_topology_mgr *mgr, > > int len; > > struct drm_dp_sideband_msg_tx *txmsg; > > int ret; > > + int attempts = 5; > > > > Does the spec say this should be retried 5 times or is this just an > experimental number? The spec. does not say how many times to retry, but it does say the source is responsible for retrying. > We have such magical numbers for retries all over the DP code and that makes > debugging > harder later, so atleast a comment of why its 5 would help. Takes about 22 seconds from boot to complete 5 retries on my SKL, I think that's enough trying from the kernel side before pulling out the DP cable makes sense :) > > > - txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > +retry: txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > if (!txmsg) > > return; > > > > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct > > drm_dp_mst_topology_mgr *mgr, > > } > > (*mgr->cbs->hotplug)(mgr); > > } > > + } else if (attempts--) { > > This should be --attempts if you want the total attempts to be 5 I don't. > > Manasi > > > + kfree(txmsg); > > + drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent, > > +false); > > + drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent, > > +true); > > + DRM_DEBUG_KMS("link address failed %d, retrying\n", ret); > > + goto retry; > > } else { > > mstb->link_address_sent = false; > > - DRM_DEBUG_KMS("link address failed %d\n", ret); > > + DRM_DEBUG_KMS("link address failed %d, giving up\n", ret); > > } > > > > kfree(txmsg); > > -- > > 2.11.0 > > > > ___ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > ___ > Intel-gfx mailing list > intel-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH] drm/dp: Power cycle display if LINK_ADDRESS fails.
On Thu, 2017-12-21 at 08:53 +0200, Jani Nikula wrote: > On Wed, 20 Dec 2017, Dhinakaran Pandiyan> wrote: > > Occasionally there are LINK_ADDRESS sideband messages timing out with the > > Lenovo MST dock + Dell MST monitor(w/ in-built branch) setup I have. These > > failures lead to the display not coming up on boot. Power cycling the port > > corresponding to the MST monitor's branch device and resending the message > > fixes the issue. I am not entirely sure if this is specific to my setup. > > However, as the power state is toggled conditionally on LINK_ADDRESS > > timeouts, this should not affect the working cases. > > With stuff like this, I always wonder if catering for a failing setup > blocks us from improving working setups, because once we add this, we > can't regress it. For example, is there a valid scenario where we'd want > to fail fast here, instead of retrying? Link address failure would result in not probing a branch device that already has been detected. I guess the fail fast case will be applicable if the said device was not really present but the parent branch told us otherwise. > > Some nits below. > > > Cc: Lyude > > Cc: Dave Airlie > > Cc: Jani Nikula > > Signed-off-by: Dhinakaran Pandiyan > > --- > > drivers/gpu/drm/drm_dp_mst_topology.c | 13 +++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > > b/drivers/gpu/drm/drm_dp_mst_topology.c > > index 70dcfa58d3c2..e06defcdcf18 100644 > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > > @@ -1596,8 +1596,9 @@ static void drm_dp_send_link_address(struct > > drm_dp_mst_topology_mgr *mgr, > > int len; > > struct drm_dp_sideband_msg_tx *txmsg; > > int ret; > > + int attempts = 5; > > > > - txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > +retry: txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); > > if (!txmsg) > > return; > > > > @@ -1635,9 +1636,17 @@ static void drm_dp_send_link_address(struct > > drm_dp_mst_topology_mgr *mgr, > > } > > (*mgr->cbs->hotplug)(mgr); > > } > > + } else if (attempts--) { > > You'll end up doing (attempts + 1) attempts, including the first one. Yeah, that is what I intended to do :) I renamed it from 'retry' to 'attempt' at the last moment, which made it a bit confusing I suppose. I am stressing testing my setup more to see how well this recovery works and update this patch. > > > + kfree(txmsg); > > How about memset(txmsg, 0, sizoef(*txmsg)); here and move the goto label > down to avoid repeated allocations? Absolutely. > > > + drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent, > > +false); > > + drm_dp_send_power_updown_phy(mstb->mgr, mstb->port_parent, > > +true); > > + DRM_DEBUG_KMS("link address failed %d, retrying\n", ret); > > Maybe do the debug message before you power down/up? Ok. > > BR, > Jani. > > > + goto retry; > > } else { > > mstb->link_address_sent = false; > > - DRM_DEBUG_KMS("link address failed %d\n", ret); > > + DRM_DEBUG_KMS("link address failed %d, giving up\n", ret); > > } > > > > kfree(txmsg); > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel