Re: [Nouveau] [PATCH v5 07/14] drm/dp_mst: Don't forget to update port->input in drm_dp_mst_handle_conn_stat()

2019-10-22 Thread Sean Paul
On Mon, Oct 21, 2019 at 10:36:02PM -0400, Lyude Paul wrote:
> This probably hasn't caused any problems up until now since it's
> probably nearly impossible to encounter this in the wild, however if we
> were to receive a connection status notification from the MST hub after
> resume while we're in the middle of reprobing the link addresses for a
> topology then there's a much larger chance that a port could have
> changed from being an output port to input port (or vice versa). If we
> forget to update this bit of information, we'll potentially ignore a
> valid PDT change on a downstream port because we think it's an input
> port.
> 
> So, make sure we read the input_port field in connection status
> notifications in drm_dp_mst_handle_conn_stat() to prevent this from
> happening once we've implemented suspend/resume reprobing.
> 
> Cc: Juston Li 
> Cc: Imre Deak 
> Cc: Ville Syrjälä 
> Cc: Harry Wentland 
> Cc: Daniel Vetter 
> Signed-off-by: Lyude Paul 
> Reviewed-by: Sean Paul 
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 52 +++
>  1 file changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 7bf4db91ff90..c8e218b902ae 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -2079,18 +2079,40 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch 
> *mstb,
>  {
>   struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
>   struct drm_dp_mst_port *port;
> - int old_ddps;
> - bool dowork = false;
> + int old_ddps, ret;
> + u8 new_pdt;
> + bool dowork = false, create_connector = false;
>  
>   port = drm_dp_get_port(mstb, conn_stat->port_number);
>   if (!port)
>   return;
>  
> - /* Locking is only needed if the port's exposed to userspace */
> - if (port->connector)
> + if (port->connector) {
> + if (!port->input && conn_stat->input_port) {
> + /*
> +  * We can't remove a connector from an already exposed
> +  * port, so just throw the port out and make sure we
> +  * reprobe the link address of it's parent MSTB
> +  */
> + drm_dp_mst_topology_unlink_port(mgr, port);
> + mstb->link_address_sent = false;
> + dowork = true;
> + goto out;
> + }
> +
> + /*
> +  * Locking is only needed if the port's exposed to userspace
> +  */

optional nit: this will still fit on one line

>   drm_modeset_lock(>base.lock, NULL);
> + } else if (port->input && !conn_stat->input_port) {
> + create_connector = true;
> + /* Reprobe link address so we get num_sdp_streams */
> + mstb->link_address_sent = false;
> + dowork = true;
> + }
>  
>   old_ddps = port->ddps;
> + port->input = conn_stat->input_port;
>   port->mcs = conn_stat->message_capability_status;
>   port->ldps = conn_stat->legacy_device_plug_status;
>   port->ddps = conn_stat->displayport_device_plug_status;
> @@ -2103,21 +2125,23 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch 
> *mstb,
>   }
>   }
>  
> - if (!port->input) {
> - int ret = drm_dp_port_set_pdt(port,
> -   conn_stat->peer_device_type);
> - if (ret == 1) {
> - dowork = true;
> - } else if (ret < 0) {
> - DRM_ERROR("Failed to change PDT for port %p: %d\n",
> -   port, ret);
> - dowork = false;
> - }
> + new_pdt = port->input ? DP_PEER_DEVICE_NONE : 
> conn_stat->peer_device_type;
> +
> + ret = drm_dp_port_set_pdt(port, new_pdt);
> + if (ret == 1) {
> + dowork = true;
> + } else if (ret < 0) {
> + DRM_ERROR("Failed to change PDT for port %p: %d\n",
> +   port, ret);
> + dowork = false;
>   }
>  
>   if (port->connector)
>   drm_modeset_unlock(>base.lock);
> + else if (create_connector)
> + drm_dp_mst_port_add_connector(mstb, port);
>  
> +out:
>   drm_dp_mst_topology_put_port(port);
>   if (dowork)
>   queue_work(system_long_wq, >mgr->work);
> -- 
> 2.21.0
> 

-- 
Sean Paul, Software Engineer, Google / Chromium OS
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [Bug 112070] H/W Acceleration sufficiently buggy on Debian to hard lock machine GeForce 7600 GO

2019-10-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=112070

--- Comment #23 from Ilia Mirkin  ---
Looks like a crash in glmark2 itself. Probably some unexpected condition...

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [Bug 112070] H/W Acceleration sufficiently buggy on Debian to hard lock machine GeForce 7600 GO

2019-10-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=112070

--- Comment #22 from Doobz  ---
Created attachment 145795
  --> https://bugs.freedesktop.org/attachment.cgi?id=145795=edit
Backtrace glmark2 segfault nv4b mesa1836

Let me know if I messed up. And thanks for taking the time to look into this.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [Bug 112070] H/W Acceleration sufficiently buggy on Debian to hard lock machine GeForce 7600 GO

2019-10-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=112070

--- Comment #21 from Ilia Mirkin  ---
b000 range is the media range. Just remove dri/libvdpau_nouveau.so -- I'm sure
that's not helping. And any similar va-api thing.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v5 05/14] drm/dp_mst: Add probe_lock

2019-10-22 Thread Sean Paul
On Mon, Oct 21, 2019 at 10:36:00PM -0400, Lyude Paul wrote:
> Currently, MST lacks locking in a lot of places that really should have
> some sort of locking. Hotplugging and link address code paths are some
> of the offenders here, as there is actually nothing preventing us from
> running a link address probe while at the same time handling a
> connection status update request - something that's likely always been
> possible but never seen in the wild because hotplugging has been broken
> for ages now (with the exception of amdgpu, for reasons I don't think
> are worth digging into very far).
> 
> Note: I'm going to start using the term "in-memory topology layout" here
> to refer to drm_dp_mst_port->mstb and drm_dp_mst_branch->ports.
> 
> Locking in these places is a little tougher then it looks though.
> Generally we protect anything having to do with the in-memory topology
> layout under >lock. But this becomes nearly impossible to do from
> the context of link address probes due to the fact that >lock is
> usually grabbed under random various modesetting locks, meaning that
> there's no way we can just invert the >lock order and keep it
> locked throughout the whole process of updating the topology.
> 
> Luckily there are only two workers which can modify the in-memory
> topology layout: drm_dp_mst_up_req_work() and
> drm_dp_mst_link_probe_work(), meaning as long as we prevent these two
> workers from traveling the topology layout in parallel with the intent
> of updating it we don't need to worry about grabbing >lock in these
> workers for reads. We only need to grab >lock in these workers for
> writes, so that readers outside these two workers are still protected
> from the topology layout changing beneath them.
> 
> So, add the new >probe_lock and use it in both
> drm_dp_mst_link_probe_work() and drm_dp_mst_up_req_work(). Additionally,
> add some more detailed explanations for how this locking is intended to
> work to drm_dp_mst_port->mstb and drm_dp_mst_branch->ports.

This seems like a good solution to me, thanks for working this through!

Any chance we could add a WARN_ON(!mutex_is_locked(>probe_lock)); somewhere
centrally called by all paths modifying the in-memory topology layout?
drm_dp_add_port perhaps? That way we have a fallback in case something else
starts mucking with the topology.

Other than that,

Reviewed-by: Sean Paul 


> 
> Signed-off-by: Lyude Paul 
> Cc: Juston Li 
> Cc: Imre Deak 
> Cc: Ville Syrjälä 
> Cc: Harry Wentland 
> Cc: Daniel Vetter 
> Cc: Sean Paul 
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 28 ++-
>  include/drm/drm_dp_mst_helper.h   | 32 +++
>  2 files changed, 46 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 08c316a727df..11d842f0bff5 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -2147,37 +2147,40 @@ static void drm_dp_check_and_send_link_address(struct 
> drm_dp_mst_topology_mgr *m
>  struct drm_dp_mst_branch *mstb)
>  {
>   struct drm_dp_mst_port *port;
> - struct drm_dp_mst_branch *mstb_child;
> +
>   if (!mstb->link_address_sent)
>   drm_dp_send_link_address(mgr, mstb);
>  
>   list_for_each_entry(port, >ports, next) {
> - if (port->input)
> - continue;
> + struct drm_dp_mst_branch *mstb_child = NULL;
>  
> - if (!port->ddps)
> + if (port->input || !port->ddps)
>   continue;
>  
>   if (!port->available_pbn)
>   drm_dp_send_enum_path_resources(mgr, mstb, port);
>  
> - if (port->mstb) {
> + if (port->mstb)
>   mstb_child = drm_dp_mst_topology_get_mstb_validated(
>   mgr, port->mstb);
> - if (mstb_child) {
> - drm_dp_check_and_send_link_address(mgr, 
> mstb_child);
> - drm_dp_mst_topology_put_mstb(mstb_child);
> - }
> +
> + if (mstb_child) {
> + drm_dp_check_and_send_link_address(mgr, mstb_child);
> + drm_dp_mst_topology_put_mstb(mstb_child);
>   }
>   }
>  }
>  
>  static void drm_dp_mst_link_probe_work(struct work_struct *work)
>  {
> - struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct 
> drm_dp_mst_topology_mgr, work);
> + struct drm_dp_mst_topology_mgr *mgr =
> + container_of(work, struct drm_dp_mst_topology_mgr, work);
> + struct drm_device *dev = mgr->dev;
>   struct drm_dp_mst_branch *mstb;
>   int ret;
>  
> + mutex_lock(>probe_lock);
> +
>   mutex_lock(>lock);
>   mstb = mgr->mst_primary;
>   if (mstb) {
> @@ -2190,6 +2193,7 @@ static void drm_dp_mst_link_probe_work(struct 
> 

[Nouveau] [Bug 112070] H/W Acceleration sufficiently buggy on Debian to hard lock machine GeForce 7600 GO

2019-10-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=112070

--- Comment #20 from Doobz  ---
Created attachment 145794
  --> https://bugs.freedesktop.org/attachment.cgi?id=145794=edit
dmesg output from NV63 GF7100 desktop

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [Bug 112070] H/W Acceleration sufficiently buggy on Debian to hard lock machine GeForce 7600 GO

2019-10-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=112070

--- Comment #19 from Doobz  ---
Not managed to do the backtrace yet, but thought I'd attach dmesg from
7100-based desktop. Had it sitting cycling gl screensavers for a while.

Lots of these:-
[   45.639455] nouveau :00:10.0: bus: MMIO write of 00540001 FAULT at
00b000

And some of these:-
[  825.282398] perf: interrupt took too long (5006 > 4996), lowering
kernel.perf_event_max_sample_rate to 39750

In other news, same machine hard locked as soon as I tried to play an local
video file.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-22 Thread Karol Herbst
On Tue, Oct 22, 2019 at 2:45 PM Mika Westerberg
 wrote:
>
> On Tue, Oct 22, 2019 at 11:16:14AM +0200, Karol Herbst wrote:
> > I think there is something I totally forgot about:
> >
> > When there was never a driver bound to the GPU, and if runtime power
> > management gets enabled on that device, runtime suspend/resume works
> > as expected (I am not 100% sure on if that always works, but I will
> > recheck that).
>
> AFAIK, if there is no driver bound to the PCI device it is left to D0
> regardless of the runtime PM state which could explain why it works in
> that case (it is never put into D3hot).
>
> I looked at the acpidump you sent and there is one thing that may
> explain the differences between Windows and Linux. Not sure if you were
> aware of this already, though. The power resource PGOF() method has
> this:
>
>If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05 {
>   ...
>}
>

I think this is the fallback to some older method of runtime
suspending the device, and I think it will end up touching different
registers on the bridge controller which do not show the broken
behaviour.

You'll find references to following variables which all cause a link
to be powered down: Q0L2 (newest), P0L2, P0LD (oldest, I think).

Maybe I remember incorrectly and have to read the code again... okay,
the fallback path uses P0LD indeed. That's actually the only register
of those being documented by Intel afaik.

> If I read it right, the later condition tries to detect Linux which
> fails nowadays but if you have acpi_rev_override in the command line (or
> the machine is listed in acpi_rev_dmi_table) this check passes and does
> some magic which is not clear to me. There is similar in PGON() side
> which is used to turn the device back on.
>
> You can check what actually happens when _ON()/_OFF() is called by
> passing something like below to the kernel command line:
>
>   acpi.trace_debug_layer=0x80 acpi.trace_debug_level=0x10 
> acpi.trace_method_name=\_SB.PCI0.PEG0.PG00._ON acpi.trace_state=method
>
> (See also Documentation/firmware-guide/acpi/method-tracing.rst).
>
> Trace goes to system dmesg.

This sounds to be very helpful, I'll give it a try.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-22 Thread Karol Herbst
I think there is something I totally forgot about:

When there was never a driver bound to the GPU, and if runtime power
management gets enabled on that device, runtime suspend/resume works
as expected (I am not 100% sure on if that always works, but I will
recheck that).
In the past I know that at some point I "bisected" the nouveau driver
to figure out what actually breaks it and found out that some script
executed with the help of an on-chip engine (signed script, signed
firmware, both vbios provided) makes it break. Debugging the script
lead me to the PCIe link speed changes done inside the script breaking
it.

But as "reverting" the speed change didn't make it work reliably
again, I think I need to get back on that and check if it's something
else. I will try to convert the script into C or python code to make
it more accessible to debug and hopefully I'll find something I
overlooked the last time.

On Mon, Oct 21, 2019 at 6:40 PM Karol Herbst  wrote:
>
> On Mon, Oct 21, 2019 at 5:46 PM Mika Westerberg
>  wrote:
> >
> > On Mon, Oct 21, 2019 at 04:49:09PM +0200, Karol Herbst wrote:
> > > On Mon, Oct 21, 2019 at 4:09 PM Mika Westerberg
> > >  wrote:
> > > >
> > > > On Mon, Oct 21, 2019 at 03:54:09PM +0200, Karol Herbst wrote:
> > > > > > I really would like to provide you more information about such
> > > > > > workaround but I'm not aware of any ;-) I have not seen any issues 
> > > > > > like
> > > > > > this when D3cold is properly implemented in the platform.  That's 
> > > > > > why
> > > > > > I'm bit skeptical that this has anything to do with specific Intel 
> > > > > > PCIe
> > > > > > ports. More likely it is some power sequence in the _ON/_OFF() 
> > > > > > methods
> > > > > > that is run differently on Windows.
> > > > >
> > > > > yeah.. maybe. I really don't know what's the actual root cause. I just
> > > > > know that with this workaround it works perfectly fine on my and some
> > > > > other systems it was tested on. Do you know who would be best to
> > > > > approach to get proper documentation about those methods and what are
> > > > > the actual prerequisites of those methods?
> > > >
> > > > Those should be documented in the ACPI spec. Chapter 7 should explain
> > > > power resources and the device power methods in detail.
> > >
> > > either I looked up the wrong spec or the documentation isn't really
> > > saying much there.
> >
> > Well it explains those methods, _PSx, _PRx and _ON()/_OFF(). In case of
> > PCIe device you also want to check PCIe spec. PCIe 5.0 section 5.8 "PCI
> > Function Power State Transitions" has a picture about the supported
> > power state transitions and there we can find that function must be in
> > D3hot before it can be transitioned into D3cold so if the _OFF() for
> > example blindly assumes that the device is in D0 when it is called, it
> > is a bug in the BIOS.
> >
> > BTW, where can I find acpidump of such system?
>
> I am sure it's uploaded somewhere already. But it's not an issue of
> just one system. It's essentially hitting every single laptop with a
> skylake or kaby lake CPU + Nvidia GPU. I didn't see any system where
> it's actually working right now (and we are pestering nvidia about
> this issue for over a year already with no solution)
>
> I've attached an acpidump from my system.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau