Re: [Nouveau] [PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the Lenovo ThinkPad P50

2019-03-13 Thread Lyude Paul
[note to David Ober: you -should- be able to reply to this, hopefully, but I
haven't actually tested that so results may vary]

Hi again! Sorry I didn't fully answer all of the questions you originally
asked in this email, I had to get in contact with Lenovo to make sure that it
was OK for me to disclose more details on this bug (and I had PTO scheduled
immediately after I asked). I've added David Ober from Lenovo to this thread
as well. So now that I've got Lenovo's approval I can answer those questions,
and give some better answers for the others! (see below)


On Fri, 2019-02-15 at 16:17 -0500, Lyude Paul wrote:
> On Thu, 2019-02-14 at 18:43 -0600, Bjorn Helgaas wrote:
> > Hi Lyude,
> > 
> > On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote:
> > > On a very specific subset of ThinkPad P50 SKUs, particularly ones that
> > > come with a Quadro M1000M chip instead of the M2000M variant, the BIOS
> > > seems to have a very nasty habit of not always resetting the secondary
> > > Nvidia GPU between full reboots if the laptop is configured in Hybrid
> > > Graphics mode. The reason for this happening is unknown, but the
> > > following steps and possibly a good bit of patience will reproduce the
> > > issue:
> > > 
> > > 1. Boot up the laptop normally in Hybrid graphics mode
> > > 2. Make sure nouveau is loaded and that the GPU is awake
> > > 2. Allow the nvidia GPU to runtime suspend itself after being idle
> > > 3. Reboot the machine, the more sudden the better (e.g sysrq-b may help)
> > > 4. If nouveau loads up properly, reboot the machine again and go back to
> > > step 2 until you reproduce the issue
> > > 
> > > This results in some very strange behavior: the GPU will
> > > quite literally be left in exactly the same state it was in when the
> > > previously booted kernel started the reboot. This has all sorts of bad
> > > sideaffects: for starters, this completely breaks nouveau starting with
> > > a
> > > mysterious EVO channel failure that happens well before we've actually
> > > used the EVO channel for anything:
> > 
> > There are a lot of moving parts here that are probably obvious to you
> > but not to me.  I need help untangling this a bit so I'm comfortable
> > that we got to the root cause and that we're doing something logical
> > as opposed to something that just happens to make things work.  I
> > really don't know enough to even ask the right questions...
> 
> I completely understand! I'm pretty sure I'd be just as skeptical if I was
> in
> your position reviewing a patch like this :P
> 
> > Is there a bug report for this?  Bugzilla.kernel.org would be ideal,
> > including "lspci -vvxxx" and dmidecode for the system.
> > 
> Not yet, but there has been discussion about this between nouveau developers
> on our IRC channel.
I lied: yes there actually is a bug report for this, but it's currently on the
Red Hat bugzilla. I can get more information from it if you need (with
lenovo's approval of course).

> 
> > Is this running a current BIOS?  The date in your log below looks
> > pretty recent, so I assume it is current.
> 
> Yes, this is the most up to date BIOS available for this system.

And additionally: I've been working with Lenovo on this issue for a couple of
months now, and we've gone through dozens of different trial BIOSes with no
success thus far. However, Lenovo is currently working on trying to add this
workaround into their BIOS but I've been told that this change is going to
take a decent amount of time since they need to test it across multiple
operating systems. I'd be happy to come back and add a conditional later to
turn this workaround off for later BIOS versions once Lenovo has released a
proper fix.

With all of that being said, [how] do you think we should proceed?

> 
> > I assume "hybrid graphics" means you have two GPUs.  Do you select
> > hybrid graphics mode in the BIOS?
> 
> Yes, the P50 has two available modes in the BIOS: Dedicated (e.g. only
> the nvidia GPU is used for everything), and Hybrid (i915 drives the
> built-in display panel, nouveau drives everything else). This bug only
> seems to occur in Hybrid mode.
> 
> > I assume when you say the Nvidia GPU doesn't get reset on a full
> > reboot, you're talking about a "warm reboot", and that if you actually
> > remove the power and do a cold reboot, there's no problem?
> 
> If you meant "unplugging the power adapter" when you said cutting the
> power we don't need to go that far, but shutting down the machine and
> restarting it by hand does avoid the problem yes.
> 
> > I assume Nvidia GPU being active means you are using the performance
> > GPU.  Does that mean the integrated GPU is completely unused and Linux
> > does nothing at all with it?  Is Linux doing any switching between
> > them?  If so, how?  I am not 100% confident in the code I've seen that
> > does the switching.
> 
> "Switching" isn't really the right word to describe it these days as the
> process for how this is handled has changed quit

[Nouveau] [Bug 109996] New: Bug

2019-03-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109996

Bug ID: 109996
   Summary: Bug
   Product: xorg
   Version: 7.6 (2010.12)
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Driver/nouveau
  Assignee: nouveau@lists.freedesktop.org
  Reporter: zohaibshehz...@gmail.com
QA Contact: xorg-t...@lists.x.org
CC: dejavu-b...@lists.freedesktop.org,
musharib@gmail.com, zohaibshehz...@gmail.com
Depends on: 109995

+++ This bug was initially created as a clone of Bug #109995 +++

Boaeen Rola ayy raja g thk krop g


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=109995
[Bug 109995] Bug
-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau