On Sun, May 29, 2016 at 05:50:06PM +0200, Lukas Wunner wrote: > Hi Peter, > > On Fri, May 27, 2016 at 03:07:33AM +0200, Peter Wu wrote: > > On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote: > > > nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0, > > > but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally. > > > We therefore leak a runtime pm ref whenever nouveau is loaded with > > > runpm=0 and then unloaded. The GPU will subsequently never runtime > > > suspend even if nouveau is loaded again with runpm=1. > > > > > > Fix by taking the runtime pm ref under the same condition that it was > > > released on driver load. > > > > > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)") > > > Cc: Dave Airlie <airlied at redhat.com> > > > Reported-by: Karol Herbst <nouveau at karolherbst.de> > > > Tested-by: Karol Herbst <nouveau at karolherbst.de> > > > Signed-off-by: Lukas Wunner <lukas at wunner.de> > > > > Looks good, I tested this scenario: > > > > ru(){ cat /sys/bus/pci/devices/0000\:01:00.0/power/runtime_usage;} > > ru # reports 1 > > modprobe nouveau runpm=0 > > ru # reports 2 > > rmmod nouveau > > ru # reports 1 > > > > Without runpm=0 the count drops to 0 in the second step and stays 0 in > > the third step. After applying patch 2/9, this correctly reports 1 as > > expected (this is the same as manually setting power/control to on). > > How exactly did you reach the situation where the root port didn't wake > up when you tried to load nouveau again? (IRC conversation this week.)
Ensure that the pci/pm patches are applied, then: 0. Unload nouveau (I have blacklisted it for testing). 1. Enable rpm for the root port and children (control = auto). 2. Verify in the kernel logs that the devices are sleeping: pcieport 0000:00:01.0: power state changed by ACPI to D3cold 3. (Optional, to rule out issues with delays:) Disable rpm for the Nvidia device (control = on). 4. modprobe nouveau. The above test with v4.6 + 4 pci/pm patches (8b71f565) gives: 50.245795 MXM: GUID detected in BIOS 50.245948 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492 50.246016 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 50.246044 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492 50.246110 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F 50.246256 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 50.246289 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F 50.246443 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 50.246457 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F 50.246932 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported 50.247005 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle 50.247084 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PG00._ON] at AML address ffffc9000001086e length 11D 50.390140 pcieport 0000:00:01.0: power state changed by ACPI to D0 50.491893 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D 50.492285 pcieport 0000:00:01.0: PME# disabled 50.492583 nouveau 0000:01:00.0: unknown chipset (ffffffff) 50.492687 nouveau: probe of 0000:01:00.0 failed with error -12 50.501990 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._S0W] at AML address ffffc90000010a8e length 2 50.502403 pcieport 0000:00:01.0: PME# enabled 50.502601 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D 50.513005 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PG00._OFF] at AML address ffffc90000010994 length 6D 50.533258 pcieport 0000:00:01.0: power state changed by ACPI to D3cold (Note that this patch is not included.) When nouveau is operating normally, I see that _PS0 is also called (which does not happen above). If you think that mixing power resources with DSM causes this issue, I also tried to apply my power resources work for nouveau but it gives the same problem: 20.183306 MXM: GUID detected in BIOS 20.183606 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 20.184158 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 20.184547 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 20.185152 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported 20.185351 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle 20.185384 nouveau: detected PR support, will not use DSM 20.185552 nouveau 0000:01:00.0: enabling device (0000 -> 0003) 20.185873 nouveau 0000:01:00.0: unknown chipset (ffffffff) 20.185946 nouveau: probe of 0000:01:00.0 failed with error -12 > What's happening is, the PCI core will keep unbound devices (i.e., > without driver) in D0 but the runtime status is allowed to change > to "suspended". So it'll appear to the kernel as if it was suspended > but in reality it stays in D0. > > Once runtime pm for PCIe ports gets merged, the root port above the > GPU will indeed go to D3 in such a situation because the check > pm_children_suspended() (called from rpm_check_suspend_allowed()) > returns true. > > I'm not sure if this is desirable or not. If we keep unbound devices > in D0, should we allow ports above them to go to D3? Maybe Rafael (linux-pm / linux-pci) can answer this question better? The comments in local_pci_probe, pci_pm_runtime_suspend and pci_pm_runtime_resume suggest that unbound devices are assumed in D0 which is apparently not the case when runtime PM is enabled. > In any case, when nouveau is loaded again, local_pci_probe() will > call pm_runtime_get_sync(), which will implicitly set the runtime > status to "active" and which should also wake parents. So how did > you ever reach a point where you loaded nouveau and the root port > stayed asleep? Clearly we have a bug there, question is where. > This shouldn't work only if pm_runtime_forbid() was called on > driver unload. > > Thanks for the extensive testing! > Lukas Both devices (root port and Nvidia) were resumed, but somehow the Nvidia card was not fully initialized/ready (as you can see in the above logs). Peter > > > > Peter > > > > > --- > > > drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c > > > b/drivers/gpu/drm/nouveau/nouveau_drm.c > > > index 11f8dd9..faf7438 100644 > > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > > > @@ -498,7 +498,10 @@ nouveau_drm_unload(struct drm_device *dev) > > > { > > > struct nouveau_drm *drm = nouveau_drm(dev); > > > > > > - pm_runtime_get_sync(dev->dev); > > > + if (nouveau_runtime_pm != 0) { > > > + pm_runtime_get_sync(dev->dev); > > > + } > > > + > > > nouveau_fbcon_fini(dev); > > > nouveau_accel_fini(drm); > > > nouveau_hwmon_fini(dev); > > > -- > > > 2.8.1 > > > > > > _______________________________________________ > > > Nouveau mailing list > > > Nouveau at lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/nouveau