Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-06-24 Thread Karol Herbst
Hi Bjorn,

I managed to trigger the issue from userspace without having to call
into any ACPI code. Wrote a little script[1] which essentially does
(and prints) this (I've added some comments):

PCI SCAN
# set PCIe link speed to 2.5. Bug doesn't trigger with 5.0 or 8.0
# NVA is a small lib we use to poke into MMIO registers of the GPU in
order to easier reverse engineer them.
NVA R 0x8c040 80005800
NVA W 0x8c040 80085800
NVA R 0x8c040 80085800
NVA W 0x8c040 80085801
NVA R 0x8c040 80085800
# put GPU into d3hot via the PCI config
PCI R :01:00.0 0x64 0008
PCI W :01:00.0 0x64 000b
PCI R :01:00.0 0x64 000b
# slot ctrl/status
PCI R :00:01.0 0xb8 0048
# link ctrl/status 2
PCI R :00:01.0 0xd0 001e0043
# bit 7 in 0x248 is called Q0L2 inside the ACPI code
# is set by the ACPI platform method to runtime suspend the GPU
# \_SB.PCI0.PEG0.PG00._OFF on my machine for the parent method
PCI W :00:01.0 0x248 0080
PCI R :00:01.0 0x248 
# now things are super messed up 
> ohh nvm. It was a mistake on my end. Sorry for the noise
>
> On Wed, Jun 19, 2019 at 2:07 PM Karol Herbst  wrote:
> >
> > Hi Bjorn,
> >
> > I was playing around with some older information again (write into the
> > PCI config to put the card into d3 state). And there is something
> > which made me very curious:
> > If I put the card manually into any other state besides D0 via the
> > 0x64 pci config register, the card just dies and pci core seems to
> > expect this to not happen. pci_raw_set_power_state has this
> > "pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);" call,
> > and reads the value back later, but if the card is already gone, maybe
> > we can't do this for nvidia GPUs?
> >
> > No idea why I didn't played around more with that register, but if the
> > card already dies there then this kind of shows there is indeed an
> > issue on a PCI level, no?
> >
> > On Mon, Jun 3, 2019 at 8:10 PM Bjorn Helgaas  wrote:
> > >
> > > On Mon, Jun 03, 2019 at 03:18:56PM +0200, Karol Herbst wrote:
> > > > @bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
> > > > this issue inside Nouveau and leave it there until we have a better
> > > > understanding or non Nouveau cases of this issue.
> > >
> > > Nope, I have no more ideas.
> > >
> > > > On Tue, May 21, 2019 at 7:48 PM Karol Herbst  wrote:
> > > > >
> > > > > doing the same on the bridge controller with my workarounds applied:
> > > > >
> > > > > please note some differences:
> > > > > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> > > > > SltSta: PresDet+ vs PresDet-
> > > > > LnkSta2: Equalization stuff
> > > > > Virtual channel: NegoPending- vs NegoPending+
> > > > >
> > > > > both times I executed lspci while the GPU was still suspended.
> > > > >
> > > > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > > > decode])
> > > > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > SERR-  > > > > Latency: 0
> > 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-06-19 Thread Karol Herbst
ohh nvm. It was a mistake on my end. Sorry for the noise

On Wed, Jun 19, 2019 at 2:07 PM Karol Herbst  wrote:
>
> Hi Bjorn,
>
> I was playing around with some older information again (write into the
> PCI config to put the card into d3 state). And there is something
> which made me very curious:
> If I put the card manually into any other state besides D0 via the
> 0x64 pci config register, the card just dies and pci core seems to
> expect this to not happen. pci_raw_set_power_state has this
> "pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);" call,
> and reads the value back later, but if the card is already gone, maybe
> we can't do this for nvidia GPUs?
>
> No idea why I didn't played around more with that register, but if the
> card already dies there then this kind of shows there is indeed an
> issue on a PCI level, no?
>
> On Mon, Jun 3, 2019 at 8:10 PM Bjorn Helgaas  wrote:
> >
> > On Mon, Jun 03, 2019 at 03:18:56PM +0200, Karol Herbst wrote:
> > > @bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
> > > this issue inside Nouveau and leave it there until we have a better
> > > understanding or non Nouveau cases of this issue.
> >
> > Nope, I have no more ideas.
> >
> > > On Tue, May 21, 2019 at 7:48 PM Karol Herbst  wrote:
> > > >
> > > > doing the same on the bridge controller with my workarounds applied:
> > > >
> > > > please note some differences:
> > > > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> > > > SltSta: PresDet+ vs PresDet-
> > > > LnkSta2: Equalization stuff
> > > > Virtual channel: NegoPending- vs NegoPending+
> > > >
> > > > both times I executed lspci while the GPU was still suspended.
> > > >
> > > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > > decode])
> > > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > SERR-  > > > Latency: 0
> > > > Interrupt: pin A routed to IRQ 16
> > > > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > > I/O behind bridge: e000-efff [size=4K]
> > > > Memory behind bridge: ec00-ed0f [size=17M]
> > > > Prefetchable memory behind bridge:
> > > > c000-d1ff [size=288M]
> > > > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > >  > > > BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- 
> > > > FastB2B-
> > > > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > > Capabilities: [88] Subsystem: Dell Device 07be
> > > > Capabilities: [80] Power Management version 3
> > > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > > Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > > > Address:   Data: 
> > > > Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > > > DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > > ExtTag- RBE+
> > > > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > > MaxPayload 256 bytes, MaxReadReq 128 bytes
> > > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > AuxPwr- TransPend-
> > > > LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > > Exit Latency L0s <256ns, L1 <8us
> > > > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > > > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > LnkSta: Speed 8GT/s (ok), Width x16 (ok)
> > > > TrErr- Train- SlotClk+ DLActive- BWMgmt+ 
> > > > ABWMgmt+
> > > > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > > HotPlug- Surprise-
> > > > Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > > > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> > > > CmdCplt- HPIrq- LinkChg-
> > > > Control: AttnInd Unknown, PwrInd Unknown,
> > > > Power- Interlock-
> > > > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > > PresDet+ Interlock-
> > > > Changed: MRL- PresDet+ LinkState-
> > > > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > > PMEIntEna- CRSVisible-
> > > > RootCap: CRSVisible-
> > > > RootSta: PME ReqID , PMEStatus- PMEPending-
> > > > DevCap2: 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-06-19 Thread Karol Herbst
Hi Bjorn,

I was playing around with some older information again (write into the
PCI config to put the card into d3 state). And there is something
which made me very curious:
If I put the card manually into any other state besides D0 via the
0x64 pci config register, the card just dies and pci core seems to
expect this to not happen. pci_raw_set_power_state has this
"pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);" call,
and reads the value back later, but if the card is already gone, maybe
we can't do this for nvidia GPUs?

No idea why I didn't played around more with that register, but if the
card already dies there then this kind of shows there is indeed an
issue on a PCI level, no?

On Mon, Jun 3, 2019 at 8:10 PM Bjorn Helgaas  wrote:
>
> On Mon, Jun 03, 2019 at 03:18:56PM +0200, Karol Herbst wrote:
> > @bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
> > this issue inside Nouveau and leave it there until we have a better
> > understanding or non Nouveau cases of this issue.
>
> Nope, I have no more ideas.
>
> > On Tue, May 21, 2019 at 7:48 PM Karol Herbst  wrote:
> > >
> > > doing the same on the bridge controller with my workarounds applied:
> > >
> > > please note some differences:
> > > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> > > SltSta: PresDet+ vs PresDet-
> > > LnkSta2: Equalization stuff
> > > Virtual channel: NegoPending- vs NegoPending+
> > >
> > > both times I executed lspci while the GPU was still suspended.
> > >
> > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > decode])
> > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > SERR-  > > Latency: 0
> > > Interrupt: pin A routed to IRQ 16
> > > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > I/O behind bridge: e000-efff [size=4K]
> > > Memory behind bridge: ec00-ed0f [size=17M]
> > > Prefetchable memory behind bridge:
> > > c000-d1ff [size=288M]
> > > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > >  > > BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- 
> > > FastB2B-
> > > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > Capabilities: [88] Subsystem: Dell Device 07be
> > > Capabilities: [80] Power Management version 3
> > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > > Address:   Data: 
> > > Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > > DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > ExtTag- RBE+
> > > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > MaxPayload 256 bytes, MaxReadReq 128 bytes
> > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > AuxPwr- TransPend-
> > > LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > Exit Latency L0s <256ns, L1 <8us
> > > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > LnkSta: Speed 8GT/s (ok), Width x16 (ok)
> > > TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > HotPlug- Surprise-
> > > Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> > > CmdCplt- HPIrq- LinkChg-
> > > Control: AttnInd Unknown, PwrInd Unknown,
> > > Power- Interlock-
> > > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > PresDet+ Interlock-
> > > Changed: MRL- PresDet+ LinkState-
> > > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > PMEIntEna- CRSVisible-
> > > RootCap: CRSVisible-
> > > RootSta: PME ReqID , PMEStatus- PMEPending-
> > > DevCap2: Completion Timeout: Not Supported,
> > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > >  AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > > DevCtl2: Completion Timeout: 50us to 50ms,
> > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > >  

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-06-03 Thread Bjorn Helgaas
On Mon, Jun 03, 2019 at 03:18:56PM +0200, Karol Herbst wrote:
> @bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
> this issue inside Nouveau and leave it there until we have a better
> understanding or non Nouveau cases of this issue.

Nope, I have no more ideas.

> On Tue, May 21, 2019 at 7:48 PM Karol Herbst  wrote:
> >
> > doing the same on the bridge controller with my workarounds applied:
> >
> > please note some differences:
> > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> > SltSta: PresDet+ vs PresDet-
> > LnkSta2: Equalization stuff
> > Virtual channel: NegoPending- vs NegoPending+
> >
> > both times I executed lspci while the GPU was still suspended.
> >
> > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > decode])
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > SERR-  > Latency: 0
> > Interrupt: pin A routed to IRQ 16
> > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > I/O behind bridge: e000-efff [size=4K]
> > Memory behind bridge: ec00-ed0f [size=17M]
> > Prefetchable memory behind bridge:
> > c000-d1ff [size=288M]
> > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >  > BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > Capabilities: [88] Subsystem: Dell Device 07be
> > Capabilities: [80] Power Management version 3
> > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > Address:   Data: 
> > Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > DevCap: MaxPayload 256 bytes, PhantFunc 0
> > ExtTag- RBE+
> > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > MaxPayload 256 bytes, MaxReadReq 128 bytes
> > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > AuxPwr- TransPend-
> > LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > Exit Latency L0s <256ns, L1 <8us
> > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > LnkSta: Speed 8GT/s (ok), Width x16 (ok)
> > TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > HotPlug- Surprise-
> > Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> > CmdCplt- HPIrq- LinkChg-
> > Control: AttnInd Unknown, PwrInd Unknown,
> > Power- Interlock-
> > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > PresDet+ Interlock-
> > Changed: MRL- PresDet+ LinkState-
> > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > PMEIntEna- CRSVisible-
> > RootCap: CRSVisible-
> > RootSta: PME ReqID , PMEStatus- PMEPending-
> > DevCap2: Completion Timeout: Not Supported,
> > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> >  AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > DevCtl2: Completion Timeout: 50us to 50ms,
> > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> >  AtomicOpsCtl: ReqEn- EgressBlck-
> > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- 
> > SpeedDis-
> >  Transmit Margin: Normal Operating Range,
> > EnterModifiedCompliance- ComplianceSOS-
> >  Compliance De-emphasis: -6dB
> > LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete+, EqualizationPhase1+
> >  EqualizationPhase2+, EqualizationPhase3+,
> > LinkEqualizationRequest-
> > Capabilities: [100 v1] Virtual Channel
> > Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > Arb:Fixed- WRR32- WRR64- WRR128-
> > Ctrl:   ArbSelect=Fixed
> > Status: InProgress-
> > VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > Arb:Fixed+ WRR32- WRR64- WRR128- TWRR128- 
> > WRR256-
> >   

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-06-03 Thread Karol Herbst
@bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
this issue inside Nouveau and leave it there until we have a better
understanding or non Nouveau cases of this issue.

On Tue, May 21, 2019 at 7:48 PM Karol Herbst  wrote:
>
> doing the same on the bridge controller with my workarounds applied:
>
> please note some differences:
> LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> SltSta: PresDet+ vs PresDet-
> LnkSta2: Equalization stuff
> Virtual channel: NegoPending- vs NegoPending+
>
> both times I executed lspci while the GPU was still suspended.
>
> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> decode])
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> SERR-  Latency: 0
> Interrupt: pin A routed to IRQ 16
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> I/O behind bridge: e000-efff [size=4K]
> Memory behind bridge: ec00-ed0f [size=17M]
> Prefetchable memory behind bridge:
> c000-d1ff [size=288M]
> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
>  BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> Capabilities: [88] Subsystem: Dell Device 07be
> Capabilities: [80] Power Management version 3
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> Address:   Data: 
> Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> DevCap: MaxPayload 256 bytes, PhantFunc 0
> ExtTag- RBE+
> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> MaxPayload 256 bytes, MaxReadReq 128 bytes
> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> AuxPwr- TransPend-
> LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> Exit Latency L0s <256ns, L1 <8us
> ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 8GT/s (ok), Width x16 (ok)
> TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> HotPlug- Surprise-
> Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> CmdCplt- HPIrq- LinkChg-
> Control: AttnInd Unknown, PwrInd Unknown,
> Power- Interlock-
> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> PresDet+ Interlock-
> Changed: MRL- PresDet+ LinkState-
> RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> PMEIntEna- CRSVisible-
> RootCap: CRSVisible-
> RootSta: PME ReqID , PMEStatus- PMEPending-
> DevCap2: Completion Timeout: Not Supported,
> TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
>  AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> DevCtl2: Completion Timeout: 50us to 50ms,
> TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
>  AtomicOpsCtl: ReqEn- EgressBlck-
> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>  Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
>  Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete+, EqualizationPhase1+
>  EqualizationPhase2+, EqualizationPhase3+,
> LinkEqualizationRequest-
> Capabilities: [100 v1] Virtual Channel
> Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> Arb:Fixed- WRR32- WRR64- WRR128-
> Ctrl:   ArbSelect=Fixed
> Status: InProgress-
> VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> Arb:Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> Status: NegoPending- InProgress-
> Capabilities: [140 v1] Root Complex Link
> Desc:   PortNumber=02 ComponentID=01 EltType=Config
> Link0:  Desc:   

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-21 Thread Karol Herbst
doing the same on the bridge controller with my workarounds applied:

please note some differences:
LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
SltSta: PresDet+ vs PresDet-
LnkSta2: Equalization stuff
Virtual channel: NegoPending- vs NegoPending+

both times I executed lspci while the GPU was still suspended.

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- TAbort-
Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [88] Subsystem: Dell Device 07be
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Address:   Data: 
Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr- TransPend-
LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
Exit Latency L0s <256ns, L1 <8us
ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet+ Interlock-
Changed: MRL- PresDet+ LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID , PMEStatus- PMEPending-
DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
 AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
 AtomicOpsCtl: ReqEn- EgressBlck-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
 Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete+, EqualizationPhase1+
 EqualizationPhase2+, EqualizationPhase3+,
LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
Arb:Fixed- WRR32- WRR64- WRR128-
Ctrl:   ArbSelect=Fixed
Status: InProgress-
VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb:Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [140 v1] Root Complex Link
Desc:   PortNumber=02 ComponentID=01 EltType=Config
Link0:  Desc:   TargetPort=00 TargetComponent=01
AssocRCRB- LinkType=MemMapped LinkValid+
Addr:   fed19000
Capabilities: [d94 v1] Secondary PCI Express 
Kernel driver in use: pcieport
00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
c0: 00 00 00 00 80 0b 08 00 00 64 00 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-21 Thread Karol Herbst
was able to get the lspci prints via ssh. Machine rebooted
automatically each time though.

relevant dmesg:
kernel: nouveau :01:00.0: Refused to change power state, currently in D3
kernel: nouveau :01:00.0: Refused to change power state, currently in D3
kernel: nouveau :01:00.0: Refused to change power state, currently in D3
kernel: nouveau :01:00.0: tmr: stalled at 

(last one is a 64 bit mmio read to get the on GPU timer value)

# lspci -vvxxx -s 0:01.00
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
decode])
   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- TAbort-
Reset- FastB2B-
   PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
   Capabilities: [88] Subsystem: Dell Device 07be
   Capabilities: [80] Power Management version 3
   Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
   Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
   Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
   Address:   Data: 
   Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
   DevCap: MaxPayload 256 bytes, PhantFunc 0
   ExtTag- RBE+
   DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
   RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
   MaxPayload 256 bytes, MaxReadReq 128 bytes
   DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr- TransPend-
   LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
Exit Latency L0s <256ns, L1 <8us
   ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
   ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
   LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
   TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
   SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
   Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
   SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
HPIrq- LinkChg-
   Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
   SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet- Interlock-
   Changed: MRL- PresDet+ LinkState-
   RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
   RootCap: CRSVisible-
   RootSta: PME ReqID , PMEStatus- PMEPending-
   DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
   DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
LTR+, OBFF Via WAKE# ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
   LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
   LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
   Capabilities: [100 v1] Virtual Channel
   Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
   Arb:Fixed- WRR32- WRR64- WRR128-
   Ctrl:   ArbSelect=Fixed
   Status: InProgress-
   VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
   Arb:Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
   Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
   Status: NegoPending+ InProgress-
   Capabilities: [140 v1] Root Complex Link
   Desc:   PortNumber=02 ComponentID=01 EltType=Config
   Link0:  Desc:   TargetPort=00 TargetComponent=01
AssocRCRB- LinkType=MemMapped LinkValid+
   Addr:   fed19000
   Capabilities: [d94 v1] Secondary PCI Express 
   Kernel driver in use: pcieport
00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 10 00 42 01 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-21 Thread Karol Herbst
On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas  wrote:
>
> On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas  wrote:
> > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas  
> > > > wrote:
> > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > Apperantly things go south if we suspend the device with a 
> > > > > > different PCIE
> > > > > > link speed set than it got booted with. Fixes runtime suspend on my 
> > > > > > gp107.
> > > > > >
> > > > > > This all looks like some bug inside the pci subsystem and I would 
> > > > > > prefer a
> > > > > > fix there instead of nouveau, but maybe there is no real nice way 
> > > > > > of doing
> > > > > > that outside of drivers?
> > > > >
> > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > feasible.
> > > > >
> > > > > It looks like this driver changes the PCIe link speed using some
> > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > comes up at the boot speed because nothing did that device-specific
> > > > > magic to change it, so you probably end up with the link being slow
> > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > combination doesn't work.
> > > > >
> > > > > If it requires something device-specific to change that link speed, I
> > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > something?
> > > > >
> > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > >
> > > > >   Initialization – During hardware initialization, each PCI Express
> > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > >   or operating system software is involved.
> > > > >
> > > > > I have been assuming that this means device-specific link speed
> > > > > management is out of spec, but it seems pretty common that devices
> > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > speed.
> > > >
> > > > I would expect that devices kind of have to figure out what they can
> > > > operate on and the operating system kind of just checks what the
> > > > current state is and doesn't try to "restore" the old state or
> > > > something?
> > >
> > > The devices at each end of the link negotiate the width and speed of
> > > the link.  This is done directly by the hardware without any help from
> > > the OS.
> > >
> > > The OS can read the current link state (Current Link Speed and
> > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > very little control over that state.  It can't directly restore the
> > > state because the hardware has to negotiate a width & speed that
> > > result in reliable operation.
> > >
> > > > We don't do anything in the driver after the device was suspended. And
> > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > essentially. I have no idea how much of this is part of the actual pci
> > > > standard and how much is driver specific. But the driver also wants to
> > > > have some control over the link speed as it's tight to performance
> > > > states on GPU.
> > >
> > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > influence the link width or speed.  If the GPU driver needs to do
> > > that, it would be via some device-specific mechanism.
> > >
> > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > the communication with the controller is broken. But if we set it to
> > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > that _something_ (might it be the controller or the pci subsystem)
> > > > tries to force to operate on an invalid link speed and because the
> > > > bridge controller is actually powered down as well (as all children
> > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > actually restores the state which lets the controller fail to
> > > > establish communication again?
> > >
> > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > >  without OS/driver intervention.
> > >
> > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > >  requires driver intervention or at least some ACPI method.
> >
> > there is no driver intervention and Nouveau doesn't care at all. It's
> > all done on the GPU. We just upload a script and some firmware on to
> > the GPU. The script runs then on the PMU inside the GPU and this

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-21 Thread Bjorn Helgaas
On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas  wrote:
> > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas  wrote:
> > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > Apperantly things go south if we suspend the device with a different 
> > > > > PCIE
> > > > > link speed set than it got booted with. Fixes runtime suspend on my 
> > > > > gp107.
> > > > >
> > > > > This all looks like some bug inside the pci subsystem and I would 
> > > > > prefer a
> > > > > fix there instead of nouveau, but maybe there is no real nice way of 
> > > > > doing
> > > > > that outside of drivers?
> > > >
> > > > I agree it would be nice to fix this in the PCI core if that's
> > > > feasible.
> > > >
> > > > It looks like this driver changes the PCIe link speed using some
> > > > device-specific mechanism.  When we suspend, we put the device in
> > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > comes up at the boot speed because nothing did that device-specific
> > > > magic to change it, so you probably end up with the link being slow
> > > > but the driver thinking it's configured to be fast, and maybe that
> > > > combination doesn't work.
> > > >
> > > > If it requires something device-specific to change that link speed, I
> > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > something?
> > > >
> > > > Per the PCIe spec (r4.0, sec 1.2):
> > > >
> > > >   Initialization – During hardware initialization, each PCI Express
> > > >   Link is set up following a negotiation of Lane widths and frequency
> > > >   of operation by the two agents at each end of the Link. No firmware
> > > >   or operating system software is involved.
> > > >
> > > > I have been assuming that this means device-specific link speed
> > > > management is out of spec, but it seems pretty common that devices
> > > > don't come up by themselves at the fastest possible link speed.  So
> > > > maybe the spec just intends that devices can operate at *some* valid
> > > > speed.
> > >
> > > I would expect that devices kind of have to figure out what they can
> > > operate on and the operating system kind of just checks what the
> > > current state is and doesn't try to "restore" the old state or
> > > something?
> >
> > The devices at each end of the link negotiate the width and speed of
> > the link.  This is done directly by the hardware without any help from
> > the OS.
> >
> > The OS can read the current link state (Current Link Speed and
> > Negotiated Link Width, both in the Link Status register).  The OS has
> > very little control over that state.  It can't directly restore the
> > state because the hardware has to negotiate a width & speed that
> > result in reliable operation.
> >
> > > We don't do anything in the driver after the device was suspended. And
> > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > essentially. I have no idea how much of this is part of the actual pci
> > > standard and how much is driver specific. But the driver also wants to
> > > have some control over the link speed as it's tight to performance
> > > states on GPU.
> >
> > As far as I'm aware, there is no generic PCIe way for the OS to
> > influence the link width or speed.  If the GPU driver needs to do
> > that, it would be via some device-specific mechanism.
> >
> > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > the communication with the controller is broken. But if we set it to
> > > the boot speed, resuming the GPU just works. So my assumption was,
> > > that _something_ (might it be the controller or the pci subsystem)
> > > tries to force to operate on an invalid link speed and because the
> > > bridge controller is actually powered down as well (as all children
> > > are in D3cold) I could imagine that something in the pci subsystem
> > > actually restores the state which lets the controller fail to
> > > establish communication again?
> >
> >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> >  without OS/driver intervention.
> >
> >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> >  requires driver intervention or at least some ACPI method.
> 
> there is no driver intervention and Nouveau doesn't care at all. It's
> all done on the GPU. We just upload a script and some firmware on to
> the GPU. The script runs then on the PMU inside the GPU and this
> script also causes changing the PCIe link settings. But from a Nouveau
> point of view we don't care about the link before or after that script
> was invoked. Also there is no ACPI method involved.
> 
> But if there is something we should 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-21 Thread Karol Herbst
On Tue, May 21, 2019 at 3:51 PM Ilia Mirkin  wrote:
>
> On Tue, May 21, 2019 at 9:29 AM Karol Herbst  wrote:
> >
> > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas  wrote:
> > >
> > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas  
> > > > wrote:
> > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > Apperantly things go south if we suspend the device with a 
> > > > > > different PCIE
> > > > > > link speed set than it got booted with. Fixes runtime suspend on my 
> > > > > > gp107.
> > > > > >
> > > > > > This all looks like some bug inside the pci subsystem and I would 
> > > > > > prefer a
> > > > > > fix there instead of nouveau, but maybe there is no real nice way 
> > > > > > of doing
> > > > > > that outside of drivers?
> > > > >
> > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > feasible.
> > > > >
> > > > > It looks like this driver changes the PCIe link speed using some
> > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > comes up at the boot speed because nothing did that device-specific
> > > > > magic to change it, so you probably end up with the link being slow
> > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > combination doesn't work.
> > > > >
> > > > > If it requires something device-specific to change that link speed, I
> > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > something?
> > > > >
> > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > >
> > > > >   Initialization – During hardware initialization, each PCI Express
> > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > >   or operating system software is involved.
> > > > >
> > > > > I have been assuming that this means device-specific link speed
> > > > > management is out of spec, but it seems pretty common that devices
> > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > speed.
> > > >
> > > > I would expect that devices kind of have to figure out what they can
> > > > operate on and the operating system kind of just checks what the
> > > > current state is and doesn't try to "restore" the old state or
> > > > something?
> > >
> > > The devices at each end of the link negotiate the width and speed of
> > > the link.  This is done directly by the hardware without any help from
> > > the OS.
> > >
> > > The OS can read the current link state (Current Link Speed and
> > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > very little control over that state.  It can't directly restore the
> > > state because the hardware has to negotiate a width & speed that
> > > result in reliable operation.
> > >
> > > > We don't do anything in the driver after the device was suspended. And
> > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > essentially. I have no idea how much of this is part of the actual pci
> > > > standard and how much is driver specific. But the driver also wants to
> > > > have some control over the link speed as it's tight to performance
> > > > states on GPU.
> > >
> > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > influence the link width or speed.  If the GPU driver needs to do
> > > that, it would be via some device-specific mechanism.
> > >
> > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > the communication with the controller is broken. But if we set it to
> > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > that _something_ (might it be the controller or the pci subsystem)
> > > > tries to force to operate on an invalid link speed and because the
> > > > bridge controller is actually powered down as well (as all children
> > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > actually restores the state which lets the controller fail to
> > > > establish communication again?
> > >
> > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > >  without OS/driver intervention.
> > >
> > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > >  requires driver intervention or at least some ACPI method.
> > >
> >
> > there is no driver intervention and Nouveau doesn't care at all. It's
> > all done on the GPU. We just upload a script and some firmware on to
> > the GPU. The script runs then on the PMU inside the GPU and 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-21 Thread Ilia Mirkin
On Tue, May 21, 2019 at 9:29 AM Karol Herbst  wrote:
>
> On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas  wrote:
> >
> > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas  wrote:
> > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > Apperantly things go south if we suspend the device with a different 
> > > > > PCIE
> > > > > link speed set than it got booted with. Fixes runtime suspend on my 
> > > > > gp107.
> > > > >
> > > > > This all looks like some bug inside the pci subsystem and I would 
> > > > > prefer a
> > > > > fix there instead of nouveau, but maybe there is no real nice way of 
> > > > > doing
> > > > > that outside of drivers?
> > > >
> > > > I agree it would be nice to fix this in the PCI core if that's
> > > > feasible.
> > > >
> > > > It looks like this driver changes the PCIe link speed using some
> > > > device-specific mechanism.  When we suspend, we put the device in
> > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > comes up at the boot speed because nothing did that device-specific
> > > > magic to change it, so you probably end up with the link being slow
> > > > but the driver thinking it's configured to be fast, and maybe that
> > > > combination doesn't work.
> > > >
> > > > If it requires something device-specific to change that link speed, I
> > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > something?
> > > >
> > > > Per the PCIe spec (r4.0, sec 1.2):
> > > >
> > > >   Initialization – During hardware initialization, each PCI Express
> > > >   Link is set up following a negotiation of Lane widths and frequency
> > > >   of operation by the two agents at each end of the Link. No firmware
> > > >   or operating system software is involved.
> > > >
> > > > I have been assuming that this means device-specific link speed
> > > > management is out of spec, but it seems pretty common that devices
> > > > don't come up by themselves at the fastest possible link speed.  So
> > > > maybe the spec just intends that devices can operate at *some* valid
> > > > speed.
> > >
> > > I would expect that devices kind of have to figure out what they can
> > > operate on and the operating system kind of just checks what the
> > > current state is and doesn't try to "restore" the old state or
> > > something?
> >
> > The devices at each end of the link negotiate the width and speed of
> > the link.  This is done directly by the hardware without any help from
> > the OS.
> >
> > The OS can read the current link state (Current Link Speed and
> > Negotiated Link Width, both in the Link Status register).  The OS has
> > very little control over that state.  It can't directly restore the
> > state because the hardware has to negotiate a width & speed that
> > result in reliable operation.
> >
> > > We don't do anything in the driver after the device was suspended. And
> > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > essentially. I have no idea how much of this is part of the actual pci
> > > standard and how much is driver specific. But the driver also wants to
> > > have some control over the link speed as it's tight to performance
> > > states on GPU.
> >
> > As far as I'm aware, there is no generic PCIe way for the OS to
> > influence the link width or speed.  If the GPU driver needs to do
> > that, it would be via some device-specific mechanism.
> >
> > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > the communication with the controller is broken. But if we set it to
> > > the boot speed, resuming the GPU just works. So my assumption was,
> > > that _something_ (might it be the controller or the pci subsystem)
> > > tries to force to operate on an invalid link speed and because the
> > > bridge controller is actually powered down as well (as all children
> > > are in D3cold) I could imagine that something in the pci subsystem
> > > actually restores the state which lets the controller fail to
> > > establish communication again?
> >
> >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> >  without OS/driver intervention.
> >
> >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> >  requires driver intervention or at least some ACPI method.
> >
>
> there is no driver intervention and Nouveau doesn't care at all. It's
> all done on the GPU. We just upload a script and some firmware on to
> the GPU. The script runs then on the PMU inside the GPU and this
> script also causes changing the PCIe link settings. But from a Nouveau
> point of view we don't care about the link before or after that script
> was invoked. Also there is no ACPI method involved.
>
> But if there is something we should 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-21 Thread Karol Herbst
On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas  wrote:
>
> On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas  wrote:
> > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > Apperantly things go south if we suspend the device with a different 
> > > > PCIE
> > > > link speed set than it got booted with. Fixes runtime suspend on my 
> > > > gp107.
> > > >
> > > > This all looks like some bug inside the pci subsystem and I would 
> > > > prefer a
> > > > fix there instead of nouveau, but maybe there is no real nice way of 
> > > > doing
> > > > that outside of drivers?
> > >
> > > I agree it would be nice to fix this in the PCI core if that's
> > > feasible.
> > >
> > > It looks like this driver changes the PCIe link speed using some
> > > device-specific mechanism.  When we suspend, we put the device in
> > > D3cold, so it loses all its state.  When we resume, the link probably
> > > comes up at the boot speed because nothing did that device-specific
> > > magic to change it, so you probably end up with the link being slow
> > > but the driver thinking it's configured to be fast, and maybe that
> > > combination doesn't work.
> > >
> > > If it requires something device-specific to change that link speed, I
> > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > something?
> > >
> > > Per the PCIe spec (r4.0, sec 1.2):
> > >
> > >   Initialization – During hardware initialization, each PCI Express
> > >   Link is set up following a negotiation of Lane widths and frequency
> > >   of operation by the two agents at each end of the Link. No firmware
> > >   or operating system software is involved.
> > >
> > > I have been assuming that this means device-specific link speed
> > > management is out of spec, but it seems pretty common that devices
> > > don't come up by themselves at the fastest possible link speed.  So
> > > maybe the spec just intends that devices can operate at *some* valid
> > > speed.
> >
> > I would expect that devices kind of have to figure out what they can
> > operate on and the operating system kind of just checks what the
> > current state is and doesn't try to "restore" the old state or
> > something?
>
> The devices at each end of the link negotiate the width and speed of
> the link.  This is done directly by the hardware without any help from
> the OS.
>
> The OS can read the current link state (Current Link Speed and
> Negotiated Link Width, both in the Link Status register).  The OS has
> very little control over that state.  It can't directly restore the
> state because the hardware has to negotiate a width & speed that
> result in reliable operation.
>
> > We don't do anything in the driver after the device was suspended. And
> > the 0x88000 is a mirror of the PCI config space, but we also got some
> > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > essentially. I have no idea how much of this is part of the actual pci
> > standard and how much is driver specific. But the driver also wants to
> > have some control over the link speed as it's tight to performance
> > states on GPU.
>
> As far as I'm aware, there is no generic PCIe way for the OS to
> influence the link width or speed.  If the GPU driver needs to do
> that, it would be via some device-specific mechanism.
>
> > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > the communication with the controller is broken. But if we set it to
> > the boot speed, resuming the GPU just works. So my assumption was,
> > that _something_ (might it be the controller or the pci subsystem)
> > tries to force to operate on an invalid link speed and because the
> > bridge controller is actually powered down as well (as all children
> > are in D3cold) I could imagine that something in the pci subsystem
> > actually restores the state which lets the controller fail to
> > establish communication again?
>
>   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
>  without OS/driver intervention.
>
>   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
>  requires driver intervention or at least some ACPI method.
>

there is no driver intervention and Nouveau doesn't care at all. It's
all done on the GPU. We just upload a script and some firmware on to
the GPU. The script runs then on the PMU inside the GPU and this
script also causes changing the PCIe link settings. But from a Nouveau
point of view we don't care about the link before or after that script
was invoked. Also there is no ACPI method involved.

But if there is something we should notify pci core about, maybe
that's something we have to do then?

>   3) Suspend puts GPU into D3cold (powered off).
>
>   4) Resume restores GPU to D0, and the Port and GPU hardware again
>  negotiate 8.0 GT/s without OS/driver intervention, just 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-21 Thread Bjorn Helgaas
On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas  wrote:
> > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > Apperantly things go south if we suspend the device with a different PCIE
> > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > >
> > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > that outside of drivers?
> >
> > I agree it would be nice to fix this in the PCI core if that's
> > feasible.
> >
> > It looks like this driver changes the PCIe link speed using some
> > device-specific mechanism.  When we suspend, we put the device in
> > D3cold, so it loses all its state.  When we resume, the link probably
> > comes up at the boot speed because nothing did that device-specific
> > magic to change it, so you probably end up with the link being slow
> > but the driver thinking it's configured to be fast, and maybe that
> > combination doesn't work.
> >
> > If it requires something device-specific to change that link speed, I
> > don't know how to put that in the PCI core.  But maybe I'm missing
> > something?
> >
> > Per the PCIe spec (r4.0, sec 1.2):
> >
> >   Initialization – During hardware initialization, each PCI Express
> >   Link is set up following a negotiation of Lane widths and frequency
> >   of operation by the two agents at each end of the Link. No firmware
> >   or operating system software is involved.
> >
> > I have been assuming that this means device-specific link speed
> > management is out of spec, but it seems pretty common that devices
> > don't come up by themselves at the fastest possible link speed.  So
> > maybe the spec just intends that devices can operate at *some* valid
> > speed.
> 
> I would expect that devices kind of have to figure out what they can
> operate on and the operating system kind of just checks what the
> current state is and doesn't try to "restore" the old state or
> something?

The devices at each end of the link negotiate the width and speed of
the link.  This is done directly by the hardware without any help from
the OS.

The OS can read the current link state (Current Link Speed and
Negotiated Link Width, both in the Link Status register).  The OS has
very little control over that state.  It can't directly restore the
state because the hardware has to negotiate a width & speed that
result in reliable operation.

> We don't do anything in the driver after the device was suspended. And
> the 0x88000 is a mirror of the PCI config space, but we also got some
> PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> essentially. I have no idea how much of this is part of the actual pci
> standard and how much is driver specific. But the driver also wants to
> have some control over the link speed as it's tight to performance
> states on GPU.

As far as I'm aware, there is no generic PCIe way for the OS to
influence the link width or speed.  If the GPU driver needs to do
that, it would be via some device-specific mechanism.

> The big issue here is just, that the GPU boots with 8.0, some on-gpu
> init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> the communication with the controller is broken. But if we set it to
> the boot speed, resuming the GPU just works. So my assumption was,
> that _something_ (might it be the controller or the pci subsystem)
> tries to force to operate on an invalid link speed and because the
> bridge controller is actually powered down as well (as all children
> are in D3cold) I could imagine that something in the pci subsystem
> actually restores the state which lets the controller fail to
> establish communication again?

  1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
 without OS/driver intervention.

  2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
 requires driver intervention or at least some ACPI method.

  3) Suspend puts GPU into D3cold (powered off).

  4) Resume restores GPU to D0, and the Port and GPU hardware again
 negotiate 8.0 GT/s without OS/driver intervention, just like at
 initial boot.

  5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
 8.0 GT/s.

Without knowing more about the transition to 2.5 GT/s, I can't guess
why the GPU wouldn't work after resume.  From a PCIe point of view,
the link is supposed to work and the device should be reachable
independent of the link speed.  But maybe there's some weird
dependency between the GPU and the driver here.

It sounds like things work if you return to 8.0 GT/s before suspend,
things work.  That would make sense to me because then the driver's
idea of the link state after resume would match the actual state.

But I don't see a way to deal with this in the PCI core.  The PCI core
does save and restore most of the architected config 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-20 Thread Karol Herbst
On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas  wrote:
>
> On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > Apperantly things go south if we suspend the device with a different PCIE
> > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> >
> > This all looks like some bug inside the pci subsystem and I would prefer a
> > fix there instead of nouveau, but maybe there is no real nice way of doing
> > that outside of drivers?
>
> I agree it would be nice to fix this in the PCI core if that's
> feasible.
>
> It looks like this driver changes the PCIe link speed using some
> device-specific mechanism.  When we suspend, we put the device in
> D3cold, so it loses all its state.  When we resume, the link probably
> comes up at the boot speed because nothing did that device-specific
> magic to change it, so you probably end up with the link being slow
> but the driver thinking it's configured to be fast, and maybe that
> combination doesn't work.
>
> If it requires something device-specific to change that link speed, I
> don't know how to put that in the PCI core.  But maybe I'm missing
> something?
>
> Per the PCIe spec (r4.0, sec 1.2):
>
>   Initialization – During hardware initialization, each PCI Express
>   Link is set up following a negotiation of Lane widths and frequency
>   of operation by the two agents at each end of the Link. No firmware
>   or operating system software is involved.
>
> I have been assuming that this means device-specific link speed
> management is out of spec, but it seems pretty common that devices
> don't come up by themselves at the fastest possible link speed.  So
> maybe the spec just intends that devices can operate at *some* valid
> speed.
>

I would expect that devices kind of have to figure out what they can
operate on and the operating system kind of just checks what the
current state is and doesn't try to "restore" the old state or
something?

We don't do anything in the driver after the device was suspended. And
the 0x88000 is a mirror of the PCI config space, but we also got some
PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
essentially. I have no idea how much of this is part of the actual pci
standard and how much is driver specific. But the driver also wants to
have some control over the link speed as it's tight to performance
states on GPU.

The big issue here is just, that the GPU boots with 8.0, some on-gpu
init mechanism decreases it to 2.5. If we suspend, the GPU or at least
the communication with the controller is broken. But if we set it to
the boot speed, resuming the GPU just works. So my assumption was,
that _something_ (might it be the controller or the pci subsystem)
tries to force to operate on an invalid link speed and because the
bridge controller is actually powered down as well (as all children
are in D3cold) I could imagine that something in the pci subsystem
actually restores the state which lets the controller fail to
establish communication again?

Just something which kind of would make sense to me.

> > v2: squashed together patch 4 and 5
> >
> > Signed-off-by: Karol Herbst 
> > Reviewed-by: Lyude Paul 
> > ---
> >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> >  drm/nouveau/nvkm/subdev/pci/base.c|  9 +++--
> >  drm/nouveau/nvkm/subdev/pci/pcie.c| 24 
> >  drm/nouveau/nvkm/subdev/pci/priv.h|  2 ++
> >  4 files changed, 32 insertions(+), 8 deletions(-)
> >
> > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h 
> > b/drm/nouveau/include/nvkm/subdev/pci.h
> > index 1fdf3098..b23793a2 100644
> > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > @@ -26,8 +26,9 @@ struct nvkm_pci {
> >   } agp;
> >
> >   struct {
> > - enum nvkm_pcie_speed speed;
> > - u8 width;
> > + enum nvkm_pcie_speed cur_speed;
> > + enum nvkm_pcie_speed def_speed;
> > + u8 cur_width;
> >   } pcie;
> >
> >   bool msi;
> > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c 
> > b/drm/nouveau/nvkm/subdev/pci/base.c
> > index ee2431a7..d9fb5a83 100644
> > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> >
> >   if (pci->agp.bridge)
> >   nvkm_agp_fini(pci);
> > + else if (pci_is_pcie(pci->pdev))
> > + nvkm_pcie_fini(pci);
> >
> >   return 0;
> >  }
> > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> >   struct nvkm_pci *pci = nvkm_pci(subdev);
> >   if (pci->agp.bridge)
> >   nvkm_agp_preinit(pci);
> > + else if (pci_is_pcie(pci->pdev))
> > + nvkm_pcie_preinit(pci);
> >   return 0;
> >  }
> >
> > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct 
> > nvkm_device *device,
> >   pci->func = func;
> >   pci->pdev = 

Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

2019-05-20 Thread Bjorn Helgaas
On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> Apperantly things go south if we suspend the device with a different PCIE
> link speed set than it got booted with. Fixes runtime suspend on my gp107.
> 
> This all looks like some bug inside the pci subsystem and I would prefer a
> fix there instead of nouveau, but maybe there is no real nice way of doing
> that outside of drivers?

I agree it would be nice to fix this in the PCI core if that's
feasible.

It looks like this driver changes the PCIe link speed using some
device-specific mechanism.  When we suspend, we put the device in
D3cold, so it loses all its state.  When we resume, the link probably
comes up at the boot speed because nothing did that device-specific
magic to change it, so you probably end up with the link being slow
but the driver thinking it's configured to be fast, and maybe that
combination doesn't work.

If it requires something device-specific to change that link speed, I
don't know how to put that in the PCI core.  But maybe I'm missing
something?

Per the PCIe spec (r4.0, sec 1.2):

  Initialization – During hardware initialization, each PCI Express
  Link is set up following a negotiation of Lane widths and frequency
  of operation by the two agents at each end of the Link. No firmware
  or operating system software is involved.

I have been assuming that this means device-specific link speed
management is out of spec, but it seems pretty common that devices
don't come up by themselves at the fastest possible link speed.  So
maybe the spec just intends that devices can operate at *some* valid
speed.

> v2: squashed together patch 4 and 5
> 
> Signed-off-by: Karol Herbst 
> Reviewed-by: Lyude Paul 
> ---
>  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
>  drm/nouveau/nvkm/subdev/pci/base.c|  9 +++--
>  drm/nouveau/nvkm/subdev/pci/pcie.c| 24 
>  drm/nouveau/nvkm/subdev/pci/priv.h|  2 ++
>  4 files changed, 32 insertions(+), 8 deletions(-)
> 
> diff --git a/drm/nouveau/include/nvkm/subdev/pci.h 
> b/drm/nouveau/include/nvkm/subdev/pci.h
> index 1fdf3098..b23793a2 100644
> --- a/drm/nouveau/include/nvkm/subdev/pci.h
> +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> @@ -26,8 +26,9 @@ struct nvkm_pci {
>   } agp;
>  
>   struct {
> - enum nvkm_pcie_speed speed;
> - u8 width;
> + enum nvkm_pcie_speed cur_speed;
> + enum nvkm_pcie_speed def_speed;
> + u8 cur_width;
>   } pcie;
>  
>   bool msi;
> diff --git a/drm/nouveau/nvkm/subdev/pci/base.c 
> b/drm/nouveau/nvkm/subdev/pci/base.c
> index ee2431a7..d9fb5a83 100644
> --- a/drm/nouveau/nvkm/subdev/pci/base.c
> +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
>  
>   if (pci->agp.bridge)
>   nvkm_agp_fini(pci);
> + else if (pci_is_pcie(pci->pdev))
> + nvkm_pcie_fini(pci);
>  
>   return 0;
>  }
> @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
>   struct nvkm_pci *pci = nvkm_pci(subdev);
>   if (pci->agp.bridge)
>   nvkm_agp_preinit(pci);
> + else if (pci_is_pcie(pci->pdev))
> + nvkm_pcie_preinit(pci);
>   return 0;
>  }
>  
> @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct 
> nvkm_device *device,
>   pci->func = func;
>   pci->pdev = device->func->pci(device)->pdev;
>   pci->irq = -1;
> - pci->pcie.speed = -1;
> - pci->pcie.width = -1;
> + pci->pcie.cur_speed = -1;
> + pci->pcie.def_speed = -1;
> + pci->pcie.cur_width = -1;
>  
>   if (device->type == NVKM_DEVICE_AGP)
>   nvkm_agp_ctor(pci);
> diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c 
> b/drm/nouveau/nvkm/subdev/pci/pcie.c
> index 70ccbe0d..731dd30e 100644
> --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
>   return 0;
>  }
>  
> +int
> +nvkm_pcie_preinit(struct nvkm_pci *pci)
> +{
> + pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> + return 0;
> +}
> +
>  int
>  nvkm_pcie_init(struct nvkm_pci *pci)
>  {
> @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
>   if (pci->func->pcie.init)
>   pci->func->pcie.init(pci);
>  
> - if (pci->pcie.speed != -1)
> - nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> + if (pci->pcie.cur_speed != -1)
> + nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> +pci->pcie.cur_width);
>  
>   return 0;
>  }
>  
> +int
> +nvkm_pcie_fini(struct nvkm_pci *pci)
> +{
> + if (!IS_ERR_VALUE(pci->pcie.def_speed))
> + return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> + return 0;
> +}
> +
>  int
>  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 
> width)
>  {
> @@ -146,8 +162,8 @@