On Fri, 2014-02-07 at 01:22 +0100, Maik Broemme wrote: > Interesting is the diff between 1st and 2nd boot, so if I do the lspci > prior to the booting. The only difference between 1st start and 2nd > start are: > > --- 001-lspci.290x.before.1st.log 2014-02-07 01:13:41.498827928 +0100 > +++ 004-lspci.290x.before.2nd.log 2014-02-07 01:16:50.966611282 +0100 > @@ -24,7 +24,7 @@ > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > - LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > + LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, > OBFF Not Supported > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, > OBFF Disabled > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > @@ -33,13 +33,13 @@ > LnkSta2: Current De-emphasis Level: -3.5dB, > EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, > LinkEqualizationRequest- > Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ > - Address: 0000000000000000 Data: 0000 > + Address: 00000000fee00000 Data: 0000 > Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 > Len=010 <?> > Capabilities: [150 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > - CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > + CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > Capabilities: [270 v1] #19 > > After that if I do suspend-to-ram / resume trick I have again lspci > output from before 1st boot.
The Link Status change after X is stopped seems the most interesting to me. The MSI change is probably explained by the MSI save/restore of the device, but should be harmless since MSI is disabled. I'm a bit surprised the Correctable Error Status in the AER capability didn't get cleared. I would have thought that a bus reset would have caused the link to retrain back to the original speed/width as well. Let's check that we're actually getting a bus reset, try this in addition to the previous qemu patch. This just enables debug logging for the bus resest function. Thanks, Alex diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c index 8db182f..7fec259 100644 --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -2927,6 +2927,10 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *hos host1->slot == host2->slot && host1->function == host2->function); } +#undef DPRINTF +#define DPRINTF(fmt, ...) \ + do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0) + static int vfio_pci_hot_reset(VFIODevice *vdev, bool single) { VFIOGroup *group; @@ -3104,6 +3108,15 @@ out_single: return ret; } +#undef DPRINTF +#ifdef DEBUG_VFIO +#define DPRINTF(fmt, ...) \ + do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) \ + do { } while (0) +#endif + /* * We want to differentiate hot reset of mulitple in-use devices vs hot reset * of a single in-use device. VFIO_DEVICE_RESET will already handle the case