Re: [REGRESSION] xHCI host controller not responding, assume dead on Dell XPS 13 9360

2018-05-03 Thread Esokrates

Sure. Done.

On 05/03/2018 07:04 PM, Mika Westerberg wrote:

Could you then attach full dmesg of the failure without revert to the
bugzilla bug?

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] xHCI host controller not responding, assume dead on Dell XPS 13 9360

2018-05-03 Thread Mika Westerberg
On Thu, May 03, 2018 at 06:53:13PM +0200, Esokrates wrote:
> Hi,
> 
> Thanks very much for pointing out that commit!
> Indeed, reverting makes the problem go away!
> Also, interestingly it also makes the errors in
> https://bugzilla.kernel.org/show_bug.cgi?id=199557
> go away, tested using 4.16.7!

OK, good.

Could you then attach full dmesg of the failure without revert to the
bugzilla bug?
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] xHCI host controller not responding, assume dead on Dell XPS 13 9360

2018-05-03 Thread Esokrates

Hi,

Thanks very much for pointing out that commit!
Indeed, reverting makes the problem go away!
Also, interestingly it also makes the errors in
https://bugzilla.kernel.org/show_bug.cgi?id=199557
go away, tested using 4.16.7!

On 05/03/2018 05:18 PM, Mika Westerberg wrote:

Could you try to revert:

   13d3047c8150 ("ACPI / hotplug / PCI: Check presence of slot itself in 
get_slot_status()")

and see if the problem goes away?

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] xHCI host controller not responding, assume dead on Dell XPS 13 9360

2018-05-03 Thread Mika Westerberg
On Thu, May 03, 2018 at 04:13:05PM +0300, Mathias Nyman wrote:
> On 03.05.2018 12:30, Esokrates wrote:
> > Hi,> Beginning with Linux 4.16 rc7 (4.16 rc6 was NOT affected), I do
> > find the following regularly in dmesg (often it does not happen during
> > boot, but after suspend to ram / resume):
> > 
> 
> Hi
> 
> Can you try 'git bisect' to find the patch that causes the issues?
> (Adding Mika to Cc)

Could you try to revert:

  13d3047c8150 ("ACPI / hotplug / PCI: Check presence of slot itself in 
get_slot_status()")

and see if the problem goes away?

> > [  216.443309] pcieport :00:1c.0: AER: Corrected error received: id=00e0
> > [  216.443951] pcieport :00:1c.0: PCIe Bus Error: severity=Corrected, 
> > type=Physical Layer, id=00e0(Receiver ID)
> > [  216.444607] pcieport :00:1c.0:   device [8086:9d10] error 
> > status/mask=0001/2000
> > [  216.445300] pcieport :00:1c.0:    [ 0] Receiver Error (First)
> > [  216.517886] xhci_hcd :39:00.0: remove, state 4
> > [  216.518573] usb usb4: USB disconnect, device number 1
> > [  216.519438] xhci_hcd :39:00.0: USB bus 4 deregistered
> > [  216.520320] xhci_hcd :39:00.0: xHCI host controller not responding, 
> > assume dead
> > [  216.521908] xhci_hcd :39:00.0: remove, state 4
> > [  216.522950] usb usb3: USB disconnect, device number 1
> > [  216.523891] xhci_hcd :39:00.0: Host halt failed, -19
> > [  216.524994] xhci_hcd :39:00.0: Host not accessible, reset failed.
> > [  216.526153] xhci_hcd :39:00.0: USB bus 3 deregistered
> > 
> > 
> > Running 4.16.0 I also observed
> > 
> > [   31.509282] ACPI: Waking up from system sleep state S3
> > [   31.809429] ACPI: EC: interrupt unblocked
> > [   31.828849] pci_raw_set_power_state: 62 callbacks suppressed
> > [   31.828852] pcieport :01:00.0: Refused to change power state, 
> > currently in D3
> > [   31.830422] pcieport :02:01.0: Refused to change power state, 
> > currently in D3
> > [   31.830423] pcieport :02:02.0: Refused to change power state, 
> > currently in D3
> > [   31.848853] pcieport :02:00.0: Refused to change power state, 
> > currently in D3
> > [   31.852520] xhci_hcd :39:00.0: Refused to change power state, 
> > currently in D3
> > [   31.872529] thunderbolt :03:00.0: Refused to change power state, 
> > currently in D3
> > [   31.933970] thunderbolt :03:00.0: control channel starting...
> > [   31.937403] ACPI: EC: event unblocked
> > [   31.938385] sd 2:0:0:0: [sda] Starting disk
> > [   31.938886] ACPI: button: The lid device is not compliant to SW_LID.
> > [   31.956574] xhci_hcd :39:00.0: Refused to change power state, 
> > currently in D3
> > [   31.956624] xhci_hcd :39:00.0: WARN: xHC restore state timeout
> > [   31.956631] xhci_hcd :39:00.0: PCI post-resume error -110!
> > [   31.956656] xhci_hcd :39:00.0: HC died; cleaning up
> > [   31.956658] xhci_hcd :39:00.0: HC died; cleaning up
> > [   31.956664] dpm_run_callback(): pci_pm_resume+0x0/0xb0 returns -110
> > [   31.956668] PM: Device :39:00.0 failed to resume async: error -110
> > 
> > Furthermore sometimes I also get a bunch of these errors before the errors 
> > above:
> > May 03 10:58:49 debian kernel: usb 1-3: device descriptor read/64, error -71
> > May 03 10:58:49 debian kernel: usb 1-3: device descriptor read/64, error -71
> > May 03 10:58:50 debian kernel: usb 1-3: device descriptor read/64, error -71
> > May 03 10:58:50 debian kernel: usb 1-3: device descriptor read/64, error -71
> > May 03 10:58:51 debian kernel: usb 1-3: device not accepting address 2, 
> > error -71
> > 
> > All of this never happened before 4.16rc7. All kernels since 4.16.rc7 are 
> > reproducibly affected (suspend/resume helps triggering), including 
> > 4.17.0rc3.
> > 
> > My hardware is a XPS 13 9360 Kabylake, lsub and lspci output are attached.
> > 
> > I am not subscribed to the mailing list, so please CC me when replying to 
> > the list.
> > 
> > Thanks very much!
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] xHCI host controller not responding, assume dead on Dell XPS 13 9360

2018-05-03 Thread Mathias Nyman

On 03.05.2018 12:30, Esokrates wrote:
Hi,> 
Beginning with Linux 4.16 rc7 (4.16 rc6 was NOT affected), I do find the following regularly in dmesg (often it does not happen during boot, but after suspend to ram / resume):




Hi

Can you try 'git bisect' to find the patch that causes the issues?
(Adding Mika to Cc)


[  216.443309] pcieport :00:1c.0: AER: Corrected error received: id=00e0
[  216.443951] pcieport :00:1c.0: PCIe Bus Error: severity=Corrected, 
type=Physical Layer, id=00e0(Receiver ID)
[  216.444607] pcieport :00:1c.0:   device [8086:9d10] error 
status/mask=0001/2000
[  216.445300] pcieport :00:1c.0:    [ 0] Receiver Error (First)
[  216.517886] xhci_hcd :39:00.0: remove, state 4
[  216.518573] usb usb4: USB disconnect, device number 1
[  216.519438] xhci_hcd :39:00.0: USB bus 4 deregistered
[  216.520320] xhci_hcd :39:00.0: xHCI host controller not responding, 
assume dead
[  216.521908] xhci_hcd :39:00.0: remove, state 4
[  216.522950] usb usb3: USB disconnect, device number 1
[  216.523891] xhci_hcd :39:00.0: Host halt failed, -19
[  216.524994] xhci_hcd :39:00.0: Host not accessible, reset failed.
[  216.526153] xhci_hcd :39:00.0: USB bus 3 deregistered


Running 4.16.0 I also observed

[   31.509282] ACPI: Waking up from system sleep state S3
[   31.809429] ACPI: EC: interrupt unblocked
[   31.828849] pci_raw_set_power_state: 62 callbacks suppressed
[   31.828852] pcieport :01:00.0: Refused to change power state, currently 
in D3
[   31.830422] pcieport :02:01.0: Refused to change power state, currently 
in D3
[   31.830423] pcieport :02:02.0: Refused to change power state, currently 
in D3
[   31.848853] pcieport :02:00.0: Refused to change power state, currently 
in D3
[   31.852520] xhci_hcd :39:00.0: Refused to change power state, currently 
in D3
[   31.872529] thunderbolt :03:00.0: Refused to change power state, 
currently in D3
[   31.933970] thunderbolt :03:00.0: control channel starting...
[   31.937403] ACPI: EC: event unblocked
[   31.938385] sd 2:0:0:0: [sda] Starting disk
[   31.938886] ACPI: button: The lid device is not compliant to SW_LID.
[   31.956574] xhci_hcd :39:00.0: Refused to change power state, currently 
in D3
[   31.956624] xhci_hcd :39:00.0: WARN: xHC restore state timeout
[   31.956631] xhci_hcd :39:00.0: PCI post-resume error -110!
[   31.956656] xhci_hcd :39:00.0: HC died; cleaning up
[   31.956658] xhci_hcd :39:00.0: HC died; cleaning up
[   31.956664] dpm_run_callback(): pci_pm_resume+0x0/0xb0 returns -110
[   31.956668] PM: Device :39:00.0 failed to resume async: error -110

Furthermore sometimes I also get a bunch of these errors before the errors 
above:
May 03 10:58:49 debian kernel: usb 1-3: device descriptor read/64, error -71
May 03 10:58:49 debian kernel: usb 1-3: device descriptor read/64, error -71
May 03 10:58:50 debian kernel: usb 1-3: device descriptor read/64, error -71
May 03 10:58:50 debian kernel: usb 1-3: device descriptor read/64, error -71
May 03 10:58:51 debian kernel: usb 1-3: device not accepting address 2, error 
-71

All of this never happened before 4.16rc7. All kernels since 4.16.rc7 are 
reproducibly affected (suspend/resume helps triggering), including 4.17.0rc3.

My hardware is a XPS 13 9360 Kabylake, lsub and lspci output are attached.

I am not subscribed to the mailing list, so please CC me when replying to the 
list.

Thanks very much!


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[REGRESSION] xHCI host controller not responding, assume dead on Dell XPS 13 9360

2018-05-03 Thread Esokrates

Hi,

Beginning with Linux 4.16 rc7 (4.16 rc6 was NOT affected), I do find the 
following regularly in dmesg (often it does not happen during boot, but 
after suspend to ram / resume):


[  216.443309] pcieport :00:1c.0: AER: Corrected error received: id=00e0
[  216.443951] pcieport :00:1c.0: PCIe Bus Error: 
severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
[  216.444607] pcieport :00:1c.0:   device [8086:9d10] error 
status/mask=0001/2000

[  216.445300] pcieport :00:1c.0:[ 0] Receiver Error (First)
[  216.517886] xhci_hcd :39:00.0: remove, state 4
[  216.518573] usb usb4: USB disconnect, device number 1
[  216.519438] xhci_hcd :39:00.0: USB bus 4 deregistered
[  216.520320] xhci_hcd :39:00.0: xHCI host controller not 
responding, assume dead

[  216.521908] xhci_hcd :39:00.0: remove, state 4
[  216.522950] usb usb3: USB disconnect, device number 1
[  216.523891] xhci_hcd :39:00.0: Host halt failed, -19
[  216.524994] xhci_hcd :39:00.0: Host not accessible, reset failed.
[  216.526153] xhci_hcd :39:00.0: USB bus 3 deregistered


Running 4.16.0 I also observed

[   31.509282] ACPI: Waking up from system sleep state S3
[   31.809429] ACPI: EC: interrupt unblocked
[   31.828849] pci_raw_set_power_state: 62 callbacks suppressed
[   31.828852] pcieport :01:00.0: Refused to change power state, 
currently in D3
[   31.830422] pcieport :02:01.0: Refused to change power state, 
currently in D3
[   31.830423] pcieport :02:02.0: Refused to change power state, 
currently in D3
[   31.848853] pcieport :02:00.0: Refused to change power state, 
currently in D3
[   31.852520] xhci_hcd :39:00.0: Refused to change power state, 
currently in D3
[   31.872529] thunderbolt :03:00.0: Refused to change power state, 
currently in D3

[   31.933970] thunderbolt :03:00.0: control channel starting...
[   31.937403] ACPI: EC: event unblocked
[   31.938385] sd 2:0:0:0: [sda] Starting disk
[   31.938886] ACPI: button: The lid device is not compliant to SW_LID.
[   31.956574] xhci_hcd :39:00.0: Refused to change power state, 
currently in D3

[   31.956624] xhci_hcd :39:00.0: WARN: xHC restore state timeout
[   31.956631] xhci_hcd :39:00.0: PCI post-resume error -110!
[   31.956656] xhci_hcd :39:00.0: HC died; cleaning up
[   31.956658] xhci_hcd :39:00.0: HC died; cleaning up
[   31.956664] dpm_run_callback(): pci_pm_resume+0x0/0xb0 returns -110
[   31.956668] PM: Device :39:00.0 failed to resume async: error -110

Furthermore sometimes I also get a bunch of these errors before the 
errors above:

May 03 10:58:49 debian kernel: usb 1-3: device descriptor read/64, error -71
May 03 10:58:49 debian kernel: usb 1-3: device descriptor read/64, error -71
May 03 10:58:50 debian kernel: usb 1-3: device descriptor read/64, error -71
May 03 10:58:50 debian kernel: usb 1-3: device descriptor read/64, error -71
May 03 10:58:51 debian kernel: usb 1-3: device not accepting address 2, 
error -71


All of this never happened before 4.16rc7. All kernels since 4.16.rc7 
are reproducibly affected (suspend/resume helps triggering), including 
4.17.0rc3.


My hardware is a XPS 13 9360 Kabylake, lsub and lspci output are attached.

I am not subscribed to the mailing list, so please CC me when replying 
to the list.


Thanks very much!
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor 
Host Bridge/DRAM Registers (rev 02)
Subsystem: Dell Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM 
Registers
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] Vendor Specific Information: Len=10 

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02) 
(prog-if 00 [VGA controller])
Subsystem: Dell HD Graphics 620
Flags: bus master, fast devsel, latency 0, IRQ 128
Memory at db00 (64-bit, non-prefetchable) [size=16M]
Memory at 9000 (64-bit, prefetchable) [size=256M]
I/O ports at f000 [size=64]
[virtual] Expansion ROM at 000c [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c 
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [100] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [300] Page Request Interface (PRI)
Kernel driver in use: i915
Kernel modules: i915

00:04.0 Signal processing controller: Intel Corporation Skylake Processor 
Thermal Subsystem (rev 02)
Subsystem: Dell Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor 
Thermal Subsystem
Flags: fast devsel, IRQ 16
Memory at dc22 (64-bit, non-prefetchable) [size=32K]
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-