Re: NEC uPD720200 xHCI Controller dies when Runtime PM enabled

2016-08-01 Thread Mike Murdoch
Hello,

On 2016-08-01 13:57, Durval Menezes wrote:
> Hi Mathias,
>
> On Mon, Aug 1, 2016 at 8:20 AM, Mathias Nyman <mathias.ny...@linux.intel.com> 
> wrote:
>>> On 01.08.2016 13:15, Durval Menezes wrote:
>>> Hello Mike, Mathias, list,
>>>
>>> On 06.02.2016 19:08, Mike Murdoch wrote:
>>> Bug ID: 111251
>>>
>>> I have a NEC uPD720200 USB3.0 controller in a Thinkpad W520 laptop on
>>> kernel 4.4.1-gentoo.
>>>
>>> 0e:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host
>>> Controller (rev 04) (prog-if 30 [XHCI])
>>>  Subsystem: Lenovo uPD720200 USB 3.0 Host Controller
>>>  Flags: bus master, fast devsel, latency 0
>>>  Memory at f380 (64-bit, non-prefetchable) [size=8K]
>>>  Capabilities: [50] Power Management version 3
>>>  Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
>>>  Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
>>>  Capabilities: [a0] Express Endpoint, MSI 00
>>>  Capabilities: [100] Advanced Error Reporting
>>>  Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
>>>  Capabilities: [150] Latency Tolerance Reporting
>>>  Kernel driver in use: xhci_hcd
>>>  Kernel modules: xhci_pci
>>>
>>> When runtime power control for this controller is disabled
>>> (/sys/bus/pci/devices/:0e:00.0/power/control = on), the controller
>>> works fine and reaches over 120MB/s transfer rates.
>>>
>>> When runtime power control for this controller is enabled
>>> (/sys/bus/pci/devices/:0e:00.0/power/control = auto), two effects
>>> can be observed:
>>>
>>> - Transfer rates are much lower at around 30MB/s
>>> - During transfers, the controller dies after a couple of seconds:
>>>
>>> I found this message in the list archives, and I have the exact same
>>> issues on exactly the same hardware (Thinkpad W520 laptop with the same
>>> USB3 controller showing on lspci -v); otherwise, I'm running distro kernel
>>> 2.6.32-573.7.1.el6.x86_64 on a Springdale Linux 6.7 (RHEL6) install.
>>>
>>> I just verified that my controller's PM was set by default to "auto":
>>> cat /sys/bus/pci/devices/\:0e\:00.0/power/control
>>> auto
>>> I have now set it to "on" and will test whether this will work around
>>> the issue (I'm waiting for my USB3.0 "heavy duty" disk docks to be
>>> released from another system that is using them right now).
>>>
>>> I have one question for Mike: have you upgraded your uPD720200 controller
>>> firmware (as per [1], [2]) or are you still running stock?
>>>
>>> Also, one question for Mathias: do you know whether your patches at [3]
>>> can be applied to kernel 2.6.32?
>> The last patch in [3] is faulty. So don't use the patches from the mail.
>>
>> I just force updated that branch, so if you like you can try to backport
>> patches from:
>>
>>  git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git 
>> bug_usb3_enum_rtresume
>>
>> only 2 patches are relevant:
>>
>> 8caabe9 xhci: Don't suspend the xhci bus it there is a pending event.
>> 4427456 xhci: resume USB 3 roothub first
> Thanks Mathias. Now I only need Mike's response concerning the firmware
> in order to proceed.
>
> Cheers,
No, I haven't tried updating the firmware. Feel free to give it a go,
I'm curious if it'll make a difference.

As for the patches. All three of them did fix this bug, but introduced
other problems (I don't remember details, sorry). As Mathias said, the
last one is faulty. However, using only the first two patches is *not*
enough to completely fix this bug (I verified it just now).

Unfortunately I don't have the time to do much testing. The Thinkpad is
used by someone else and I only have access to it on the weekends. A
workaround is to just disable runtime power management.

Let me know how things work for you!

Cheers,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NEC uPD720200 xHCI Controller dies when Runtime PM enabled

2016-03-14 Thread Mike Murdoch


On 2016-03-14 10:06, Mathias Nyman wrote:
> On 13.03.2016 11:16, Mike Murdoch wrote:
>>
>>
>> On 2016-03-01 16:32, Mathias Nyman wrote:
>>> On 18.02.2016 18:34, Mike Murdoch wrote:
>>>>
>>>>
>>>> On 2016-02-18 16:12, Mathias Nyman wrote:
>>>>> On 16.02.2016 23:58, main.ha...@googlemail.com wrote:
>>>>>>
>>>>>>
>>>>>> On 2016-02-08 15:31, Mathias Nyman wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> On 06.02.2016 19:08, Mike Murdoch wrote:
>>>>>>>> Bug ID: 111251
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I have a NEC uPD720200 USB3.0 controller in a Thinkpad W520
>>>>>>>> laptop on
>>>>>>>> kernel 4.4.1-gentoo.
>>>>>>>>
>>>>>>>> 0e:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host
>>>>>>>> Controller (rev 04) (prog-if 30 [XHCI])
>>>>>>>> Subsystem: Lenovo uPD720200 USB 3.0 Host Controller
>>>>>>>>
>>>>>>>> When runtime power control for this controller is disabled
>>>>>>>> (/sys/bus/pci/devices/:0e:00.0/power/control = on), the
>>>>>>>> controller
>>>>>>>> works fine and reaches over 120MB/s transfer rates.
>>>>>>>>
>>>>>>>> When runtime power control for this controller is enabled
>>>>>>>> (/sys/bus/pci/devices/:0e:00.0/power/control = auto), two
>>>>>>>> effects
>>>>>>>> can be observed:
>>>>>>>>
>>>>>>>> - Transfer rates are much lower at around 30MB/s
>>>>>>>> - During transfers, the controller dies after a couple of seconds:
>>>>>>>>
>>>>>>>> At this point, a reboot is required to reactivate the controller,
>>>>>>>> unloading and reloading the xhci_* modules does not work.
>>>>>>>>
>>>>>>>
>>>
>>> ...
>>>
>>> I did some more digging, there are a few things that need to be
>>> addressed:
>>> 1. We should resume USB3 bus before USB2 bus to let devices enumerate
>>> as USB3 better,
>>> this gives them more time to finish the link training.
>>>
>>> 2. After resuming xhci we don't see any port changes immediately, hub
>>> thinks nothing
>>> happended and stops polling the ports, hub will suspend again ->
>>> xhci will try to
>>> suspend.
>>> 3. Roothubs will autosuspend immediately after autoresume,
>>> (autosuspend timeout = 0)
>>> This could be a reason why we see the "xhci_suspend" entry in the
>>> log. We either
>>> need to increase the autosuspend timeout, or prevent suspend if we
>>> can see the pending
>>> event in a xhci status register.
>>>
>>> inserting usb3 storage device
>>> Feb 16 20:03:33 xhci_hcd :0e:00.0: // Setting command ring address
>>> to 0xe001
>>> Feb 16 20:03:33 xhci_hcd :0e:00.0: xhci_resume: starting port
>>> polling.
>>> Feb 16 20:03:33 xhci_hcd :0e:00.0: xhci_hub_status_data: stopping
>>> port polling.
>>> Feb 16 20:03:33 xhci_hcd :0e:00.0: xhci_suspend: stopping port
>>> polling.
>>>
>>> I got a few patches, attached. They both partially try to fix the
>>> issue, and add more logging.
>>> Same changes can be found in a topic branch from in:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git
>>> bug_usb3_enum_rtresume
>>>
>>> Any chance to try them out?
>>>
>>> -Mathias
>>
>> Hello,
>>
>> I've come around to testing these patches. I applied them all at once
>> (did you want me to test them individually?) and they appear to fix this
>> issue completely! Full speed and no dead controllers.Do you need any
>> further logs?
>>
>
> That's good news.
>
> Can I add your "Tested-by:" tag to two of the patches?
> I'll send them as fixes after rc1 is out.
>
> No more logs needed as it works, I'll send the third additional debug
> info
> patch to usb-next later. It will be useful for future debugging
>
> Thanks
> Mathias
>
>
>
> for further debugging this case
> The third patch is just additional debug info and useful for future
> debugging (or if those
>
>
Hello,

yes, feel free to add the tag. Thanks for everything!

- Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NEC uPD720200 xHCI Controller dies when Runtime PM enabled

2016-03-13 Thread Mike Murdoch


On 2016-03-01 16:32, Mathias Nyman wrote:
> On 18.02.2016 18:34, Mike Murdoch wrote:
>>
>>
>> On 2016-02-18 16:12, Mathias Nyman wrote:
>>> On 16.02.2016 23:58, main.ha...@googlemail.com wrote:
>>>>
>>>>
>>>> On 2016-02-08 15:31, Mathias Nyman wrote:
>>>>> Hi
>>>>>
>>>>> On 06.02.2016 19:08, Mike Murdoch wrote:
>>>>>> Bug ID: 111251
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a NEC uPD720200 USB3.0 controller in a Thinkpad W520
>>>>>> laptop on
>>>>>> kernel 4.4.1-gentoo.
>>>>>>
>>>>>> 0e:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host
>>>>>> Controller (rev 04) (prog-if 30 [XHCI])
>>>>>>Subsystem: Lenovo uPD720200 USB 3.0 Host Controller
>>>>>>
>>>>>> When runtime power control for this controller is disabled
>>>>>> (/sys/bus/pci/devices/:0e:00.0/power/control = on), the
>>>>>> controller
>>>>>> works fine and reaches over 120MB/s transfer rates.
>>>>>>
>>>>>> When runtime power control for this controller is enabled
>>>>>> (/sys/bus/pci/devices/:0e:00.0/power/control = auto), two
>>>>>> effects
>>>>>> can be observed:
>>>>>>
>>>>>> - Transfer rates are much lower at around 30MB/s
>>>>>> - During transfers, the controller dies after a couple of seconds:
>>>>>>
>>>>>> At this point, a reboot is required to reactivate the controller,
>>>>>> unloading and reloading the xhci_* modules does not work.
>>>>>>
>>>>>
>
> ...
>
> I did some more digging, there are a few things that need to be
> addressed:
> 1. We should resume USB3 bus before USB2 bus to let devices enumerate
> as USB3 better,
>this gives them more time to finish the link training.
>
> 2. After resuming xhci we don't see any port changes immediately, hub
> thinks nothing
>happended and stops polling the ports, hub will suspend again ->
> xhci will try to
>suspend.  
> 3. Roothubs will autosuspend immediately after autoresume,
> (autosuspend timeout = 0)
>This could be a reason why we see the "xhci_suspend" entry in the
> log. We either
>need to increase the autosuspend timeout, or prevent suspend if we
> can see the pending
>event in a xhci status register.
>  
> inserting usb3 storage device
> Feb 16 20:03:33 xhci_hcd :0e:00.0: // Setting command ring address
> to 0xe001
> Feb 16 20:03:33 xhci_hcd :0e:00.0: xhci_resume: starting port
> polling.
> Feb 16 20:03:33 xhci_hcd :0e:00.0: xhci_hub_status_data: stopping
> port polling.
> Feb 16 20:03:33 xhci_hcd :0e:00.0: xhci_suspend: stopping port
> polling.
>
> I got a few patches, attached. They both partially try to fix the
> issue, and add more logging.
> Same changes can be found in a topic branch from in:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git
> bug_usb3_enum_rtresume
>
> Any chance to try them out?
>
> -Mathias

Hello,

I've come around to testing these patches. I applied them all at once
(did you want me to test them individually?) and they appear to fix this
issue completely! Full speed and no dead controllers.Do you need any
further logs?

Many thanks so far! :)

Cheers,
- Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NEC uPD720200 xHCI Controller dies when Runtime PM enabled

2016-02-18 Thread Mike Murdoch


On 2016-02-18 16:12, Mathias Nyman wrote:
> On 16.02.2016 23:58, main.ha...@googlemail.com wrote:
>>
>>
>> On 2016-02-08 15:31, Mathias Nyman wrote:
>>> Hi
>>>
>>> On 06.02.2016 19:08, Mike Murdoch wrote:
>>>> Bug ID: 111251
>>>>
>>>> Hello,
>>>>
>>>> I have a NEC uPD720200 USB3.0 controller in a Thinkpad W520 laptop on
>>>> kernel 4.4.1-gentoo.
>>>>
>>>> 0e:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host
>>>> Controller (rev 04) (prog-if 30 [XHCI])
>>>>   Subsystem: Lenovo uPD720200 USB 3.0 Host Controller
>>>>
>>>> When runtime power control for this controller is disabled
>>>> (/sys/bus/pci/devices/:0e:00.0/power/control = on), the controller
>>>> works fine and reaches over 120MB/s transfer rates.
>>>>
>>>> When runtime power control for this controller is enabled
>>>> (/sys/bus/pci/devices/:0e:00.0/power/control = auto), two effects
>>>> can be observed:
>>>>
>>>> - Transfer rates are much lower at around 30MB/s
>>>> - During transfers, the controller dies after a couple of seconds:
>>>>
>>>> xhci_hcd :0e:00.0: xHCI host not responding to stop endpoint
>>>> command.
>>>> xhci_hcd :0e:00.0: Assuming host is dying, halting host.
>>>> xhci_hcd :0e:00.0: Host not halted after 16000 microseconds.
>>>> xhci_hcd :0e:00.0: Non-responsive xHCI host is not halting.
>>>> xhci_hcd :0e:00.0: Completing active URBs anyway.
>>>> xhci_hcd :0e:00.0: HC died; cleaning up
>>>> sd 9:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_ERROR
>>>> driverbyte=DRIVER_OK
>>>> sd 9:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 19 a9 00 00 00 f0 00
>>>> blk_update_request: I/O error, dev sdc, sector 1681664
>>>> xhci_hcd :0e:00.0: Stopped the command ring failed, maybe the host
>>>> is dead
>>>> xhci_hcd :0e:00.0: Host not halted after 16000 microseconds.
>>>> xhci_hcd :0e:00.0: Abort command ring failed
>>>> xhci_hcd :0e:00.0: HC died; cleaning up
>>>>
>>>> At this point, a reboot is required to reactivate the controller,
>>>> unloading and reloading the xhci_* modules does not work.
>>>>
>>>
>>> With 120MB/s I assume it was a USB3 device.
>>> Was there any USB 2 device connected as well?
>>> Does this occur with only a USB2 device connected to xhci?
>>>
>>> xhci handles suspend/resume a bit differently for USB2 and USB3
>>> roothubs.
>>>
>>> Does this happen on older kernels as well? 4.3 or 4.2 based?
>>>
>>> For more xhci debugging, do:
>>> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
>>> and check dmesg for more xhci info.
>>>
>>> If reloading the module did not help it is more likely that the
>>> controller is in some
>>> unexpected state.
>>> If however, it would instead be just bad timeout timer handling we
>>> could just return immediately
>>> in the timeout handler, and check if the usb device(s) continue to
>>> work normally.
>>>
>>> This could be done by editing drivers/usb/hosts/xhci-ring.c
>>>
>>> +++ b/drivers/usb/host/xhci-ring.c
>>> @@ -831,6 +831,7 @@ void xhci_stop_endpoint_command_watchdog(unsigned
>>> long arg)
>>>  struct xhci_virt_ep *ep;
>>>  int ret, i, j;
>>>  unsigned long flags;
>>> +   return;
>>>
>>> -Mathias
>>>
>>>
>> Hello Mat,
>>
>> thanks for your response. I have experimented with your suggestions.
>>
>> As for your questions: No, there was only one USB3 stick connected to
>> the host controller during the tests. USB2 devices work fine too.
>>
>> Yes, I encountered this problem on a 4.1 series kernel aswell as the 4.4
>> series.
>>
>> I have enabled the debug controls and attached the results to this mail,
>> along with some commentary. I am hoping this works in the mailing list.
>>
>> I've also tried your suggested modification, and it does seem to work!
>> With it, the controller does not die, but it still sacrifices a lot of
>> speed (as I had mentioned in the first mail of this thread)
>>
>>
>> I hope this is helpful!
>>
>
> Thanks, it is helpful
>
> Looks like when the USB3 device is inserted it i

NEC uPD720200 xHCI Controller dies when Runtime PM enabled

2016-02-06 Thread Mike Murdoch
Bug ID: 111251

Hello,

I have a NEC uPD720200 USB3.0 controller in a Thinkpad W520 laptop on
kernel 4.4.1-gentoo.

0e:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host
Controller (rev 04) (prog-if 30 [XHCI])
Subsystem: Lenovo uPD720200 USB 3.0 Host Controller
Flags: bus master, fast devsel, latency 0
Memory at f380 (64-bit, non-prefetchable) [size=8K]
Capabilities: [50] Power Management version 3
Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
Capabilities: [150] Latency Tolerance Reporting
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci

When runtime power control for this controller is disabled
(/sys/bus/pci/devices/:0e:00.0/power/control = on), the controller
works fine and reaches over 120MB/s transfer rates.

When runtime power control for this controller is enabled
(/sys/bus/pci/devices/:0e:00.0/power/control = auto), two effects
can be observed:

- Transfer rates are much lower at around 30MB/s
- During transfers, the controller dies after a couple of seconds:

xhci_hcd :0e:00.0: xHCI host not responding to stop endpoint command.
xhci_hcd :0e:00.0: Assuming host is dying, halting host.
xhci_hcd :0e:00.0: Host not halted after 16000 microseconds.
xhci_hcd :0e:00.0: Non-responsive xHCI host is not halting.
xhci_hcd :0e:00.0: Completing active URBs anyway.
xhci_hcd :0e:00.0: HC died; cleaning up
sd 9:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_ERROR
driverbyte=DRIVER_OK
sd 9:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 19 a9 00 00 00 f0 00
blk_update_request: I/O error, dev sdc, sector 1681664
xhci_hcd :0e:00.0: Stopped the command ring failed, maybe the host
is dead
xhci_hcd :0e:00.0: Host not halted after 16000 microseconds.
xhci_hcd :0e:00.0: Abort command ring failed
xhci_hcd :0e:00.0: HC died; cleaning up

At this point, a reboot is required to reactivate the controller,
unloading and reloading the xhci_* modules does not work.

I'll be happy to assist in getting this fixed :)
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html