RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-09-10 Thread Alan Stern
On Wed, 10 Sep 2014, Kasberger Andreas wrote:

> We have a problem with a CDC device that simulates a serial
> interface. Problem is that the host stops CDC BULK IN transfers after
> a while on those USB2.0 ports that are connected to the chip set's
> internal rate matching hub, others seem to work fine.

Here you say that ports connected to the rate-matching hub _fail_.

> ... to prevent confusion please note: Problem can only be reproduced
> on those USB2.0 ports that are NOT HANDLED by chip set's rate
> matching hub.
> 
> If we disable USB3.0 within the BIOS (or remove USB3.0 Linux drivers)
> all ports are handled via rate matching hub and therefore all ports
> seem to work, in that case the error cannot be reproduced anymore at
> any port.

Here you say that ports connected to the rate-matching hub _work_.

Note that the xHCI host controller does not use the rate-matching hub.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-09-10 Thread Lu, Baolu

Hi Andreas,

I'd like to reproduce this problem in the lab. How can I get a CDC device?

Thanks,
-baolu

On 9/10/2014 3:04 PM, Kasberger Andreas wrote:

So I am back with more tests on this problem.
Intel itself told us it is a problem on the driver for the XHCI host 
controller. I will put some stuff here now what I goo from Intel:

Let me explain why I came back to you with this problem.
We have already tried the same thing with an "echo device" from Ellisys to see 
if the problem came from our board.
And we got also with this very simple device (just echoing) the same problem.
XHCI host controller do not respond anymore

So it has nothing to do with our board and it is a bug exisiting on the current 
Linux kernel

Here some short mail traiffic between Intel and our company. I removed the 
helpless parts

First Mail from us :

We have a problem with a CDC device that simulates a serial interface. Problem 
is that the host stops CDC BULK IN transfers after a while on those USB2.0 
ports that are connected to the chip set's internal rate matching hub, others 
seem to work fine.
We tried it with our own host hardware (using PCH 82QM87 [Lynx Point] chip set) as well as with 
your "CRB Emeral Lake 2". We also sent our hardware to a consulting company 
(http://thesycon.de/eng/home.shtml) for analysis. They tried our CDC device and their own "CDC 
echo device", both with same result. Tests have been done with unchanged Kubuntu 
3.13.0-24-generic and our own in-house Linux (kernel 3.6 with RT patches), without any differences. 
It also makes no difference if the device is connected directly or via external USB2.0 hub, same 
behavior.
We realized that it has to do with file open/close on Linux because if we open /dev/ttyACM0 once 
communication works fine for hours but if we re-open /dev/ttyACM0 for each message CDC BULK IN 
transfer is stopped within seconds (CDC BULK OUT still works). The problem could be reproduced with 
simply "cat" and "echo" in a loop as well as with an own written tool that just 
opens /dev/ttyACM0, writes something, expects an answer and closes the file again in a loop.
Please find attached a log file made with USB logger from Ellisys (software is 
available for free at Ellisys homepage if necessary: 
http://www.ellisys.com/products/usbex200/download.php) that contains whole CDC 
device transfer from being plugged in until CDC BULK IN transfer has been 
stopped. Furthermore find attached Thesycon’s device descriptor file. Both CDC 
devices differs in a way that our CDC device has only one interface whereas 
Thesycon’s has two.
So far we are not sure if the problem lays inside the host controller hardware 
or any of the Linux device drivers. Do you have ever heard about that problem, 
any suggestions, bug fixes or work arounds?
... to prevent confusion please note: Problem can only be reproduced on those 
USB2.0 ports that are NOT HANDLED by chip set's rate matching hub.

If we disable USB3.0 within the BIOS (or remove USB3.0 Linux drivers) all ports 
are handled via rate matching hub and therefore all ports seem to work, in that 
case the error cannot be reproduced anymore at any port.

Resposes from Intel
Sorry for the delay. I was not able to get in contact with Sarah Sharp like you 
told me, see seems to be out on vacations. Anyway,  I see the other issue was 
rejected in the Linux channel because they don't deal with peripheral drivers, 
just graphics drivers. I think we can veer towards the Linux community at this 
point. As we've discussed earlier, since the issue does not happen under the 
Windows environment, this more of linux driver issue that hardware.


We can conclude the same if we take into consideration that you see the same 
issue with the Intel CRB. Most of the times, Linux issues related to drivers 
have been already addressed and solved in the community, that's we strongly 
encourage you to ping them about it.


I've found a USB Linux drivers website that offers device driver support and it 
lists a bunch of USB devices as well as CDC class drivers. Perhaps this could 
help you out.


http://www.linux-usb.org/devices.html
Best Regards,



From: andreaskasber...@hotmail.com
To: st...@rowland.harvard.edu
CC: sarah.a.sh...@linux.intel.com; pe...@stuge.se; linux-usb@vger.kernel.org; 
mathias.ny...@intel.com
Subject: RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM 
dead after massive NAK
Date: Thu, 13 Feb 2014 16:15:13 +


Does my log means it has nothing to do with kernel itself ?

Maybe you're experiencing a problem with link power management. Some
changes were just merged into Greg KH's development tree (the usb-linus
branch), and they should appear in the next 3.14-rc release. You could
try either one of those. Or you could try building a kernel without
CONFIG_PM_RUNTIME, which will disable link power manageme

Re: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-09-10 Thread Oliver Neukum
On Wed, 2014-09-10 at 09:48 +, Kasberger Andreas wrote:
> Yes I tried Sarah's suggestions and disabled power management.
> but same results. 
> The time how long it takes until XHCI stops responding is sometimes only 1 
> second, sometimes several minutes

How did you disable power management?

Regards
Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-09-10 Thread Kasberger Andreas
Yes I tried Sarah's suggestions and disabled power management.
but same results. 
The time how long it takes until XHCI stops responding is sometimes only 1 
second, sometimes several minutes


> Subject: Re: PROBLEM: XHCI Host Controller on Intel Panther Point with 
> CDC/ACM dead after massive NAK
> From: oneu...@suse.de
> To: andreaskasber...@hotmail.com
> CC: st...@rowland.harvard.edu; sarah.a.sh...@linux.intel.com; pe...@stuge.se; 
> linux-usb@vger.kernel.org; mathias.ny...@intel.com
> Date: Wed, 10 Sep 2014 11:36:35 +0200
>
> On Wed, 2014-09-10 at 07:04 +, Kasberger Andreas wrote:
>> So I am back with more tests on this problem.
>> Intel itself told us it is a problem on the driver for the XHCI host 
>> controller. I will put some stuff here now what I goo from Intel:
>>
>> Let me explain why I came back to you with this problem.
>> We have already tried the same thing with an "echo device" from Ellisys to 
>> see if the problem came from our board.
>> And we got also with this very simple device (just echoing) the same problem.
>> XHCI host controller do not respond anymore
>>
>> So it has nothing to do with our board and it is a bug exisiting on the 
>> current Linux kernel
>>
>> Here some short mail traiffic between Intel and our company. I removed the 
>> helpless parts
>
> Your trace hasn't made it through the list. Anyway, have you tried
> following Sarah's suggestion and tried without LPM?
>
> Regards
> Oliver
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
  --
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-09-10 Thread Oliver Neukum
On Wed, 2014-09-10 at 07:04 +, Kasberger Andreas wrote:
> So I am back with more tests on this problem.
> Intel itself told us it is a problem on the driver for the XHCI host 
> controller. I will put some stuff here now what I goo from Intel:
> 
> Let me explain why I came back to you with this problem.
> We have already tried the same thing with an "echo device" from Ellisys to 
> see if the problem came from our board. 
> And we got also with this very simple device (just echoing) the same problem.
> XHCI host controller do not respond anymore
> 
> So it has nothing to do with our board and it is a bug exisiting on the 
> current Linux kernel
> 
> Here some short mail traiffic between Intel and our company. I removed the 
> helpless parts

Your trace hasn't made it through the list. Anyway, have you tried
following Sarah's suggestion and tried without LPM?

Regards
Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-09-10 Thread Kasberger Andreas
So I am back with more tests on this problem.
Intel itself told us it is a problem on the driver for the XHCI host 
controller. I will put some stuff here now what I goo from Intel:

Let me explain why I came back to you with this problem.
We have already tried the same thing with an "echo device" from Ellisys to see 
if the problem came from our board. 
And we got also with this very simple device (just echoing) the same problem.
XHCI host controller do not respond anymore

So it has nothing to do with our board and it is a bug exisiting on the current 
Linux kernel

Here some short mail traiffic between Intel and our company. I removed the 
helpless parts

First Mail from us :

We have a problem with a CDC device that simulates a serial interface. Problem 
is that the host stops CDC BULK IN transfers after a while on those USB2.0 
ports that are connected to the chip set's internal rate matching hub, others 
seem to work fine.
We tried it with our own host hardware (using PCH 82QM87 [Lynx Point] chip set) 
as well as with your "CRB Emeral Lake 2". We also sent our hardware to a 
consulting company (http://thesycon.de/eng/home.shtml) for analysis. They tried 
our CDC device and their own "CDC echo device", both with same result. Tests 
have been done with unchanged Kubuntu 3.13.0-24-generic and our own in-house 
Linux (kernel 3.6 with RT patches), without any differences. It also makes no 
difference if the device is connected directly or via external USB2.0 hub, same 
behavior.
We realized that it has to do with file open/close on Linux because if we open 
/dev/ttyACM0 once communication works fine for hours but if we re-open 
/dev/ttyACM0 for each message CDC BULK IN transfer is stopped within seconds 
(CDC BULK OUT still works). The problem could be reproduced with simply "cat" 
and "echo" in a loop as well as with an own written tool that just opens 
/dev/ttyACM0, writes something, expects an answer and closes the file again in 
a loop.
Please find attached a log file made with USB logger from Ellisys (software is 
available for free at Ellisys homepage if necessary: 
http://www.ellisys.com/products/usbex200/download.php) that contains whole CDC 
device transfer from being plugged in until CDC BULK IN transfer has been 
stopped. Furthermore find attached Thesycon’s device descriptor file. Both CDC 
devices differs in a way that our CDC device has only one interface whereas 
Thesycon’s has two.
So far we are not sure if the problem lays inside the host controller hardware 
or any of the Linux device drivers. Do you have ever heard about that problem, 
any suggestions, bug fixes or work arounds?
... to prevent confusion please note: Problem can only be reproduced on those 
USB2.0 ports that are NOT HANDLED by chip set's rate matching hub.

If we disable USB3.0 within the BIOS (or remove USB3.0 Linux drivers) all ports 
are handled via rate matching hub and therefore all ports seem to work, in that 
case the error cannot be reproduced anymore at any port.

Resposes from Intel
Sorry for the delay. I was not able to get in contact with Sarah Sharp like you 
told me, see seems to be out on vacations. Anyway,  I see the other issue was 
rejected in the Linux channel because they don't deal with peripheral drivers, 
just graphics drivers. I think we can veer towards the Linux community at this 
point. As we've discussed earlier, since the issue does not happen under the 
Windows environment, this more of linux driver issue that hardware.


We can conclude the same if we take into consideration that you see the same 
issue with the Intel CRB. Most of the times, Linux issues related to drivers 
have been already addressed and solved in the community, that's we strongly 
encourage you to ping them about it.


I've found a USB Linux drivers website that offers device driver support and it 
lists a bunch of USB devices as well as CDC class drivers. Perhaps this could 
help you out.


http://www.linux-usb.org/devices.html
Best Regards,


> From: andreaskasber...@hotmail.com
> To: st...@rowland.harvard.edu
> CC: sarah.a.sh...@linux.intel.com; pe...@stuge.se; linux-usb@vger.kernel.org; 
> mathias.ny...@intel.com
> Subject: RE: PROBLEM: XHCI Host Controller on Intel Panther Point with 
> CDC/ACM dead after massive NAK
> Date: Thu, 13 Feb 2014 16:15:13 +
>
>>> Does my log means it has nothing to do with kernel itself ?
>>
>> Maybe you're experiencing a problem with link power management. Some
>> changes were just merged into Greg KH's development tree (the usb-linus
>> branch), and they should appear in the next 3.14-rc release. You could
>> try either one of those. Or you could try building a kernel without
>> CONFIG_PM_RUNTIME, which will disable link power management.
>>
>
> I will give the latest 

RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-13 Thread Kasberger Andreas
>> Does my log means it has nothing to do with kernel itself ?
>
> Maybe you're experiencing a problem with link power management. Some
> changes were just merged into Greg KH's development tree (the usb-linus
> branch), and they should appear in the next 3.14-rc release. You could
> try either one of those. Or you could try building a kernel without
> CONFIG_PM_RUNTIME, which will disable link power management.
>

I will give the latest kernel a try in some days. The test with disabled power 
managment I have done already but maybe something is getting better with the 
new kernel anyway.


Andreas   --
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-13 Thread Alan Stern
On Thu, 13 Feb 2014, Kasberger Andreas wrote:

> > Ah, I see the reason for your confusion.
> >
> > On USB buses, Wireshark captures URBs, not packets. An URB can contain
> > multiple packets. In this case, there was a single URB containing two
> > data packets. Each packet was 64 bytes, and the URB was 128 bytes.
> 
> Ok Alan. Now that is clear for me.
> 
> I am looking like crazy on the usbmon data. But no error is popping up in my 
> eyes.�
> I get no status message back except 0 and -2. And both are what I would 
> expect.
> Other status message like -115 just popping up in SUBMISSION but this is 
> normal.
> 
> The only thing what I can think there is a HW-bug in Intel EHCI-part. At NEC 
> controller everything is going fine for hours
> 
> I fear without a real USB analyzer I will be not able to see any stuff that 
> would help me. I even cannot see any ACK/NAK and something like that in 
> usbmon.
> 
> Does my log means it has nothing to do with kernel itself ?   
>   

Maybe you're experiencing a problem with link power management.  Some
changes were just merged into Greg KH's development tree (the usb-linus
branch), and they should appear in the next 3.14-rc release.  You could
try either one of those.  Or you could try building a kernel without
CONFIG_PM_RUNTIME, which will disable link power management.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-11 Thread Alan Stern
On Tue, 11 Feb 2014, Kasberger Andreas wrote:

> > >> I saw the in the device endpoint ep82/ep83 at wMaxPacketSize a size 
> > >> "0040".
> >>
> >> As far as I understand the "packet 7092" in wireshark with URB data
> >> length 128 should not possible? What happens at such packets sizes?
> >> Or does wireshark just joking me --
> >
> > Wireshark adds 64 bytes of overhead to each packet it captures.
> >
> 
> Yes Alan, overall wireshark shows 192 byte for that packet.

Ah, I see the reason for your confusion.

On USB buses, Wireshark captures URBs, not packets.  An URB can contain
multiple packets.  In this case, there was a single URB containing two
data packets.  Each packet was 64 bytes, and the URB was 128 bytes.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-10 Thread Kasberger Andreas
>> I saw the in the device endpoint ep82/ep83 at wMaxPacketSize a size "0040".
>>
>> As far as I understand the "packet 7092" in wireshark with URB data
>> length 128 should not possible? What happens at such packets sizes?
>> Or does wireshark just joking me --
>
> Wireshark adds 64 bytes of overhead to each packet it captures.
>

Yes Alan, overall wireshark shows 192 byte for that packet. 
  --
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-10 Thread Alan Stern
On Mon, 10 Feb 2014, Kasberger Andreas wrote:

> I saw the in the device endpoint ep82/ep83 at wMaxPacketSize a size "0040".�
> 
> As far as I understand the "packet 7092" in wireshark with URB data
> length 128 should not possible? What happens at such packets sizes?
> Or does wireshark just joking me --

Wireshark adds 64 bytes of overhead to each packet it captures.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-10 Thread Kasberger Andreas
I saw the in the device endpoint ep82/ep83 at wMaxPacketSize a size "0040". 

As far as I understand the "packet 7092" in wireshark with URB data length 128 
should not possible? What happens at such packets sizes? Or does wireshark just 
joking me  --
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-05 Thread Sarah Sharp
On Wed, Feb 05, 2014 at 07:33:15AM +, Kasberger Andreas wrote:
> Hello Peter,
> 
> many many thanks for your long and detailed answer. 
> 
> > On the protocol design:
> >
> > First, using CDC-ACM means sacrificing all structured communication
> > offered by the USB packet bus and settling for such primitive use of
> > USB is not a decision that should be made lightly. Almost all
> > applications can benefit quite significantly both in end-user
> > usability and in ease of implementation from an application-specific
> > protocol which takes advantage of what USB offers.
> 
> 
> Yes you are absolutely right. No the best idea. The usage for this protocol 
> is to make firmware updates. In normal life it is a simple keyboard. And 
> sending out bulk messages is the great advantage of CDC/ACM
> 
> 
> What is still puzzling me is the fact that the host controller stops any 
> communication.

Perhaps looking at the USB mon trace would help?  I'm wondering if
the driver is continuing to submit URBs even though the next ones are
NAKed?

http://lxr.free-electrons.com/source/Documentation/usb/usbmon.txt

It could also be a problem with the xHCI cancellation code.  ISTR that
we have an issue in the driver where if no URBs get completed, the
dequeue pointer gets left on the last TRB that completed, and we keep
expanding the rings indefinitely.  When you turn on xHCI debugging
(either through CONFIG_USB_XHCI_HCD_DEBUG or by running
`echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control`)
do you see messages about URB cancellation?

> That means there is really electrically no communication (bulk_out) from HC 
> to device anymore. It seems that the host controller has shut down 
> communication port to one particular device. unbind and bind host controller 
> will solve the problem

Do you have auto-suspend enabled for the device?  Perhaps the CDC-ACM
driver is suspending the device because there's no reader?

You can see if auto-suspend is enabled by finding the device in
/sys/bus/usb/devices and seeing if power/control is set to 'auto'.

> But anyway I will try do my best to find out the root cause of 
> mis-communication between between both sides.
> 
> 
> > You mention device-side buffering and that the device at some point
> > can't accept anything more from the host. With USB this means that
> > you must ensure that the host will know when it must not send more.
> 
> 
> I thought sending NAK as response for each package is the correct way to tell 
> the host "not now but maybe later.Please try again".  After the internal 
> device queue is not completely full namyore the comunication is done in 
> normal way. But after some time HC stops completely any communication. 
> In real life it means a huge firmware update takes long time and so  it could 
> happens the internal device  queue is full. But a broken firmware update is a 
> bad thing
> 
> 
> > The USB way to do this, were you using an application-specific
> > protocol instead of serial port simulation, would be to stall the
> > endpoint. Unfortunately CDC-ACM doesn't allow doing that.
> 
> Ok. I will think about this if another way is possible
> 
> 
> > So you have to include some kind of in-band signalling for this. :\
> >
> > This is just one reason why ACM is a poor choice for when you need
> > structured communication.
> 
> 
> Anyway many many thanks for your precious time and detailed answers.
> 
> My conclusions and todo :
> 
> 1. Thinking about design
> 2. Still try to find out the main reason why host controller shutdown 
> connection
> 
> Arrrghhh Just saw also USB 2.0 has some problems. Host controller is 
> resetting after some hours but not getting in work state again.

By USB 2.0, do you mean your device under an EHCI host doesn't work as
well?  There may be USB 2.0 only ports under your xHCI host, and the
best way to find out which host you're under is to run `sudo lsusb -t`.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-05 Thread Bjørn Mork
Kasberger Andreas  writes:

>> On the protocol design:
>>
>> First, using CDC-ACM means sacrificing all structured communication
>> offered by the USB packet bus and settling for such primitive use of
>> USB is not a decision that should be made lightly. Almost all
>> applications can benefit quite significantly both in end-user
>> usability and in ease of implementation from an application-specific
>> protocol which takes advantage of what USB offers.
>
> Yes you are absolutely right. No the best idea. The usage for this
> protocol is to make firmware updates. 

Maybe consider DFU instead?


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-05 Thread David Laight
From: Kasberger Andreas
> What is still puzzling me is the fact that the host controller stops any 
> communication.
> That means there is really electrically no communication (bulk_out) from HC 
> to device anymore. It
> seems that the host controller has shut down communication port to one 
> particular device. unbind and
> bind host controller will solve the problem

The complete stop is probably a bug in the error recovery code.

> But anyway I will try do my best to find out the root cause of 
> mis-communication between between both
> sides.
> 
> 
> > You mention device-side buffering and that the device at some point
> > can't accept anything more from the host. With USB this means that
> > you must ensure that the host will know when it must not send more.
> 
> 
> I thought sending NAK as response for each package is the correct way to tell 
> the host "not now but
> maybe later.Please try again".  After the internal device queue is not 
> completely full namyore the
> comunication is done in normal way. But after some time HC stops completely 
> any communication.
> In real life it means a huge firmware update takes long time and so  it could 
> happens the internal
> device  queue is full. But a broken firmware update is a bad thing

IIRC the host controller should repeat any transfer that is responded to
with a NAK. But I'm not sure how long it will do this for before signalling
an error.

The fact that you caused the ring to be expanded several times does rather
indicate that you are not waiting for earlier transfers to finish before
adding the subsequent ones.
You should at least limit the number of in-flight transfers.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-05 Thread Kasberger Andreas
Hello Peter,

one short remark

> Application-specific or vendor-specific are often frowned upon in
> other contexts but if the protocol is documented publically then it
> is a great way to take advantage of all that USB offers, and it is
> explicitly supported by the specification. Use bDeviceClass or
> bInterfaceClass 0xff.

Certainly we try to use only the CDC/ACM standard and not using any custom 
communication 

Best regards
   Andreas--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-04 Thread Kasberger Andreas
Hello Peter,

many many thanks for your long and detailed answer. 

> On the protocol design:
>
> First, using CDC-ACM means sacrificing all structured communication
> offered by the USB packet bus and settling for such primitive use of
> USB is not a decision that should be made lightly. Almost all
> applications can benefit quite significantly both in end-user
> usability and in ease of implementation from an application-specific
> protocol which takes advantage of what USB offers.


Yes you are absolutely right. No the best idea. The usage for this protocol is 
to make firmware updates. In normal life it is a simple keyboard. And sending 
out bulk messages is the great advantage of CDC/ACM


What is still puzzling me is the fact that the host controller stops any 
communication.
That means there is really electrically no communication (bulk_out) from HC to 
device anymore. It seems that the host controller has shut down communication 
port to one particular device. unbind and bind host controller will solve the 
problem

But anyway I will try do my best to find out the root cause of 
mis-communication between between both sides.


> You mention device-side buffering and that the device at some point
> can't accept anything more from the host. With USB this means that
> you must ensure that the host will know when it must not send more.


I thought sending NAK as response for each package is the correct way to tell 
the host "not now but maybe later.Please try again".  After the internal device 
queue is not completely full namyore the comunication is done in normal way. 
But after some time HC stops completely any communication. 
In real life it means a huge firmware update takes long time and so  it could 
happens the internal device  queue is full. But a broken firmware update is a 
bad thing


> The USB way to do this, were you using an application-specific
> protocol instead of serial port simulation, would be to stall the
> endpoint. Unfortunately CDC-ACM doesn't allow doing that.

Ok. I will think about this if another way is possible


> So you have to include some kind of in-band signalling for this. :\
>
> This is just one reason why ACM is a poor choice for when you need
> structured communication.


Anyway many many thanks for your precious time and detailed answers.

My conclusions and todo :

1. Thinking about design
2. Still try to find out the main reason why host controller shutdown connection

Arrrghhh Just saw also USB 2.0 has some problems. Host controller is resetting 
after some hours but not getting in work state again.

I hope in future I can make more sensible contributions to the list

Best Regards
   Andreas--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-04 Thread Peter Stuge
Hi Andreas,

Kasberger Andreas wrote:
> XHCI Host Controller on Intel Panther Point with CDC/ACM dead after
> massive NAK
> PCH 82HM76 (PantherPoint) chipset connect with  a Renesas RX621
> How to reproduce :
> No Reader on device /dev/ttyACM0 connected
> Writer will send in endless loop always same command
> : echo "readhik">/dev/ttyACM0
> 
> Function: Renesas RX621 will receive command, put into a internal
> queue and it is waiting for reader. As long reader is comming it
> will stored in the queue. If command has reached time out the
> command will removed from queue. If queue is full every command
> will be answered with NAK
> 
> The response will be nearly always with NAK because nobody will
> read from /dev/ttyACM0
> 
> After some time ( between seconds and several hours ) the host
> controller will not send anything to the device connected on
> Renesas RX621. 
> We proven this with a analyzer directly on the bus. Other devices
> connect to host controller are still alive and working.

I'm sorry, but this protocol design is rather broken use of USB.

That said, the HC must certainly be more robust in dealing with it.


On the protocol design:

First, using CDC-ACM means sacrificing all structured communication
offered by the USB packet bus and settling for such primitive use of
USB is not a decision that should be made lightly. Almost all
applications can benefit quite significantly both in end-user
usability and in ease of implementation from an application-specific
protocol which takes advantage of what USB offers.

Application-specific or vendor-specific are often frowned upon in
other contexts but if the protocol is documented publically then it
is a great way to take advantage of all that USB offers, and it is
explicitly supported by the specification. Use bDeviceClass or
bInterfaceClass 0xff.


But more importantly, regardless of the application protocol, with
USB it is the absolute and complete responsibility of the host-side
software to communicate with the device only *exactly* in the way
that the device supports.

You mention device-side buffering and that the device at some point
can't accept anything more from the host. With USB this means that
you must ensure that the host will know when it must not send more.

The USB way to do this, were you using an application-specific
protocol instead of serial port simulation, would be to stall the
endpoint. Unfortunately CDC-ACM doesn't allow doing that.

So you have to include some kind of in-band signalling for this. :\

This is just one reason why ACM is a poor choice for when you need
structured communication.


//Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html