Re: g_mass_storage bug ?

2014-09-26 Thread Felipe Balbi
On Thu, Sep 25, 2014 at 06:12:51PM -0500, Felipe Balbi wrote:
> Hi,
> 
> On Thu, Sep 25, 2014 at 09:57:59PM +, Paul Zimmerman wrote:
> > > From: Felipe Balbi [mailto:ba...@ti.com]
> > > 
> > > On Thu, Sep 25, 2014 at 07:08:12PM +, Paul Zimmerman wrote:
> > > > > From: Alan Stern [mailto:st...@rowland.harvard.edu]
> > > > > Sent: Thursday, September 25, 2014 11:08 AM
> > > > >
> > > > > On Thu, 25 Sep 2014, Paul Zimmerman wrote:
> > > > >
> > > > > > That's why I don't understand how this can happen for IN. AFAIK, a 
> > > > > > STALL
> > > > > > is only sent in response to something the host sent in the CBW. At 
> > > > > > that
> > > > > > point, there shouldn't be any IN transfers active.
> > > > >
> > > > > The gadget may send a partial response to the CBW.  The end of the
> > > > > response is marked with a STALL.  The mass-storage gadget driver
> > > > > submits the partial response and then calls usb_ep_set_halt() without
> > > > > waiting for the IN data to be delivered.  It relies on the UDC driver
> > > > > returning -EAGAIN if any data is still pending.
> > > >
> > > > I guess you're referring to the code under the DATA_DIR_TO_HOST case
> > > > in finish_reply().
> > > >
> > > > Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge()
> > > > functions check the request queue for the endpoint, and if it is not
> > > > empty, they return -EAGAIN.
> > > 
> > > is that really enough ? The request is deleted (rather, moved to another
> > > list) as soon as StartTransfer is issued. I can certainly check both
> > > lists (and already do) but I fear that we might still race with the HW
> > > because there's no documentation to guarantee that I won't :-)
> > > 
> > > > I see your patch for the dwc3 driver does add that, in addition to the
> > > > FIFO empty check. Does it still work OK if you remove the FIFO empty
> > > > check portion?
> > > 
> > > It probably does, but I can't be sure there won't be a corner case where
> > > the list is empty (Xfercomplete was issued and request given back) but
> > > host hasn't moved all the data. Synopsys databook doesn't guarantee that
> > > will never be the case.
> > > 
> > > Can you confirm, without a shadow of a doubt, that XferComplete will
> > > only be issued after host has moved every single bit of data ? If that
> > > can be updated to the databook, then sure, I can remove the FIFO space
> > > check; otherwise we might leave a corner case that will take us a few
> > > days to debug again because it will be very difficult to trigger :-)
> > 
> > From the 2.70a databook, section 8.2.2:
> > 
> > "While processing TRBs, two conditions may cause the core to write out an
> >  event and raise an interrupt line:
> > - TRB Complete:
> > - For OUT endpoints, a packet is received which reduces the
> >   remaining byte count in the TRB buffer to zero.
> > - For IN endpoints, an acknowledgement is received for a
> >   transmitted packet which reduces the remaining byte count
> >   in the TRB buffer to zero.
> > - Short Packet Received:
> > - For OUT endpoints only. While writing to a TRB buffer,
> >   the endpoint receives a packet that is smaller than the
> >   endpoint's MaxPacketSize.
> > "
> > 
> > So for an IN, the completion event doesn't come until the TRB has been
> > ACKed, which means all the data has been sent.
> 
> a very good point indeed. I completely missed that. Well, the patch can
> become a lot smaller, which makes my life easier when backporting :-)
> 
> I'll test that out tomorrow, for sure :-)
> 
> > But, I'm not saying you *have* to remove the FIFO space check. I think
> > we now know the problem was caused by the missing -EAGAIN. So having
> 
> sure, that's certainly what was missing.
> 
> > the FIFO check there shouldn't hurt anything, it's not masking some
> 
> it's also completely useless. A few register accesses for no reason :-)
> 
> > other problem. And you may need it in the future for one of those
> > other cases the databook talks about.
> 
> I'll keep that in the back of my head. Perhaps I can repurpose these
> FIFO space accessors to implement usb_ep_fifo_status(). I'll consider
> that.

alright, removed all that and it still works fine. Even passes Halt
Endpoint test on usb20cv.

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-25 Thread Felipe Balbi
Hi,

On Thu, Sep 25, 2014 at 09:57:59PM +, Paul Zimmerman wrote:
> > From: Felipe Balbi [mailto:ba...@ti.com]
> > 
> > On Thu, Sep 25, 2014 at 07:08:12PM +, Paul Zimmerman wrote:
> > > > From: Alan Stern [mailto:st...@rowland.harvard.edu]
> > > > Sent: Thursday, September 25, 2014 11:08 AM
> > > >
> > > > On Thu, 25 Sep 2014, Paul Zimmerman wrote:
> > > >
> > > > > That's why I don't understand how this can happen for IN. AFAIK, a 
> > > > > STALL
> > > > > is only sent in response to something the host sent in the CBW. At 
> > > > > that
> > > > > point, there shouldn't be any IN transfers active.
> > > >
> > > > The gadget may send a partial response to the CBW.  The end of the
> > > > response is marked with a STALL.  The mass-storage gadget driver
> > > > submits the partial response and then calls usb_ep_set_halt() without
> > > > waiting for the IN data to be delivered.  It relies on the UDC driver
> > > > returning -EAGAIN if any data is still pending.
> > >
> > > I guess you're referring to the code under the DATA_DIR_TO_HOST case
> > > in finish_reply().
> > >
> > > Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge()
> > > functions check the request queue for the endpoint, and if it is not
> > > empty, they return -EAGAIN.
> > 
> > is that really enough ? The request is deleted (rather, moved to another
> > list) as soon as StartTransfer is issued. I can certainly check both
> > lists (and already do) but I fear that we might still race with the HW
> > because there's no documentation to guarantee that I won't :-)
> > 
> > > I see your patch for the dwc3 driver does add that, in addition to the
> > > FIFO empty check. Does it still work OK if you remove the FIFO empty
> > > check portion?
> > 
> > It probably does, but I can't be sure there won't be a corner case where
> > the list is empty (Xfercomplete was issued and request given back) but
> > host hasn't moved all the data. Synopsys databook doesn't guarantee that
> > will never be the case.
> > 
> > Can you confirm, without a shadow of a doubt, that XferComplete will
> > only be issued after host has moved every single bit of data ? If that
> > can be updated to the databook, then sure, I can remove the FIFO space
> > check; otherwise we might leave a corner case that will take us a few
> > days to debug again because it will be very difficult to trigger :-)
> 
> From the 2.70a databook, section 8.2.2:
> 
> "While processing TRBs, two conditions may cause the core to write out an
>  event and raise an interrupt line:
>   - TRB Complete:
>   - For OUT endpoints, a packet is received which reduces the
> remaining byte count in the TRB buffer to zero.
>   - For IN endpoints, an acknowledgement is received for a
> transmitted packet which reduces the remaining byte count
> in the TRB buffer to zero.
>   - Short Packet Received:
>   - For OUT endpoints only. While writing to a TRB buffer,
> the endpoint receives a packet that is smaller than the
> endpoint's MaxPacketSize.
> "
> 
> So for an IN, the completion event doesn't come until the TRB has been
> ACKed, which means all the data has been sent.

a very good point indeed. I completely missed that. Well, the patch can
become a lot smaller, which makes my life easier when backporting :-)

I'll test that out tomorrow, for sure :-)

> But, I'm not saying you *have* to remove the FIFO space check. I think
> we now know the problem was caused by the missing -EAGAIN. So having

sure, that's certainly what was missing.

> the FIFO check there shouldn't hurt anything, it's not masking some

it's also completely useless. A few register accesses for no reason :-)

> other problem. And you may need it in the future for one of those
> other cases the databook talks about.

I'll keep that in the back of my head. Perhaps I can repurpose these
FIFO space accessors to implement usb_ep_fifo_status(). I'll consider
that.

cheers

-- 
balbi


signature.asc
Description: Digital signature


RE: g_mass_storage bug ?

2014-09-25 Thread Paul Zimmerman
> From the 2.70a databook, section 8.2.2:

Sorry, it's databook section 8.2.3, not 8.2.2.

-- 
Paul

> From: Paul Zimmerman
> Sent: Thursday, September 25, 2014 2:58 PM
> 
> > From: Felipe Balbi [mailto:ba...@ti.com]
> >
> > On Thu, Sep 25, 2014 at 07:08:12PM +, Paul Zimmerman wrote:
> > > > From: Alan Stern [mailto:st...@rowland.harvard.edu]
> > > > Sent: Thursday, September 25, 2014 11:08 AM
> > > >
> > > > On Thu, 25 Sep 2014, Paul Zimmerman wrote:
> > > >
> > > > > That's why I don't understand how this can happen for IN. AFAIK, a 
> > > > > STALL
> > > > > is only sent in response to something the host sent in the CBW. At 
> > > > > that
> > > > > point, there shouldn't be any IN transfers active.
> > > >
> > > > The gadget may send a partial response to the CBW.  The end of the
> > > > response is marked with a STALL.  The mass-storage gadget driver
> > > > submits the partial response and then calls usb_ep_set_halt() without
> > > > waiting for the IN data to be delivered.  It relies on the UDC driver
> > > > returning -EAGAIN if any data is still pending.
> > >
> > > I guess you're referring to the code under the DATA_DIR_TO_HOST case
> > > in finish_reply().
> > >
> > > Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge()
> > > functions check the request queue for the endpoint, and if it is not
> > > empty, they return -EAGAIN.
> >
> > is that really enough ? The request is deleted (rather, moved to another
> > list) as soon as StartTransfer is issued. I can certainly check both
> > lists (and already do) but I fear that we might still race with the HW
> > because there's no documentation to guarantee that I won't :-)
> >
> > > I see your patch for the dwc3 driver does add that, in addition to the
> > > FIFO empty check. Does it still work OK if you remove the FIFO empty
> > > check portion?
> >
> > It probably does, but I can't be sure there won't be a corner case where
> > the list is empty (Xfercomplete was issued and request given back) but
> > host hasn't moved all the data. Synopsys databook doesn't guarantee that
> > will never be the case.
> >
> > Can you confirm, without a shadow of a doubt, that XferComplete will
> > only be issued after host has moved every single bit of data ? If that
> > can be updated to the databook, then sure, I can remove the FIFO space
> > check; otherwise we might leave a corner case that will take us a few
> > days to debug again because it will be very difficult to trigger :-)
> 
> From the 2.70a databook, section 8.2.2:
> 
> "While processing TRBs, two conditions may cause the core to write out an
>  event and raise an interrupt line:
>   - TRB Complete:
>   - For OUT endpoints, a packet is received which reduces the
> remaining byte count in the TRB buffer to zero.
>   - For IN endpoints, an acknowledgement is received for a
> transmitted packet which reduces the remaining byte count
> in the TRB buffer to zero.
>   - Short Packet Received:
>   - For OUT endpoints only. While writing to a TRB buffer,
> the endpoint receives a packet that is smaller than the
> endpoint's MaxPacketSize.
> "
> 
> So for an IN, the completion event doesn't come until the TRB has been
> ACKed, which means all the data has been sent.
> 
> But, I'm not saying you *have* to remove the FIFO space check. I think
> we now know the problem was caused by the missing -EAGAIN. So having
> the FIFO check there shouldn't hurt anything, it's not masking some
> other problem. And you may need it in the future for one of those
> other cases the databook talks about.
> 
> --
> Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: g_mass_storage bug ?

2014-09-25 Thread Paul Zimmerman
> From: Felipe Balbi [mailto:ba...@ti.com]
> 
> On Thu, Sep 25, 2014 at 07:08:12PM +, Paul Zimmerman wrote:
> > > From: Alan Stern [mailto:st...@rowland.harvard.edu]
> > > Sent: Thursday, September 25, 2014 11:08 AM
> > >
> > > On Thu, 25 Sep 2014, Paul Zimmerman wrote:
> > >
> > > > That's why I don't understand how this can happen for IN. AFAIK, a STALL
> > > > is only sent in response to something the host sent in the CBW. At that
> > > > point, there shouldn't be any IN transfers active.
> > >
> > > The gadget may send a partial response to the CBW.  The end of the
> > > response is marked with a STALL.  The mass-storage gadget driver
> > > submits the partial response and then calls usb_ep_set_halt() without
> > > waiting for the IN data to be delivered.  It relies on the UDC driver
> > > returning -EAGAIN if any data is still pending.
> >
> > I guess you're referring to the code under the DATA_DIR_TO_HOST case
> > in finish_reply().
> >
> > Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge()
> > functions check the request queue for the endpoint, and if it is not
> > empty, they return -EAGAIN.
> 
> is that really enough ? The request is deleted (rather, moved to another
> list) as soon as StartTransfer is issued. I can certainly check both
> lists (and already do) but I fear that we might still race with the HW
> because there's no documentation to guarantee that I won't :-)
> 
> > I see your patch for the dwc3 driver does add that, in addition to the
> > FIFO empty check. Does it still work OK if you remove the FIFO empty
> > check portion?
> 
> It probably does, but I can't be sure there won't be a corner case where
> the list is empty (Xfercomplete was issued and request given back) but
> host hasn't moved all the data. Synopsys databook doesn't guarantee that
> will never be the case.
> 
> Can you confirm, without a shadow of a doubt, that XferComplete will
> only be issued after host has moved every single bit of data ? If that
> can be updated to the databook, then sure, I can remove the FIFO space
> check; otherwise we might leave a corner case that will take us a few
> days to debug again because it will be very difficult to trigger :-)

>From the 2.70a databook, section 8.2.2:

"While processing TRBs, two conditions may cause the core to write out an
 event and raise an interrupt line:
- TRB Complete:
- For OUT endpoints, a packet is received which reduces the
  remaining byte count in the TRB buffer to zero.
- For IN endpoints, an acknowledgement is received for a
  transmitted packet which reduces the remaining byte count
  in the TRB buffer to zero.
- Short Packet Received:
- For OUT endpoints only. While writing to a TRB buffer,
  the endpoint receives a packet that is smaller than the
  endpoint's MaxPacketSize.
"

So for an IN, the completion event doesn't come until the TRB has been
ACKed, which means all the data has been sent.

But, I'm not saying you *have* to remove the FIFO space check. I think
we now know the problem was caused by the missing -EAGAIN. So having
the FIFO check there shouldn't hurt anything, it's not masking some
other problem. And you may need it in the future for one of those
other cases the databook talks about.

-- 
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: g_mass_storage bug ?

2014-09-25 Thread Felipe Balbi
Hi,

On Thu, Sep 25, 2014 at 07:08:12PM +, Paul Zimmerman wrote:
> > From: Alan Stern [mailto:st...@rowland.harvard.edu]
> > Sent: Thursday, September 25, 2014 11:08 AM
> > 
> > On Thu, 25 Sep 2014, Paul Zimmerman wrote:
> > 
> > > That's why I don't understand how this can happen for IN. AFAIK, a STALL
> > > is only sent in response to something the host sent in the CBW. At that
> > > point, there shouldn't be any IN transfers active.
> > 
> > The gadget may send a partial response to the CBW.  The end of the
> > response is marked with a STALL.  The mass-storage gadget driver
> > submits the partial response and then calls usb_ep_set_halt() without
> > waiting for the IN data to be delivered.  It relies on the UDC driver
> > returning -EAGAIN if any data is still pending.
> 
> I guess you're referring to the code under the DATA_DIR_TO_HOST case
> in finish_reply().
> 
> Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge()
> functions check the request queue for the endpoint, and if it is not
> empty, they return -EAGAIN.

is that really enough ? The request is deleted (rather, moved to another
list) as soon as StartTransfer is issued. I can certainly check both
lists (and already do) but I fear that we might still race with the HW
because there's no documentation to guarantee that I won't :-)

> I see your patch for the dwc3 driver does add that, in addition to the
> FIFO empty check. Does it still work OK if you remove the FIFO empty
> check portion?

It probably does, but I can't be sure there won't be a corner case where
the list is empty (Xfercomplete was issued and request given back) but
host hasn't moved all the data. Synopsys databook doesn't guarantee that
will never be the case.

Can you confirm, without a shadow of a doubt, that XferComplete will
only be issued after host has moved every single bit of data ? If that
can be updated to the databook, then sure, I can remove the FIFO space
check; otherwise we might leave a corner case that will take us a few
days to debug again because it will be very difficult to trigger :-)

cheers

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-25 Thread Felipe Balbi
Hi,

On Thu, Sep 25, 2014 at 05:50:59PM +, Paul Zimmerman wrote:
> > From: Felipe Balbi [mailto:ba...@ti.com]
> > Sent: Wednesday, September 24, 2014 7:41 PM
> > 
> > On Thu, Sep 25, 2014 at 01:37:05AM +, Paul Zimmerman wrote:
> > > > > > Then, after the host cleared the halt, the gadget apparently sent 
> > > > > > the
> > > > > > data that _should_ have been sent previously.  The host was 
> > > > > > expecting
> > > > > > to receive the CSW at this point, so there was an overflow error.
> > > > > > That's what caused the host to perform a reset.
> > > > > >
> > > > > > Evidently this UDC implements the set_halt method incorrectly.
> > > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > > >
> > > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if 
> > > > > > any
> > > > > >  * transfer requests are still queued, or if the controller hardware
> > > > > >  * (usually a FIFO) still holds bytes that the host hasn't 
> > > > > > collected.
> > > > >
> > > > > damn old bugs :-) I'll fix that up and Cc stable.
> > > >
> > > > alright fixed. Below you can see a combined diff which I'll still split
> > > > into patches so they can be applied properly.
> > >
> > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ]
> > >
> > > Ick. That shouldn't be necessary. The Synopsys vendor driver works
> > 
> > you need to make sure that both there are no pending transfers and FIFO
> > is empty in case of IN endpoints. Databook itself says (sorry, forgot
> > the section and no access to the docs right now) that to figure out if
> > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any
> > better ideas ?
> 
> I guess I'm just afraid this may just be papering over the real cause.

Well, the API documentation, as pointed out by Alan, calls for:

380  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
381  * transfer requests are still queued, or if the controller hardware
382  * (usually a FIFO) still holds bytes that the host hasn't collected.

There's no other way to ensuring FIFO is empty with this core.

> > > fine with the mass-storage gadget (with stall=1) without doing anything
> > > like that.
> > >
> > > When did the dwc3 driver start showing this problem? Wouldn't it be
> > > better to find the change which caused this? Or has dwc3 always had
> > 
> > it's probably been like that since ever :-) I just figured that my
> > scripts always had stall=0 and ended up never testing the other way.
> > 
> > > this problem, but you never tested with stall=1 before so didn't see
> > > it?
> > 
> > yup. Note that it's not enough to checked for TRB completion because
> > there could still be data in the FIFO which the host hasn't moved yet,
> > unless it's 100% guaranteed that the core won't fire XferComplete until
> > every single bit of data has been moved by the host.
> 
> That is supposed to be 100% guaranteed. The IN XferInProgress and
> XferComplete events are only delivered after the host has ACKed the
> packet.
> 
> > But the way things are, I'd expect core to be firing transfer complete
> > as soon as all data has reached the FIFO (in case of TX).
> 
> Nope.
> 
> That's why I don't understand how this can happen for IN. AFAIK, a STALL
> is only sent in response to something the host sent in the CBW. At that
> point, there shouldn't be any IN transfers active.

you could race with the HW, right ? You just issued Start Transfer for
that IN transfer and control returns to the gadget driver which can, at
that very moment, issue a Set Stall. SW *must* make sure that there's
nothing pending yet. We haven't received XferComplete for that IN we
just started.

> BTW, about a year ago, I fixed a similar issue in our vendor driver.
> But I don't remember what the cause was, I will search the Git log to
> see if I can find the commit that fixed it.

ok, thanks.

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-25 Thread Felipe Balbi
On Thu, Sep 25, 2014 at 07:43:35PM +, Paul Zimmerman wrote:
> > From: Paul Zimmerman
> > 
> > > From: Felipe Balbi [mailto:ba...@ti.com]
> > >
> > > On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote:
> > > > On Thu, Sep 25, 2014 at 01:37:05AM +, Paul Zimmerman wrote:
> > > > > > > > Then, after the host cleared the halt, the gadget apparently 
> > > > > > > > sent the
> > > > > > > > data that _should_ have been sent previously.  The host was 
> > > > > > > > expecting
> > > > > > > > to receive the CSW at this point, so there was an overflow 
> > > > > > > > error.
> > > > > > > > That's what caused the host to perform a reset.
> > > > > > > >
> > > > > > > > Evidently this UDC implements the set_halt method incorrectly.
> > > > > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > > > > >
> > > > > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) 
> > > > > > > > if any
> > > > > > > >  * transfer requests are still queued, or if the controller 
> > > > > > > > hardware
> > > > > > > >  * (usually a FIFO) still holds bytes that the host hasn't 
> > > > > > > > collected.
> > > > > > >
> > > > > > > damn old bugs :-) I'll fix that up and Cc stable.
> > > > > >
> > > > > > alright fixed. Below you can see a combined diff which I'll still 
> > > > > > split
> > > > > > into patches so they can be applied properly.
> > > > >
> > > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ]
> > > > >
> > > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works
> > > >
> > > > you need to make sure that both there are no pending transfers and FIFO
> > > > is empty in case of IN endpoints. Databook itself says (sorry, forgot
> > > > the section and no access to the docs right now) that to figure out if
> > > > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any
> > > > better ideas ?
> > >
> > > oh, and btw... I only noticed the failure with Linux. I mentioned that
> > > it, though flakey during start, worked very stably against Mac OS X.
> > > Perhaps Windows is also more forgiving ?
> > 
> > Yeah, I see you said it only fails with 3.17-rc on an embedded platform
> > you have. Can you say which platform that is? Do you know whose xHCI
> > controller it uses?
> > 
> > > Can you share a little more details on what you guys did to get it to
> > > work without making sure FIFO is empty ? I can't really understand how
> > > you'd cover all cases otherwise...
> > 
> > See my other email, I will try to find the fix for the similar issue
> > that we saw.
> 
> OK, it looks like that issue is not related to this one. We were
> accidentally setting the Stream Capable bit in the endpoint
> configuration for all bulk EPs, not just stream-enabled ones. That
> caused several strange behaviors, including the Set Stall command
> not working.

ok, this it not about Set Stall not working, however, it's about making
sure we don't stall at the wrong time :-)

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-25 Thread Felipe Balbi
On Thu, Sep 25, 2014 at 05:54:10PM +, Paul Zimmerman wrote:
> > From: Felipe Balbi [mailto:ba...@ti.com]
> > 
> > On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote:
> > > On Thu, Sep 25, 2014 at 01:37:05AM +, Paul Zimmerman wrote:
> > > > > > > Then, after the host cleared the halt, the gadget apparently sent 
> > > > > > > the
> > > > > > > data that _should_ have been sent previously.  The host was 
> > > > > > > expecting
> > > > > > > to receive the CSW at this point, so there was an overflow error.
> > > > > > > That's what caused the host to perform a reset.
> > > > > > >
> > > > > > > Evidently this UDC implements the set_halt method incorrectly.
> > > > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > > > >
> > > > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if 
> > > > > > > any
> > > > > > >  * transfer requests are still queued, or if the controller 
> > > > > > > hardware
> > > > > > >  * (usually a FIFO) still holds bytes that the host hasn't 
> > > > > > > collected.
> > > > > >
> > > > > > damn old bugs :-) I'll fix that up and Cc stable.
> > > > >
> > > > > alright fixed. Below you can see a combined diff which I'll still 
> > > > > split
> > > > > into patches so they can be applied properly.
> > > >
> > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ]
> > > >
> > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works
> > >
> > > you need to make sure that both there are no pending transfers and FIFO
> > > is empty in case of IN endpoints. Databook itself says (sorry, forgot
> > > the section and no access to the docs right now) that to figure out if
> > > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any
> > > better ideas ?
> > 
> > oh, and btw... I only noticed the failure with Linux. I mentioned that
> > it, though flakey during start, worked very stably against Mac OS X.
> > Perhaps Windows is also more forgiving ?
> 
> Yeah, I see you said it only fails with 3.17-rc on an embedded platform
> you have. Can you say which platform that is? Do you know whose xHCI

AM473x with DWC3 2.40a (2 instances of it).

> controller it uses?

yours :-)

> > Can you share a little more details on what you guys did to get it to
> > work without making sure FIFO is empty ? I can't really understand how
> > you'd cover all cases otherwise...
> 
> See my other email, I will try to find the fix for the similar issue
> that we saw.

alright, tks

-- 
balbi


signature.asc
Description: Digital signature


RE: g_mass_storage bug ?

2014-09-25 Thread Paul Zimmerman
> From: Paul Zimmerman
> 
> > From: Felipe Balbi [mailto:ba...@ti.com]
> >
> > On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote:
> > > On Thu, Sep 25, 2014 at 01:37:05AM +, Paul Zimmerman wrote:
> > > > > > > Then, after the host cleared the halt, the gadget apparently sent 
> > > > > > > the
> > > > > > > data that _should_ have been sent previously.  The host was 
> > > > > > > expecting
> > > > > > > to receive the CSW at this point, so there was an overflow error.
> > > > > > > That's what caused the host to perform a reset.
> > > > > > >
> > > > > > > Evidently this UDC implements the set_halt method incorrectly.
> > > > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > > > >
> > > > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if 
> > > > > > > any
> > > > > > >  * transfer requests are still queued, or if the controller 
> > > > > > > hardware
> > > > > > >  * (usually a FIFO) still holds bytes that the host hasn't 
> > > > > > > collected.
> > > > > >
> > > > > > damn old bugs :-) I'll fix that up and Cc stable.
> > > > >
> > > > > alright fixed. Below you can see a combined diff which I'll still 
> > > > > split
> > > > > into patches so they can be applied properly.
> > > >
> > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ]
> > > >
> > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works
> > >
> > > you need to make sure that both there are no pending transfers and FIFO
> > > is empty in case of IN endpoints. Databook itself says (sorry, forgot
> > > the section and no access to the docs right now) that to figure out if
> > > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any
> > > better ideas ?
> >
> > oh, and btw... I only noticed the failure with Linux. I mentioned that
> > it, though flakey during start, worked very stably against Mac OS X.
> > Perhaps Windows is also more forgiving ?
> 
> Yeah, I see you said it only fails with 3.17-rc on an embedded platform
> you have. Can you say which platform that is? Do you know whose xHCI
> controller it uses?
> 
> > Can you share a little more details on what you guys did to get it to
> > work without making sure FIFO is empty ? I can't really understand how
> > you'd cover all cases otherwise...
> 
> See my other email, I will try to find the fix for the similar issue
> that we saw.

OK, it looks like that issue is not related to this one. We were
accidentally setting the Stream Capable bit in the endpoint
configuration for all bulk EPs, not just stream-enabled ones. That
caused several strange behaviors, including the Set Stall command
not working.

-- 
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: g_mass_storage bug ?

2014-09-25 Thread Paul Zimmerman
> From: Alan Stern [mailto:st...@rowland.harvard.edu]
> Sent: Thursday, September 25, 2014 11:08 AM
> 
> On Thu, 25 Sep 2014, Paul Zimmerman wrote:
> 
> > That's why I don't understand how this can happen for IN. AFAIK, a STALL
> > is only sent in response to something the host sent in the CBW. At that
> > point, there shouldn't be any IN transfers active.
> 
> The gadget may send a partial response to the CBW.  The end of the
> response is marked with a STALL.  The mass-storage gadget driver
> submits the partial response and then calls usb_ep_set_halt() without
> waiting for the IN data to be delivered.  It relies on the UDC driver
> returning -EAGAIN if any data is still pending.

I guess you're referring to the code under the DATA_DIR_TO_HOST case
in finish_reply().

Felipe, in our vendor driver, the ep_set_halt() and ep_set_wedge()
functions check the request queue for the endpoint, and if it is not
empty, they return -EAGAIN.

I see your patch for the dwc3 driver does add that, in addition to the
FIFO empty check. Does it still work OK if you remove the FIFO empty
check portion?

-- 
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: g_mass_storage bug ?

2014-09-25 Thread Alan Stern
On Thu, 25 Sep 2014, Paul Zimmerman wrote:

> That's why I don't understand how this can happen for IN. AFAIK, a STALL
> is only sent in response to something the host sent in the CBW. At that
> point, there shouldn't be any IN transfers active.

The gadget may send a partial response to the CBW.  The end of the 
response is marked with a STALL.  The mass-storage gadget driver 
submits the partial response and then calls usb_ep_set_halt() without 
waiting for the IN data to be delivered.  It relies on the UDC driver 
returning -EAGAIN if any data is still pending.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: g_mass_storage bug ?

2014-09-25 Thread Paul Zimmerman
> From: Felipe Balbi [mailto:ba...@ti.com]
> 
> On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote:
> > On Thu, Sep 25, 2014 at 01:37:05AM +, Paul Zimmerman wrote:
> > > > > > Then, after the host cleared the halt, the gadget apparently sent 
> > > > > > the
> > > > > > data that _should_ have been sent previously.  The host was 
> > > > > > expecting
> > > > > > to receive the CSW at this point, so there was an overflow error.
> > > > > > That's what caused the host to perform a reset.
> > > > > >
> > > > > > Evidently this UDC implements the set_halt method incorrectly.
> > > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > > >
> > > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if 
> > > > > > any
> > > > > >  * transfer requests are still queued, or if the controller hardware
> > > > > >  * (usually a FIFO) still holds bytes that the host hasn't 
> > > > > > collected.
> > > > >
> > > > > damn old bugs :-) I'll fix that up and Cc stable.
> > > >
> > > > alright fixed. Below you can see a combined diff which I'll still split
> > > > into patches so they can be applied properly.
> > >
> > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ]
> > >
> > > Ick. That shouldn't be necessary. The Synopsys vendor driver works
> >
> > you need to make sure that both there are no pending transfers and FIFO
> > is empty in case of IN endpoints. Databook itself says (sorry, forgot
> > the section and no access to the docs right now) that to figure out if
> > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any
> > better ideas ?
> 
> oh, and btw... I only noticed the failure with Linux. I mentioned that
> it, though flakey during start, worked very stably against Mac OS X.
> Perhaps Windows is also more forgiving ?

Yeah, I see you said it only fails with 3.17-rc on an embedded platform
you have. Can you say which platform that is? Do you know whose xHCI
controller it uses?

> Can you share a little more details on what you guys did to get it to
> work without making sure FIFO is empty ? I can't really understand how
> you'd cover all cases otherwise...

See my other email, I will try to find the fix for the similar issue
that we saw.

-- 
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: g_mass_storage bug ?

2014-09-25 Thread Paul Zimmerman
> From: Felipe Balbi [mailto:ba...@ti.com]
> Sent: Wednesday, September 24, 2014 7:41 PM
> 
> On Thu, Sep 25, 2014 at 01:37:05AM +, Paul Zimmerman wrote:
> > > > > Then, after the host cleared the halt, the gadget apparently sent the
> > > > > data that _should_ have been sent previously.  The host was expecting
> > > > > to receive the CSW at this point, so there was an overflow error.
> > > > > That's what caused the host to perform a reset.
> > > > >
> > > > > Evidently this UDC implements the set_halt method incorrectly.
> > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > >
> > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> > > > >  * transfer requests are still queued, or if the controller hardware
> > > > >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> > > >
> > > > damn old bugs :-) I'll fix that up and Cc stable.
> > >
> > > alright fixed. Below you can see a combined diff which I'll still split
> > > into patches so they can be applied properly.
> >
> > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ]
> >
> > Ick. That shouldn't be necessary. The Synopsys vendor driver works
> 
> you need to make sure that both there are no pending transfers and FIFO
> is empty in case of IN endpoints. Databook itself says (sorry, forgot
> the section and no access to the docs right now) that to figure out if
> the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any
> better ideas ?

I guess I'm just afraid this may just be papering over the real cause.

> > fine with the mass-storage gadget (with stall=1) without doing anything
> > like that.
> >
> > When did the dwc3 driver start showing this problem? Wouldn't it be
> > better to find the change which caused this? Or has dwc3 always had
> 
> it's probably been like that since ever :-) I just figured that my
> scripts always had stall=0 and ended up never testing the other way.
> 
> > this problem, but you never tested with stall=1 before so didn't see
> > it?
> 
> yup. Note that it's not enough to checked for TRB completion because
> there could still be data in the FIFO which the host hasn't moved yet,
> unless it's 100% guaranteed that the core won't fire XferComplete until
> every single bit of data has been moved by the host.

That is supposed to be 100% guaranteed. The IN XferInProgress and
XferComplete events are only delivered after the host has ACKed the
packet.

> But the way things are, I'd expect core to be firing transfer complete
> as soon as all data has reached the FIFO (in case of TX).

Nope.

That's why I don't understand how this can happen for IN. AFAIK, a STALL
is only sent in response to something the host sent in the CBW. At that
point, there shouldn't be any IN transfers active.

BTW, about a year ago, I fixed a similar issue in our vendor driver.
But I don't remember what the cause was, I will search the Git log to
see if I can find the commit that fixed it.

-- 
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
Hi again,

On Wed, Sep 24, 2014 at 09:40:37PM -0500, Felipe Balbi wrote:
> On Thu, Sep 25, 2014 at 01:37:05AM +, Paul Zimmerman wrote:
> > > > > Then, after the host cleared the halt, the gadget apparently sent the
> > > > > data that _should_ have been sent previously.  The host was expecting
> > > > > to receive the CSW at this point, so there was an overflow error.
> > > > > That's what caused the host to perform a reset.
> > > > >
> > > > > Evidently this UDC implements the set_halt method incorrectly.
> > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > >
> > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> > > > >  * transfer requests are still queued, or if the controller hardware
> > > > >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> > > >
> > > > damn old bugs :-) I'll fix that up and Cc stable.
> > > 
> > > alright fixed. Below you can see a combined diff which I'll still split
> > > into patches so they can be applied properly.
> > 
> > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ]
> > 
> > Ick. That shouldn't be necessary. The Synopsys vendor driver works
> 
> you need to make sure that both there are no pending transfers and FIFO
> is empty in case of IN endpoints. Databook itself says (sorry, forgot
> the section and no access to the docs right now) that to figure out if
> the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any
> better ideas ?

oh, and btw... I only noticed the failure with Linux. I mentioned that
it, though flakey during start, worked very stably against Mac OS X.
Perhaps Windows is also more forgiving ?

Can you share a little more details on what you guys did to get it to
work without making sure FIFO is empty ? I can't really understand how
you'd cover all cases otherwise...

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
Hi,

On Thu, Sep 25, 2014 at 01:37:05AM +, Paul Zimmerman wrote:
> > > > Then, after the host cleared the halt, the gadget apparently sent the
> > > > data that _should_ have been sent previously.  The host was expecting
> > > > to receive the CSW at this point, so there was an overflow error.
> > > > That's what caused the host to perform a reset.
> > > >
> > > > Evidently this UDC implements the set_halt method incorrectly.
> > > > According to the kerneldoc for usb_ep_set_halt:
> > > >
> > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> > > >  * transfer requests are still queued, or if the controller hardware
> > > >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> > >
> > > damn old bugs :-) I'll fix that up and Cc stable.
> > 
> > alright fixed. Below you can see a combined diff which I'll still split
> > into patches so they can be applied properly.
> 
> [ snipped patch which grubs around in the GDBGFIFOSPACE registers ]
> 
> Ick. That shouldn't be necessary. The Synopsys vendor driver works

you need to make sure that both there are no pending transfers and FIFO
is empty in case of IN endpoints. Databook itself says (sorry, forgot
the section and no access to the docs right now) that to figure out if
the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any
better ideas ?

> fine with the mass-storage gadget (with stall=1) without doing anything
> like that.
> 
> When did the dwc3 driver start showing this problem? Wouldn't it be
> better to find the change which caused this? Or has dwc3 always had

it's probably been like that since ever :-) I just figured that my
scripts always had stall=0 and ended up never testing the other way.

> this problem, but you never tested with stall=1 before so didn't see
> it?

yup. Note that it's not enough to checked for TRB completion because
there could still be data in the FIFO which the host hasn't moved yet,
unless it's 100% guaranteed that the core won't fire XferComplete until
every single bit of data has been moved by the host.

But the way things are, I'd expect core to be firing transfer complete
as soon as all data has reached the FIFO (in case of TX).

cheers

-- 
balbi


signature.asc
Description: Digital signature


RE: g_mass_storage bug ?

2014-09-24 Thread Paul Zimmerman
> From: linux-usb-ow...@vger.kernel.org 
> [mailto:linux-usb-ow...@vger.kernel.org] On Behalf Of Felipe Balbi
> Sent: Wednesday, September 24, 2014 12:18 PM
> 
> On Wed, Sep 24, 2014 at 01:08:15PM -0500, Felipe Balbi wrote:
> > On Wed, Sep 24, 2014 at 01:53:31PM -0400, Alan Stern wrote:
> > > On Wed, 24 Sep 2014, Felipe Balbi wrote:
> > >
> > > > > I'll capture usbmon and send here shortly.
> > > >
> > > > here it is... Interesting part starts at line 73 (114 on this email)
> > > > where the data transport received EPIPE (due to Stall). This time
> > > > however, I was eventually able to talk to the device and managed to
> > > > issue quite a few writes to it.
> > >
> > > Here's where the unexpected stuff begins:
> > >
> > > > ed2541c0 1237463240 S Bo:003:01 -115 31 = 55534243 0600 c000 
> > > > 861a 003f00c0 
>  00
> > > > ed2541c0 1237463431 C Bo:003:01 0 31 >
> > > > ec1a8540 1237463873 S Bi:003:01 -115 192 <
> > > > ec1a8540 1237464053 C Bi:003:01 -32 0
> > > > ed2541c0 1237464158 S Co:003:00 s 02 01  0081  0
> > > > ed2541c0 1237464359 C Co:003:00 0 0
> > > > ed2541c0 1237468607 S Bi:003:01 -115 13 <
> > > > ed2541c0 1237468802 C Bi:003:01 -75 0
> > >
> > > This is the first MODE SENSE command.  The gadget should send as much
> > > data as it can before halting the bulk-IN endpoint.  Instead, the
> > > endpoint was halted first.
> > >
> > > Then, after the host cleared the halt, the gadget apparently sent the
> > > data that _should_ have been sent previously.  The host was expecting
> > > to receive the CSW at this point, so there was an overflow error.
> > > That's what caused the host to perform a reset.
> > >
> > > Evidently this UDC implements the set_halt method incorrectly.
> > > According to the kerneldoc for usb_ep_set_halt:
> > >
> > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> > >  * transfer requests are still queued, or if the controller hardware
> > >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> >
> > damn old bugs :-) I'll fix that up and Cc stable.
> 
> alright fixed. Below you can see a combined diff which I'll still split
> into patches so they can be applied properly.

[ snipped patch which grubs around in the GDBGFIFOSPACE registers ]

Ick. That shouldn't be necessary. The Synopsys vendor driver works
fine with the mass-storage gadget (with stall=1) without doing anything
like that.

When did the dwc3 driver start showing this problem? Wouldn't it be
better to find the change which caused this? Or has dwc3 always had
this problem, but you never tested with stall=1 before so didn't see
it?

-- 
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
On Wed, Sep 24, 2014 at 03:06:15PM -0500, Felipe Balbi wrote:
> On Wed, Sep 24, 2014 at 02:40:12PM -0500, Felipe Balbi wrote:
> > On Wed, Sep 24, 2014 at 03:31:13PM -0400, Alan Stern wrote:
> > > On Wed, 24 Sep 2014, Felipe Balbi wrote:
> > > 
> > > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > > > 
> > > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if 
> > > > > > any
> > > > > >  * transfer requests are still queued, or if the controller hardware
> > > > > >  * (usually a FIFO) still holds bytes that the host hasn't 
> > > > > > collected.
> > > > > 
> > > > > damn old bugs :-) I'll fix that up and Cc stable.
> > > > 
> > > > alright fixed. Below you can see a combined diff which I'll still split
> > > > into patches so they can be applied properly.
> > > 
> > > And this eliminates the problems you saw with g_mass_storage?
> > 
> > yup, working with and without stall=0, with and without debugging on. On
> > all three systems I tested before ;-)
> 
> there is still one detail which I just caught and not sure if it's
> something we should care. When I run my msc.sh/msc.c tests [1], after
> each test I see a new "sdX: unknown partition table". This doesn't
> happen with my USB stick.

it happens with the USB stick after I destroy the partition table. So I
guess that's normal.

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
On Wed, Sep 24, 2014 at 02:40:12PM -0500, Felipe Balbi wrote:
> On Wed, Sep 24, 2014 at 03:31:13PM -0400, Alan Stern wrote:
> > On Wed, 24 Sep 2014, Felipe Balbi wrote:
> > 
> > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > > 
> > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> > > > >  * transfer requests are still queued, or if the controller hardware
> > > > >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> > > > 
> > > > damn old bugs :-) I'll fix that up and Cc stable.
> > > 
> > > alright fixed. Below you can see a combined diff which I'll still split
> > > into patches so they can be applied properly.
> > 
> > And this eliminates the problems you saw with g_mass_storage?
> 
> yup, working with and without stall=0, with and without debugging on. On
> all three systems I tested before ;-)

there is still one detail which I just caught and not sure if it's
something we should care. When I run my msc.sh/msc.c tests [1], after
each test I see a new "sdX: unknown partition table". This doesn't
happen with my USB stick.

I'll fire up my sniffer again and see if I find anything peculiar.

[1] 
https://gitorious.org/usb/usb-tools/source/7eb7ef21de6cd124e0e0d0e7df9ddfff0e2f548e:msc.sh

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
On Wed, Sep 24, 2014 at 03:48:29PM -0400, Alan Stern wrote:
> On Wed, 24 Sep 2014, Felipe Balbi wrote:
> 
> > > alright fixed. Below you can see a combined diff which I'll still split
> > > into patches so they can be applied properly.
> > 
> > OTOH, there's really no way to split it. It's all needed to fix a single
> > bug. Do you want me to add Reported-by: Alan Stern ?
> 
> What we really need is a "Diagnosed-by:" tag.  :-)

heh, that's right.

> I'll settle for Suggested-by:.

alright, I'll add that :-)

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Alan Stern
On Wed, 24 Sep 2014, Felipe Balbi wrote:

> > alright fixed. Below you can see a combined diff which I'll still split
> > into patches so they can be applied properly.
> 
> OTOH, there's really no way to split it. It's all needed to fix a single
> bug. Do you want me to add Reported-by: Alan Stern ?

What we really need is a "Diagnosed-by:" tag.  :-)

I'll settle for Suggested-by:.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
On Wed, Sep 24, 2014 at 03:31:13PM -0400, Alan Stern wrote:
> On Wed, 24 Sep 2014, Felipe Balbi wrote:
> 
> > > > According to the kerneldoc for usb_ep_set_halt:
> > > > 
> > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> > > >  * transfer requests are still queued, or if the controller hardware
> > > >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> > > 
> > > damn old bugs :-) I'll fix that up and Cc stable.
> > 
> > alright fixed. Below you can see a combined diff which I'll still split
> > into patches so they can be applied properly.
> 
> And this eliminates the problems you saw with g_mass_storage?

yup, working with and without stall=0, with and without debugging on. On
all three systems I tested before ;-)

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Alan Stern
On Wed, 24 Sep 2014, Felipe Balbi wrote:

> > > According to the kerneldoc for usb_ep_set_halt:
> > > 
> > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> > >  * transfer requests are still queued, or if the controller hardware
> > >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> > 
> > damn old bugs :-) I'll fix that up and Cc stable.
> 
> alright fixed. Below you can see a combined diff which I'll still split
> into patches so they can be applied properly.

And this eliminates the problems you saw with g_mass_storage?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
Hi,

On Wed, Sep 24, 2014 at 02:18:14PM -0500, Felipe Balbi wrote:
> On Wed, Sep 24, 2014 at 01:08:15PM -0500, Felipe Balbi wrote:
> > On Wed, Sep 24, 2014 at 01:53:31PM -0400, Alan Stern wrote:
> > > On Wed, 24 Sep 2014, Felipe Balbi wrote:
> > > 
> > > > > I'll capture usbmon and send here shortly.
> > > > 
> > > > here it is... Interesting part starts at line 73 (114 on this email)
> > > > where the data transport received EPIPE (due to Stall). This time
> > > > however, I was eventually able to talk to the device and managed to
> > > > issue quite a few writes to it.
> > > 
> > > Here's where the unexpected stuff begins:
> > > 
> > > > ed2541c0 1237463240 S Bo:003:01 -115 31 = 55534243 0600 c000 
> > > > 861a 003f00c0   00
> > > > ed2541c0 1237463431 C Bo:003:01 0 31 >
> > > > ec1a8540 1237463873 S Bi:003:01 -115 192 <
> > > > ec1a8540 1237464053 C Bi:003:01 -32 0
> > > > ed2541c0 1237464158 S Co:003:00 s 02 01  0081  0
> > > > ed2541c0 1237464359 C Co:003:00 0 0
> > > > ed2541c0 1237468607 S Bi:003:01 -115 13 <
> > > > ed2541c0 1237468802 C Bi:003:01 -75 0
> > > 
> > > This is the first MODE SENSE command.  The gadget should send as much
> > > data as it can before halting the bulk-IN endpoint.  Instead, the
> > > endpoint was halted first.
> > > 
> > > Then, after the host cleared the halt, the gadget apparently sent the
> > > data that _should_ have been sent previously.  The host was expecting
> > > to receive the CSW at this point, so there was an overflow error.
> > > That's what caused the host to perform a reset.
> > > 
> > > Evidently this UDC implements the set_halt method incorrectly.  
> > > According to the kerneldoc for usb_ep_set_halt:
> > > 
> > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> > >  * transfer requests are still queued, or if the controller hardware
> > >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> > 
> > damn old bugs :-) I'll fix that up and Cc stable.
> 
> alright fixed. Below you can see a combined diff which I'll still split
> into patches so they can be applied properly.

OTOH, there's really no way to split it. It's all needed to fix a single
bug. Do you want me to add Reported-by: Alan Stern ?

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
Hi,

On Wed, Sep 24, 2014 at 01:08:15PM -0500, Felipe Balbi wrote:
> On Wed, Sep 24, 2014 at 01:53:31PM -0400, Alan Stern wrote:
> > On Wed, 24 Sep 2014, Felipe Balbi wrote:
> > 
> > > > I'll capture usbmon and send here shortly.
> > > 
> > > here it is... Interesting part starts at line 73 (114 on this email)
> > > where the data transport received EPIPE (due to Stall). This time
> > > however, I was eventually able to talk to the device and managed to
> > > issue quite a few writes to it.
> > 
> > Here's where the unexpected stuff begins:
> > 
> > > ed2541c0 1237463240 S Bo:003:01 -115 31 = 55534243 0600 c000 
> > > 861a 003f00c0   00
> > > ed2541c0 1237463431 C Bo:003:01 0 31 >
> > > ec1a8540 1237463873 S Bi:003:01 -115 192 <
> > > ec1a8540 1237464053 C Bi:003:01 -32 0
> > > ed2541c0 1237464158 S Co:003:00 s 02 01  0081  0
> > > ed2541c0 1237464359 C Co:003:00 0 0
> > > ed2541c0 1237468607 S Bi:003:01 -115 13 <
> > > ed2541c0 1237468802 C Bi:003:01 -75 0
> > 
> > This is the first MODE SENSE command.  The gadget should send as much
> > data as it can before halting the bulk-IN endpoint.  Instead, the
> > endpoint was halted first.
> > 
> > Then, after the host cleared the halt, the gadget apparently sent the
> > data that _should_ have been sent previously.  The host was expecting
> > to receive the CSW at this point, so there was an overflow error.
> > That's what caused the host to perform a reset.
> > 
> > Evidently this UDC implements the set_halt method incorrectly.  
> > According to the kerneldoc for usb_ep_set_halt:
> > 
> >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> >  * transfer requests are still queued, or if the controller hardware
> >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> 
> damn old bugs :-) I'll fix that up and Cc stable.

alright fixed. Below you can see a combined diff which I'll still split
into patches so they can be applied properly.

diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 66f6256..834f524 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -210,6 +210,14 @@
 #define DWC3_GHWPARAMS4_HIBER_SCRATCHBUFS(n)   (((n) & (0x0f << 13)) >> 13)
 #define DWC3_MAX_HIBER_SCRATCHBUFS 15
 
+/* Global FIFO Space Register */
+#define DWC3_GDBGFIFOSPACE_TXFIFO  (0 << 5)
+#define DWC3_GDBGFIFOSPACE_RXFIFO  (1 << 5)
+#define DWC3_GDBGFIFOSPACE_TXREQ_Q (2 << 5)
+#define DWC3_GDBGFIFOSPACE_RXREQ_Q (3 << 5)
+
+#define DWC3_GDBGFIFOSPACE_SPACE_AVAIL(num)(((num) & 0x) >> 16)
+
 /* Device Configuration Register */
 #define DWC3_DCFG_DEVADDR(addr)((addr) << 3)
 #define DWC3_DCFG_DEVADDR_MASK DWC3_DCFG_DEVADDR(0x7f)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index de53361..5e89913 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -145,6 +145,75 @@ int dwc3_gadget_set_link_state(struct dwc3 *dwc, enum 
dwc3_link_state state)
 }
 
 /**
+ * dwc3_gadget_ep_fifo_space - returns currently available space on FIFO
+ * @dwc: pointer to our context struct
+ * @dep: the endpoint to fetch FIFO space
+ *
+ * This function will return the currently available FIFO space
+ */
+static int dwc3_gadget_ep_fifo_space(struct dwc3 *dwc, struct dwc3_ep *dep)
+{
+   u32 current_fifo;
+   u32 reg;
+
+   if (dep->direction)
+   reg = DWC3_GDBGFIFOSPACE_TXFIFO;
+   else
+   reg = DWC3_GDBGFIFOSPACE_RXFIFO;
+
+   /* remove direction bit */
+   reg |= dep->number >> 1;
+
+   dwc3_writel(dwc->regs, DWC3_GDBGFIFOSPACE, reg);
+   reg = dwc3_readl(dwc->regs, DWC3_GDBGFIFOSPACE);
+   current_fifo = DWC3_GDBGFIFOSPACE_SPACE_AVAIL(reg);
+
+   return current_fifo;
+}
+
+/**
+ * dwc3_gadget_ep_fifo_size - returns allocated FIFO size
+ * @dwc: pointer to our context struct
+ * @dep: TX endpoint to fetch allocated FIFO size
+ *
+ * This function will read the correct TX FIFO register and
+ * return the FIFO size
+ */
+static int dwc3_gadget_ep_fifo_size(struct dwc3 *dwc, struct dwc3_ep *dep)
+{
+   if (WARN_ON(!dep->direction))
+   return 0;
+
+   return dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(dep->number)) & 0x;
+}
+
+/**
+ * dwc3_gadget_ep_fifo_empty - returns true when FIFO is empty
+ * @dwc: pointer to our context struct
+ * @dep: endpoint to request FIFO space
+ *
+ * This function will return a TRUE when FIFO is empty and FALSE
+ * otherwise.
+ */
+static int dwc3_gadget_ep_fifo_empty(struct dwc3 *dwc, struct dwc3_ep *dep)
+{
+   u32 allocated_fifo;
+   u32 current_fifo;
+
+   /* should not be called for RX endpoints */
+   if (WARN_ON(!dep->direction))
+   return false;
+
+   current_fifo = dwc3_gadget_ep_fifo_space(dwc, dep);
+   allocated_fifo = dwc3_gadget_ep_fifo_size(dwc, dep);
+
+   dev_vdbg(dwc->dev, "%s: FIFO space %u/%u\n", dep->name,
+ 

Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
On Wed, Sep 24, 2014 at 01:53:31PM -0400, Alan Stern wrote:
> On Wed, 24 Sep 2014, Felipe Balbi wrote:
> 
> > > I'll capture usbmon and send here shortly.
> > 
> > here it is... Interesting part starts at line 73 (114 on this email)
> > where the data transport received EPIPE (due to Stall). This time
> > however, I was eventually able to talk to the device and managed to
> > issue quite a few writes to it.
> 
> Here's where the unexpected stuff begins:
> 
> > ed2541c0 1237463240 S Bo:003:01 -115 31 = 55534243 0600 c000 
> > 861a 003f00c0   00
> > ed2541c0 1237463431 C Bo:003:01 0 31 >
> > ec1a8540 1237463873 S Bi:003:01 -115 192 <
> > ec1a8540 1237464053 C Bi:003:01 -32 0
> > ed2541c0 1237464158 S Co:003:00 s 02 01  0081  0
> > ed2541c0 1237464359 C Co:003:00 0 0
> > ed2541c0 1237468607 S Bi:003:01 -115 13 <
> > ed2541c0 1237468802 C Bi:003:01 -75 0
> 
> This is the first MODE SENSE command.  The gadget should send as much
> data as it can before halting the bulk-IN endpoint.  Instead, the
> endpoint was halted first.
> 
> Then, after the host cleared the halt, the gadget apparently sent the
> data that _should_ have been sent previously.  The host was expecting
> to receive the CSW at this point, so there was an overflow error.
> That's what caused the host to perform a reset.
> 
> Evidently this UDC implements the set_halt method incorrectly.  
> According to the kerneldoc for usb_ep_set_halt:
> 
>  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
>  * transfer requests are still queued, or if the controller hardware
>  * (usually a FIFO) still holds bytes that the host hasn't collected.

damn old bugs :-) I'll fix that up and Cc stable.

> This didn't happen; the endpoint was halted before the host collected 
> the pending data.
> 
> Incidentally, even though the URB completed with -EOVERFLOW status, we
> still should see the first 13 bytes of data (i.e., the portion that
> could fit into the data buffer).  But the actual_length value is 0.  I
> don't know if this is a quirk of the xHCI hardware or a bug in
> xhci-hcd.
> 
> > ec1a8540 1237469361 S Co:001:00 s 23 03 0004 0001  0
> > ec1a8540 1237471551 C Co:001:00 0 0
> > ed2209c0 1237534064 S Ci:001:00 s a3 00  0001 0004 4 <
> > ed2209c0 1237537012 C Ci:001:00 0 4 = 03051000
> > ed2209c0 1237594113 S Co:001:00 s 23 01 0014 0001  0
> > ed2209c0 1237595037 C Co:001:00 0 0
> > ed2209c0 1237595434 S Co:001:00 s 23 01 0001 0001  0
> > ed2209c0 1237597480 C Co:001:00 0 0
> 
> Immediately after resetting the port, the host disabled it.  No
> indication of why.
> 
> > ed2209c0 1237597823 S Co:001:00 s 23 03 0004 0001  0
> > ed2209c0 1237597890 C Co:001:00 0 0
> > ed2209c0 1237654005 S Ci:001:00 s a3 00  0001 0004 4 <
> > ed2209c0 1237654098 C Ci:001:00 0 4 = 03051000
> > ed2209c0 1237714084 S Co:001:00 s 23 01 0014 0001  0
> > ed2209c0 1237714151 C Co:001:00 0 0
> > ed2209c0 1237715894 S Co:001:00 s 23 01 0001 0001  0
> > ed2209c0 1237715985 C Co:001:00 0 0
> 
> Another reset followed by another port disable.
> 
> > ed2209c0 1237716244 S Co:001:00 s 23 03 0004 0001  0
> > ed2209c0 1237716308 C Co:001:00 0 0
> > ed2209c0 1237774094 S Ci:001:00 s a3 00  0001 0004 4 <
> > ed2209c0 1237775327 C Ci:001:00 0 4 = 03051000
> > ed2209c0 1237834107 S Co:001:00 s 23 01 0014 0001  0
> > ed2209c0 1237834183 C Co:001:00 0 0
> > ed2209c0 1237854094 S Ci:003:00 s 80 06 0100  0008 8 <
> > ed2209c0 1237854455 C Ci:003:00 0 8 = 12011002 0040
> > ed2209c0 1237854963 S Ci:003:00 s 80 06 0100  0012 18 <
> > ed2209c0 1237855219 C Ci:003:00 0 18 = 12011002 0040 2505a5a4 17030304 
> > 0001
> > ed2209c0 1237855544 S Ci:003:00 s 80 06 0f00  0005 5 <
> > ed2209c0 1237855771 C Ci:003:00 0 5 = 050f1600 02
> > ed2209c0 1237856062 S Ci:003:00 s 80 06 0f00  0016 22 <
> > ed2209c0 1237856265 C Ci:003:00 0 22 = 050f1600 02071002 0200 0a100300 
> > 0f000101 f401
> > ed2209c0 1237856548 S Ci:003:00 s 80 06 0200  0020 32 <
> > ed2209c0 1237858430 C Ci:003:00 0 32 = 09022000 010100c0 01090400 00020806 
> > 50010705 81020002 00070501 02000201
> > ed2200c0 1237860245 S Co:003:00 s 00 09 0001   0
> > ed2200c0 1237861785 C Co:003:00 0 0
> 
> Then a normal reset, like we should have seen originally.  I have no
> idea what was going on.  Maybe something involving warm vs. cold 
> resets?
> 
> > ed2541c0 1237875505 S Bo:003:01 -115 31 = 55534243 0700 c000 
> > 861a 003f00c0   00
> > ed2541c0 1237875778 C Bo:003:01 0 31 >
> > ed2200c0 1237876448 S Bi:003:01 -115 192 <
> > ed2200c0 1237876534 C Bi:003:01 -32 0
> > ed2541c0 1237876703 S Co:003:00 s 02 01  0081  0
> > ed2541c0 1237876883 C Co:003:00 0 0
> > ed2541c0 1237876987 S Bi:003:01 -115 13 <
> > ed2541c0 1237877114 C Bi:003:01 0 0
> > ed2541c0 1237877486 S Bi:003:01 -115 13 <
> > ed2541c0 1237877572 C Bi:003:01 0 13 = 55534253 0700 c000 01
> 
> Here the MODE SENSE

Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
Hi,

On Wed, Sep 24, 2014 at 01:56:21PM -0400, Alan Stern wrote:
> On Wed, 24 Sep 2014, Felipe Balbi wrote:
> 
> > so here's sequence of events so far:
> > 
> > - Enumration goes fine
> > - Get Max Lun   -> 0 (single lun)
> > - Inquiry   -> Passed
> > - Test Unit Ready   -> Failed
> > - Request Sense (Unit Attention)-> Passed
> > - Test Unit Ready   -> Passed
> > - Mode Sense-> Stall of Data transport.
> > - Clear Endpoint Feature (HALT) EP1 IN
> > - After clear feature, a 16 bulk in completes. Shouldn't gadget
> >   driver have cancelled that ?
> 
> No.  The 16-byte transfer (which I presume was the response to the MODE
> SENSE) should have completed _before_ the halt feature was set.  The
> UDC driver is buggy.

ahaaa, now to figure out how to synchronize that.

> > - Bus reset
> > 
> > This remains for a few iterations. One thing is very interesting ...
> > 
> > [ snip ]
> > 
> > > ed2541c0 1239906485 S Bo:003:01 -115 31 = 55534243 1e00 1200 
> > > 8603 0012   00
> > > ed2541c0 1239906590 C Bo:003:01 0 31 >
> > > ec1a8740 1239906770 S Bi:003:01 -115 18 <
> > > ec1a8740 1239906871 C Bi:003:01 0 18 = 7600 000a  
> > > 2900 
> > > ed2541c0 1239906975 S Bi:003:01 -115 13 <
> > > ed2541c0 1239907026 C Bi:003:01 0 13 = 55534253 1e00  00
> > > ed2541c0 1239907803 S Bo:003:01 -115 31 = 55534243 1f00 0002 
> > > 8ca1 082e0001  ec00 00
> > 
> > 0xa1 ? What is this ? Looks like XHCI corrupted the packet ? I can see
> > the same SCSI opcode (0xa1) with my sniffer.
> 
> 0xa1 is an ATA pass-through command.

ok, thanks.

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Alan Stern
On Wed, 24 Sep 2014, Felipe Balbi wrote:

> so here's sequence of events so far:
> 
> - Enumration goes fine
> - Get Max Lun -> 0 (single lun)
> - Inquiry -> Passed
> - Test Unit Ready -> Failed
> - Request Sense (Unit Attention)  -> Passed
> - Test Unit Ready -> Passed
> - Mode Sense  -> Stall of Data transport.
>   - Clear Endpoint Feature (HALT) EP1 IN
>   - After clear feature, a 16 bulk in completes. Shouldn't gadget
> driver have cancelled that ?

No.  The 16-byte transfer (which I presume was the response to the MODE
SENSE) should have completed _before_ the halt feature was set.  The
UDC driver is buggy.

> - Bus reset
> 
> This remains for a few iterations. One thing is very interesting ...
> 
> [ snip ]
> 
> > ed2541c0 1239906485 S Bo:003:01 -115 31 = 55534243 1e00 1200 
> > 8603 0012   00
> > ed2541c0 1239906590 C Bo:003:01 0 31 >
> > ec1a8740 1239906770 S Bi:003:01 -115 18 <
> > ec1a8740 1239906871 C Bi:003:01 0 18 = 7600 000a  2900 
> > 
> > ed2541c0 1239906975 S Bi:003:01 -115 13 <
> > ed2541c0 1239907026 C Bi:003:01 0 13 = 55534253 1e00  00
> > ed2541c0 1239907803 S Bo:003:01 -115 31 = 55534243 1f00 0002 
> > 8ca1 082e0001  ec00 00
> 
> 0xa1 ? What is this ? Looks like XHCI corrupted the packet ? I can see
> the same SCSI opcode (0xa1) with my sniffer.

0xa1 is an ATA pass-through command.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: g_mass_storage bug ?

2014-09-24 Thread Alan Stern
On Wed, 24 Sep 2014, Felipe Balbi wrote:

> > I'll capture usbmon and send here shortly.
> 
> here it is... Interesting part starts at line 73 (114 on this email)
> where the data transport received EPIPE (due to Stall). This time
> however, I was eventually able to talk to the device and managed to
> issue quite a few writes to it.

Here's where the unexpected stuff begins:

> ed2541c0 1237463240 S Bo:003:01 -115 31 = 55534243 0600 c000 861a 
> 003f00c0   00
> ed2541c0 1237463431 C Bo:003:01 0 31 >
> ec1a8540 1237463873 S Bi:003:01 -115 192 <
> ec1a8540 1237464053 C Bi:003:01 -32 0
> ed2541c0 1237464158 S Co:003:00 s 02 01  0081  0
> ed2541c0 1237464359 C Co:003:00 0 0
> ed2541c0 1237468607 S Bi:003:01 -115 13 <
> ed2541c0 1237468802 C Bi:003:01 -75 0

This is the first MODE SENSE command.  The gadget should send as much
data as it can before halting the bulk-IN endpoint.  Instead, the
endpoint was halted first.

Then, after the host cleared the halt, the gadget apparently sent the
data that _should_ have been sent previously.  The host was expecting
to receive the CSW at this point, so there was an overflow error.
That's what caused the host to perform a reset.

Evidently this UDC implements the set_halt method incorrectly.  
According to the kerneldoc for usb_ep_set_halt:

 * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
 * transfer requests are still queued, or if the controller hardware
 * (usually a FIFO) still holds bytes that the host hasn't collected.

This didn't happen; the endpoint was halted before the host collected 
the pending data.

Incidentally, even though the URB completed with -EOVERFLOW status, we
still should see the first 13 bytes of data (i.e., the portion that
could fit into the data buffer).  But the actual_length value is 0.  I
don't know if this is a quirk of the xHCI hardware or a bug in
xhci-hcd.

> ec1a8540 1237469361 S Co:001:00 s 23 03 0004 0001  0
> ec1a8540 1237471551 C Co:001:00 0 0
> ed2209c0 1237534064 S Ci:001:00 s a3 00  0001 0004 4 <
> ed2209c0 1237537012 C Ci:001:00 0 4 = 03051000
> ed2209c0 1237594113 S Co:001:00 s 23 01 0014 0001  0
> ed2209c0 1237595037 C Co:001:00 0 0
> ed2209c0 1237595434 S Co:001:00 s 23 01 0001 0001  0
> ed2209c0 1237597480 C Co:001:00 0 0

Immediately after resetting the port, the host disabled it.  No
indication of why.

> ed2209c0 1237597823 S Co:001:00 s 23 03 0004 0001  0
> ed2209c0 1237597890 C Co:001:00 0 0
> ed2209c0 1237654005 S Ci:001:00 s a3 00  0001 0004 4 <
> ed2209c0 1237654098 C Ci:001:00 0 4 = 03051000
> ed2209c0 1237714084 S Co:001:00 s 23 01 0014 0001  0
> ed2209c0 1237714151 C Co:001:00 0 0
> ed2209c0 1237715894 S Co:001:00 s 23 01 0001 0001  0
> ed2209c0 1237715985 C Co:001:00 0 0

Another reset followed by another port disable.

> ed2209c0 1237716244 S Co:001:00 s 23 03 0004 0001  0
> ed2209c0 1237716308 C Co:001:00 0 0
> ed2209c0 1237774094 S Ci:001:00 s a3 00  0001 0004 4 <
> ed2209c0 1237775327 C Ci:001:00 0 4 = 03051000
> ed2209c0 1237834107 S Co:001:00 s 23 01 0014 0001  0
> ed2209c0 1237834183 C Co:001:00 0 0
> ed2209c0 1237854094 S Ci:003:00 s 80 06 0100  0008 8 <
> ed2209c0 1237854455 C Ci:003:00 0 8 = 12011002 0040
> ed2209c0 1237854963 S Ci:003:00 s 80 06 0100  0012 18 <
> ed2209c0 1237855219 C Ci:003:00 0 18 = 12011002 0040 2505a5a4 17030304 
> 0001
> ed2209c0 1237855544 S Ci:003:00 s 80 06 0f00  0005 5 <
> ed2209c0 1237855771 C Ci:003:00 0 5 = 050f1600 02
> ed2209c0 1237856062 S Ci:003:00 s 80 06 0f00  0016 22 <
> ed2209c0 1237856265 C Ci:003:00 0 22 = 050f1600 02071002 0200 0a100300 
> 0f000101 f401
> ed2209c0 1237856548 S Ci:003:00 s 80 06 0200  0020 32 <
> ed2209c0 1237858430 C Ci:003:00 0 32 = 09022000 010100c0 01090400 00020806 
> 50010705 81020002 00070501 02000201
> ed2200c0 1237860245 S Co:003:00 s 00 09 0001   0
> ed2200c0 1237861785 C Co:003:00 0 0

Then a normal reset, like we should have seen originally.  I have no
idea what was going on.  Maybe something involving warm vs. cold 
resets?

> ed2541c0 1237875505 S Bo:003:01 -115 31 = 55534243 0700 c000 861a 
> 003f00c0   00
> ed2541c0 1237875778 C Bo:003:01 0 31 >
> ed2200c0 1237876448 S Bi:003:01 -115 192 <
> ed2200c0 1237876534 C Bi:003:01 -32 0
> ed2541c0 1237876703 S Co:003:00 s 02 01  0081  0
> ed2541c0 1237876883 C Co:003:00 0 0
> ed2541c0 1237876987 S Bi:003:01 -115 13 <
> ed2541c0 1237877114 C Bi:003:01 0 0
> ed2541c0 1237877486 S Bi:003:01 -115 13 <
> ed2541c0 1237877572 C Bi:003:01 0 13 = 55534253 0700 c000 01

Here the MODE SENSE command was sent again, and this time the gadget
responded in a way that the host could accept.  I think it still
wasn't _right_, because it appears the gadget tried to send a 0-length
reply after the reset.  But at least it didn't provoke another reset.

> ed2541c0 1237877915 S Bo:003:01 -115 31 = 55534243 0800 12

Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
On Wed, Sep 24, 2014 at 12:19:13PM -0500, Felipe Balbi wrote:
> Hi,
> 
> On Wed, Sep 24, 2014 at 11:20:31AM -0500, Felipe Balbi wrote:
> > > > > > Therefore stalling is appropriate.  Why it causes it problem for 
> > > > > > your 
> > > > > > system is a different matter.  Is your UDC hardware capable of 
> > > > > > halting 
> > > > > > bulk endpoints?
> > > > > 
> > > > > yeah, that part is just fine; I also verified with my sniffer that 
> > > > > bulk
> > > > > halt is happening as it should. The problem, however, is that after 
> > > > > that
> > > > > halt condition happens, host (same board has xhci too, Linux 3.17-rc5)
> > > > > issues a reset recovery
> > > > 
> > > > It shouldn't; there's no reason for it to do so.  Unless something 
> > > > else is going wrong on the host side.  Have you tried capturing a 
> > > > usbmon trace on the host?
> > > 
> > > I'll capture usbmon and send here shortly.
> > 
> > here it is... Interesting part starts at line 73 (114 on this email)
> > where the data transport received EPIPE (due to Stall). This time
> > however, I was eventually able to talk to the device and managed to
> > issue quite a few writes to it.
> 
> so here's sequence of events so far:
> 
> - Enumration goes fine
> - Get Max Lun -> 0 (single lun)
> - Inquiry -> Passed
> - Test Unit Ready -> Failed
> - Request Sense (Unit Attention)  -> Passed
> - Test Unit Ready -> Passed
> - Mode Sense  -> Stall of Data transport.
>   - Clear Endpoint Feature (HALT) EP1 IN
>   - After clear feature, a 16 bulk in completes. Shouldn't gadget
> driver have cancelled that ?
> - Bus reset

looking into the SCSI glue, it seems like scsi is calling
->eh_bus_reset_handler() instead of ->eh_device_reset_handler(). Digging
a little more.

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
Hi,

On Wed, Sep 24, 2014 at 11:20:31AM -0500, Felipe Balbi wrote:
> > > > > Therefore stalling is appropriate.  Why it causes it problem for your 
> > > > > system is a different matter.  Is your UDC hardware capable of 
> > > > > halting 
> > > > > bulk endpoints?
> > > > 
> > > > yeah, that part is just fine; I also verified with my sniffer that bulk
> > > > halt is happening as it should. The problem, however, is that after that
> > > > halt condition happens, host (same board has xhci too, Linux 3.17-rc5)
> > > > issues a reset recovery
> > > 
> > > It shouldn't; there's no reason for it to do so.  Unless something 
> > > else is going wrong on the host side.  Have you tried capturing a 
> > > usbmon trace on the host?
> > 
> > I'll capture usbmon and send here shortly.
> 
> here it is... Interesting part starts at line 73 (114 on this email)
> where the data transport received EPIPE (due to Stall). This time
> however, I was eventually able to talk to the device and managed to
> issue quite a few writes to it.

so here's sequence of events so far:

- Enumration goes fine
- Get Max Lun   -> 0 (single lun)
- Inquiry   -> Passed
- Test Unit Ready   -> Failed
- Request Sense (Unit Attention)-> Passed
- Test Unit Ready   -> Passed
- Mode Sense-> Stall of Data transport.
- Clear Endpoint Feature (HALT) EP1 IN
- After clear feature, a 16 bulk in completes. Shouldn't gadget
  driver have cancelled that ?
- Bus reset

This remains for a few iterations. One thing is very interesting ...

[ snip ]

> ed2541c0 1239906485 S Bo:003:01 -115 31 = 55534243 1e00 1200 8603 
> 0012   00
> ed2541c0 1239906590 C Bo:003:01 0 31 >
> ec1a8740 1239906770 S Bi:003:01 -115 18 <
> ec1a8740 1239906871 C Bi:003:01 0 18 = 7600 000a  2900 
> 
> ed2541c0 1239906975 S Bi:003:01 -115 13 <
> ed2541c0 1239907026 C Bi:003:01 0 13 = 55534253 1e00  00
> ed2541c0 1239907803 S Bo:003:01 -115 31 = 55534243 1f00 0002 8ca1 
> 082e0001  ec00 00

0xa1 ? What is this ? Looks like XHCI corrupted the packet ? I can see
the same SCSI opcode (0xa1) with my sniffer.

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
Hi,

On Wed, Sep 24, 2014 at 11:40:54AM -0400, Alan Stern wrote:
> On Wed, 24 Sep 2014, Felipe Balbi wrote:
> 
> > > Therefore stalling is appropriate.  Why it causes it problem for your 
> > > system is a different matter.  Is your UDC hardware capable of halting 
> > > bulk endpoints?
> > 
> > yeah, that part is just fine; I also verified with my sniffer that bulk
> > halt is happening as it should. The problem, however, is that after that
> > halt condition happens, host (same board has xhci too, Linux 3.17-rc5)
> > issues a reset recovery
> 
> It shouldn't; there's no reason for it to do so.  Unless something 
> else is going wrong on the host side.  Have you tried capturing a 
> usbmon trace on the host?

I'll capture usbmon and send here shortly.

> > and it all happens again. I stay in that loop
> > for a while until it finally enumerates correctly, but when I try to
> > write to the block device with dd, it resets again.
> > 
> > I'll try the same test against my desktop (3.16.1) and a Mac OS X I have
> > here, and see if the same behavior shows up.
> 
> > It seems to work better against my v3.16.1 desktop and Mac OS X then it
> > does against v3.17-rc5 (running on the development board).
> 
> Indicating that this really is a host-side problem.

right.

> > I'll try using a USB stick attached to the board.
> 
> USB sticks probably won't generate the Unit Attention condition in 
> response to a reset.  They tend not to adhere terribly closely to the 
> SCSI standard.

yeah, I figured :-s

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Alan Stern
On Wed, 24 Sep 2014, Felipe Balbi wrote:

> > Therefore stalling is appropriate.  Why it causes it problem for your 
> > system is a different matter.  Is your UDC hardware capable of halting 
> > bulk endpoints?
> 
> yeah, that part is just fine; I also verified with my sniffer that bulk
> halt is happening as it should. The problem, however, is that after that
> halt condition happens, host (same board has xhci too, Linux 3.17-rc5)
> issues a reset recovery

It shouldn't; there's no reason for it to do so.  Unless something 
else is going wrong on the host side.  Have you tried capturing a 
usbmon trace on the host?

> and it all happens again. I stay in that loop
> for a while until it finally enumerates correctly, but when I try to
> write to the block device with dd, it resets again.
> 
> I'll try the same test against my desktop (3.16.1) and a Mac OS X I have
> here, and see if the same behavior shows up.

> It seems to work better against my v3.16.1 desktop and Mac OS X then it
> does against v3.17-rc5 (running on the development board).

Indicating that this really is a host-side problem.

> I'll try using a USB stick attached to the board.

USB sticks probably won't generate the Unit Attention condition in 
response to a reset.  They tend not to adhere terribly closely to the 
SCSI standard.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
On Wed, Sep 24, 2014 at 10:30:17AM -0500, Felipe Balbi wrote:
> On Wed, Sep 24, 2014 at 11:17:18AM -0400, Alan Stern wrote:
> > On Wed, 24 Sep 2014, Alan Stern wrote:
> > 
> > > > Case (6) is when Hi == Di, looking at my logs, I have:
> > > > 
> > > > 720 [  286.843965] SCSI CDB: 1a 00 3f 00 c0 00
> > > > 721 [  286.844000] g_mass_storage gadget: SCSI command: MODE SENSE(6); 
> > > > Dc=6, Di=192;  Hc=6, Hi=192
> > > > 722 [  286.844018] g_mass_storage gadget: bulk-in set halt
> > > > 723 [  286.844034] g_mass_storage gadget: sending command-failure status
> > > > 724 [  286.844045] g_mass_storage gadget:   sense data: SK x06, ASC 
> > > > x29, ASCQ x00;  info x0
> > > > 
> > > > Isn't it wrong to halt in this condition ?
> > > 
> > > No, it's correct.  SK = 6 and ASC = 0x29 means Unit Attention, Reset
> > > Occurred.  It occurs because this is the first command the gadget has
> > > received since starting up, which certainly is a form of reset.  In
> > > effect, this is how the device tells the host that it was just powered
> > > on.
> > 
> > Actually, looking at your log again, it seems more like this followed a 
> > genuine reset, not a power-on.  Regardless, it's still appropriate.
> 
> It seems to work better against my v3.16.1 desktop and Mac OS X then it
> does against v3.17-rc5 (running on the development board).
> 
> I'll try using a USB stick attached to the board.

alright, USB stick attached to the board behaves perfectly, but then
again I don't see any stalls from that device. Any chance I can persuade
you into running a similar test with your net2272/2280 UDC ? I'm using a
500MiB tmpfs as storage backend, then just connecting to an XHCI host
and looking at dmesg. A dd also triggers a few resets with v3.17-rc5
which don't happen on 3.16.1.

cheers

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
On Wed, Sep 24, 2014 at 11:17:18AM -0400, Alan Stern wrote:
> On Wed, 24 Sep 2014, Alan Stern wrote:
> 
> > > Case (6) is when Hi == Di, looking at my logs, I have:
> > > 
> > > 720 [  286.843965] SCSI CDB: 1a 00 3f 00 c0 00
> > > 721 [  286.844000] g_mass_storage gadget: SCSI command: MODE SENSE(6); 
> > > Dc=6, Di=192;  Hc=6, Hi=192
> > > 722 [  286.844018] g_mass_storage gadget: bulk-in set halt
> > > 723 [  286.844034] g_mass_storage gadget: sending command-failure status
> > > 724 [  286.844045] g_mass_storage gadget:   sense data: SK x06, ASC x29, 
> > > ASCQ x00;  info x0
> > > 
> > > Isn't it wrong to halt in this condition ?
> > 
> > No, it's correct.  SK = 6 and ASC = 0x29 means Unit Attention, Reset
> > Occurred.  It occurs because this is the first command the gadget has
> > received since starting up, which certainly is a form of reset.  In
> > effect, this is how the device tells the host that it was just powered
> > on.
> 
> Actually, looking at your log again, it seems more like this followed a 
> genuine reset, not a power-on.  Regardless, it's still appropriate.

It seems to work better against my v3.16.1 desktop and Mac OS X then it
does against v3.17-rc5 (running on the development board).

I'll try using a USB stick attached to the board.

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Felipe Balbi
Hi,

On Wed, Sep 24, 2014 at 11:05:58AM -0400, Alan Stern wrote:
> On Tue, 23 Sep 2014, Felipe Balbi wrote:
> 
> > Hi Alan,
> > 
> > Need your help looking over this detail here. When I run g_mass_storage
> > with stall=0 everything works fine. As soon as I remove it, things go
> > bonkers.
> > 
> > Looking at the bulk-only spec, I see:
> > 
> > "6.7.2 Hi - Host expects to receive data from the device
> > 
> > [ ... ]
> > 
> > The specific device requirements are:
> > 
> > 1. The device shall receive a CBW.
> > 2. When the CBW is valid and meaningful, then:
> > . The device shall attempt the command.
> > . [Case (6)]
> > If the device intends to send dCBWDataTransferLength, then:
> > The device shall send dCBWDataTransferLength bytes of
> > data.
> > 
> > The device shall set bCSWStatus to 00h or 01h.
> > 
> > The device shall set dCSWDataResidue to zero.
> > "
> > 
> > Case (6) is when Hi == Di, looking at my logs, I have:
> > 
> > 720 [  286.843965] SCSI CDB: 1a 00 3f 00 c0 00
> > 721 [  286.844000] g_mass_storage gadget: SCSI command: MODE SENSE(6); 
> > Dc=6, Di=192;  Hc=6, Hi=192
> > 722 [  286.844018] g_mass_storage gadget: bulk-in set halt
> > 723 [  286.844034] g_mass_storage gadget: sending command-failure status
> > 724 [  286.844045] g_mass_storage gadget:   sense data: SK x06, ASC x29, 
> > ASCQ x00;  info x0
> > 
> > Isn't it wrong to halt in this condition ?
> 
> No, it's correct.  SK = 6 and ASC = 0x29 means Unit Attention, Reset
> Occurred.  It occurs because this is the first command the gadget has
> received since starting up, which certainly is a form of reset.  In
> effect, this is how the device tells the host that it was just powered
> on.
> 
> So the case you should be looking at is Case 5: Hi > Di.  I realize the 
> debugging output indicates Di = 192, but that line gets written before 
> the driver checks for a Unit Attention condition.  (The line is written 
> near the start of check_command() in f_mass_storage.c, and the Unit 
> Attention check is near the end of the same function.)  The printed Di 
> value indicates how much data the gadget thinks the command is asking 
> for, not the amount of data the gadget actually intends to send.
> 
> Therefore stalling is appropriate.  Why it causes it problem for your 
> system is a different matter.  Is your UDC hardware capable of halting 
> bulk endpoints?

yeah, that part is just fine; I also verified with my sniffer that bulk
halt is happening as it should. The problem, however, is that after that
halt condition happens, host (same board has xhci too, Linux 3.17-rc5)
issues a reset recovery and it all happens again. I stay in that loop
for a while until it finally enumerates correctly, but when I try to
write to the block device with dd, it resets again.

I'll try the same test against my desktop (3.16.1) and a Mac OS X I have
here, and see if the same behavior shows up.

-- 
balbi


signature.asc
Description: Digital signature


Re: g_mass_storage bug ?

2014-09-24 Thread Alan Stern
On Wed, 24 Sep 2014, Alan Stern wrote:

> > Case (6) is when Hi == Di, looking at my logs, I have:
> > 
> > 720 [  286.843965] SCSI CDB: 1a 00 3f 00 c0 00
> > 721 [  286.844000] g_mass_storage gadget: SCSI command: MODE SENSE(6); 
> > Dc=6, Di=192;  Hc=6, Hi=192
> > 722 [  286.844018] g_mass_storage gadget: bulk-in set halt
> > 723 [  286.844034] g_mass_storage gadget: sending command-failure status
> > 724 [  286.844045] g_mass_storage gadget:   sense data: SK x06, ASC x29, 
> > ASCQ x00;  info x0
> > 
> > Isn't it wrong to halt in this condition ?
> 
> No, it's correct.  SK = 6 and ASC = 0x29 means Unit Attention, Reset
> Occurred.  It occurs because this is the first command the gadget has
> received since starting up, which certainly is a form of reset.  In
> effect, this is how the device tells the host that it was just powered
> on.

Actually, looking at your log again, it seems more like this followed a 
genuine reset, not a power-on.  Regardless, it's still appropriate.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: g_mass_storage bug ?

2014-09-24 Thread Alan Stern
On Tue, 23 Sep 2014, Felipe Balbi wrote:

> Hi Alan,
> 
> Need your help looking over this detail here. When I run g_mass_storage
> with stall=0 everything works fine. As soon as I remove it, things go
> bonkers.
> 
> Looking at the bulk-only spec, I see:
> 
> "6.7.2 Hi - Host expects to receive data from the device
> 
> [ ... ]
> 
> The specific device requirements are:
> 
> 1. The device shall receive a CBW.
> 2. When the CBW is valid and meaningful, then:
>   . The device shall attempt the command.
>   . [Case (6)]
>   If the device intends to send dCBWDataTransferLength, then:
>   The device shall send dCBWDataTransferLength bytes of
>   data.
> 
>   The device shall set bCSWStatus to 00h or 01h.
> 
>   The device shall set dCSWDataResidue to zero.
> "
> 
> Case (6) is when Hi == Di, looking at my logs, I have:
> 
> 720 [  286.843965] SCSI CDB: 1a 00 3f 00 c0 00
> 721 [  286.844000] g_mass_storage gadget: SCSI command: MODE SENSE(6); Dc=6, 
> Di=192;  Hc=6, Hi=192
> 722 [  286.844018] g_mass_storage gadget: bulk-in set halt
> 723 [  286.844034] g_mass_storage gadget: sending command-failure status
> 724 [  286.844045] g_mass_storage gadget:   sense data: SK x06, ASC x29, ASCQ 
> x00;  info x0
> 
> Isn't it wrong to halt in this condition ?

No, it's correct.  SK = 6 and ASC = 0x29 means Unit Attention, Reset
Occurred.  It occurs because this is the first command the gadget has
received since starting up, which certainly is a form of reset.  In
effect, this is how the device tells the host that it was just powered
on.

So the case you should be looking at is Case 5: Hi > Di.  I realize the 
debugging output indicates Di = 192, but that line gets written before 
the driver checks for a Unit Attention condition.  (The line is written 
near the start of check_command() in f_mass_storage.c, and the Unit 
Attention check is near the end of the same function.)  The printed Di 
value indicates how much data the gadget thinks the command is asking 
for, not the amount of data the gadget actually intends to send.

Therefore stalling is appropriate.  Why it causes it problem for your 
system is a different matter.  Is your UDC hardware capable of halting 
bulk endpoints?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


g_mass_storage bug ?

2014-09-23 Thread Felipe Balbi
Hi Alan,

Need your help looking over this detail here. When I run g_mass_storage
with stall=0 everything works fine. As soon as I remove it, things go
bonkers.

Looking at the bulk-only spec, I see:

"6.7.2 Hi - Host expects to receive data from the device

[ ... ]

The specific device requirements are:

1. The device shall receive a CBW.
2. When the CBW is valid and meaningful, then:
. The device shall attempt the command.
. [Case (6)]
If the device intends to send dCBWDataTransferLength, then:
The device shall send dCBWDataTransferLength bytes of
data.

The device shall set bCSWStatus to 00h or 01h.

The device shall set dCSWDataResidue to zero.
"

Case (6) is when Hi == Di, looking at my logs, I have:

720 [  286.843965] SCSI CDB: 1a 00 3f 00 c0 00
721 [  286.844000] g_mass_storage gadget: SCSI command: MODE SENSE(6); Dc=6, 
Di=192;  Hc=6, Hi=192
722 [  286.844018] g_mass_storage gadget: bulk-in set halt
723 [  286.844034] g_mass_storage gadget: sending command-failure status
724 [  286.844045] g_mass_storage gadget:   sense data: SK x06, ASC x29, ASCQ 
x00;  info x0

Isn't it wrong to halt in this condition ?

cheers

-- 
balbi


signature.asc
Description: Digital signature