On Fri, Oct 02, 2015 at 01:29:05PM -0400, Alan Stern wrote: > On Fri, 2 Oct 2015, Felipe Balbi wrote: > > > Hi Alan, > > > > here's a question which I hope you can help me understand :-) > > > > Why do we have that kthread for the mass storage gadgets ? I noticed a very > > interesting behavior. > > Basically, the mass-storage gadget uses a kthread in order to access > the backing file. > > Obviously a kernel driver doesn't _need_ to use a separate thread to > read or write a file (think of /dev/loop, for example). But doing it > that way is very easy because the driver merely has to imitate read(2) > and write(2) system calls, which have a simple and well defined > interface. To do anything more direct would mean getting deeply > involved in the block and filesystem layers, and that wasn't my goal -- > I wanted to write a USB gadget driver, not re-implement the loop > driver. > > Also, remember that at the time the MS Gadget was written, speed over > USB wasn't a pressing issue. You couldn't transfer more than about 40 > MB/s anyway, so efficiency in the gadget wasn't a terribly high > priority. > > > Basically, the MSC class works in a loop like so: > > > > CBW > > Data Transfer > > CSW > > > > In our implemention, what we do is: > > > > CBW > > wake_up_process() > > Data Transfer > > wake_up_process() > > CSW > > wake_up_process() > > The wake_up_process() call is the notification from the completion > routine telling the kthread that a request has completed. The kthread > doesn't necessarily stop and wait for these notifications; it can > continue processing in parallel. > > > Now here's the interesting bit. Every time we wake_up_process(), we > > basically > > don't do anything until MSC's kthread gets finally scheduled and has a > > chance of > > doing its job. > > That's not entirely true; the kthread may already be running. For > example, the default MS gadget uses two I/O buffers (there's a Kconfig > option to use more if you want). When carrying out a READ, the kthread > fills up the first buffer and submits it to the UDC. But it doesn't > stop and wait for wake_up_process(); instead it goes on to fill up the > second buffer while the first is being sent back to the host. The > wake_up_process() may very well occur while the second buffer is being > filled, in which case it won't do anything (since the kthread isn't > asleep). > > > This means that the host keeps sending us tokens but the UDC > > doesn't have any request queued to start a transfer. This happens specially > > with > > IN endpoints, not so much on OUT directions. See figure one [1] we can see > > that > > host issues over 7 POLLs before UDC has finally started a usb_request, > > sometimes > > this goes for even longer (see image [3]). > > Figure [1] is misleading. The 7 POLLs you see are at the very start of > a READ. It's not surprising that the host can poll 7 times before the > gadget manages to read the first block of data from the backing file. > This isn't a case where the kthread could have been doing something > useful instead of waiting around to hear from the host. > > Figure [3] is difficult to interpret because it doesn't include the > transfer lengths. I can't tell what was going on during the 37 + 50 > POLLs before the first OUT transfer. If that was a CDB starting a new
yeah, sorry about that. The 37 + 50 POLLs out is a CBW (31-byte). Before those we had a CSW. > command, I don't see why it would take that long to schedule the > kthread unless the CPU was busy with other tasks. there's nothing else running on this board (other than the normal PID1 and stuff like that). > > On figure two we can see that on this particular session, I had as much as > > 15% > > of the bandwidth wasted on POLLs. With this current setup I'm 34MB/sec and > > with > > the added 15% that would get really close to 40MB/sec. > > So high speed, right? Are the numbers in the figure handshake _counts_ counts > or handshake _times_? A simple NAK doesn't use much bandwidth. Even > if 15% of the handshakes are NAKs, it doesn't mean you're wasting 15% > of the bandwidth. sure it means. Given a uFrame, I can stuff (theoretically) 13 bulk transactions in it. If I have a token (IN/OUT) which gets a NAK, that's one less transaction I can stuff in this uFrame, right ? Note that when I have the 37 POLLs, you can see 8 SOFs. This is just the sniffer SW trying to aggregate SOFs (I should drop that aggregation, it's really annoying). > > So the question is, why do we have to wait for that kthread to get > > scheduled ? > > Why couldn't we skip it completely ? Is there really anything left in there > > that > > couldn't be done from within usb_request->complete() itself ? > > The real answer is the calls to vfs_read() and vfs_write() -- those > have to occur in process context. would a threaded IRQ handler be enough ? Fort he sake of argument, let's assume that all UDC drivers use threaded irqs and they are properly masking their IRQs in in the top half for the bottom half (the IRQ thread) to run. This means, unless I'm missing something, that we could switch a chunk of the gadget framework to mutexes instead of spin locks (not all of it though, but let's ignore that side for now). In that case, we could safely remove spin locks from ->complete() and use mutexes and we then we wouldn't need this extra kthread at all, right ? > In theory, the CBW packet doesn't have to be processed by the kthread. > But processing it in the completion routine wouldn't help, because the > kthread would still have to be scheduled in order to carry out the READ > or WRITE command. In other words, changing: > > Completion handler Kthread > ------------------ ------- > Receive CBW > Wake up kthread > Process CBW > Start READ or WRITE > > into: > > Completion handler Kthread > ------------------ ------- > Receive CBW > Process CBW > Wake up kthread > Start READ or WRITE > > doesn't really give any significant advantage. > > > I'll spend some time on that today and really dig that thing up, but if you > > know > > the answer off the top of your head, I'd be happy to hear. > > I hope this explains the issues clearly enough. sure does, thanks a lot :-) -- balbi
signature.asc
Description: PGP signature