On Friday 18 February 2005 8:40 am, Alan Stern wrote: > > What if CONFIG_USB_SUSPEND isn't set? Yes, I know we're trying to move > away from this possibility. But for the moment we still need to handle it > correctly.
Let me turn that around: how could just _removing_ states from the system make those full-fledged USB_SUSPEND rules stop applying to the HCD? Remember: when USB_SUSPEND isn't set, that just means there's no (selective) usb suspend. So HCDs would suspend only during system sleep state transitions. (Internal low power states don't count as "suspend", and by definition they shouldn't be visible to usbcore ...) > > ); but once > > suspended, neither is allowed. > > (Bear in mind that we may eventually autosuspend HCs, with autoresume upon > demand. The "demand" might be submission of an URB. Not relevant just > now, though...) I can't see any kind of autosuspend policy relieving the suspending component of the responsibility for resuming! Since the only way an HC could autosuspend is if its children were all suspended, no URB could go to any of its children before the HC were resumed. QED. :) > > And (e) is off the edge of the picture ... there's no active HC, so it > > has no state known to usbcore. If usbcore has a handle on such a > > data structure, that's a BUG(). > > Maybe there's no state, but there is a data structure. It hangs around > until the refcount drops to 0, which might not be for some time after the > bus is deregistered. The existence of pointers to such things isn't a > BUG, and so long as the pointers exist they can be dereferenced. OK, I mis"spoke". If usbcore tries to _use_ such a pointer it's a bug. Before the HCD deregisters its bus, it had to get rid of all internal references ... which involved telling usbcore to kick all drivers off all devices left on the bus. Clearly, the only possible place that a pointer to the bus could still linger is in a misbehaving driver. So as I said: If usbcore has such a pointer, it's a BUG(). If you want to extend the scope of what you were saying and include drivers above usbcore, then I'd amend what I said slightly: such misbehavior from drivers doesn't involve a BUG(), but a KERN_WARNING message might be appropriate for all operations other than dropping reference counts. That'd not be a case of usbcore keeping a handle on such a bus though! It'd be a driver keeping a device pointer after it returned from disconnect(), and trying to submit an URB through that bus ... not a mortal sin, but a sin nonetheless. > > > For the most part, the tests are intended to prevent actions from taking > > > place when the HC is not in an appropriate state. The big problem is > > > that > > > the state transitions are not synchronized in any way with the tests. > > > > At one point, it was true that all the tests in usbcore could handle > > the transitions locklessly, because of the nature of the transitions > > and the fact that the HCDs had explicit tests on the submit path. > > Maybe we need to adopt that kind of strategy again. If we're not there already ... I don't have time to audit the code lately. (Though I did just notice that UHCI is missing that test...) > > > This isn't surprising, since only the HCD is in a position to do such > > > synchronization. > > > > > > States a and b don't require any special action at all. Eveything is > > > legal in those states. > > > > Submitting urbs shouldn't be allowed in any of those states except > > "running normally". > > I can't tell if you are agreeing with me or disagreeing. State a _is_ > "running normally". State b is "root hub suspended", which means the root > hub isn't running normally but the HC is. (In state b, URBs won't be > submitted unless CONFIG_USB_SUSPEND isn't set -- but we do need to allow > for that possibility. Especially for the root hub polling URB.) I think I'm still thinking that this definition of "root hub suspended" is troublesome. If it's getting any kind of URB, it's not suspended in the sense that any other USB device would be suspended. That would seem to be the root of the confusion. That, and the fact that the root hub timer (possibly needed _only_ for UHCI at this point) is serving a dual role in that case: both detecting port change events, and initiating some sort of wakeup. If we could stop having usbcore muck around in those internal HCD state transitions, things would probably get a lot simpler. > > When it dies, the HCD ought to set the state with its internal > > data structures locked. At one point I checked all the code to > > make sure the HCD's spinlock was held whenever hcd->state was set. > > "When it dies..." The HCD can't do anything at the moment the HC dies, > because it won't learn about the death until the next interrupt comes. > Obviously. Before that interrupt arrives the HCD has to be able to handle > requests even though the HC is dead. That's true whether locking is used > or not. There's a distinction between HC dying and HCD noticing it, yes; but I was talking about the latter. As a rule the next event from the HC is going to involve a death notice, which the HCD will handle by marking itself as dead. :) > Furthermore, what good does it do for the HCD to hold a spinlock while > setting hcd->state if usbcore doesn't hold the same spinlock while testing > it? I'm concerned about making usbcore work right -- my assumption is > that the HCDs are already in good shape. I think the HCDs are mostly in good shape, but remember that these issues are all along the interface between HCDs and usbcore ... and that what we're discussing here is changes to that interface. The good it does for the HCD to hold a spinlock is in conjunction with that policy I sketched before: the state transitions being set up so that usbcore can do this locklessly. Should anything come down from usbcore before the HCD is marked as dead, that'd just need cleanup. But given the lock, the HCD will know to reject new requests from other contexts, starting right then; that places a ceiling on the cleanup that'll be needed. That should sound fine except for a certain gaping hole, about the shape of a "but usbcore doesn't clean them up yet". ;) > > > We should consider d-6: detecting that the HC has died. I'm not so sure > > > that we're doing it the best way. First, an HCD might decide that the HC > > > is dead when a watchdog timer fires, not when an interrupt is received. > > > Second, it should be just about as easy for an HCD to call usb_hc_died > > > directly as to set hcd->state. So action 6 could be eliminated. > > > > If the HCD sets the state with its internal spinlock held, then it > > doesn't matter whether the IRQ is from a timer or the controller. > > Yes it does. It matters because if the IRQ is from the controller then > usb_hcd_irq will automatically call usb_hc_died, but if the IRQ is from a > timer then the HCD has to call usb_hc_died. _If_ the HCD would call > usb_hc_died in both cases, _then_ it wouldn't matter. And we could > simplify usb_hcd_irq. The root hub timer calls the hub_status_data() method. No HCD marks the HC as dead in that path right now ... so there's no difference! If some HCD did such marking, I'd agree with you. That said, I agree that the "gone to heaven" transition after "died" needs work. Right now it doesn't really work; things stay in limbo. > > There was also the issue that HALTED state can be entered through normal > > state transitions ... "died" was supposed to be an abnormal transition. > > True, but I don't think it's important for the purposes of this > discussion. It's important not to assume that for example "rmmod" doesn't go through HALTED ... > > > So the whole thing boils down to state c. While the HC is suspended is > > > when we really want to avoid trying to access the controller. However > > > lack of synchronization means that we aren't really doing this correctly > > > now. Either we need to find a way of preventing 1-4 precisely while the > > > HC is suspended or we might as well give up and just rely on the HCDs to > > > do the necessary synchronizing. Whatever this way is, it will have to > > > involve the HCDs somehow because the suspend/resume callbacks for non-PCI > > > HC devices don't pass through usbcore. > > > > Remember though that there are supposed to be system-wide invariants for > > the device tree during suspend processing. Specifically, that no device > > gets suspended after its parent, or resumed before it ... so how would > > one of those operations 1-4 get past the usb_device checks?? > > Well, if CONFIG_USB_SUSPEND isn't set then we certainly _can_ have HCs and > root hubs suspended without any of their children being suspended. So how do we move more quickly to having USB_SUSPEND always be available when PM is configured? > And > operation #4 is get_frame_number, which doesn't have any usb_device > checks since it's not associated with a USB device. Not that I think we should remove it, but I don't think get_frame_number gets used outside of debugging ... Linux isn't being particularly paranoid about synchronizing isochronous transfers with external clocks. (Yet!) And in any case, I think usb_get_current_frame_number() should be checking the state of the device it's passed. The HCD glue checks the HC state, so it should be mostly OK except for a small SMP race. I agree there's something to fix in this area though. > > > My conclusion is that on the whole, hcd->state can be eliminated from > > > hcd.c. The exceptions are: > > > > > > hcd->state, or something like it, will be useful to prevent IRQ > > > forwarding at times when we know the HC isn't active. > > > > For PCI, the IRQ handlers get deregistered during suspend. There's a > > small window between requesting the IRQ and calling hcd->start(), and > > likewise after stop() and before freeing it. > > That doesn't mean the test is useless. Even when the HC is incapable of > generating IRQs, there might be another device sharing the same IRQ line. > We would like to prevent those other interrupts being sent to the HCD, > just as a simple optimization. Right, I was just saying that particular mechanism could only kick in during those small windows. In general I'd be happier to have usbcore out of the business of adding a stack frame to IRQ handlers; but meanwhile, there's still the issue of how to make it clean up after dead HCs. > > > Those two warnings (7 and 8) seem to need it. But do we need > > > the warnings? I don't think I've ever seen them triggered. > > > > We don't want to see #7, those tend to indicate ACPI or other > > IRQ setup problems ... > > #7 is warning about unlink while the HC isn't running. ... Sorry, I wasn't clear on which message you meant. This one is just a usbcore integrity assertion like #8. > > Re #8 that's just to catch any bugs internal to usbcore/hcds; > > as a rule, it's safe to treat all WARN_ON(x) as assertions > > about how things are expected to behave. Which means that if > > they ever appear, there's a problem to fix ASAP ... this one > > isn't like #7. > > It isn't like the "Unlink after no-IRQ" warning, but it is like #7. So now that we're both clear on what's meant ... are you proposing to remove these integrity assertions? > > > Blocking calls to the HCD while the HC is suspended is the > > > single major stumbling block. It should be done properly > > > (which hcd->state can't do) or not at all. > > > > I'm not sure what you mean by that. Again, a lot of those > > tests are more at the level of integrety assertions ... they > > don't fire, we'd worry if they did. > > The warnings #7 and #8 are like that. But #1 - #4 aren't: preventing URBs > being submitted, preventing root hub polling, and preventing > get_frame_number whenever the HC isn't running. They are not integrity > assertions. Re submitting URBs -- HCDs should be preventing that already, I'd call it a bug in the HCD if it isn't. Re timer polling -- yes, that's a mess, and I seem to recall you were looking at a way to let HCDs (like OHCI and EHCI) say they don't need a root hub timer. (Hmm, maybe just add a hc_driver.flags bit to say if the HCD wants the timer, and some way for HCDs with IRQ-driven root hubs to tell khubd to check their root hub. It might be more efficient to have me take a whack at that first, letting you focus on any issues that turns up with UHCI.) Preventing get_frame_number() ... right now there's a small race that should appear only on SMP, and that API is hardly ever used, but this spot could stand some more work. My preferred fix would be making USB_SUSPEND be the only mode for usbcore when PM is enabled. > #5 is more of an optimization: don't forward an IRQ when we know the HC > didn't generate it. And #6 is merely a convenience: call usb_hc_died on > behalf of the HCD so the HCD doesn't have to. Agreed. - Dave > Alan Stern > > ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel