Thanks for the update! It occurred to me: what if for #3, instead of the driver not handling the error, it's simply expecting a different/new (type of) error to be raised in order to go through a code path that leads to it not getting borked? Bah ... I'm sure you've thought of this ☺ Cheers! On Sat, 2019-09-21 at 20:14 -0500, Mike Isely wrote: > An update on this... > 1. There are two kernel threads involved. One manages contexts, the other is > involved in a kernel > work queue for managing the hardware. A week ago I first thought it was that > context-managing > thread, but now it appears to be that second thread which is jamming, > triggering a kernel oops and > then aborting, leaving the driver in a fubar state. > 2. The v5.3.1 kernel happens to now include an upstream fix that deals with a > potential null > pointer dereference problem in the sysfs part of the pvrusb2 driver. This is > a new change, since > at least 5.2.13 (the version I'm focusing on right now). This would be > something that gets hit on > tear-down so without that fix things MIGHT go awry. But right now I don't > know if that is the > same problem we're looking at here. This is because... > 3. After turning on additional trace print, I've noticed another problem that > might be masking > things. Some background... The pvrusb2 driver doesn't do "everything" on > its own. Rather, like > many v4l drivers, it relies on common external v4l chip-level drivers to > self-manage various parts > of the video pipeline. In these cases the pvrusb2 driver provides a datapath > for all these things > to reach the hardware, via an I2C master interface that is carried over the > USB cable (tunneled, > effectively). Every chip-level driver in v4l that accesses stuff on the > pvrusb2-related hardware > does so through this pvrusb2-provided I2C interface. Well when you unplug > the device / kill its > power / whatever, obviously that datapath is severed. When this happens any > further attempts to > access that I2C master interface is met with an EIO error back to the caller > (and you'll see a > kernel log message "pvrusb2: Attempted to execute control transfer when > device no ok"). During > tear-down that's actually expected. However the tear-down can't complete > until all these chip- > level drivers in v4l stop trying to use this interface. And somebody these > isn't giving up - the > driver is getting into what appears to be an infinite loop of these errors > and never getting > out. This leads me to suspect a v4l chip-level driver may have a problem > dealing with a hot- > unplug situation. Given that those drivers are managed outside of the > pvrusb2 driver (for obvious > reasons), it's possible that a change in one of those might be a contributor > to the problem here. > So I'm trying to suss out #3 above first. That should hopefully clear the > air to solve #1 and > figure out if #2 is related to any of this. > -Mike > > On Mon, 9 Sep 2019, Mike Isely wrote: > > Stay tuned. And pester me again if I go quiet for too long. > > The pvrusb2 driver sets up a single internal kernel thread to take care of > > various bits of > > background activity. That thread also performs part of the setup and most > > of the tear-down when > > a device is hotplugged / hot-unplugged. The oops is definitely happening > > in that thread - which > > is a good thing because it means that it should be possible to rule out > > lots of bizarre > > interactions involving other threads calling into the driver. I am going > > to add printk's before > > each step of the tear-down process so I can start to get an idea where it > > is going awry. I hope > > to do that tonight. > > -Mike > > > > On Sun, 8 Sep 2019, Diego Rivera wrote: > > > No problem! I can imagine how normal life has you pegged down, just like > > > it does with us > > > all!Thanks for circling back to it, though. Is there anything I can do on > > > my end to help > > > you?Cheers! > > > On Sat, 2019-09-07 at 14:26 -0500, [email protected] wrote: > > > > Hi Diego,I am sorry. I had gotten completely distracted away from > > > > this.I just updated to > > > > the latest kernel and have confirmed that it's still getting an oops > > > > when thedevice is hot- > > > > unplugged. I'm looking at it right now. At first glance this looks > > > > like a fairlynasty > > > > tear-down race - which long ago didn't used to be there. So there has > > > > to be some kind > > > > ofenvironmental change leading to this behavior. -MikeOn Wed, 21 Aug > > > > 2019, Diego Rivera > > > > wrote: > > > > > Hi, Mike!Any luck with this? I haven't poked you in some time so I > > > > > figured I'd check to > > > > > see ifyou've had theopportunity to debug this anymore, and if there's > > > > > any way I can help > > > > > with theprocess...Let me know!Cheers!On Sat, 2019-04-20 at 20:16 > > > > > -0600, Diego Rivera > > > > > wrote: > > > > > > This is the result of a 2nd attempt with a hot-unplug. I don't see > > > > > > many differences > > > > > > beyondthevalues of some registers changing between one instance and > > > > > > the other.Cheers!-- > > > > > > > > > > > > Diego RiveraOn Sat, 2019-04-20 at 20:09 -0600, Diego Rivera wrote: > > > > > > > Guinea pig #1 responding as ordered, sir!☺One is the kernel log > > > > > > > from connection, the > > > > > > > otheris what happens if I try to do a modprobe-r. I noticed > > > > > > > there's a call trace with > > > > > > > registers- I'm wondering if I need to add more symbolspackages so > > > > > > > that trace can be > > > > > > > more verbose andoffer up more info. Thoughts?Let me know if you > > > > > > > want me to try > > > > > > > anything else. I'm going toproduce the output now for hot-unplug > > > > > > > of the same device, > > > > > > > see how that differs.Cheers!-- > > > > > > > > > > > > > > Diego RiveraOn Sat, 2019-04-20 at 20:26 -0500, [email protected] > > > > > > > wrote: > > > > > > > > Status update. Nothing really useful to report except that I > > > > > > > > am seeing some > > > > > > > > screwybehaviorjust on hotplug / hotunplug operations with the > > > > > > > > device just sitting > > > > > > > > idle not beingtouched byanything. In this case I tested an old > > > > > > > > 29032 model - a very > > > > > > > > early module butit's a usefultest subject because it is simpler > > > > > > > > than the HVR-1950 > > > > > > > > yet still exercises mostof the keypieces of the driver. I ran > > > > > > > > a freshly compiled > > > > > > > > 5.0.9 kernel (latest stable) forthis test.Sorry this has taken > > > > > > > > so long. As was > > > > > > > > guessed earlier, I haven't worked on thisin a very longtime and > > > > > > > > I had to unbox a lot > > > > > > > > of stuff. I also spent far too much timetoday setting up > > > > > > > > aseparate purpose-built > > > > > > > > computer which I can trash / crash / hang withwild abandon > > > > > > > > withoutlosing anything of > > > > > > > > value. This approach allows me to keep my devenvironment on a > > > > > > > > machineseparate from > > > > > > > > the one that is running test kernels.I was able tocleanly > > > > > > > > modprobe -r pvrusb2 every > > > > > > > > time so far, but if the issue is on the DVBside of thefence, > > > > > > > > then the old 29032 > > > > > > > > model I've just tried won't exhibit that issue. So alot > > > > > > > > morecharacterization to > > > > > > > > do.Diego: It would useful if you could post to me the section of > > > > > > > > your/var/log/kern.log (orequivalent) should all the kernel > > > > > > > > messages from the point > > > > > > > > when youplug in the device to whenthe fireworks are happening > > > > > > > > after trying to tear > > > > > > > > down. If Ifind that same pattern here thenwe'll know for sure > > > > > > > > that we are chasing > > > > > > > > the same issue. -Mike > > > > -- > > Mike Iselyisely @ isely (dot) netPGP: 03 54 43 4D 75 E5 CC 92 71 16 01 E2 > > B5 F5 C1 > > E8_______________________________________________pvrusb2 mailing > > [email protected] > > http://www.isely.net/cgi-bin/mailman/listinfo/pvrusb2 > > --
Diego Rivera
signature.asc
Description: This is a digitally signed message part
_______________________________________________ pvrusb2 mailing list [email protected] http://www.isely.net/cgi-bin/mailman/listinfo/pvrusb2
