Re: kernel condvars: how to use?
Okay, I seem to have things at least mostly working. For anyone interested, I've created a new3/ directory, parallel to the new2/ I mentioned upthread, with my current version in it. I think I've fixed most of the issues people have raised; the major exception that comes to mind is that it's still not MPSAFE. (I've commented the places I noticed that depend on the giantlock for correctness; there are doubtless others I haven't noticed.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: kernel condvars: how to use?
> Date: Mon, 11 Dec 2017 11:13:16 -0500 (EST) > From: Mouse> > > If read does > > > while (sc->sc_foo < sc->sc_bar) > > cv_wait(>sc_cv, >sc_lock); > > > then whoever changes sc_foo or sc_bar might test whether they changed > > from sc->sc_foo < sc->sc_bar to !(sc->sc_foo < sc->sc_bar) before > > they cv_broadcast. > > ...then you have the same test in two places, leaving them vulnerable > to one but not the other being changed. (Of course, only some changes > will break functionality, but that's a separate issue.) > > Furthermore, they and LPT_RF_WAITING are testing/describing different > things: "is anyone actually waiting?" versus "if anyone does happen to > be waiting, could this make a difference?". Correct. Your responsibility as a user of condvar(9) is to make sure that if you ever make a difference for a waiter, then you wake. That's why I advise either issuing wakeups for any changes to sc_foo/bar, or if you're going to conditionalize them to avoid unnecessary wakeups, then conditionalizing them only on changes to sc_foo/bar that transition from a state that blocks waiters to a state that doesn't block waiters. On the other side, condvar(9) handles skipping some potentially expensive logic if there isn't actually anyone waiting. You shouldn't worry about testing for the presence of waiters unless you actually observe a performance impact. That's why I advise _against_ using LPT_RF_WAITING. (One exception: It's OK to KASSERT(!cv_has_waiters(cv)) in cases when you need to be sure there are no waiters. Aside: Maybe cv_destroy should do this itself anyway.) > > In any case, neither __insn_barrier nor volatile is sufficient in the > > multiprocessor model, > > Not sufficient, true, but necessary (though, depending on the > implementations of things like C and mutexes, possibly implicitly > provided). mutex_enter/exit guarantees the appropriate memory barriers to make the mutual exclusion sections for a single lock object appear globally ordered on all CPUs, in thread context and in interrupt context.
Re: kernel condvars: how to use?
>>> If anything, you should just test rather whether you changed from a >>> state that would block a read to a state in which a read can make >>> progress. >> That's kind-of what LPT_RF_WAITING records. > What I meant is the condition that read actually loops on. [...] Yes, that's what I thought you meant. But... > If read does > while (sc->sc_foo < sc->sc_bar) > cv_wait(>sc_cv, >sc_lock); > then whoever changes sc_foo or sc_bar might test whether they changed > from sc->sc_foo < sc->sc_bar to !(sc->sc_foo < sc->sc_bar) before > they cv_broadcast. ...then you have the same test in two places, leaving them vulnerable to one but not the other being changed. (Of course, only some changes will break functionality, but that's a separate issue.) Furthermore, they and LPT_RF_WAITING are testing/describing different things: "is anyone actually waiting?" versus "if anyone does happen to be waiting, could this make a difference?". (Each one is a useful test under some circumstances; it just seems to me that, while each can be a useful test to apply there, confusing the tested-for conditions is likely to lead to muddy thinking and thence muddy code.) >> Looking at the implementation of __insn_barrier(), it looks like >> overkill: is forces _everything_ out of registers, [etc]. Using >> volatile does so only for the things for which it actually matters > Yes, that's right. On the other hand, volatile also applies to every > access to fields in question, whereas __insn_barrier applies a > barrier only in one specific place. That's why I write struct lpt_softc *sc; volatile struct lpt_softc *vsc; vsc = sc; and use vsc-> when volatile is appropriate and sc-> when not. It would be better if C provided a way to mark a sequence point as volatile-synchronizing, relaxed (as compared to the current rules) so that object accesses are synchronized not as of the next sequence point but rather as of the next _marked_ sequence point. Possibly, instead, a way to name objects and force their values to be synchronized. But we don't have either, and, with no namespace for sequence points (many of which have no explicit textual representation in the source), I'm not sure how doable the former would be. > In any case, neither __insn_barrier nor volatile is sufficient in the > multiprocessor model, Not sufficient, true, but necessary (though, depending on the implementations of things like C and mutexes, possibly implicitly provided). /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: kernel condvars: how to use?
> Date: Fri, 8 Dec 2017 14:00:40 -0500 (EST) > From: Mouse> > > If anything, you should just test rather whether you changed from a > > state that would block a read to a state in which a read can make > > progress. > > That's kind-of what LPT_RF_WAITING records. What I meant is the condition that read actually loops on. If read does while (sc->sc_foo < sc->sc_bar) cv_wait(>sc_cv, >sc_lock); then whoever changes sc_foo or sc_bar might test whether they changed from sc->sc_foo < sc->sc_bar to !(sc->sc_foo < sc->sc_bar) before they cv_broadcast. Or they might just unconditionally broadcast -- simpler and safer. LPT_RF_WAITING is an _additional_ bit of state that may or may not correspond exactly with what read is really waiting for. In your case, there's no `while (sc->sc_flags & LPT_RF_WAITING) cv_wait(...)'. It doesn't add anything; I recommend avoiding it altogether as unnecessary complexity. > Looking at the implementation of __insn_barrier(), it looks like > overkill: is forces _everything_ out of registers, stack temporaries, > and the like, into wherever it nominally lives. Using volatile does so > only for the things for which it actually matters (well, when used > correctly, at least; like anything else, volatile can be misused). Yes, that's right. On the other hand, volatile also applies to every access to fields in question, whereas __insn_barrier applies a barrier only in one specific place. In any case, neither __insn_barrier nor volatile is sufficient in the multiprocessor model, which is what all new code in NetBSD should be prepared for unless it has a good reason otherwise, usually to do with performance. > > That's helpful, but I should have phrased my question a little more > > clearly: Can you write down the states that the lpt_softc data > > structure can be in? Given a diagram of something like this: [...] > > Ah! > > sc_state is easy: if RAW is not set, the only piece of my new code that > [...] Great! Now write this down in comments in the code, and make sure the cv_wait occurs only in the state where you intend to wait, and every state transition from one on which anyone might wait is associated with a cv_broadcast. > > But from the perspective of the next human to read your code, [using > > splhigh() is] expressing a confusing message to them: `This code MUST > > exclude ALL interrupts on the current CPU!' > > There's actually another possible semantic behind it, which > unfortunately there is no way to express short of a comment (which > usually isn't used): "this code must lock against itself, and we don't > know what the highest spl level it may be called at is". Note that that means `must lock against itself _on the current CPU due to interrupt_' but not on any other CPUs, and what it further means is `I intend this API to be usable at IPL_HIGH'. For debugging tools, that may be appropriate (and if it isn't usable at IPL_HIGH anyway for some other reason, then there's no sense in splhigh). lptopen is not a debugging tool, though, and there only reason it can rely on having to exclude only interrupts on the current CPU and not other parallel invocations of lptopen is that it currently runs giant-locked. That's an assumption that new code should generally avoid making.
Re: kernel condvars: how to use?
>>> Where do you set LPT_RF_WAITING? >> Hm, now that you mention it, I think I don't. > Note that a flag like this is not normally necessary, Of course. > or even particularly helpful -- cv_broadcast already has essentially > the same optimization internally. That doesn't surprise me, but it wasn't clear from the manpage - at least not the 5.2 manpage I had on hand - so I didn't want to depend on it. (I don't like to depend on accidents of the implementation, though of course in a lot of cases I end up doing so anyway.) > If anything, you should just test rather whether you changed from a > state that would block a read to a state in which a read can make > progress. That's kind-of what LPT_RF_WAITING records. > In this case, you might test whether sc->rbptr is newly less than > sc->rbfill. Yes, that's the test for whether a read can make progress...now. But wiring that into the wakeup code is a case of keeping the same information ("when can a read progress?") in two different places and thus vulnerable to getting out of sync. LPT_RF_WAITING is basically a way for the read code to communicate that status to its paired wakeup. > But I'd just skip the test altogether and always broadcast whenever > sc->rbfill changes. I may change it to do that >>> [...], and definitely avoid volatile voodoo. >> volatile is hardly voodoo. It is required by C for data accessed by >> both the main line and the interrupt line and mutated by at least >> one of them. (I didn't explicitly say, but this is a case of "necessary but likely not sufficient". volatile is needed from a C perspective, but there may be other things, like interprocessor cache protocols, that need attention as well.) > However, we still normally use __insn_barrier to enforce ordering > where it is important in those cases where we aren't using mutexes or > splfoo for some reason. What are the semantics of __insn_barrier (ie, as an API definition - I can certainly look up the semantics of the implementation)? "man -k barrier" shows me the bus_space stuff, mn, membar_*, and a handful of pthread_* pages, nothing more. > In the multiprocessor model, volatile is basically always red > herring. Looking at the implementation of __insn_barrier(), it looks like overkill: is forces _everything_ out of registers, stack temporaries, and the like, into wherever it nominally lives. Using volatile does so only for the things for which it actually matters (well, when used correctly, at least; like anything else, volatile can be misused). >> (If the mutex implementation uses hooks into the C implementation >> such that volatile is guaranteed unnecessary, fine, but that really >> needs to be clearly spelled out in the documentation - and probably >> would be very fragile, as a new compiler, or a new compiler version, >> could break it.) It also has the problem sketched above, in that it sledgehammers _everything_ out of temporaries, rather than confining the attention to just the things that actually need it. >>> Can you write down all the states you expect this thing to be in, >>> and the state transitions that each action [...] >> I think so, yes: [...] > That's helpful, but I should have phrased my question a little more > clearly: Can you write down the states that the lpt_softc data > structure can be in? Given a diagram of something like this: [...] Ah! sc_state is easy: if RAW is not set, the only piece of my new code that should ever run is the tests in lptopen() to see if we're opening the raw device and my other new stuff in the softc is all don't care, except for the condvar and mutex set up at attach time and thus aren't really don't-care, though they should never be touched when RAW is clear.. RAW set without OPEN also set, or with any of the other bits set, is a can't-happen. When open raw (sc_state == OPEN|RAW), the semantics of the pieces are: rawbuf buffers values read from the hardware but not yet passed back to userland. Elements subscripted < rbptr or >= rbfill are don't-care. rbfill being <0 or >=LPT_RAWBUFSIZE is a can't-happen. So is rbptr being <0 or >rbfill. laststat is used to detect changed status bits. Except for its initialization, it is never touched except by the callout ticker. LPT_NO_STAT is (by intent, at least) a value that cannot match anything read from lpt_status. (The code is, admittedly, broken if unsigned int and unsigned char are the same; while permitted by C, I think that's unlikely enough to be ignored on any hardware where this driver is useful. I really should comment it and/or switch to a separate flag.) > What are all the possible classes of tuples of values that sc_state, > rbptr, rbfill, , can have? Each class is a discrete state. Sounds as though the states you're looking for are (1) Buffer empty: rbptr == rbfill (2) Something buffered: rbptr < rbfill with transitions (elaborated below) (1)->(1) or (2)->(2): lpt_raw_read_tick finds status bits unchanged. (2)->(2):
Re: kernel condvars: how to use?
> Date: Fri, 8 Dec 2017 09:04:25 -0500 (EST) > From: Mouse> > > Where do you set LPT_RF_WAITING? > > Hm, now that you mention it, I think I don't. Note that a flag like this is not normally necessary, or even particularly helpful -- cv_broadcast already has essentially the same optimization internally. If anything, you should just test rather whether you changed from a state that would block a read to a state in which a read can make progress. In this case, you might test whether sc->rbptr is newly less than sc->rbfill. But I'd just skip the test altogether and always broadcast whenever sc->rbfill changes. > > It's unclear to me why you have splhigh in lptopen. > > To prevent sc->sc_state from being changed by anything else while the > open-testing logic runs. > > > splhigh should be reserved for very special purposes that are clearly > > justified. > > What is the correct way to interlock against anyone else changing > sc_state, then? Who else changes sc_state? Whoever does so must agree with lptopen about how to exclude access. In new code, this should generally be done with a mutex that all the callers agree on, since the callers might run on any CPU in parallel. E.g., you might just use a general kmutex_t sc_lock in lpt_softc which is configured for the highest-IPL interrupt handler that needs access to it. Only in legacy code running under the giant lock is it sufficient to raise the IPL without any interprocessor locks, and only if all the other callers are also giant-locked. > > In this case, everything will run giant-locked, but if you're writing > > new code you should avoid relying on that, and definitely avoid > > volatile voodoo. > > volatile is hardly voodoo. It is required by C for data accessed by > both the main line and the interrupt line and mutated by at least one > of them. (In current implementations, the interrupt line doesn't need > it, but that's not the same as the language promising it will work.) In the uniprocessor model of interrupts, and in special cases that we know are limited to it (like CPU-bound threads synchronizing with CPU-interrupts bound on the sam CPU), that's roughly right. However, we still normally use __insn_barrier to enforce ordering where it is important in those cases where we aren't using mutexes or splfoo for some reason. In the multiprocessor model, volatile is basically always red herring. > > For instance, in lpt_raw_read_tick, I advise that you unconditionally > > acquire the lock and use it for all access to the new fields you > > added: rawbuf, rflags, rbfill, rbptr, laststat. > > Yes, I daresay I should. But, locked or not, they still need to be > accessed through volatile names, or the new values could be cached > somewhere not visible to other concurrent accesses. (If the mutex > implementation uses hooks into the C implementation such that volatile > is guaranteed unnecessary, fine, but that really needs to be clearly > spelled out in the documentation - and probably would be very fragile, > as a new compiler, or a new compiler version, could break it.) Fixed in mutex(9). > > Can you write down all the states you expect this thing to be in, and > > the state transitions that each action -- lpt_raw_read_tick, > > lpt_raw_read -- can take, and in what states lpt_raw_read must wait > > for a tick to transition to another state before it can complete? > > I think so, yes: > > (1) Idle, not open > (2) Open, no read() posted > (3) Someone inside read(), not sleeping > [...] That's helpful, but I should have phrased my question a little more clearly: Can you write down the states that the lpt_softc data structure can be in? Given a diagram of something like this: +--+---+--+-+ sc_state| OPEN | OBUSY | INIT | RAW | +--+---+--+-+ ++ rawbuf | data...| +|---|---+ \ rbptr \ rbfill\ LPT_RAWBUFSIZE What are all the possible classes of tuples of values that sc_state, rbptr, rbfill, , can have? Each class is a discrete state. In each state, what can read do, and what state does it transition to? In which states must read block? In each state, what can read_tick do, and what state does it transition to? Which state transitions does read_tick cause that may unblock read? > Back in the days of using spl*() for mutual exclusion, the only harm > from using too-high an spl was that you might block more than you need > to (and, in some cases, can lock out whatever would normally wake you > up). Based on that, I would expect the use of a too-high IPL when > creating a mutex to be a relatively benign mistake. Is that inference > wrong? Just from the perspective of the program's function, you're mostly right. From the perspective of performance, using spin locks unnecessarily
Re: kernel condvars: how to use?
> For lptclose, note that callout_stop does not wait for the callout to > stop if it was in the middle of executing, and does not destroy it. > You must use callout_halt to wait, and then use callout_destroy to > destroy it. Okay, manpage bug - 5.2's callout(9) does not mention callout_halt (though it exists in and sys/kern/kern_timeout.c), so I was not aware of it. I should probably fix the manpage. (Also, its second argument is typed as void * rather than kmutex_t *. Weird.) I think I found the actual problem. My write routine was missing a break condition, so it was infinite-looping in kernel mode with the giantlock held, thereby blocking practically everything. (I found this by keeping a trace of line numbers in a ring buffer, for dumping with ddb; once I added that tracing to the write routine it became obvious. And, once noticed, it's obvious in the code too. And, if I'd thought it through propertly, I'd've realized that my userland test program doesn't yet do reads, only writes, so, while the callout ticker was perhaps relevant, the read routine couldn't've been because it was never getting called.) I was focusing on the callout/mutex stuff because it was the piece I felt experimental about. Perhaps I should have used sleep/wakeup until I had it working that way and then switched - "first make it work, then make it better". Thus, turns out most of the condvar/mutex stuff was a red herring. But 'twas not wasted, as I fixed at least two bugs waiting to happen and have a significantly better understanding of this stuff thanks you people! And, now, when I try to read (which I eventually will need to do), it has a fighting chance of actually working. In case anyone cares, I've dropped my latest version (I just rebuilt and am about to test it) in a new2/ directory in the anonymous ftp area I mentioned upthread. Many thanks once again. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: kernel condvars: how to use?
> Where do you set LPT_RF_WAITING? Hm, now that you mention it, I think I don't. That should prevent read() from ever being woken up, but I don't think it should deadlock the whole system. > Some other general comments: > Are you sure that's the code you're using? lpt_raw_read seems to > refer to a variable s in splx(s) without defining it -- seems to have > just `spllpt()' where it should have `s = spllpt()'. I sure thought it was, but you're right, that doesn't build. And I find an editor backup file that, as a diff base, shows the mystery spllpt() and two splx(s) calls as being added. So I suspect I may have started to edit something into it and somehow got only partway into it. > It's unclear to me why you have splhigh in lptopen. To prevent sc->sc_state from being changed by anything else while the open-testing logic runs. > splhigh should be reserved for very special purposes that are clearly > justified. What is the correct way to interlock against anyone else changing sc_state, then? > For lptclose, note that callout_stop does not wait for the callout to > stop if it was in the middle of executing, and does not destroy it. > You must use callout_halt to wait, and then use callout_destroy to > destroy it. Hm, okay, I msut have misread the manpage. > In this case, everything will run giant-locked, but if you're writing > new code you should avoid relying on that, and definitely avoid > volatile voodoo. volatile is hardly voodoo. It is required by C for data accessed by both the main line and the interrupt line and mutated by at least one of them. (In current implementations, the interrupt line doesn't need it, but that's not the same as the language promising it will work.) > For instance, in lpt_raw_read_tick, I advise that you unconditionally > acquire the lock and use it for all access to the new fields you > added: rawbuf, rflags, rbfill, rbptr, laststat. Yes, I daresay I should. But, locked or not, they still need to be accessed through volatile names, or the new values could be cached somewhere not visible to other concurrent accesses. (If the mutex implementation uses hooks into the C implementation such that volatile is guaranteed unnecessary, fine, but that really needs to be clearly spelled out in the documentation - and probably would be very fragile, as a new compiler, or a new compiler version, could break it.) > Can you write down all the states you expect this thing to be in, and > the state transitions that each action -- lpt_raw_read_tick, > lpt_raw_read -- can take, and in what states lpt_raw_read must wait > for a tick to transition to another state before it can complete? I think so, yes: (1) Idle, not open (2) Open, no read() posted (3) Someone inside read(), not sleeping (4) Someone inside read(), sleeping for more data (5) Inside lpt_raw_read_tick(), during (5a) state (2) (5b) state (3) (5c) state (4) The state transitions: (1)->(2): first userland open() (2)->(3): userland read() (3)->(4): read() finds it needs to sleep (3)->(2): read() finishes (4)->(3): read() woken up by lpt_raw_read_tick() (4)->(3): a signal is posted to the process (2)->(5a): interrupt ends up calling lpt_raw_read_tick() (3)->(5b): interrupt ends up calling lpt_raw_read_tick() (4)->(5c): interrupt ends up calling lpt_raw_read_tick() (5a)->(2): lpt_raw_read_tick() returns (5b)->(3): lpt_raw_read_tick() returns (5c)->(4): lpt_raw_read_tick() returns without cv_broadcast (5c)->(3): lpt_raw_read_tick() calls cv_broadcast and returns (4)->(3): lpt_raw_read_tick() savse a new sample and calls cv_broadcast (2)->(1): last userland close() >> ...I used IPL_VM, because splvm() is the same as spllpt(), and >> because the manpage says that results in a spin mutex, which I >> thought was what I needed. > You should set the IPL to the same one that the interrupt handler > runs at. In this case, it's a callout, so it's IPL_SOFTCLOCK, as the > callout(9) man page says. Back in the days of using spl*() for mutual exclusion, the only harm from using too-high an spl was that you might block more than you need to (and, in some cases, can lock out whatever would normally wake you up). Based on that, I would expect the use of a too-high IPL when creating a mutex to be a relatively benign mistake. Is that inference wrong? >> I don't know who holds it; I'm not sure how to find out. I should >> probably go peek under the hood. > With LOCKDEBUG, if you know the lock address you can use `show lock' > in ddb. ddb's lock == kernel's kmutex_t? Or do I need to work out the address of the mutex's mtx_lock, or what? I'll experiment a bit more and see what I can find. Thanks again! /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: kernel condvars: how to use?
On 12/08/17 09:04, Nick Hudson wrote: On 12/08/17 03:26, Mouse wrote: [Brian Buhrow] 1. [...]. Mutexes that use spin locks can't be used in interrupt context. Sure you don't have that backwards? I _think_ mutex(9) says that spin mutexes are the only ones that _can_ be used from an interrupt. Brain did get it backwards. To quote mutex(9)... My brain got Brian backwards... IPL_VM, IPL_SCHED, IPL_HIGH A spin mutex will be returned. Spin mutexes provide mutual exclusion between LWPs, and between LWPs and interrupt handlers. Nick
Re: kernel condvars: how to use?
On 12/08/17 03:26, Mouse wrote: [Brian Buhrow] 1. [...]. Mutexes that use spin locks can't be used in interrupt context. Sure you don't have that backwards? I _think_ mutex(9) says that spin mutexes are the only ones that _can_ be used from an interrupt. Brain did get it backwards. To quote mutex(9)... IPL_VM, IPL_SCHED, IPL_HIGH A spin mutex will be returned. Spin mutexes provide mutual exclusion between LWPs, and between LWPs and interrupt handlers. Nick
Re: kernel condvars: how to use?
> Date: Thu, 7 Dec 2017 22:26:54 -0500 (EST) > From: Mouse> > DIAGNOSTIC and DEBUG are both on already. None of the LOCK* options > are, though - see Paul Goyette's response, below. My usual debug-all-the-things kernel configuration is: makeoptions DBG=-"-g" options DEBUG options DIAGNOSTIC options LOCKDEBUG > ftp.rodents-montreal.org:/mouse/misc/lpt/ holds the code. base/ has > the code I started with (which is stock 5.2, for these files); new/ has > my current version. lptreg.h is identical; lptvar.h and lpt.c have > changes. diffs is output from "diff -u -r base new". (All these are > relative to sys/dev/ic/.) Where do you set LPT_RF_WAITING? Some other general comments: Are you sure that's the code you're using? lpt_raw_read seems to refer to a variable s in splx(s) without defining it -- seems to have just `spllpt()' where it should have `s = spllpt()'. Of course, all that is unnecessary with a lock at the right IPL, but it makes me suspicious that I'm not looking at the code that reproduces your bug. It's unclear to me why you have splhigh in lptopen. splhigh should be reserved for very special purposes that are clearly justified. If you need to defer the first tick until after the rest of the lpt is initialized, just use callout_setfunc first, and then after it's initialized at the end of lptopen do callout_schedule. For lptclose, note that callout_stop does not wait for the callout to stop if it was in the middle of executing, and does not destroy it. You must use callout_halt to wait, and then use callout_destroy to destroy it. In this case, everything will run giant-locked, but if you're writing new code you should avoid relying on that, and definitely avoid volatile voodoo. For instance, in lpt_raw_read_tick, I advise that you unconditionally acquire the lock and use it for all access to the new fields you added: rawbuf, rflags, rbfill, rbptr, laststat. (If not, then whoever wants to remove the giant lock will have to figure out the right places to put in memory ordering barriers, and that's not a fun surprise to give someone!) Can you write down all the states you expect this thing to be in, and the state transitions that each action -- lpt_raw_read_tick, lpt_raw_read -- can take, and in what states lpt_raw_read must wait for a tick to transition to another state before it can complete? The condition variable represents exactly that set of state transitions of interest under that mutex, and it's helpful to identify what it is in order to assess whether all paths that effect those state transitions also notify the condition variable. > ...I used IPL_VM, because splvm() is the same as spllpt(), and because > the manpage says that results in a spin mutex, which I thought was what > I needed. Is there some way to tell what IPL the hardware interrupt > for the device comes in on? You should set the IPL to the same one that the interrupt handler runs at. In this case, it's a callout, so it's IPL_SOFTCLOCK, as the callout(9) man page says. (Under the hood, in this case, there actually is _no_ splsoftclock() issued by mutex_enter, but that's not important to you -- what is important is that you pass IPL_SOFTCLOCK to mutex_init so that mutex_enter works in thread or interrupt context to exclude one another.) > Or, in my case, hang? I don't know who holds it; I'm not sure how to > find out. I should probably go peek under the hood. With LOCKDEBUG, if you know the lock address you can use `show lock' in ddb.
Re: kernel condvars: how to use?
Omnibus reply here to some half-dozen of the replies; I'll try to get all the attributions right. First, a big thank you to everyone who replied. I was not expecting this many, nor this helpful, responses to something about NetBSD as old as 5.2! [me - all the double-quoted text here is mine] >> db{0}> tr >> breakpoint() at netbsd:breakpoint+0x5 >> comintr() at netbsd:comintr+0x53a >> Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xeb >> --- interrupt --- >> x86_pause() at netbsd:x86_pause >> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16 >> Xintr_ioapic_level5() at netbsd:Xintr_ioapic_level5+0xf3 >> --- interrupt --- >> x86_pause() at netbsd:x86_pause+0x2 >> cdev_poll() at netbsd:cdev_poll+0x6d [Taylor R Campbell] > This stack trace suggests that you're in the middle of a busy-wait > loop somewhere inside your d_poll routine I didn't mention that because I knew it was false (and neglected to say anything about it - my fault). Why am I sure? Because the driver in question uses nopoll as its poll routine. :) > It may be helpful to build with all debug options enabled. DIAGNOSTIC and DEBUG are both on already. None of the LOCK* options are, though - see Paul Goyette's response, below. >> On reflection, I think I know why. Userland's syscall handler took >> the mutex in preparation for cv_wait_sig(), the interrupt happens, >> my code is called (verified with a printf), and it tries to take the >> same mutex so it can cv_broadcast(). > cv_wait_sig releases the lock while it waits. I'm positing an interrupt that strikes after taking the mutex and before enterint cv_wait_sig. (See also your other message, below.) >> I can of course provide more information if it would help, but I'm >> not sure what would be useful to mention here. > Can you provide the code you're using? Hard to guess otherwise. Heh. Fair enough. First, a brief sketch of my intent What I was/am trying to do was to take the lpt driver and give it a mode in which it uses the four output and five input bits in the control and status registers for low-data-rate parallel digital input and output. The design is that, on output, userland write()s a buffer of even length and the kernel shoves alternate bytes out the data and control ports, with no particular timing constraints (in the anticipated use case there will be only one pair of bytes per write, and writes are only occasional); for input, I have a callout that, once per hz, checks to see if the status bits have changed, and, if so, saves them in a buffer for read(). (The input data rate is low - in the anticipated application, those input bits are driven off mechanical switches operated by human actions - and only changes matter.) I actually may change this to provide, somehow, different interfaces for the control bits and the data bits; I'm not sure the "alternate bytes" paradigm is all that good a one for my anticipated use. Rather than replace the lpt driver entirely, I had it grow another flag bit in the device minor number: 0x100 indicates this `raw' mode. As for the code ftp.rodents-montreal.org:/mouse/misc/lpt/ holds the code. base/ has the code I started with (which is stock 5.2, for these files); new/ has my current version. lptreg.h is identical; lptvar.h and lpt.c have changes. diffs is output from "diff -u -r base new". (All these are relative to sys/dev/ic/.) >> So I wrote some code using a condvar and a mutex, and the system >> promptly deadlocked. [...] [Joerg Sonnenberger] > Did you set the IPL for your mutex correctly? I don't know. I tried to... > Adaptive mutexes must not be shared with interrupts. ...I used IPL_VM, because splvm() is the same as spllpt(), and because the manpage says that results in a spin mutex, which I thought was what I needed. Is there some way to tell what IPL the hardware interrupt for the device comes in on? [Taylor R Campbell again, different email] > Note that if you use a mutex in a thread _and_ an interrupt handler, > you must initialize the mutex with the ipl at which the interrupt > handler runs. That way, while you hold the lock, it will block the > interrupt handler too, which avoids the scenario you described. Oh! That's very important; it was not clear to me from the manpage that mutex_enter() implies spl*(). Yes, then, as you say, my scenario is impossible if the mutex is correctly set up. [Brian Buhrow] > 1. [...]. Mutexes that use spin locks can't be used in interrupt > context. Sure you don't have that backwards? I _think_ mutex(9) says that spin mutexes are the only ones that _can_ be used from an interrupt. > 2. Initialize your convar with cv_init(). Done. (I think.) > 4. If you run into lock contention when debugging your code, pay > careful attention to who holds the lock at the time of the panic. Or, in my case, hang? I don't know who holds it; I'm not sure how to find out. I should probably go peek under the hood. [me] >> [...my scenario outline...]
Re: kernel condvars: how to use?
On Thu, 7 Dec 2017, Brian Buhrow wrote: 4. If you run into lock contention when debugging your code, pay careful attention to who holds the lock at the time of the panic. I found times when I was locking against myself in non-obvious manners. You might well find LOCKDEBUG to be your friend here! +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: kernel condvars: how to use?
hello. I don't consider myself an expert with this stuff, but I've spent quite a bit of time converting kernel code from using spl()/splx() in 5.2 to using mutexes and convars with some success. Here are some notes that may be helpful in your work: 1. You must initialize mutexes, as Taylor noted, and pick the kind of mutex you want and the level at which you want it to run. Mutexes that use spin locks can't be used in interrupt context. If you're sharing your IPL with other drivers that may want to hold mutexes at the same level, particularly ones that might be doing some operation like copying data into or out of user space, it's good to structure your code in such a way that you won't try to run when they do. 2. Initialize your convar with cv_init(). 3. Many times, you can replace s = spl(ipl) with mutex_enter(my_mutex) and splx(s) with mutex_exit(my_mutex). It's generally considered to be a very bad idea to sleep while holding a mutex, much as it is while executing code inside an spl'd stanza. 4. If you run into lock contention when debugging your code, pay careful attention to who holds the lock at the time of the panic. I found times when I was locking against myself in non-obvious manners. 5. Read the manual pages for mutex(9), convar(9) and rwlock(9), and, when done, read them again. Then, look at working examples in the code. Eventually, it will click in your head and begin to make sense. Hope these imprecise notes are somewhat helpful. -Brian On Dec 7, 6:24pm, Mouse wrote: } Subject: kernel condvars: how to use? } I'm trying to write some kernel code, interlocking between an interrupt } (in my case, a callout()-called function) and a driver read() function. } I'm using 5.2, so if this is because of bugs that have been fixed since } then, that's useful information. (And anyone who isn't interested } because I'm on such an old version need read no further.) } } I noted that the interfaces I have historically used for this - spl*(), } sleep(), and wakeup() - are documented as deprecated in favour of } condvar(9), mutex(9), and rwlock(9). So I wrote some code using a } condvar and a mutex, and the system promptly deadlocked. I got into } ddb, which told me it was inside intr_biglock_wrapper(): } } db{0}> tr } breakpoint() at netbsd:breakpoint+0x5 } comintr() at netbsd:comintr+0x53a } Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xeb } --- interrupt --- } x86_pause() at netbsd:x86_pause } intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16 } Xintr_ioapic_level5() at netbsd:Xintr_ioapic_level5+0xf3 } --- interrupt --- } x86_pause() at netbsd:x86_pause+0x2 } cdev_poll() at netbsd:cdev_poll+0x6d } VOP_POLL() at netbsd:VOP_POLL+0x5e } pollcommon() at netbsd:pollcommon+0x265 } sys_poll() at netbsd:sys_poll+0x5b } syscall() at netbsd:syscall+0xb9 } db{0}> } } On reflection, I think I know why. Userland's syscall handler took the } mutex in preparation for cv_wait_sig(), the interrupt happens, my code } is called (verified with a printf), and it tries to take the same mutex } so it can cv_broadcast(). Of course, the mutex is held and, because } it's held by code which can't run until the interrupt handler exits, } will never be released. Then, when a hardware interrupt hit it found } the biglock held } } Clearly, I'm doing something wrong. But I can't see what. I can't see } how to use the condvar/mutex primitives without provoking the above } failure mode. And they appear to still be the current recommended way, } based on what I could find, so I'm presumably just missing something. } Any hints what? } } I can of course provide more information if it would help, but I'm not } sure what would be useful to mention here. } } /~\ The ASCII Mouse } \ / Ribbon Campaign } X Against HTML mo...@rodents-montreal.org } / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B >-- End of excerpt from Mouse
Re: kernel condvars: how to use?
> Date: Thu, 7 Dec 2017 18:24:22 -0500 (EST) > From: Mouse> > On reflection, I think I know why. Userland's syscall handler took the > mutex in preparation for cv_wait_sig(), the interrupt happens, my code > is called (verified with a printf), and it tries to take the same mutex > so it can cv_broadcast(). Of course, the mutex is held and, because > it's held by code which can't run until the interrupt handler exits, > will never be released. Then, when a hardware interrupt hit it found > the biglock held Note that if you use a mutex in a thread _and_ an interrupt handler, you must initialize the mutex with the ipl at which the interrupt handler runs. That way, while you hold the lock, it will block the interrupt handler too, which avoids the scenario you described. It's possible that's what happened in the stack trace you showed if you passed the a too-low IPL_* to mutex_init, but still hard to tell without reference to the code.
Re: kernel condvars: how to use?
> Date: Thu, 7 Dec 2017 18:24:22 -0500 (EST) > From: Mouse> > db{0}> tr > breakpoint() at netbsd:breakpoint+0x5 > comintr() at netbsd:comintr+0x53a > Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xeb > --- interrupt --- > x86_pause() at netbsd:x86_pause > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16 > Xintr_ioapic_level5() at netbsd:Xintr_ioapic_level5+0xf3 > --- interrupt --- > x86_pause() at netbsd:x86_pause+0x2 > cdev_poll() at netbsd:cdev_poll+0x6d This stack trace suggests that you're in the middle of a busy-wait loop somewhere inside your d_poll routine -- x86_pause is used between iterations in a busy-wait. I suspect there may be an intermediate call frame that ddb isn't picking up. It may be helpful to build with all debug options enabled. > On reflection, I think I know why. Userland's syscall handler took the > mutex in preparation for cv_wait_sig(), the interrupt happens, my code > is called (verified with a printf), and it tries to take the same mutex > so it can cv_broadcast(). Of course, the mutex is held and, because > it's held by code which can't run until the interrupt handler exits, > will never be released. Then, when a hardware interrupt hit it found > the biglock held cv_wait_sig releases the lock while it waits. > I can of course provide more information if it would help, but I'm not > sure what would be useful to mention here. Can you provide the code you're using? Hard to guess otherwise.
kernel condvars: how to use?
I'm trying to write some kernel code, interlocking between an interrupt (in my case, a callout()-called function) and a driver read() function. I'm using 5.2, so if this is because of bugs that have been fixed since then, that's useful information. (And anyone who isn't interested because I'm on such an old version need read no further.) I noted that the interfaces I have historically used for this - spl*(), sleep(), and wakeup() - are documented as deprecated in favour of condvar(9), mutex(9), and rwlock(9). So I wrote some code using a condvar and a mutex, and the system promptly deadlocked. I got into ddb, which told me it was inside intr_biglock_wrapper(): db{0}> tr breakpoint() at netbsd:breakpoint+0x5 comintr() at netbsd:comintr+0x53a Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xeb --- interrupt --- x86_pause() at netbsd:x86_pause intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16 Xintr_ioapic_level5() at netbsd:Xintr_ioapic_level5+0xf3 --- interrupt --- x86_pause() at netbsd:x86_pause+0x2 cdev_poll() at netbsd:cdev_poll+0x6d VOP_POLL() at netbsd:VOP_POLL+0x5e pollcommon() at netbsd:pollcommon+0x265 sys_poll() at netbsd:sys_poll+0x5b syscall() at netbsd:syscall+0xb9 db{0}> On reflection, I think I know why. Userland's syscall handler took the mutex in preparation for cv_wait_sig(), the interrupt happens, my code is called (verified with a printf), and it tries to take the same mutex so it can cv_broadcast(). Of course, the mutex is held and, because it's held by code which can't run until the interrupt handler exits, will never be released. Then, when a hardware interrupt hit it found the biglock held Clearly, I'm doing something wrong. But I can't see what. I can't see how to use the condvar/mutex primitives without provoking the above failure mode. And they appear to still be the current recommended way, based on what I could find, so I'm presumably just missing something. Any hints what? I can of course provide more information if it would help, but I'm not sure what would be useful to mention here. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B