Re: locking against myself panic (cprng_strongreseed, filt_rndread)
> Date: Fri, 10 Nov 2017 13:58:51 +0100 > From: Edgar Fuß> > Short: Your patch seems to work. Pullup request submitted: https://releng.netbsd.org/cgi-bin/req-6.cgi?show=1512 > Long: I don't quite understand the behaviour, but at least, it doesn't panic. > > I wrote a test programm (my first encounter with kqueue, so probably wrong) > to exercise EVFILT_READ. > I created a named pipe and it did approximately what I expected (on start, > it hung in fiford, which I didn't expect). > I wanted to test in on /dev/urandom, it read 512 bytes, then timed out > (program wrote dots) and paniced (same traceback as originally) after > about 15 of them. > On /dev/random, it read 512 bytes, wrote dots and didn't panic (at least > not for 20 seconds). On a second run on /dev/random, it read 512 bytes > and paniced immediately after. There are multiple compounding bugs here. Here's how it works: 1. - When you register an EVFILT_READ event for /dev/random, it draws a PRNG seed from the entropy pool (`depleting it'), and records that _if_ entropy is ever added to the pool then the waiter should be notified. - Same for /dev/urandom, except it doesn't draw a new seed if there already was a per-CPU PRNG initialized. (Note that the caller blocks if there already is entropy in the pool -- the event only waits for _adding_ entropy to the pool. This is an independent bug that was also fixed in netbsd-7.) 2. When someone later adds entropy to the pool (e.g., uvm fault, or network packet arrival, or disk seek, or `echo htthththttthtthh > /dev/random'), cprng_strong_reseed gets called. 3. cprng_strong_reseed does a locking dance with a pair of cyclic mutexes (also gone in netbsd-7). If it wins the dance, it calls cprng_strong_doreseed with c->reseed.mtx and c->mtx held. 4. cprng_strong_doreseed calls selnotify, which notifies all the kqueue events by... 5. ...calling filt_rndread, which calls cprng_strong_ready, which acquires c->mtx again --> panic. The patch changes cprng_strong_ready so that it requires the caller to hold c->mtx (which was necessary anyway for the caller to make sensible decisions on the basis of invariants preserved by c->mtx), and enables two ways to call filt_rndread: - with c->mtx not held, indicated by having NOTE_SUBMIT clear, when the waiter is registering the kevent and about to sleep; and - with c->mtx held, indicated by having NOTE_SUBMIT set, when the event happens. (This kind of crazy NOTE_SUBMIT protocol turns out to be the standard way to write kqueue filters. Go figure.) > With the patch, it read behaved identically on the first run; on the second, > it read about twenty 512-byte chunks and continued writing dots (if I > remember correctly). In any case, I couldn't make it panic. Right -- all of the _other_ bugs in netbsd-6 /dev/random are still there. All that the patch changed is the instapanic on adding entropy to the pool when someone is waiting with kqueue to read from /dev/u?random.
Re: locking against myself panic (cprng_strongreseed, filt_rndread)
Short: Your patch seems to work. Long: I don't quite understand the behaviour, but at least, it doesn't panic. I wrote a test programm (my first encounter with kqueue, so probably wrong) to exercise EVFILT_READ. I created a named pipe and it did approximately what I expected (on start, it hung in fiford, which I didn't expect). I wanted to test in on /dev/urandom, it read 512 bytes, then timed out (program wrote dots) and paniced (same traceback as originally) after about 15 of them. On /dev/random, it read 512 bytes, wrote dots and didn't panic (at least not for 20 seconds). On a second run on /dev/random, it read 512 bytes and paniced immediately after. With the patch, it read behaved identically on the first run; on the second, it read about twenty 512-byte chunks and continued writing dots (if I remember correctly). In any case, I couldn't make it panic. #include #include #include #include #include #include int main(int argc, char *argv[]) { int fd; int kq; struct kevent ev; int nev; const struct timespec tout = { 1, 0 }; int i; char buf[1024]; if (argc != 2) errx(1, "argc"); if ((fd = open(argv[1], O_RDONLY)) == -1) err(1, "open %s", argv[1]); if ((kq = kqueue()) == -1) err(1, "kqueue"); EV_SET(, fd, EVFILT_READ, EV_ADD | EV_ENABLE | EV_CLEAR, 0, 0, 0); if (kevent(kq, , 1, NULL, 0, ) == -1) err(1, "kevent1"); for (;;) { if ((nev = kevent(kq, NULL, 0, , 1, )) == -1) err(1, "kevent2"); if (nev == 0) { fputc('.', stdout); fflush(stdout); continue; } i = ev.data; printf("%d", i); fflush(stdout); if (i > sizeof buf) i = sizeof buf; if (read(fd, buf, i) == -1) err(1, "read"); } (void)close(kq); (void)close(fd); }
Re: locking against myself panic (cprng_strongreseed, filt_rndread)
> Date: Wed, 8 Nov 2017 11:22:26 +0100 > From: Edgar Fuß> > > Not surprising: cprng locking was completely hosed in netbsd-6 until > > it got rewritten for netbsd-7. > So I can expect all of my servers to panic at any time? > Can I mitigate the probability of the panic? If you can guarantee either (a) you have a hardware RNG, or (b) the system is updating and using the /var/db/entropy file, then you can safely replace /dev/random by a symlink to /dev/urandom, and that should avoid the panic for certain use patterns. Specifically, processes that first try to read from /dev/random, and then wait with select/poll/kqueue only if it returns EAGAIN, will avoid triggering that code path by reading exclusively from (what is actually) /dev/urandom, which never returns EAGAIN. Processes that start by select/poll/kqueue, however, may still trigger the bad code path. I threw together a patch, attached, that _might_ fix this particular code path, but (a) I don't have time to test or even compile-test it at the moment, and (b) there are lots of other things wrong with the netbsd-6 cprng stuff, so don't get too excited. If you'd like to test it, we can see about applying it as a `pullup' (not that it has any connection to the changes in netbsd-7 that were a near-total rewrite). diff --git a/sys/dev/rndpseudo.c b/sys/dev/rndpseudo.c index 5501263020d5..68eeaa700d3b 100644 --- a/sys/dev/rndpseudo.c +++ b/sys/dev/rndpseudo.c @@ -673,13 +673,13 @@ rnd_poll(struct file *fp, int events) } } + mutex_enter(>cprng->mtx); if (cprng_strong_ready(ctx->cprng)) { revents |= events & (POLLIN | POLLRDNORM); } else { - mutex_enter(>cprng->mtx); selrecord(curlwp, >cprng->selq); - mutex_exit(>cprng->mtx); } + mutex_exit(>cprng->mtx); return (revents); } @@ -731,12 +731,24 @@ static int filt_rndread(struct knote *kn, long hint) { cprng_strong_t *c = kn->kn_hook; + int ret; + if (hint & NOTE_SUBMIT) + KASSERT(mutex_owned(>mtx)); + else + mutex_enter(>mtx); if (cprng_strong_ready(c)) { kn->kn_data = RND_TEMP_BUFFER_SIZE; - return 1; + ret = 1; + } else { + ret = 0; } - return 0; + if (hint & NOTE_SUBMIT) + KASSERT(mutex_owned(>mtx)); + else + mutex_exit(>mtx); + + return ret; } static const struct filterops rnd_seltrue_filtops = diff --git a/sys/kern/subr_cprng.c b/sys/kern/subr_cprng.c index 168a83066844..cdb382b84c81 100644 --- a/sys/kern/subr_cprng.c +++ b/sys/kern/subr_cprng.c @@ -95,7 +95,7 @@ cprng_strong_doreseed(cprng_strong_t *const c) if (c->flags & CPRNG_USE_CV) { cv_broadcast(>cv); } - selnotify(>selq, 0, 0); + selnotify(>selq, 0, NOTE_SUBMIT); } static void @@ -397,7 +397,7 @@ cprng_strong_setflags(cprng_strong_t *const c, int flags) if (c->flags & CPRNG_USE_CV) { cv_broadcast(>cv); } - selnotify(>selq, 0, 0); + selnotify(>selq, 0, NOTE_SUBMIT); } } c->flags = flags; diff --git a/sys/sys/cprng.h b/sys/sys/cprng.h index 984a06fb1b11..278ca3ab2885 100644 --- a/sys/sys/cprng.h +++ b/sys/sys/cprng.h @@ -121,12 +121,11 @@ static inline int cprng_strong_ready(cprng_strong_t *c) { int ret = 0; - - mutex_enter(>mtx); + + KASSERT(mutex_owned(>mtx)); if (c->drbg.reseed_counter < NIST_CTR_DRBG_RESEED_INTERVAL) { ret = 1; } - mutex_exit(>mtx); return ret; }
Re: locking against myself panic (cprng_strongreseed, filt_rndread)
> Not surprising: cprng locking was completely hosed in netbsd-6 until > it got rewritten for netbsd-7. So I can expect all of my servers to panic at any time? Can I mitigate the probability of the panic?
Re: locking against myself panic (cprng_strongreseed, filt_rndread)
> Date: Tue, 7 Nov 2017 22:58:39 +0100 > From: Edgar Fuß> > I got the following panic on a mostly idle 6.1/amd64: > > Mutex error: mutex_vector_enter: locking against myself > > and then the machine got stuck. It's a development server, so I can > leave it stuck until tomorrow in case someone wants me to examine > anything in particular. Also, no need to keep it around for diagnostics; it's clear what the lock recursion is: cprng_strong_reseed acquires c->mtx and calls selnotify, which calls filt_rndread, which calls cprng_strong_ready, which tries to recursively acquire c->mtx.
Re: locking against myself panic (cprng_strongreseed, filt_rndread)
> Date: Tue, 7 Nov 2017 22:58:39 +0100 > From: Edgar =?iso-8859-1?B?RnXf?=> > I got the following panic on a mostly idle 6.1/amd64: > > Mutex error: mutex_vector_enter: locking against myself > [...] > mutex_vector_enter() at netbsd:mutex_vector_enter+0x3df > filt_rndread() at netbsd:filt_rndread+0x16 > knote() at netbsd:knote+0x33 > selnotify() at netbsd:selnotify+0x24 > cprng_strong_doreseed() at netbsd:cprng_strong_doreseed+0xbd > cprng_strong_reseed() at netbsd:cprng_strong_reseed+0x2d > rnd_wakeup_readers() at netbsd:rnd_wakeup_readers+0x171 > callout_softclock() at netbsd:callout_softclock+0x174 Not surprising: cprng locking was completely hosed in netbsd-6 until it got rewritten for netbsd-7. If this is particularly important, we can maybe find a way to pull up a simpler change to netbsd-6, or pull up all of the netbsd-7 entropy subsystem wholesale, but it may not be easy either way.