Re: vfs_shutdown would like to do polled I/O at least on panic
> Date: Mon, 11 May 2015 16:54:54 +0200 > From: Mike Belopuhov > > On Fri, May 08, 2015 at 20:28 +0200, Mark Kettenis wrote: > > > Date: Fri, 8 May 2015 20:15:58 +0200 > > > From: Mike Belopuhov > > > > > > On Fri, May 08, 2015 at 12:34 +0200, Mike Belopuhov wrote: > > > > > I think tsleep(9) and msleep(9) need to release and re-acquire the > > > > > kernel lock in the "cold || panicstr" case. > > > > > > > > Well, it's not hard to do really, but... > > > > > > > > > We might need this for > > > > > handling interrupts during autoconf as soon as we start distributing > > > > > interrupts over CPUs. > > > > > > > > > > > > > ...cold might mean that interrupts are not ready yet. So then we might > > > > need another flag for shutdown? > > > > > > This is what I have come up with. Chunks were taken directly from > > > mi_switch and it seems to do the job just fine. Right now I'm not > > > using any additional flags and it seems to work here. I'll resume > > > testing on Monday, but it looks fairly complete. Any comments? > > > > Makes sense to me. The msleep/tsleep code could be simplified to: > > > > if (__mp_lock_held(&kernel_lock)) { > > hold_count = __mp_release_all(&kernel_lock); > > __mp_acquire_count(&kernel_lock, hold_count); > > } > > > > Indeed. > > > I'm also wondering whether we should change __mp_release_all() to > > simple return 0 if the current CPU does not hold the lock, such that > > the __mp_lock_held() check isn't needed anymore. But that's a > > separate issue. > > > > It would have made the __mp_release_all safer as well since it > wouldn't require an external check. > > I don't have any firther input on this, diff works fine here. > > OK? ok kettenis@ > diff --git sys/kern/kern_synch.c sys/kern/kern_synch.c > index 03308b4..6f573fc 100644 > --- sys/kern/kern_synch.c > +++ sys/kern/kern_synch.c > @@ -103,10 +103,13 @@ extern int safepri; > int > tsleep(const volatile void *ident, int priority, const char *wmesg, int timo) > { > struct sleep_state sls; > int error, error1; > +#ifdef MULTIPROCESSOR > + int hold_count; > +#endif > > KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); > > #ifdef MULTIPROCESSOR > KASSERT(timo || __mp_lock_held(&kernel_lock)); > @@ -120,10 +123,16 @@ tsleep(const volatile void *ident, int priority, const > char *wmesg, int timo) >* don't run any other procs or panic below, >* in case this is the idle process and already asleep. >*/ > s = splhigh(); > splx(safepri); > +#ifdef MULTIPROCESSOR > + if (__mp_lock_held(&kernel_lock)) { > + hold_count = __mp_release_all(&kernel_lock); > + __mp_acquire_count(&kernel_lock, hold_count); > + } > +#endif > splx(s); > return (0); > } > > sleep_setup(&sls, ident, priority, wmesg); > @@ -149,10 +158,13 @@ int > msleep(const volatile void *ident, struct mutex *mtx, int priority, > const char *wmesg, int timo) > { > struct sleep_state sls; > int error, error1, spl; > +#ifdef MULTIPROCESSOR > + int hold_count; > +#endif > > KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); > KASSERT(mtx != NULL); > > if (cold || panicstr) { > @@ -163,10 +175,16 @@ msleep(const volatile void *ident, struct mutex *mtx, > int priority, >* in case this is the idle process and already asleep. >*/ > spl = MUTEX_OLDIPL(mtx); > MUTEX_OLDIPL(mtx) = safepri; > mtx_leave(mtx); > +#ifdef MULTIPROCESSOR > + if (__mp_lock_held(&kernel_lock)) { > + hold_count = __mp_release_all(&kernel_lock); > + __mp_acquire_count(&kernel_lock, hold_count); > + } > +#endif > if ((priority & PNORELOCK) == 0) { > mtx_enter(mtx); > MUTEX_OLDIPL(mtx) = spl; > } else > splx(spl); > diff --git sys/kern/vfs_subr.c sys/kern/vfs_subr.c > index a26fbe2..a373789 100644 > --- sys/kern/vfs_subr.c > +++ sys/kern/vfs_subr.c > @@ -1664,10 +1664,13 @@ int > vfs_syncwait(int verbose) > { > struct buf *bp; > int iter, nbusy, dcount, s; > struct proc *p; > +#ifdef MULTIPROCESSOR > + int hold_count; > +#endif > > p = curproc? curproc : &proc0; > sys_sync(p, (void *)0, (register_t *)0); > > /* Wait for sync to finish. */ > @@ -1698,11 +1701,21 @@ vfs_syncwait(int verbose) > } > if (nbusy == 0) > break; > if (verbose) > printf("%d ", nbusy); > +#ifdef MULTIPROCESSOR > + if (__mp_lock_held(&kernel_lock)) > + hold_count = __mp_release_all(&kernel_lock); > + else > +
Re: vfs_shutdown would like to do polled I/O at least on panic
On Fri, May 08, 2015 at 20:28 +0200, Mark Kettenis wrote: > > Date: Fri, 8 May 2015 20:15:58 +0200 > > From: Mike Belopuhov > > > > On Fri, May 08, 2015 at 12:34 +0200, Mike Belopuhov wrote: > > > > I think tsleep(9) and msleep(9) need to release and re-acquire the > > > > kernel lock in the "cold || panicstr" case. > > > > > > Well, it's not hard to do really, but... > > > > > > > We might need this for > > > > handling interrupts during autoconf as soon as we start distributing > > > > interrupts over CPUs. > > > > > > > > > > ...cold might mean that interrupts are not ready yet. So then we might > > > need another flag for shutdown? > > > > This is what I have come up with. Chunks were taken directly from > > mi_switch and it seems to do the job just fine. Right now I'm not > > using any additional flags and it seems to work here. I'll resume > > testing on Monday, but it looks fairly complete. Any comments? > > Makes sense to me. The msleep/tsleep code could be simplified to: > > if (__mp_lock_held(&kernel_lock)) { > hold_count = __mp_release_all(&kernel_lock); > __mp_acquire_count(&kernel_lock, hold_count); > } > Indeed. > I'm also wondering whether we should change __mp_release_all() to > simple return 0 if the current CPU does not hold the lock, such that > the __mp_lock_held() check isn't needed anymore. But that's a > separate issue. > It would have made the __mp_release_all safer as well since it wouldn't require an external check. I don't have any firther input on this, diff works fine here. OK? diff --git sys/kern/kern_synch.c sys/kern/kern_synch.c index 03308b4..6f573fc 100644 --- sys/kern/kern_synch.c +++ sys/kern/kern_synch.c @@ -103,10 +103,13 @@ extern int safepri; int tsleep(const volatile void *ident, int priority, const char *wmesg, int timo) { struct sleep_state sls; int error, error1; +#ifdef MULTIPROCESSOR + int hold_count; +#endif KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); #ifdef MULTIPROCESSOR KASSERT(timo || __mp_lock_held(&kernel_lock)); @@ -120,10 +123,16 @@ tsleep(const volatile void *ident, int priority, const char *wmesg, int timo) * don't run any other procs or panic below, * in case this is the idle process and already asleep. */ s = splhigh(); splx(safepri); +#ifdef MULTIPROCESSOR + if (__mp_lock_held(&kernel_lock)) { + hold_count = __mp_release_all(&kernel_lock); + __mp_acquire_count(&kernel_lock, hold_count); + } +#endif splx(s); return (0); } sleep_setup(&sls, ident, priority, wmesg); @@ -149,10 +158,13 @@ int msleep(const volatile void *ident, struct mutex *mtx, int priority, const char *wmesg, int timo) { struct sleep_state sls; int error, error1, spl; +#ifdef MULTIPROCESSOR + int hold_count; +#endif KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); KASSERT(mtx != NULL); if (cold || panicstr) { @@ -163,10 +175,16 @@ msleep(const volatile void *ident, struct mutex *mtx, int priority, * in case this is the idle process and already asleep. */ spl = MUTEX_OLDIPL(mtx); MUTEX_OLDIPL(mtx) = safepri; mtx_leave(mtx); +#ifdef MULTIPROCESSOR + if (__mp_lock_held(&kernel_lock)) { + hold_count = __mp_release_all(&kernel_lock); + __mp_acquire_count(&kernel_lock, hold_count); + } +#endif if ((priority & PNORELOCK) == 0) { mtx_enter(mtx); MUTEX_OLDIPL(mtx) = spl; } else splx(spl); diff --git sys/kern/vfs_subr.c sys/kern/vfs_subr.c index a26fbe2..a373789 100644 --- sys/kern/vfs_subr.c +++ sys/kern/vfs_subr.c @@ -1664,10 +1664,13 @@ int vfs_syncwait(int verbose) { struct buf *bp; int iter, nbusy, dcount, s; struct proc *p; +#ifdef MULTIPROCESSOR + int hold_count; +#endif p = curproc? curproc : &proc0; sys_sync(p, (void *)0, (register_t *)0); /* Wait for sync to finish. */ @@ -1698,11 +1701,21 @@ vfs_syncwait(int verbose) } if (nbusy == 0) break; if (verbose) printf("%d ", nbusy); +#ifdef MULTIPROCESSOR + if (__mp_lock_held(&kernel_lock)) + hold_count = __mp_release_all(&kernel_lock); + else + hold_count = 0; +#endif DELAY(4 * iter); +#ifdef MULTIPROCESSOR + if (hold_count) + __mp_acquire_count(&kernel_lock, hold_count); +#endif } return nbusy; }
Re: vfs_shutdown would like to do polled I/O at least on panic
> Date: Fri, 8 May 2015 20:15:58 +0200 > From: Mike Belopuhov > > On Fri, May 08, 2015 at 12:34 +0200, Mike Belopuhov wrote: > > > I think tsleep(9) and msleep(9) need to release and re-acquire the > > > kernel lock in the "cold || panicstr" case. > > > > Well, it's not hard to do really, but... > > > > > We might need this for > > > handling interrupts during autoconf as soon as we start distributing > > > interrupts over CPUs. > > > > > > > ...cold might mean that interrupts are not ready yet. So then we might > > need another flag for shutdown? > > This is what I have come up with. Chunks were taken directly from > mi_switch and it seems to do the job just fine. Right now I'm not > using any additional flags and it seems to work here. I'll resume > testing on Monday, but it looks fairly complete. Any comments? Makes sense to me. The msleep/tsleep code could be simplified to: if (__mp_lock_held(&kernel_lock)) { hold_count = __mp_release_all(&kernel_lock); __mp_acquire_count(&kernel_lock, hold_count); } I'm also wondering whether we should change __mp_release_all() to simple return 0 if the current CPU does not hold the lock, such that the __mp_lock_held() check isn't needed anymore. But that's a separate issue. > diff --git sys/kern/kern_synch.c sys/kern/kern_synch.c > index 03308b4..4089adf 100644 > --- sys/kern/kern_synch.c > +++ sys/kern/kern_synch.c > @@ -103,10 +103,13 @@ extern int safepri; > int > tsleep(const volatile void *ident, int priority, const char *wmesg, int timo) > { > struct sleep_state sls; > int error, error1; > +#ifdef MULTIPROCESSOR > + int hold_count; > +#endif > > KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); > > #ifdef MULTIPROCESSOR > KASSERT(timo || __mp_lock_held(&kernel_lock)); > @@ -120,10 +123,18 @@ tsleep(const volatile void *ident, int priority, const > char *wmesg, int timo) >* don't run any other procs or panic below, >* in case this is the idle process and already asleep. >*/ > s = splhigh(); > splx(safepri); > +#ifdef MULTIPROCESSOR > + if (__mp_lock_held(&kernel_lock)) > + hold_count = __mp_release_all(&kernel_lock); > + else > + hold_count = 0; > + if (hold_count) > + __mp_acquire_count(&kernel_lock, hold_count); > +#endif > splx(s); > return (0); > } > > sleep_setup(&sls, ident, priority, wmesg); > @@ -149,10 +160,13 @@ int > msleep(const volatile void *ident, struct mutex *mtx, int priority, > const char *wmesg, int timo) > { > struct sleep_state sls; > int error, error1, spl; > +#ifdef MULTIPROCESSOR > + int hold_count; > +#endif > > KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); > KASSERT(mtx != NULL); > > if (cold || panicstr) { > @@ -163,10 +177,18 @@ msleep(const volatile void *ident, struct mutex *mtx, > int priority, >* in case this is the idle process and already asleep. >*/ > spl = MUTEX_OLDIPL(mtx); > MUTEX_OLDIPL(mtx) = safepri; > mtx_leave(mtx); > +#ifdef MULTIPROCESSOR > + if (__mp_lock_held(&kernel_lock)) > + hold_count = __mp_release_all(&kernel_lock); > + else > + hold_count = 0; > + if (hold_count) > + __mp_acquire_count(&kernel_lock, hold_count); > +#endif > if ((priority & PNORELOCK) == 0) { > mtx_enter(mtx); > MUTEX_OLDIPL(mtx) = spl; > } else > splx(spl); > diff --git sys/kern/vfs_subr.c sys/kern/vfs_subr.c > index a26fbe2..a373789 100644 > --- sys/kern/vfs_subr.c > +++ sys/kern/vfs_subr.c > @@ -1664,10 +1664,13 @@ int > vfs_syncwait(int verbose) > { > struct buf *bp; > int iter, nbusy, dcount, s; > struct proc *p; > +#ifdef MULTIPROCESSOR > + int hold_count; > +#endif > > p = curproc? curproc : &proc0; > sys_sync(p, (void *)0, (register_t *)0); > > /* Wait for sync to finish. */ > @@ -1698,11 +1701,21 @@ vfs_syncwait(int verbose) > } > if (nbusy == 0) > break; > if (verbose) > printf("%d ", nbusy); > +#ifdef MULTIPROCESSOR > + if (__mp_lock_held(&kernel_lock)) > + hold_count = __mp_release_all(&kernel_lock); > + else > + hold_count = 0; > +#endif > DELAY(4 * iter); > +#ifdef MULTIPROCESSOR > + if (hold_count) > + __mp_acquire_count(&kernel_lock, hold_count); > +#endif > } > > return nbusy; > } > > >
Re: vfs_shutdown would like to do polled I/O at least on panic
On Fri, May 08, 2015 at 12:34 +0200, Mike Belopuhov wrote: > > I think tsleep(9) and msleep(9) need to release and re-acquire the > > kernel lock in the "cold || panicstr" case. > > Well, it's not hard to do really, but... > > > We might need this for > > handling interrupts during autoconf as soon as we start distributing > > interrupts over CPUs. > > > > ...cold might mean that interrupts are not ready yet. So then we might > need another flag for shutdown? This is what I have come up with. Chunks were taken directly from mi_switch and it seems to do the job just fine. Right now I'm not using any additional flags and it seems to work here. I'll resume testing on Monday, but it looks fairly complete. Any comments? diff --git sys/kern/kern_synch.c sys/kern/kern_synch.c index 03308b4..4089adf 100644 --- sys/kern/kern_synch.c +++ sys/kern/kern_synch.c @@ -103,10 +103,13 @@ extern int safepri; int tsleep(const volatile void *ident, int priority, const char *wmesg, int timo) { struct sleep_state sls; int error, error1; +#ifdef MULTIPROCESSOR + int hold_count; +#endif KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); #ifdef MULTIPROCESSOR KASSERT(timo || __mp_lock_held(&kernel_lock)); @@ -120,10 +123,18 @@ tsleep(const volatile void *ident, int priority, const char *wmesg, int timo) * don't run any other procs or panic below, * in case this is the idle process and already asleep. */ s = splhigh(); splx(safepri); +#ifdef MULTIPROCESSOR + if (__mp_lock_held(&kernel_lock)) + hold_count = __mp_release_all(&kernel_lock); + else + hold_count = 0; + if (hold_count) + __mp_acquire_count(&kernel_lock, hold_count); +#endif splx(s); return (0); } sleep_setup(&sls, ident, priority, wmesg); @@ -149,10 +160,13 @@ int msleep(const volatile void *ident, struct mutex *mtx, int priority, const char *wmesg, int timo) { struct sleep_state sls; int error, error1, spl; +#ifdef MULTIPROCESSOR + int hold_count; +#endif KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); KASSERT(mtx != NULL); if (cold || panicstr) { @@ -163,10 +177,18 @@ msleep(const volatile void *ident, struct mutex *mtx, int priority, * in case this is the idle process and already asleep. */ spl = MUTEX_OLDIPL(mtx); MUTEX_OLDIPL(mtx) = safepri; mtx_leave(mtx); +#ifdef MULTIPROCESSOR + if (__mp_lock_held(&kernel_lock)) + hold_count = __mp_release_all(&kernel_lock); + else + hold_count = 0; + if (hold_count) + __mp_acquire_count(&kernel_lock, hold_count); +#endif if ((priority & PNORELOCK) == 0) { mtx_enter(mtx); MUTEX_OLDIPL(mtx) = spl; } else splx(spl); diff --git sys/kern/vfs_subr.c sys/kern/vfs_subr.c index a26fbe2..a373789 100644 --- sys/kern/vfs_subr.c +++ sys/kern/vfs_subr.c @@ -1664,10 +1664,13 @@ int vfs_syncwait(int verbose) { struct buf *bp; int iter, nbusy, dcount, s; struct proc *p; +#ifdef MULTIPROCESSOR + int hold_count; +#endif p = curproc? curproc : &proc0; sys_sync(p, (void *)0, (register_t *)0); /* Wait for sync to finish. */ @@ -1698,11 +1701,21 @@ vfs_syncwait(int verbose) } if (nbusy == 0) break; if (verbose) printf("%d ", nbusy); +#ifdef MULTIPROCESSOR + if (__mp_lock_held(&kernel_lock)) + hold_count = __mp_release_all(&kernel_lock); + else + hold_count = 0; +#endif DELAY(4 * iter); +#ifdef MULTIPROCESSOR + if (hold_count) + __mp_acquire_count(&kernel_lock, hold_count); +#endif } return nbusy; }
Re: vfs_shutdown would like to do polled I/O at least on panic
On 8 May 2015 at 04:51, Masao Uebayashi wrote: > By doing complex VFS shutdown operation, the system's memory image will > be modified a lot since a panic was triggered. I'd totally skip > vfs_shutdown() after a panic [1], then do the best to dump a kernel core > for analysis. > > [1] OpenBSD's panic(9) sets RB_NOSYNC only when panicstr is already set > (== recursive panic()) at this moment. This is also a valid point of view and in fact I was surprised to see sys_sync called from vfs_syncwait, esp. since it's doing uvm_vnp_sync. But on the other hand it has been there for 20 years in OpenBSD and who knows how many years before that in *BSD and we got used to relying on this behavior, didn't we?
Re: vfs_shutdown would like to do polled I/O at least on panic
> From: Mike Belopuhov > Date: Fri, 8 May 2015 13:53:56 +0200 > > On 8 May 2015 at 12:37, Martin Pieuchot wrote: > > On 07/05/15(Thu) 20:58, Mike Belopuhov wrote: > >> As I've pointed out before, on panic we can be running on any > >> CPU and our disk controller's interrupts can interrupt on the > >> other one. Since we'll most likely be holding a kernel lock, > >> dealing with unlocking it might get hairy very fast. Instead > >> what we could do to improve the chances of a clean shutdown on > >> panic is to instruct our disk subsystem to do polled I/O that > >> will be run on the same CPU with the panic. > > > > Did you consider executing ddb's boot commands on cpu0? I mean doing > > an implicit "machine ddbcpu 0" before executing any "boot" command? > > > > But panic can't do it really. And not all our architectures support the "machine ddbcpu" command.
Re: vfs_shutdown would like to do polled I/O at least on panic
On 8 May 2015 at 12:37, Martin Pieuchot wrote: > On 07/05/15(Thu) 20:58, Mike Belopuhov wrote: >> As I've pointed out before, on panic we can be running on any >> CPU and our disk controller's interrupts can interrupt on the >> other one. Since we'll most likely be holding a kernel lock, >> dealing with unlocking it might get hairy very fast. Instead >> what we could do to improve the chances of a clean shutdown on >> panic is to instruct our disk subsystem to do polled I/O that >> will be run on the same CPU with the panic. > > Did you consider executing ddb's boot commands on cpu0? I mean doing > an implicit "machine ddbcpu 0" before executing any "boot" command? > But panic can't do it really.
Re: vfs_shutdown would like to do polled I/O at least on panic
On 07/05/15(Thu) 20:58, Mike Belopuhov wrote: > As I've pointed out before, on panic we can be running on any > CPU and our disk controller's interrupts can interrupt on the > other one. Since we'll most likely be holding a kernel lock, > dealing with unlocking it might get hairy very fast. Instead > what we could do to improve the chances of a clean shutdown on > panic is to instruct our disk subsystem to do polled I/O that > will be run on the same CPU with the panic. Did you consider executing ddb's boot commands on cpu0? I mean doing an implicit "machine ddbcpu 0" before executing any "boot" command? > Initially I wanted to move "cold = 1" earlier in boot(), but > after talking to Miod, it started to look like a bad idea. > > Thoughts? > > diff --git sys/dev/ata/ata_wdc.c sys/dev/ata/ata_wdc.c > index 1f52488..aea9ec1 100644 > --- sys/dev/ata/ata_wdc.c > +++ sys/dev/ata/ata_wdc.c > @@ -199,20 +199,22 @@ wd_hibernate_io(dev_t dev, daddr_t blkno, vaddr_t addr, > size_t size, int op, voi > */ > int > wdc_ata_bio(struct ata_drive_datas *drvp, struct ata_bio *ata_bio) > { > struct wdc_xfer *xfer; > struct channel_softc *chp = drvp->chnl_softc; > > xfer = wdc_get_xfer(WDC_NOSLEEP); > if (xfer == NULL) > return WDC_TRY_AGAIN; > + if (panicstr) > + ata_bio->flags |= ATA_POLL; > if (ata_bio->flags & ATA_POLL) > xfer->c_flags |= C_POLL; > if (!(ata_bio->flags & ATA_POLL) && > (drvp->drive_flags & (DRIVE_DMA | DRIVE_UDMA)) && > (ata_bio->flags & ATA_SINGLE) == 0 && > (ata_bio->bcount > 512 || > (chp->wdc->quirks & WDC_QUIRK_NOSHORTDMA) == 0)) > xfer->c_flags |= C_DMA; > xfer->drive = drvp->drive; > xfer->cmd = ata_bio; > diff --git sys/scsi/scsi_base.c sys/scsi/scsi_base.c > index 9cf6b45..3afcc29 100644 > --- sys/scsi/scsi_base.c > +++ sys/scsi/scsi_base.c > @@ -1267,20 +1267,22 @@ scsi_report_luns(struct scsi_link *sc_link, int > selectreport, > return (error); > } > > void > scsi_xs_exec(struct scsi_xfer *xs) > { > xs->error = XS_NOERROR; > xs->resid = xs->datalen; > xs->status = 0; > CLR(xs->flags, ITSDONE); > + if (panicstr) > + SET(xs->flags, SCSI_AUTOCONF); > > #ifdef SCSIDEBUG > if (xs->sc_link->flags & SDEV_DB1) { > scsi_xs_show(xs); > if (xs->datalen && (xs->flags & SCSI_DATA_OUT)) > scsi_show_mem(xs->data, min(64, xs->datalen)); > } > #endif > > /* The adapter's scsi_cmd() is responsible for calling scsi_done(). */ >
Re: vfs_shutdown would like to do polled I/O at least on panic
On 8 May 2015 at 11:43, Mark Kettenis wrote: >> Date: Thu, 7 May 2015 20:58:53 +0200 >> From: Mike Belopuhov >> >> As I've pointed out before, on panic we can be running on any >> CPU and our disk controller's interrupts can interrupt on the >> other one. Since we'll most likely be holding a kernel lock, >> dealing with unlocking it might get hairy very fast. Instead >> what we could do to improve the chances of a clean shutdown on >> panic is to instruct our disk subsystem to do polled I/O that >> will be run on the same CPU with the panic. >> >> Initially I wanted to move "cold = 1" earlier in boot(), but >> after talking to Miod, it started to look like a bad idea. >> >> Thoughts? > > I'm not necessarily against forcing polled I/O in this case, but it > sucks a bit that we have to add checks in fairly low-level code to > accomplish this. Yes, but it's the place where we make decision what kind of I/O we want to do. SCSI I/O path in the controller or stack never falls back to the polled I/O (unless it's a specific method, i.e. sddump). > And I'm not sure if disk I/O is the only thing that > we should worry about here. > What else? > I think tsleep(9) and msleep(9) need to release and re-acquire the > kernel lock in the "cold || panicstr" case. Well, it's not hard to do really, but... > We might need this for > handling interrupts during autoconf as soon as we start distributing > interrupts over CPUs. > ...cold might mean that interrupts are not ready yet. So then we might need another flag for shutdown?
Re: vfs_shutdown would like to do polled I/O at least on panic
> Date: Thu, 7 May 2015 20:58:53 +0200 > From: Mike Belopuhov > > As I've pointed out before, on panic we can be running on any > CPU and our disk controller's interrupts can interrupt on the > other one. Since we'll most likely be holding a kernel lock, > dealing with unlocking it might get hairy very fast. Instead > what we could do to improve the chances of a clean shutdown on > panic is to instruct our disk subsystem to do polled I/O that > will be run on the same CPU with the panic. > > Initially I wanted to move "cold = 1" earlier in boot(), but > after talking to Miod, it started to look like a bad idea. > > Thoughts? I'm not necessarily against forcing polled I/O in this case, but it sucks a bit that we have to add checks in fairly low-level code to accomplish this. And I'm not sure if disk I/O is the only thing that we should worry about here. I think tsleep(9) and msleep(9) need to release and re-acquire the kernel lock in the "cold || panicstr" case. We might need this for handling interrupts during autoconf as soon as we start distributing interrupts over CPUs. > diff --git sys/dev/ata/ata_wdc.c sys/dev/ata/ata_wdc.c > index 1f52488..aea9ec1 100644 > --- sys/dev/ata/ata_wdc.c > +++ sys/dev/ata/ata_wdc.c > @@ -199,20 +199,22 @@ wd_hibernate_io(dev_t dev, daddr_t blkno, vaddr_t addr, > size_t size, int op, voi > */ > int > wdc_ata_bio(struct ata_drive_datas *drvp, struct ata_bio *ata_bio) > { > struct wdc_xfer *xfer; > struct channel_softc *chp = drvp->chnl_softc; > > xfer = wdc_get_xfer(WDC_NOSLEEP); > if (xfer == NULL) > return WDC_TRY_AGAIN; > + if (panicstr) > + ata_bio->flags |= ATA_POLL; > if (ata_bio->flags & ATA_POLL) > xfer->c_flags |= C_POLL; > if (!(ata_bio->flags & ATA_POLL) && > (drvp->drive_flags & (DRIVE_DMA | DRIVE_UDMA)) && > (ata_bio->flags & ATA_SINGLE) == 0 && > (ata_bio->bcount > 512 || > (chp->wdc->quirks & WDC_QUIRK_NOSHORTDMA) == 0)) > xfer->c_flags |= C_DMA; > xfer->drive = drvp->drive; > xfer->cmd = ata_bio; > diff --git sys/scsi/scsi_base.c sys/scsi/scsi_base.c > index 9cf6b45..3afcc29 100644 > --- sys/scsi/scsi_base.c > +++ sys/scsi/scsi_base.c > @@ -1267,20 +1267,22 @@ scsi_report_luns(struct scsi_link *sc_link, int > selectreport, > return (error); > } > > void > scsi_xs_exec(struct scsi_xfer *xs) > { > xs->error = XS_NOERROR; > xs->resid = xs->datalen; > xs->status = 0; > CLR(xs->flags, ITSDONE); > + if (panicstr) > + SET(xs->flags, SCSI_AUTOCONF); > > #ifdef SCSIDEBUG > if (xs->sc_link->flags & SDEV_DB1) { > scsi_xs_show(xs); > if (xs->datalen && (xs->flags & SCSI_DATA_OUT)) > scsi_show_mem(xs->data, min(64, xs->datalen)); > } > #endif > > /* The adapter's scsi_cmd() is responsible for calling scsi_done(). */ > >
Re: vfs_shutdown would like to do polled I/O at least on panic
By doing complex VFS shutdown operation, the system's memory image will be modified a lot since a panic was triggered. I'd totally skip vfs_shutdown() after a panic [1], then do the best to dump a kernel core for analysis. [1] OpenBSD's panic(9) sets RB_NOSYNC only when panicstr is already set (== recursive panic()) at this moment.
vfs_shutdown would like to do polled I/O at least on panic
As I've pointed out before, on panic we can be running on any CPU and our disk controller's interrupts can interrupt on the other one. Since we'll most likely be holding a kernel lock, dealing with unlocking it might get hairy very fast. Instead what we could do to improve the chances of a clean shutdown on panic is to instruct our disk subsystem to do polled I/O that will be run on the same CPU with the panic. Initially I wanted to move "cold = 1" earlier in boot(), but after talking to Miod, it started to look like a bad idea. Thoughts? diff --git sys/dev/ata/ata_wdc.c sys/dev/ata/ata_wdc.c index 1f52488..aea9ec1 100644 --- sys/dev/ata/ata_wdc.c +++ sys/dev/ata/ata_wdc.c @@ -199,20 +199,22 @@ wd_hibernate_io(dev_t dev, daddr_t blkno, vaddr_t addr, size_t size, int op, voi */ int wdc_ata_bio(struct ata_drive_datas *drvp, struct ata_bio *ata_bio) { struct wdc_xfer *xfer; struct channel_softc *chp = drvp->chnl_softc; xfer = wdc_get_xfer(WDC_NOSLEEP); if (xfer == NULL) return WDC_TRY_AGAIN; + if (panicstr) + ata_bio->flags |= ATA_POLL; if (ata_bio->flags & ATA_POLL) xfer->c_flags |= C_POLL; if (!(ata_bio->flags & ATA_POLL) && (drvp->drive_flags & (DRIVE_DMA | DRIVE_UDMA)) && (ata_bio->flags & ATA_SINGLE) == 0 && (ata_bio->bcount > 512 || (chp->wdc->quirks & WDC_QUIRK_NOSHORTDMA) == 0)) xfer->c_flags |= C_DMA; xfer->drive = drvp->drive; xfer->cmd = ata_bio; diff --git sys/scsi/scsi_base.c sys/scsi/scsi_base.c index 9cf6b45..3afcc29 100644 --- sys/scsi/scsi_base.c +++ sys/scsi/scsi_base.c @@ -1267,20 +1267,22 @@ scsi_report_luns(struct scsi_link *sc_link, int selectreport, return (error); } void scsi_xs_exec(struct scsi_xfer *xs) { xs->error = XS_NOERROR; xs->resid = xs->datalen; xs->status = 0; CLR(xs->flags, ITSDONE); + if (panicstr) + SET(xs->flags, SCSI_AUTOCONF); #ifdef SCSIDEBUG if (xs->sc_link->flags & SDEV_DB1) { scsi_xs_show(xs); if (xs->datalen && (xs->flags & SCSI_DATA_OUT)) scsi_show_mem(xs->data, min(64, xs->datalen)); } #endif /* The adapter's scsi_cmd() is responsible for calling scsi_done(). */