John Baldwin writes: | On Tuesday 10 October 2006 08:54, Bill Moran wrote: | > In response to Doug Ambrisko <[EMAIL PROTECTED]>: | > > Bruno Ducrot writes: | > > | On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote: | > > | > In response to Bruno Ducrot <[EMAIL PROTECTED]>: | > > | > > Hi, | > > | > > | > > | > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote: | > > | > > > | > > | > > > A reboot causes the OS to halt, but the hardware just sits there on the | > > | > > > shutdown screen. | > > | > > > | > > | > > > A shutdown -p does the same. | > > | > > | > > | > > What exactly are the last few lines? | > > | > | > > | > (manually copied) | > > | > | > > | > ... | > > | > All buffers synced. | > > | > Uptime: 1m16s | > > | > | > > | | > > | Thanks. Then this happen after print_uptime(). | > > | | > > | I believe one of the drivers register a shutdown_final (or | > > | shutdown_post_sync) event that hang your system. I think (though I | > > | may be wrong) mfi may be that one. | > > | | > > | It would help if you can add some printf in dev/mfi/mfi.c into the | > > | mfi_shutdown() function in order to check if that assumption | > > | is correct. | > > | > > Some what related to this we have a local hack: | > > | > > --- sys/kern/subr_bus.c.orig Tue Jun 27 15:49:39 2006 | > > +++ sys/kern/subr_bus.c Tue Jun 27 15:49:51 2006 | > > @@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev) | > > device_t child; | > > | > > TAILQ_FOREACH(child, &dev->children, link) { | > > + DELAY(1000); | > > device_shutdown(child); | > > } | > | > This patch seems to "fix" the problem. I'm going to replace it with | > some printfs and see if I can determine which driver is actually | > causing the problem (hopefully it's only one). | > | > Am I wrong in saying that the correct solution would be to identify the | > driver that needs more time and implementing some sort of polling | > mechanism to ensure the hardware is ready when the driver wants to | > shut down? | | Well, first let's see which driver it is. :) You might be able to just | remove the DELAY and add a printf and see which device is printed last.
I think it was in a different ones. One of our configs has the base HW + bge NIC the other has base HW + 2 x 2 port em NICs. The more NIC's the better chance for a problem. I've removed the hack from our kernel and I'm going to run the reboot cycle. I don't think a printf will work since I recall trying that it "fixed" the problem so I put the DELAY in :-( It could be generic problem to the system with a sufficiently fast CPU to beat the HW at shutting down. I'm not sure if his system is Dempsey or Woodcrest. We use Woodcrest and they are really faster. Other machines might be "slow" enough that it's not a a problem! We haven't seen it on our older platforms with the same kernel and similar HW configs. Doug A. _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"