> Hmm, looks like the bge driver is using
> software interrupts, and I think these could
> be running at priority level 4.
> 
> Seems that the bge hardware has some
> problems, and the driver tries to reset the
> bge network hardware in an attempt to 
> recover from the bge hardware problem.
> 
> bge_poll_firmware() could be busy waiting 
> for up to one second; I suspect this could
> explain the kernel cpu time usage.
> 
> Are there any error or warning messages
> logged to /var/adm/messages when the
> system starts consuming kernel cpu time?
> 
> 
> Maybe the hang can be avoided when the
> bge nic driver isn't used and the bge interface
> is unconfigured / unplumbed?  Or the bge
> nic driver isn't allowed to load, by using
> the kernel option "-B disable-bge=true" ?

I started at the end, with -B disable-bge=true. The network applet still shows 
bge0, but it doesn't try to configure it. ifconfig bge0 unplumb says bge0 is no 
interface, so the kernel option seems to have worked. Lockstat though still 
shows 98% of i86_mwait at 'sane' state.

I checked the /var/adm/messages, but it is so long, and I don't know what I 
should look for. I tried 'excess' and 'consum', but neither had any hits.

What looks strange to me, the layperson in kernel land:
Aug  8 22:05:34 OSolUwe mac: [ID 469746 kern.info] NOTICE: bge0 registered
Aug  8 22:05:34 OSolUwe pci_pci: [ID 370704 kern.info] PCI-device: 
pci103c,3...@e, bge0
Aug  8 22:05:34 OSolUwe genunix: [ID 936769 kern.info] bge0 is 
/p...@0,0/pci8086,2...@1e/pci103c,3...@e
Aug  8 22:05:46 OSolUwe genunix: [ID 408114 kern.info] 
/p...@0,0/pci8086,2...@1e/pci103c,3...@e (bge0) online
Aug  8 22:05:47 OSolUwe ip: [ID 856290 kern.notice] ip: joining multicasts 
failed (4) on bge0 - will use link layer broadcasts for multicast
Aug  8 22:05:50 OSolUwe in.ndpd[366]: [ID 169330 daemon.error] Interface bge0 
has been removed from kernel. in.ndpd will no longer use it
Aug  8 22:05:54 OSolUwe genunix: [ID 408114 kern.info] 
/p...@0,0/pci8086,2...@1e/pci103c,3...@e (bge0) online

At least, I can confirm that now the system keeps running normally; meaning 
that at least the symptoms have been suppressed by that kernel option.

What next?
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-help mailing list
opensolaris-help@opensolaris.org

Reply via email to