Hi Hal, I will answer for Yael as she already left the office.
The way to reproduce the "stuck" case is to run in bash: % while test $? = 0; do opensm -V -o; done The symptom we see is that OpenSM sort of exists but the process stay active (not even defunct). No way to kill it. It seems like one of the threads gets caught in the middle of ioctl or something. To be able to run OpenSM after this we need to reboot the machine. We avoid it by not issuing umad_unregister and umad_close_port Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] > Sent: Monday, November 07, 2005 4:21 PM > To: [EMAIL PROTECTED] > Cc: [email protected]; [EMAIL PROTECTED] > Subject: Re: [PATCH] Opensm - exiting issues > > Hi Yael, > > On Mon, 2005-11-07 at 08:25, Yael Kalka wrote: > > Hi Hal, > > > > There was a problem when running opensm with -o option, that caused > > the opensm to always exit with segfault, due to object destruction > > ordering. Also - there is the known issue of exiting opensm. We've > > done some clearing to the exiting code. The following patch fixes most > > of it. > > I applied this part of the patch with some cosmetic changes in > osm_vendor_ibumad.c. > > > In the current code we saw that sometimes opensm gets "stuck" on exit, > > and causes the machine to get stuck too - resulting in need for > > rebooting. In the following patch fixes most of it. > > We did run (in the patch) into rare cases where opensm exits with an > > error, but at least it exits without stucking the machine... > > Is there a reliable way to recreate machine "stuck" ? What exactly do > you mean by this ? > > All umad_unregister does is some validation, a table lookup, and issue > the ioctl to unregister the MAD agent. Not explictly unregistering the > agent(s) does not cause any harm as when the fd is closed, this will > occur as part of the cleanup. > > -- Hal _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
