On Wed, 2006-07-12 at 13:58, Sean Hefty wrote: > >> I was starting / stopping openSM on different systems soon before running > >> the > >> tests. > > > >Not sure I quite understand the sequencing. > > I was being somewhat random, just trying to stress things.
> How quickly will one SM take over for another after one dies? With the default sminfo_polling_timeout of 10 seconds and default polling_retry_number of 4, so the total handoff time should be around 40 seconds. I just did that experiment with 2 SMs and saw that as well. > >Can you run with -V and send me the output ? I want to recreate this so > >I understand what is going on. > > I'm having trouble re-creating the error at the moment, but I isolated my test > systems from our larger cluster. I will need to reconnect to the cluster and > see if I can cause the error again. That's another difference. I've never run osmtest in a large subnet. -- Hal > - Sean _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general