On Wed, 2006-07-12 at 13:58, Sean Hefty wrote:
> >> I was starting / stopping openSM on different systems soon before running 
> >> the
> >> tests.
> >
> >Not sure I quite understand the sequencing.
> 
> I was being somewhat random, just trying to stress things.  

> How quickly will one SM take over for another after one dies?

With the default sminfo_polling_timeout of 10 seconds and default
polling_retry_number of 4, so the total handoff time should be around 40
seconds. I just did that experiment with 2 SMs and saw that as well.

> >Can you run with -V and send me the output ? I want to recreate this so
> >I understand what is going on.
> 
> I'm having trouble re-creating the error at the moment, but I isolated my test
> systems from our larger cluster.  I will need to reconnect to the cluster and
> see if I can cause the error again.

That's another difference. I've never run osmtest in a large subnet.

-- Hal

> - Sean


_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to