Hi Sasha, On Mon, Dec 28, 2009 at 4:22 AM, Sasha Khapyorsky <[email protected]> wrote: > Hi Ira, > > On 09:35 Mon 21 Dec , Sasha Khapyorsky wrote: >> >> An errors are response timeouts. I guess that most of them are due >> to switches' VL15 overflow (could be verified by VL15Dropped counter >> evaluation). Will look at this deeply. > > I did a couple of modifications in the code (exact log is listed below). > In particular there are default limitation for number of outstanding MADs > on the wire and proper tracking for failed (timedout) MADs. I tested > this where possible. Could you re-run this? Thanks. > > Sasha >
[snip...] > commit da6aa19840cb2d37e8cd3daa3874b87657a76ddc > Author: Sasha Khapyorsky <[email protected]> > Date: Fri Dec 25 16:24:13 2009 +0200 > > tests/subnet_discover: --maxsmps (-n) option > > This implements the limitation of outstanding SMPs on a wire at any > one time. --maxsmps=0 means - no limit. > > Signed-off-by: Sasha Khapyorsky <[email protected]> > > diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c > index 7f8a85c..42e7aee 100644 > --- a/tests/subnet_discover.c > +++ b/tests/subnet_discover.c > @@ -40,6 +40,7 @@ static struct node *node_array[32 * 1024]; > static unsigned node_count = 0; > static unsigned trid_cnt = 0; > static unsigned outstanding = 0; > +static unsigned max_outstanding = 8; Any reason why this default is different from the one which OpenSM uses ? Seems to me it should be the same (or less). -- Hal > static unsigned timeout = 100; > static unsigned retries = 3; > static unsigned verbose = 0; [snip...] -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
