Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-05 Thread Ira Weiny
On Fri, 5 Feb 2010 07:27:05 -0500
Hal Rosenstock  wrote:

> >
> >
> > Note that 2 does not give much speed up, where 4 does.  Obviously this could
> > have to do with the fact there were 2 nodes which were bad (so if you had
> > 100's of nodes unresponsive a higher value might be worth using)
> 
> It depends on the number of unresponsive nodes being same or higher
> than number of outstanding/parallel SMPs. In a sense, the number of
> outstanding SMPs is a measure of how many unresponsive nodes one is
> willing to tolerate before slowing down/waiting for timeouts. In some
> environments, unresponsive nodes are a normal case.

Agreed but where should we set the default?  I don't think 4 is a bad default.
I don't think it makes the diags overly aggressive, compared with OpenSM.
Sasha I guess this is your call.

Just tell me where to set it and I will make the patch.  Basically with the
user option it can always be changed on a run by run basis.

Ira

> 
> -- Hal
> 
> > but as a
> > default compromise I think 4 is good.
> >
> > Ira
> >
> >> > >
> >> > > Also, I think you are correct that we should increase OpenSM's default 
> >> > > from 4
> >> > > to 8.  For the same reason as above.  Some of our clusters have worked 
> >> > > better
> >> > > with 8 when we are having issues.  But right now we are still running 
> >> > > with 4.
> >> >
> >> > I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
> >> > I've seen a number of clusters with SMP dropping with the current
> >> > lower defaults.
> >>
> >> So OpenSM is seeing dropped packets?  With 4 SMP's on the wire?  I do see 
> >> some
> >> VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an
> >> issue.  What kind of rate are you seeing?
> >>
> >> The other question is; do people regularly run the tools which are using
> >> libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)?  We do.  If others
> >> are not then I would say this change would have less impact as they would 
> >> want
> >> the diags to have some priority for debugging.  The other option is to 
> >> change
> >> the patch to be a default of 2 and allow user to change it depending on 
> >> what
> >> they are trying to do.  If you think that is best I will change the patch.
> >>
> >> Ira
> >>
> >> >
> >> > -- Hal
> >> >
> >> > > Ira
> >> > >
> >> > >>
> >> > >> -- Hal
> >> > >>
> >> > >> >
> >> > >> > The first patch converts the algorithm and the second adds the 
> >> > >> > ibnd_set_max_smps_on_wire call.
> >> > >> >
> >> > >> > Let me know what you think.  Because the algorithm changed so much 
> >> > >> > testing this is a bit difficult because the order of the node 
> >> > >> > discovery is different.  However, I have done some extensive 
> >> > >> > diffing of the output of ibnetdiscover and things look good.
> >> > >> >
> >> > >> > Ira
> >> > >> >
> >> > >> > --
> >> > >> > Ira Weiny
> >> > >> > Math Programmer/Computer Scientist
> >> > >> > Lawrence Livermore National Lab
> >> > >> > 925-423-8008
> >> > >> > wei...@llnl.gov
> >> > >> > --
> >> > >> > To unsubscribe from this list: send the line "unsubscribe 
> >> > >> > linux-rdma" in
> >> > >> > the body of a message to majord...@vger.kernel.org
> >> > >> > More majordomo info at  
> >> > >> > http://***vger.kernel.org/majordomo-info.html
> >> > >> >
> >> > >> --
> >> > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" 
> >> > >> in
> >> > >> the body of a message to majord...@vger.kernel.org
> >> > >> More majordomo info at  http://***vger.kernel.org/majordomo-info.html
> >> > >>
> >> > >
> >> > >
> >> > > --
> >> > > Ira Weiny
> >> > > Math Programmer/Computer Scientist
> >> > > Lawrence Livermore National Lab
> >> > > 925-423-8008
> >> > > wei...@llnl.gov
> >> > >
> >> >
> >>
> >>
> >> --
> >> Ira Weiny
> >> Math Programmer/Computer Scientist
> >> Lawrence Livermore National Lab
> >> 925-423-8008
> >> wei...@llnl.gov
> >
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > wei...@llnl.gov
> >
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-05 Thread Hal Rosenstock
On Thu, Feb 4, 2010 at 9:18 PM, Ira Weiny  wrote:
> On Thu, 4 Feb 2010 16:13:25 -0800
> Ira Weiny  wrote:
>
>> On Thu, 4 Feb 2010 15:01:32 -0500
>> Hal Rosenstock  wrote:
>>
>> > On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny  wrote:
>> > > On Thu, 4 Feb 2010 09:19:39 -0500
>> > > Hal Rosenstock  wrote:
>> > >
>> > >> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny  wrote:
>> > >> > Sasha,
>> > >> >
>>
>> [snip]
>
> [snip]
>
>> > >>
>> > >> Is there a speedup with 4 rather than 2 ?
>> > >
>> > > There is a bit of a speed up (~0.5 to 1.0 sec).  But my main reason to 
>> > > want to
>> > > go to 4 is that if there are issues on the fabric, unresponsive nodes 
>> > > etc.; 4
>> > > will give us better parallelism to get around these issues.  I have not 
>> > > had
>> > > the chance to test this condition with the new algorithm but the original
>> > > ibnetdiscover would slow way down when there are nodes which have 
>> > > unresponsive
>> > > SMA's.  If there are only 2 outstanding this will not give us much speed 
>> > > up.
>> > > This was the main motivation I had for improving the library in this way.
>
> Ok, I found a fabric with just 2 nodes which were unresponsive...  A quick
> test shows...
>
> Original ibnetdiscover:
>
> 18:12:29 > time ./ibnetdiscover > foo
> ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
> 0,1,24,11,9)
> src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) 
> failed, skipping port
> ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
> 0,1,24,24,18,7,6)
> src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 
> 0,1,24,24,18,7,6) failed, skipping port
>
> real    0m9.073s
> user    0m0.137s
> sys     0m0.172s
>
> 18:12:43 > time ./ibnetdiscover > foo
> ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
> 0,1,24,11,9)
> src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) 
> failed, skipping port
> ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
> 0,1,24,24,18,7,6)
> src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 
> 0,1,24,24,18,7,6) failed, skipping port
>
> real    0m9.103s
> user    0m0.046s
> sys     0m0.046s
>
>
> *New* ibnetdiscover with different outstanding SMP's.
>
> 18:12:14 > time ./ibnetdiscover -o 2 > foo
> src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) 
> bad status 110; Connection timed out
> src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 
> 0x11:0) bad status 110; Connection timed out
>
> real    0m9.746s
> user    0m6.559s
> sys     0m3.156s
>
> 18:13:00 > time ./ibnetdiscover -o 4 > foo
> src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) 
> bad status 110; Connection timed out
> src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 
> 0x11:0) bad status 110; Connection timed out
>
> real    0m4.668s
> user    0m3.043s
> sys     0m1.601s
>
> 18:13:10 > time ./ibnetdiscover -o 8 > foo
> src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) 
> bad status 110; Connection timed out
> src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 
> 0x11:0) bad status 110; Connection timed out
>
> real    0m4.360s
> user    0m2.891s
> sys     0m1.451s
>
>
> Note that 2 does not give much speed up, where 4 does.  Obviously this could
> have to do with the fact there were 2 nodes which were bad (so if you had
> 100's of nodes unresponsive a higher value might be worth using)

It depends on the number of unresponsive nodes being same or higher
than number of outstanding/parallel SMPs. In a sense, the number of
outstanding SMPs is a measure of how many unresponsive nodes one is
willing to tolerate before slowing down/waiting for timeouts. In some
environments, unresponsive nodes are a normal case.

-- Hal

> but as a
> default compromise I think 4 is good.
>
> Ira
>
>> > >
>> > > Also, I think you are correct that we should increase OpenSM's default 
>> > > from 4
>> > > to 8.  For the same reason as above.  Some of our clusters have worked 
>> > > better
>> > > with 8 when we are having issues.  But right now we are still running 
>> > > with 4.
>> >
>> > I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
>> > I've seen a number of clusters with SMP dropping with the current
>> > lower defaults.
>>
>> So OpenSM is seeing dropped packets?  With 4 SMP's on the wire?  I do see 
>> some
>> VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an
>> issue.  What kind of rate are you seeing?
>>
>> The other question is; do people regularly run the tools which are using
>> libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)?  We do.  If others
>> are not then I would say this change would have less impact as they would 
>> want
>> the diags to have some priority for debugging.  The other option is to change
>> the patch to be a default of 2 and allo

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-05 Thread Hal Rosenstock
On Thu, Feb 4, 2010 at 7:13 PM, Ira Weiny  wrote:
> On Thu, 4 Feb 2010 15:01:32 -0500
> Hal Rosenstock  wrote:
>
>> On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny  wrote:
>> > On Thu, 4 Feb 2010 09:19:39 -0500
>> > Hal Rosenstock  wrote:
>> >
>> >> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny  wrote:
>> >> > Sasha,
>> >> >
>
> [snip]
>
>> >> >
>> >> > real    0m2.249s
>> >> > user    0m1.244s
>> >> > sys     0m0.936s
>> >> >
>> >> > 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map 
>> >> > /etc/opensm/ib-node-name-map -g > new
>> >> >
>> >> > real    0m2.170s
>> >> > user    0m1.160s
>> >> > sys     0m0.933s
>> >> >
>> >> > 14:41:10 > /usr/sbin/ibqueryerrors  -s 
>> >> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data
>> >> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
>> >> > Errors for 0x66a00d90006fb "SW19"
>> >> >   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] 
>> >> > [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954]
>> >> >       Link info:    139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
>> >> > 0x0002c9030001d736    864    1[  ] "hyperion1" ( )
>> >> >
>> >> > Note that there were no additional VL15Dropped packets on the fabric.  
>> >> > I think 4 seems to be a good compromise.  I have not tested when there 
>> >> > are errors on the fabric.  (Right now things seem to be good!)
>> >>
>> >> Is this just with the SM doing light sweeping ?
>> >
>> > Yes.
>>
>> That's not a lot of SMP stress from the SM side. SMP consumers are SM,
>> diags, and the unsolicited traps.
>
> Agreed.  I hope to test this more next week.
>>
>> >
>> >>
>> >> Is there a speedup with 4 rather than 2 ?
>> >
>> > There is a bit of a speed up (~0.5 to 1.0 sec).  But my main reason to 
>> > want to
>> > go to 4 is that if there are issues on the fabric, unresponsive nodes 
>> > etc.; 4
>> > will give us better parallelism to get around these issues.  I have not had
>> > the chance to test this condition with the new algorithm but the original
>> > ibnetdiscover would slow way down when there are nodes which have 
>> > unresponsive
>> > SMA's.  If there are only 2 outstanding this will not give us much speed 
>> > up.
>> > This was the main motivation I had for improving the library in this way.
>> >
>> > Also, I think you are correct that we should increase OpenSM's default 
>> > from 4
>> > to 8.  For the same reason as above.  Some of our clusters have worked 
>> > better
>> > with 8 when we are having issues.  But right now we are still running with 
>> > 4.
>>
>> I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
>> I've seen a number of clusters with SMP dropping with the current
>> lower defaults.
>
> So OpenSM is seeing dropped packets?

OpenSM is seeing timeouts and there are VL15 drops in the subnet.

> With 4 SMP's on the wire?

Yes.

> I do see some
> VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an
> issue.  What kind of rate are you seeing?

> The other question is; do people regularly run the tools which are using
> libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)?

These tools are being used (at least ibnetdiscover and ibqueryerrors).

> We do.  If others
> are not then I would say this change would have less impact as they would want
> the diags to have some priority for debugging.  The other option is to change
> the patch to be a default of 2 and allow user to change it depending on what
> they are trying to do.  If you think that is best I will change the patch.

FWIW I think 2 is better until we have more exhaustive experience with
4. The other alternative would be to make it 4 and then see if people
start noticing (more) VL15 drops and possibly other issues.

-- Hal

> Ira
>
>>
>> -- Hal
>>
>> > Ira
>> >
>> >>
>> >> -- Hal
>> >>
>> >> >
>> >> > The first patch converts the algorithm and the second adds the 
>> >> > ibnd_set_max_smps_on_wire call.
>> >> >
>> >> > Let me know what you think.  Because the algorithm changed so much 
>> >> > testing this is a bit difficult because the order of the node discovery 
>> >> > is different.  However, I have done some extensive diffing of the 
>> >> > output of ibnetdiscover and things look good.
>> >> >
>> >> > Ira
>> >> >
>> >> > --
>> >> > Ira Weiny
>> >> > Math Programmer/Computer Scientist
>> >> > Lawrence Livermore National Lab
>> >> > 925-423-8008
>> >> > wei...@llnl.gov
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> >> > the body of a message to majord...@vger.kernel.org
>> >> > More majordomo info at  http://**vger.kernel.org/majordomo-info.html
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> >> the body of a message to majord...@vger.kernel.org
>> >> More majordomo info at  http://**vger.kernel.org/majordomo-info.html
>> >>
>> >
>> >
>> > --
>> > Ira Weiny
>> > Math Programmer/Computer Scientist
>> > Lawrence Livermore National Lab
>> > 925-423-8008
>>

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-04 Thread Ira Weiny
On Thu, 4 Feb 2010 16:13:25 -0800
Ira Weiny  wrote:

> On Thu, 4 Feb 2010 15:01:32 -0500
> Hal Rosenstock  wrote:
> 
> > On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny  wrote:
> > > On Thu, 4 Feb 2010 09:19:39 -0500
> > > Hal Rosenstock  wrote:
> > >
> > >> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny  wrote:
> > >> > Sasha,
> > >> >
> 
> [snip]

[snip]

> > >>
> > >> Is there a speedup with 4 rather than 2 ?
> > >
> > > There is a bit of a speed up (~0.5 to 1.0 sec).  But my main reason to 
> > > want to
> > > go to 4 is that if there are issues on the fabric, unresponsive nodes 
> > > etc.; 4
> > > will give us better parallelism to get around these issues.  I have not 
> > > had
> > > the chance to test this condition with the new algorithm but the original
> > > ibnetdiscover would slow way down when there are nodes which have 
> > > unresponsive
> > > SMA's.  If there are only 2 outstanding this will not give us much speed 
> > > up.
> > > This was the main motivation I had for improving the library in this way.

Ok, I found a fabric with just 2 nodes which were unresponsive...  A quick
test shows...

Original ibnetdiscover:

18:12:29 > time ./ibnetdiscover > foo
ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
0,1,24,11,9)
src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) 
failed, skipping port
ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
0,1,24,24,18,7,6)
src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 
0,1,24,24,18,7,6) failed, skipping port

real0m9.073s
user0m0.137s
sys 0m0.172s

18:12:43 > time ./ibnetdiscover > foo
ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
0,1,24,11,9)
src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) 
failed, skipping port
ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
0,1,24,24,18,7,6)
src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 
0,1,24,24,18,7,6) failed, skipping port

real0m9.103s
user0m0.046s
sys 0m0.046s


*New* ibnetdiscover with different outstanding SMP's.

18:12:14 > time ./ibnetdiscover -o 2 > foo
src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) bad 
status 110; Connection timed out
src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 0x11:0) 
bad status 110; Connection timed out

real0m9.746s
user0m6.559s
sys 0m3.156s

18:13:00 > time ./ibnetdiscover -o 4 > foo
src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) bad 
status 110; Connection timed out
src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 0x11:0) 
bad status 110; Connection timed out

real0m4.668s
user0m3.043s
sys 0m1.601s

18:13:10 > time ./ibnetdiscover -o 8 > foo
src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) bad 
status 110; Connection timed out
src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 0x11:0) 
bad status 110; Connection timed out

real0m4.360s
user0m2.891s
sys 0m1.451s


Note that 2 does not give much speed up, where 4 does.  Obviously this could
have to do with the fact there were 2 nodes which were bad (so if you had
100's of nodes unresponsive a higher value might be worth using) but as a
default compromise I think 4 is good.

Ira

> > >
> > > Also, I think you are correct that we should increase OpenSM's default 
> > > from 4
> > > to 8.  For the same reason as above.  Some of our clusters have worked 
> > > better
> > > with 8 when we are having issues.  But right now we are still running 
> > > with 4.
> > 
> > I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
> > I've seen a number of clusters with SMP dropping with the current
> > lower defaults.
> 
> So OpenSM is seeing dropped packets?  With 4 SMP's on the wire?  I do see some
> VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an
> issue.  What kind of rate are you seeing?
> 
> The other question is; do people regularly run the tools which are using
> libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)?  We do.  If others
> are not then I would say this change would have less impact as they would want
> the diags to have some priority for debugging.  The other option is to change
> the patch to be a default of 2 and allow user to change it depending on what
> they are trying to do.  If you think that is best I will change the patch.
> 
> Ira
> 
> > 
> > -- Hal
> > 
> > > Ira
> > >
> > >>
> > >> -- Hal
> > >>
> > >> >
> > >> > The first patch converts the algorithm and the second adds the 
> > >> > ibnd_set_max_smps_on_wire call.
> > >> >
> > >> > Let me know what you think.  Because the algorithm changed so much 
> > >> > testing this is a bit difficult because the order of the node 
> > >> > discovery is different.  However, I have done some extensive diffing 
> > >> > of the output of ib

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-04 Thread Ira Weiny
On Thu, 4 Feb 2010 15:01:32 -0500
Hal Rosenstock  wrote:

> On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny  wrote:
> > On Thu, 4 Feb 2010 09:19:39 -0500
> > Hal Rosenstock  wrote:
> >
> >> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny  wrote:
> >> > Sasha,
> >> >

[snip]

> >> >
> >> > real    0m2.249s
> >> > user    0m1.244s
> >> > sys     0m0.936s
> >> >
> >> > 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map 
> >> > /etc/opensm/ib-node-name-map -g > new
> >> >
> >> > real    0m2.170s
> >> > user    0m1.160s
> >> > sys     0m0.933s
> >> >
> >> > 14:41:10 > /usr/sbin/ibqueryerrors  -s 
> >> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data
> >> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
> >> > Errors for 0x66a00d90006fb "SW19"
> >> >   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] 
> >> > [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954]
> >> >       Link info:    139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
> >> > 0x0002c9030001d736    864    1[  ] "hyperion1" ( )
> >> >
> >> > Note that there were no additional VL15Dropped packets on the fabric.  I 
> >> > think 4 seems to be a good compromise.  I have not tested when there are 
> >> > errors on the fabric.  (Right now things seem to be good!)
> >>
> >> Is this just with the SM doing light sweeping ?
> >
> > Yes.
> 
> That's not a lot of SMP stress from the SM side. SMP consumers are SM,
> diags, and the unsolicited traps.

Agreed.  I hope to test this more next week.

> 
> >
> >>
> >> Is there a speedup with 4 rather than 2 ?
> >
> > There is a bit of a speed up (~0.5 to 1.0 sec).  But my main reason to want 
> > to
> > go to 4 is that if there are issues on the fabric, unresponsive nodes etc.; 
> > 4
> > will give us better parallelism to get around these issues.  I have not had
> > the chance to test this condition with the new algorithm but the original
> > ibnetdiscover would slow way down when there are nodes which have 
> > unresponsive
> > SMA's.  If there are only 2 outstanding this will not give us much speed up.
> > This was the main motivation I had for improving the library in this way.
> >
> > Also, I think you are correct that we should increase OpenSM's default from 
> > 4
> > to 8.  For the same reason as above.  Some of our clusters have worked 
> > better
> > with 8 when we are having issues.  But right now we are still running with 
> > 4.
> 
> I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
> I've seen a number of clusters with SMP dropping with the current
> lower defaults.

So OpenSM is seeing dropped packets?  With 4 SMP's on the wire?  I do see some
VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an
issue.  What kind of rate are you seeing?

The other question is; do people regularly run the tools which are using
libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)?  We do.  If others
are not then I would say this change would have less impact as they would want
the diags to have some priority for debugging.  The other option is to change
the patch to be a default of 2 and allow user to change it depending on what
they are trying to do.  If you think that is best I will change the patch.

Ira

> 
> -- Hal
> 
> > Ira
> >
> >>
> >> -- Hal
> >>
> >> >
> >> > The first patch converts the algorithm and the second adds the 
> >> > ibnd_set_max_smps_on_wire call.
> >> >
> >> > Let me know what you think.  Because the algorithm changed so much 
> >> > testing this is a bit difficult because the order of the node discovery 
> >> > is different.  However, I have done some extensive diffing of the output 
> >> > of ibnetdiscover and things look good.
> >> >
> >> > Ira
> >> >
> >> > --
> >> > Ira Weiny
> >> > Math Programmer/Computer Scientist
> >> > Lawrence Livermore National Lab
> >> > 925-423-8008
> >> > wei...@llnl.gov
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> > the body of a message to majord...@vger.kernel.org
> >> > More majordomo info at  http://**vger.kernel.org/majordomo-info.html
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://**vger.kernel.org/majordomo-info.html
> >>
> >
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > wei...@llnl.gov
> >
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-04 Thread Hal Rosenstock
On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny  wrote:
> On Thu, 4 Feb 2010 09:19:39 -0500
> Hal Rosenstock  wrote:
>
>> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny  wrote:
>> > Sasha,
>> >
>> > Following up on our thread regarding having multiple outstanding SMP's in 
>> > libibnetdisc.
>> >
>> > These 2 patches implement that as well as add a function to set the max 
>> > outstanding the lib will use.
>> >
>> > I left the default here to be 4.  On a large cluster there seems to be 
>> > some variance with using 8 or 12.  Sometimes I get a speed up over 4 and 
>> > other times I don't see any.  I think it has to do with the traffic on the 
>> > fabric at any particular time.
>> >
>> > For example here are some runs I just did on Hyperion.
>> >
>> > 14:31:55 > /usr/sbin/ibqueryerrors  -s 
>> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data
>> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
>> > Errors for 0x66a00d90006fb "SW19"
>> >   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 14562048] 
>> > [RcvData == 14563872] [XmtPkts == 202255] [RcvPkts == 202276]
>> >       Link info:    139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
>> > 0x0002c9030001d736    864    1[  ] "hyperion1" ( )
>> >
>> > 14:32:02 > time ./ibnetdiscover -o 8 --node-name-map 
>> > /etc/opensm/ib-node-name-map -g > new
>> >
>> > real    0m2.210s
>> > user    0m1.251s
>> > sys     0m0.869s
>> >
>> > 14:40:36 > time ./ibnetdiscover -o 4 --node-name-map 
>> > /etc/opensm/ib-node-name-map -g > new
>> >
>> > real    0m3.385s
>> > user    0m1.888s
>> > sys     0m1.448s
>> >
>> > 14:40:46 > time ./ibnetdiscover -o 4 --node-name-map 
>> > /etc/opensm/ib-node-name-map -g > new
>> >
>> > real    0m2.211s
>> > user    0m1.165s
>> > sys     0m0.951s
>> >
>> > 14:40:51 > time ./ibnetdiscover -o 8 --node-name-map 
>> > /etc/opensm/ib-node-name-map -g > new
>> >
>> > real    0m2.249s
>> > user    0m1.244s
>> > sys     0m0.936s
>> >
>> > 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map 
>> > /etc/opensm/ib-node-name-map -g > new
>> >
>> > real    0m2.170s
>> > user    0m1.160s
>> > sys     0m0.933s
>> >
>> > 14:41:10 > /usr/sbin/ibqueryerrors  -s 
>> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data
>> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
>> > Errors for 0x66a00d90006fb "SW19"
>> >   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] 
>> > [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954]
>> >       Link info:    139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
>> > 0x0002c9030001d736    864    1[  ] "hyperion1" ( )
>> >
>> > Note that there were no additional VL15Dropped packets on the fabric.  I 
>> > think 4 seems to be a good compromise.  I have not tested when there are 
>> > errors on the fabric.  (Right now things seem to be good!)
>>
>> Is this just with the SM doing light sweeping ?
>
> Yes.

That's not a lot of SMP stress from the SM side. SMP consumers are SM,
diags, and the unsolicited traps.

>
>>
>> Is there a speedup with 4 rather than 2 ?
>
> There is a bit of a speed up (~0.5 to 1.0 sec).  But my main reason to want to
> go to 4 is that if there are issues on the fabric, unresponsive nodes etc.; 4
> will give us better parallelism to get around these issues.  I have not had
> the chance to test this condition with the new algorithm but the original
> ibnetdiscover would slow way down when there are nodes which have unresponsive
> SMA's.  If there are only 2 outstanding this will not give us much speed up.
> This was the main motivation I had for improving the library in this way.
>
> Also, I think you are correct that we should increase OpenSM's default from 4
> to 8.  For the same reason as above.  Some of our clusters have worked better
> with 8 when we are having issues.  But right now we are still running with 4.

I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
I've seen a number of clusters with SMP dropping with the current
lower defaults.

-- Hal

> Ira
>
>>
>> -- Hal
>>
>> >
>> > The first patch converts the algorithm and the second adds the 
>> > ibnd_set_max_smps_on_wire call.
>> >
>> > Let me know what you think.  Because the algorithm changed so much testing 
>> > this is a bit difficult because the order of the node discovery is 
>> > different.  However, I have done some extensive diffing of the output of 
>> > ibnetdiscover and things look good.
>> >
>> > Ira
>> >
>> > --
>> > Ira Weiny
>> > Math Programmer/Computer Scientist
>> > Lawrence Livermore National Lab
>> > 925-423-8008
>> > wei...@llnl.gov
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://*vger.kernel.org/majordomo-info.html
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://*vger.kernel

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-04 Thread Ira Weiny
On Thu, 4 Feb 2010 09:19:39 -0500
Hal Rosenstock  wrote:

> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny  wrote:
> > Sasha,
> >
> > Following up on our thread regarding having multiple outstanding SMP's in 
> > libibnetdisc.
> >
> > These 2 patches implement that as well as add a function to set the max 
> > outstanding the lib will use.
> >
> > I left the default here to be 4.  On a large cluster there seems to be some 
> > variance with using 8 or 12.  Sometimes I get a speed up over 4 and other 
> > times I don't see any.  I think it has to do with the traffic on the fabric 
> > at any particular time.
> >
> > For example here are some runs I just did on Hyperion.
> >
> > 14:31:55 > /usr/sbin/ibqueryerrors  -s 
> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data
> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
> > Errors for 0x66a00d90006fb "SW19"
> >   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 14562048] 
> > [RcvData == 14563872] [XmtPkts == 202255] [RcvPkts == 202276]
> >       Link info:    139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
> > 0x0002c9030001d736    864    1[  ] "hyperion1" ( )
> >
> > 14:32:02 > time ./ibnetdiscover -o 8 --node-name-map 
> > /etc/opensm/ib-node-name-map -g > new
> >
> > real    0m2.210s
> > user    0m1.251s
> > sys     0m0.869s
> >
> > 14:40:36 > time ./ibnetdiscover -o 4 --node-name-map 
> > /etc/opensm/ib-node-name-map -g > new
> >
> > real    0m3.385s
> > user    0m1.888s
> > sys     0m1.448s
> >
> > 14:40:46 > time ./ibnetdiscover -o 4 --node-name-map 
> > /etc/opensm/ib-node-name-map -g > new
> >
> > real    0m2.211s
> > user    0m1.165s
> > sys     0m0.951s
> >
> > 14:40:51 > time ./ibnetdiscover -o 8 --node-name-map 
> > /etc/opensm/ib-node-name-map -g > new
> >
> > real    0m2.249s
> > user    0m1.244s
> > sys     0m0.936s
> >
> > 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map 
> > /etc/opensm/ib-node-name-map -g > new
> >
> > real    0m2.170s
> > user    0m1.160s
> > sys     0m0.933s
> >
> > 14:41:10 > /usr/sbin/ibqueryerrors  -s 
> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data
> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
> > Errors for 0x66a00d90006fb "SW19"
> >   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] 
> > [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954]
> >       Link info:    139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
> > 0x0002c9030001d736    864    1[  ] "hyperion1" ( )
> >
> > Note that there were no additional VL15Dropped packets on the fabric.  I 
> > think 4 seems to be a good compromise.  I have not tested when there are 
> > errors on the fabric.  (Right now things seem to be good!)
> 
> Is this just with the SM doing light sweeping ?

Yes.

> 
> Is there a speedup with 4 rather than 2 ?

There is a bit of a speed up (~0.5 to 1.0 sec).  But my main reason to want to
go to 4 is that if there are issues on the fabric, unresponsive nodes etc.; 4
will give us better parallelism to get around these issues.  I have not had
the chance to test this condition with the new algorithm but the original
ibnetdiscover would slow way down when there are nodes which have unresponsive
SMA's.  If there are only 2 outstanding this will not give us much speed up.
This was the main motivation I had for improving the library in this way.

Also, I think you are correct that we should increase OpenSM's default from 4
to 8.  For the same reason as above.  Some of our clusters have worked better
with 8 when we are having issues.  But right now we are still running with 4.

Ira

> 
> -- Hal
> 
> >
> > The first patch converts the algorithm and the second adds the 
> > ibnd_set_max_smps_on_wire call.
> >
> > Let me know what you think.  Because the algorithm changed so much testing 
> > this is a bit difficult because the order of the node discovery is 
> > different.  However, I have done some extensive diffing of the output of 
> > ibnetdiscover and things look good.
> >
> > Ira
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > wei...@llnl.gov
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-04 Thread Hal Rosenstock
On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny  wrote:
> Sasha,
>
> Following up on our thread regarding having multiple outstanding SMP's in 
> libibnetdisc.
>
> These 2 patches implement that as well as add a function to set the max 
> outstanding the lib will use.
>
> I left the default here to be 4.  On a large cluster there seems to be some 
> variance with using 8 or 12.  Sometimes I get a speed up over 4 and other 
> times I don't see any.  I think it has to do with the traffic on the fabric 
> at any particular time.
>
> For example here are some runs I just did on Hyperion.
>
> 14:31:55 > /usr/sbin/ibqueryerrors  -s 
> RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data
> Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
> Errors for 0x66a00d90006fb "SW19"
>   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 14562048] 
> [RcvData == 14563872] [XmtPkts == 202255] [RcvPkts == 202276]
>       Link info:    139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
> 0x0002c9030001d736    864    1[  ] "hyperion1" ( )
>
> 14:32:02 > time ./ibnetdiscover -o 8 --node-name-map 
> /etc/opensm/ib-node-name-map -g > new
>
> real    0m2.210s
> user    0m1.251s
> sys     0m0.869s
>
> 14:40:36 > time ./ibnetdiscover -o 4 --node-name-map 
> /etc/opensm/ib-node-name-map -g > new
>
> real    0m3.385s
> user    0m1.888s
> sys     0m1.448s
>
> 14:40:46 > time ./ibnetdiscover -o 4 --node-name-map 
> /etc/opensm/ib-node-name-map -g > new
>
> real    0m2.211s
> user    0m1.165s
> sys     0m0.951s
>
> 14:40:51 > time ./ibnetdiscover -o 8 --node-name-map 
> /etc/opensm/ib-node-name-map -g > new
>
> real    0m2.249s
> user    0m1.244s
> sys     0m0.936s
>
> 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map 
> /etc/opensm/ib-node-name-map -g > new
>
> real    0m2.170s
> user    0m1.160s
> sys     0m0.933s
>
> 14:41:10 > /usr/sbin/ibqueryerrors  -s 
> RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data
> Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
> Errors for 0x66a00d90006fb "SW19"
>   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] 
> [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954]
>       Link info:    139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
> 0x0002c9030001d736    864    1[  ] "hyperion1" ( )
>
> Note that there were no additional VL15Dropped packets on the fabric.  I 
> think 4 seems to be a good compromise.  I have not tested when there are 
> errors on the fabric.  (Right now things seem to be good!)

Is this just with the SM doing light sweeping ?

Is there a speedup with 4 rather than 2 ?

-- Hal

>
> The first patch converts the algorithm and the second adds the 
> ibnd_set_max_smps_on_wire call.
>
> Let me know what you think.  Because the algorithm changed so much testing 
> this is a bit difficult because the order of the node discovery is different. 
>  However, I have done some extensive diffing of the output of ibnetdiscover 
> and things look good.
>
> Ira
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> wei...@llnl.gov
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-02 Thread Ira Weiny
Sasha,

Following up on our thread regarding having multiple outstanding SMP's in 
libibnetdisc.

These 2 patches implement that as well as add a function to set the max 
outstanding the lib will use.

I left the default here to be 4.  On a large cluster there seems to be some 
variance with using 8 or 12.  Sometimes I get a speed up over 4 and other times 
I don't see any.  I think it has to do with the traffic on the fabric at any 
particular time.

For example here are some runs I just did on Hyperion.

14:31:55 > /usr/sbin/ibqueryerrors  -s 
RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data 
Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
Errors for 0x66a00d90006fb "SW19"
   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 14562048] 
[RcvData == 14563872] [XmtPkts == 202255] [RcvPkts == 202276]
   Link info:139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
0x0002c9030001d7368641[  ] "hyperion1" ( )

14:32:02 > time ./ibnetdiscover -o 8 --node-name-map 
/etc/opensm/ib-node-name-map -g > new

real0m2.210s
user0m1.251s
sys 0m0.869s

14:40:36 > time ./ibnetdiscover -o 4 --node-name-map 
/etc/opensm/ib-node-name-map -g > new

real0m3.385s
user0m1.888s
sys 0m1.448s

14:40:46 > time ./ibnetdiscover -o 4 --node-name-map 
/etc/opensm/ib-node-name-map -g > new

real0m2.211s
user0m1.165s
sys 0m0.951s

14:40:51 > time ./ibnetdiscover -o 8 --node-name-map 
/etc/opensm/ib-node-name-map -g > new

real0m2.249s
user0m1.244s
sys 0m0.936s

14:40:59 > time ./ibnetdiscover -o 4 --node-name-map 
/etc/opensm/ib-node-name-map -g > new

real0m2.170s
user0m1.160s
sys 0m0.933s

14:41:10 > /usr/sbin/ibqueryerrors  -s 
RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data 
Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
Errors for 0x66a00d90006fb "SW19"
   GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] 
[RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954]
   Link info:139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>  
0x0002c9030001d7368641[  ] "hyperion1" ( )

Note that there were no additional VL15Dropped packets on the fabric.  I think 
4 seems to be a good compromise.  I have not tested when there are errors on 
the fabric.  (Right now things seem to be good!)

The first patch converts the algorithm and the second adds the 
ibnd_set_max_smps_on_wire call.

Let me know what you think.  Because the algorithm changed so much testing this 
is a bit difficult because the order of the node discovery is different.  
However, I have done some extensive diffing of the output of ibnetdiscover and 
things look good.

Ira

-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html