Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc
On Fri, 5 Feb 2010 07:27:05 -0500 Hal Rosenstock wrote: > > > > > > Note that 2 does not give much speed up, where 4 does. Obviously this could > > have to do with the fact there were 2 nodes which were bad (so if you had > > 100's of nodes unresponsive a higher value might be worth using) > > It depends on the number of unresponsive nodes being same or higher > than number of outstanding/parallel SMPs. In a sense, the number of > outstanding SMPs is a measure of how many unresponsive nodes one is > willing to tolerate before slowing down/waiting for timeouts. In some > environments, unresponsive nodes are a normal case. Agreed but where should we set the default? I don't think 4 is a bad default. I don't think it makes the diags overly aggressive, compared with OpenSM. Sasha I guess this is your call. Just tell me where to set it and I will make the patch. Basically with the user option it can always be changed on a run by run basis. Ira > > -- Hal > > > but as a > > default compromise I think 4 is good. > > > > Ira > > > >> > > > >> > > Also, I think you are correct that we should increase OpenSM's default > >> > > from 4 > >> > > to 8. For the same reason as above. Some of our clusters have worked > >> > > better > >> > > with 8 when we are having issues. But right now we are still running > >> > > with 4. > >> > > >> > I'm concerned about just increasing ibnetdiscover to 4 rather than 2. > >> > I've seen a number of clusters with SMP dropping with the current > >> > lower defaults. > >> > >> So OpenSM is seeing dropped packets? With 4 SMP's on the wire? I do see > >> some > >> VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an > >> issue. What kind of rate are you seeing? > >> > >> The other question is; do people regularly run the tools which are using > >> libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)? We do. If others > >> are not then I would say this change would have less impact as they would > >> want > >> the diags to have some priority for debugging. The other option is to > >> change > >> the patch to be a default of 2 and allow user to change it depending on > >> what > >> they are trying to do. If you think that is best I will change the patch. > >> > >> Ira > >> > >> > > >> > -- Hal > >> > > >> > > Ira > >> > > > >> > >> > >> > >> -- Hal > >> > >> > >> > >> > > >> > >> > The first patch converts the algorithm and the second adds the > >> > >> > ibnd_set_max_smps_on_wire call. > >> > >> > > >> > >> > Let me know what you think. Because the algorithm changed so much > >> > >> > testing this is a bit difficult because the order of the node > >> > >> > discovery is different. However, I have done some extensive > >> > >> > diffing of the output of ibnetdiscover and things look good. > >> > >> > > >> > >> > Ira > >> > >> > > >> > >> > -- > >> > >> > Ira Weiny > >> > >> > Math Programmer/Computer Scientist > >> > >> > Lawrence Livermore National Lab > >> > >> > 925-423-8008 > >> > >> > wei...@llnl.gov > >> > >> > -- > >> > >> > To unsubscribe from this list: send the line "unsubscribe > >> > >> > linux-rdma" in > >> > >> > the body of a message to majord...@vger.kernel.org > >> > >> > More majordomo info at > >> > >> > http://***vger.kernel.org/majordomo-info.html > >> > >> > > >> > >> -- > >> > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" > >> > >> in > >> > >> the body of a message to majord...@vger.kernel.org > >> > >> More majordomo info at http://***vger.kernel.org/majordomo-info.html > >> > >> > >> > > > >> > > > >> > > -- > >> > > Ira Weiny > >> > > Math Programmer/Computer Scientist > >> > > Lawrence Livermore National Lab > >> > > 925-423-8008 > >> > > wei...@llnl.gov > >> > > > >> > > >> > >> > >> -- > >> Ira Weiny > >> Math Programmer/Computer Scientist > >> Lawrence Livermore National Lab > >> 925-423-8008 > >> wei...@llnl.gov > > > > > > -- > > Ira Weiny > > Math Programmer/Computer Scientist > > Lawrence Livermore National Lab > > 925-423-8008 > > wei...@llnl.gov > > > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc
On Thu, Feb 4, 2010 at 9:18 PM, Ira Weiny wrote: > On Thu, 4 Feb 2010 16:13:25 -0800 > Ira Weiny wrote: > >> On Thu, 4 Feb 2010 15:01:32 -0500 >> Hal Rosenstock wrote: >> >> > On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny wrote: >> > > On Thu, 4 Feb 2010 09:19:39 -0500 >> > > Hal Rosenstock wrote: >> > > >> > >> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny wrote: >> > >> > Sasha, >> > >> > >> >> [snip] > > [snip] > >> > >> >> > >> Is there a speedup with 4 rather than 2 ? >> > > >> > > There is a bit of a speed up (~0.5 to 1.0 sec). But my main reason to >> > > want to >> > > go to 4 is that if there are issues on the fabric, unresponsive nodes >> > > etc.; 4 >> > > will give us better parallelism to get around these issues. I have not >> > > had >> > > the chance to test this condition with the new algorithm but the original >> > > ibnetdiscover would slow way down when there are nodes which have >> > > unresponsive >> > > SMA's. If there are only 2 outstanding this will not give us much speed >> > > up. >> > > This was the main motivation I had for improving the library in this way. > > Ok, I found a fabric with just 2 nodes which were unresponsive... A quick > test shows... > > Original ibnetdiscover: > > 18:12:29 > time ./ibnetdiscover > foo > ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; > 0,1,24,11,9) > src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) > failed, skipping port > ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; > 0,1,24,24,18,7,6) > src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; > 0,1,24,24,18,7,6) failed, skipping port > > real 0m9.073s > user 0m0.137s > sys 0m0.172s > > 18:12:43 > time ./ibnetdiscover > foo > ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; > 0,1,24,11,9) > src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) > failed, skipping port > ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; > 0,1,24,24,18,7,6) > src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; > 0,1,24,24,18,7,6) failed, skipping port > > real 0m9.103s > user 0m0.046s > sys 0m0.046s > > > *New* ibnetdiscover with different outstanding SMP's. > > 18:12:14 > time ./ibnetdiscover -o 2 > foo > src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) > bad status 110; Connection timed out > src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr > 0x11:0) bad status 110; Connection timed out > > real 0m9.746s > user 0m6.559s > sys 0m3.156s > > 18:13:00 > time ./ibnetdiscover -o 4 > foo > src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) > bad status 110; Connection timed out > src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr > 0x11:0) bad status 110; Connection timed out > > real 0m4.668s > user 0m3.043s > sys 0m1.601s > > 18:13:10 > time ./ibnetdiscover -o 8 > foo > src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) > bad status 110; Connection timed out > src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr > 0x11:0) bad status 110; Connection timed out > > real 0m4.360s > user 0m2.891s > sys 0m1.451s > > > Note that 2 does not give much speed up, where 4 does. Obviously this could > have to do with the fact there were 2 nodes which were bad (so if you had > 100's of nodes unresponsive a higher value might be worth using) It depends on the number of unresponsive nodes being same or higher than number of outstanding/parallel SMPs. In a sense, the number of outstanding SMPs is a measure of how many unresponsive nodes one is willing to tolerate before slowing down/waiting for timeouts. In some environments, unresponsive nodes are a normal case. -- Hal > but as a > default compromise I think 4 is good. > > Ira > >> > > >> > > Also, I think you are correct that we should increase OpenSM's default >> > > from 4 >> > > to 8. For the same reason as above. Some of our clusters have worked >> > > better >> > > with 8 when we are having issues. But right now we are still running >> > > with 4. >> > >> > I'm concerned about just increasing ibnetdiscover to 4 rather than 2. >> > I've seen a number of clusters with SMP dropping with the current >> > lower defaults. >> >> So OpenSM is seeing dropped packets? With 4 SMP's on the wire? I do see >> some >> VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an >> issue. What kind of rate are you seeing? >> >> The other question is; do people regularly run the tools which are using >> libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)? We do. If others >> are not then I would say this change would have less impact as they would >> want >> the diags to have some priority for debugging. The other option is to change >> the patch to be a default of 2 and allo
Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc
On Thu, Feb 4, 2010 at 7:13 PM, Ira Weiny wrote: > On Thu, 4 Feb 2010 15:01:32 -0500 > Hal Rosenstock wrote: > >> On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny wrote: >> > On Thu, 4 Feb 2010 09:19:39 -0500 >> > Hal Rosenstock wrote: >> > >> >> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny wrote: >> >> > Sasha, >> >> > > > [snip] > >> >> > >> >> > real 0m2.249s >> >> > user 0m1.244s >> >> > sys 0m0.936s >> >> > >> >> > 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map >> >> > /etc/opensm/ib-node-name-map -g > new >> >> > >> >> > real 0m2.170s >> >> > user 0m1.160s >> >> > sys 0m0.933s >> >> > >> >> > 14:41:10 > /usr/sbin/ibqueryerrors -s >> >> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data >> >> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait >> >> > Errors for 0x66a00d90006fb "SW19" >> >> > GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] >> >> > [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954] >> >> > Link info: 139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> >> >> > 0x0002c9030001d736 864 1[ ] "hyperion1" ( ) >> >> > >> >> > Note that there were no additional VL15Dropped packets on the fabric. >> >> > I think 4 seems to be a good compromise. I have not tested when there >> >> > are errors on the fabric. (Right now things seem to be good!) >> >> >> >> Is this just with the SM doing light sweeping ? >> > >> > Yes. >> >> That's not a lot of SMP stress from the SM side. SMP consumers are SM, >> diags, and the unsolicited traps. > > Agreed. I hope to test this more next week. >> >> > >> >> >> >> Is there a speedup with 4 rather than 2 ? >> > >> > There is a bit of a speed up (~0.5 to 1.0 sec). But my main reason to >> > want to >> > go to 4 is that if there are issues on the fabric, unresponsive nodes >> > etc.; 4 >> > will give us better parallelism to get around these issues. I have not had >> > the chance to test this condition with the new algorithm but the original >> > ibnetdiscover would slow way down when there are nodes which have >> > unresponsive >> > SMA's. If there are only 2 outstanding this will not give us much speed >> > up. >> > This was the main motivation I had for improving the library in this way. >> > >> > Also, I think you are correct that we should increase OpenSM's default >> > from 4 >> > to 8. For the same reason as above. Some of our clusters have worked >> > better >> > with 8 when we are having issues. But right now we are still running with >> > 4. >> >> I'm concerned about just increasing ibnetdiscover to 4 rather than 2. >> I've seen a number of clusters with SMP dropping with the current >> lower defaults. > > So OpenSM is seeing dropped packets? OpenSM is seeing timeouts and there are VL15 drops in the subnet. > With 4 SMP's on the wire? Yes. > I do see some > VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an > issue. What kind of rate are you seeing? > The other question is; do people regularly run the tools which are using > libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)? These tools are being used (at least ibnetdiscover and ibqueryerrors). > We do. If others > are not then I would say this change would have less impact as they would want > the diags to have some priority for debugging. The other option is to change > the patch to be a default of 2 and allow user to change it depending on what > they are trying to do. If you think that is best I will change the patch. FWIW I think 2 is better until we have more exhaustive experience with 4. The other alternative would be to make it 4 and then see if people start noticing (more) VL15 drops and possibly other issues. -- Hal > Ira > >> >> -- Hal >> >> > Ira >> > >> >> >> >> -- Hal >> >> >> >> > >> >> > The first patch converts the algorithm and the second adds the >> >> > ibnd_set_max_smps_on_wire call. >> >> > >> >> > Let me know what you think. Because the algorithm changed so much >> >> > testing this is a bit difficult because the order of the node discovery >> >> > is different. However, I have done some extensive diffing of the >> >> > output of ibnetdiscover and things look good. >> >> > >> >> > Ira >> >> > >> >> > -- >> >> > Ira Weiny >> >> > Math Programmer/Computer Scientist >> >> > Lawrence Livermore National Lab >> >> > 925-423-8008 >> >> > wei...@llnl.gov >> >> > -- >> >> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> >> > the body of a message to majord...@vger.kernel.org >> >> > More majordomo info at http://**vger.kernel.org/majordomo-info.html >> >> > >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> >> the body of a message to majord...@vger.kernel.org >> >> More majordomo info at http://**vger.kernel.org/majordomo-info.html >> >> >> > >> > >> > -- >> > Ira Weiny >> > Math Programmer/Computer Scientist >> > Lawrence Livermore National Lab >> > 925-423-8008 >>
Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc
On Thu, 4 Feb 2010 16:13:25 -0800 Ira Weiny wrote: > On Thu, 4 Feb 2010 15:01:32 -0500 > Hal Rosenstock wrote: > > > On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny wrote: > > > On Thu, 4 Feb 2010 09:19:39 -0500 > > > Hal Rosenstock wrote: > > > > > >> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny wrote: > > >> > Sasha, > > >> > > > [snip] [snip] > > >> > > >> Is there a speedup with 4 rather than 2 ? > > > > > > There is a bit of a speed up (~0.5 to 1.0 sec). But my main reason to > > > want to > > > go to 4 is that if there are issues on the fabric, unresponsive nodes > > > etc.; 4 > > > will give us better parallelism to get around these issues. I have not > > > had > > > the chance to test this condition with the new algorithm but the original > > > ibnetdiscover would slow way down when there are nodes which have > > > unresponsive > > > SMA's. If there are only 2 outstanding this will not give us much speed > > > up. > > > This was the main motivation I had for improving the library in this way. Ok, I found a fabric with just 2 nodes which were unresponsive... A quick test shows... Original ibnetdiscover: 18:12:29 > time ./ibnetdiscover > foo ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 0,1,24,11,9) src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) failed, skipping port ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 0,1,24,24,18,7,6) src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,24,18,7,6) failed, skipping port real0m9.073s user0m0.137s sys 0m0.172s 18:12:43 > time ./ibnetdiscover > foo ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 0,1,24,11,9) src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) failed, skipping port ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 0,1,24,24,18,7,6) src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,24,18,7,6) failed, skipping port real0m9.103s user0m0.046s sys 0m0.046s *New* ibnetdiscover with different outstanding SMP's. 18:12:14 > time ./ibnetdiscover -o 2 > foo src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) bad status 110; Connection timed out src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 0x11:0) bad status 110; Connection timed out real0m9.746s user0m6.559s sys 0m3.156s 18:13:00 > time ./ibnetdiscover -o 4 > foo src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) bad status 110; Connection timed out src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 0x11:0) bad status 110; Connection timed out real0m4.668s user0m3.043s sys 0m1.601s 18:13:10 > time ./ibnetdiscover -o 8 > foo src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) bad status 110; Connection timed out src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 0x11:0) bad status 110; Connection timed out real0m4.360s user0m2.891s sys 0m1.451s Note that 2 does not give much speed up, where 4 does. Obviously this could have to do with the fact there were 2 nodes which were bad (so if you had 100's of nodes unresponsive a higher value might be worth using) but as a default compromise I think 4 is good. Ira > > > > > > Also, I think you are correct that we should increase OpenSM's default > > > from 4 > > > to 8. For the same reason as above. Some of our clusters have worked > > > better > > > with 8 when we are having issues. But right now we are still running > > > with 4. > > > > I'm concerned about just increasing ibnetdiscover to 4 rather than 2. > > I've seen a number of clusters with SMP dropping with the current > > lower defaults. > > So OpenSM is seeing dropped packets? With 4 SMP's on the wire? I do see some > VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an > issue. What kind of rate are you seeing? > > The other question is; do people regularly run the tools which are using > libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)? We do. If others > are not then I would say this change would have less impact as they would want > the diags to have some priority for debugging. The other option is to change > the patch to be a default of 2 and allow user to change it depending on what > they are trying to do. If you think that is best I will change the patch. > > Ira > > > > > -- Hal > > > > > Ira > > > > > >> > > >> -- Hal > > >> > > >> > > > >> > The first patch converts the algorithm and the second adds the > > >> > ibnd_set_max_smps_on_wire call. > > >> > > > >> > Let me know what you think. Because the algorithm changed so much > > >> > testing this is a bit difficult because the order of the node > > >> > discovery is different. However, I have done some extensive diffing > > >> > of the output of ib
Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc
On Thu, 4 Feb 2010 15:01:32 -0500 Hal Rosenstock wrote: > On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny wrote: > > On Thu, 4 Feb 2010 09:19:39 -0500 > > Hal Rosenstock wrote: > > > >> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny wrote: > >> > Sasha, > >> > [snip] > >> > > >> > real 0m2.249s > >> > user 0m1.244s > >> > sys 0m0.936s > >> > > >> > 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map > >> > /etc/opensm/ib-node-name-map -g > new > >> > > >> > real 0m2.170s > >> > user 0m1.160s > >> > sys 0m0.933s > >> > > >> > 14:41:10 > /usr/sbin/ibqueryerrors -s > >> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data > >> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait > >> > Errors for 0x66a00d90006fb "SW19" > >> > GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] > >> > [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954] > >> > Link info: 139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> > >> > 0x0002c9030001d736 864 1[ ] "hyperion1" ( ) > >> > > >> > Note that there were no additional VL15Dropped packets on the fabric. I > >> > think 4 seems to be a good compromise. I have not tested when there are > >> > errors on the fabric. (Right now things seem to be good!) > >> > >> Is this just with the SM doing light sweeping ? > > > > Yes. > > That's not a lot of SMP stress from the SM side. SMP consumers are SM, > diags, and the unsolicited traps. Agreed. I hope to test this more next week. > > > > >> > >> Is there a speedup with 4 rather than 2 ? > > > > There is a bit of a speed up (~0.5 to 1.0 sec). But my main reason to want > > to > > go to 4 is that if there are issues on the fabric, unresponsive nodes etc.; > > 4 > > will give us better parallelism to get around these issues. I have not had > > the chance to test this condition with the new algorithm but the original > > ibnetdiscover would slow way down when there are nodes which have > > unresponsive > > SMA's. If there are only 2 outstanding this will not give us much speed up. > > This was the main motivation I had for improving the library in this way. > > > > Also, I think you are correct that we should increase OpenSM's default from > > 4 > > to 8. For the same reason as above. Some of our clusters have worked > > better > > with 8 when we are having issues. But right now we are still running with > > 4. > > I'm concerned about just increasing ibnetdiscover to 4 rather than 2. > I've seen a number of clusters with SMP dropping with the current > lower defaults. So OpenSM is seeing dropped packets? With 4 SMP's on the wire? I do see some VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an issue. What kind of rate are you seeing? The other question is; do people regularly run the tools which are using libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)? We do. If others are not then I would say this change would have less impact as they would want the diags to have some priority for debugging. The other option is to change the patch to be a default of 2 and allow user to change it depending on what they are trying to do. If you think that is best I will change the patch. Ira > > -- Hal > > > Ira > > > >> > >> -- Hal > >> > >> > > >> > The first patch converts the algorithm and the second adds the > >> > ibnd_set_max_smps_on_wire call. > >> > > >> > Let me know what you think. Because the algorithm changed so much > >> > testing this is a bit difficult because the order of the node discovery > >> > is different. However, I have done some extensive diffing of the output > >> > of ibnetdiscover and things look good. > >> > > >> > Ira > >> > > >> > -- > >> > Ira Weiny > >> > Math Programmer/Computer Scientist > >> > Lawrence Livermore National Lab > >> > 925-423-8008 > >> > wei...@llnl.gov > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > >> > the body of a message to majord...@vger.kernel.org > >> > More majordomo info at http://**vger.kernel.org/majordomo-info.html > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://**vger.kernel.org/majordomo-info.html > >> > > > > > > -- > > Ira Weiny > > Math Programmer/Computer Scientist > > Lawrence Livermore National Lab > > 925-423-8008 > > wei...@llnl.gov > > > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc
On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny wrote: > On Thu, 4 Feb 2010 09:19:39 -0500 > Hal Rosenstock wrote: > >> On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny wrote: >> > Sasha, >> > >> > Following up on our thread regarding having multiple outstanding SMP's in >> > libibnetdisc. >> > >> > These 2 patches implement that as well as add a function to set the max >> > outstanding the lib will use. >> > >> > I left the default here to be 4. On a large cluster there seems to be >> > some variance with using 8 or 12. Sometimes I get a speed up over 4 and >> > other times I don't see any. I think it has to do with the traffic on the >> > fabric at any particular time. >> > >> > For example here are some runs I just did on Hyperion. >> > >> > 14:31:55 > /usr/sbin/ibqueryerrors -s >> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data >> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait >> > Errors for 0x66a00d90006fb "SW19" >> > GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 14562048] >> > [RcvData == 14563872] [XmtPkts == 202255] [RcvPkts == 202276] >> > Link info: 139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> >> > 0x0002c9030001d736 864 1[ ] "hyperion1" ( ) >> > >> > 14:32:02 > time ./ibnetdiscover -o 8 --node-name-map >> > /etc/opensm/ib-node-name-map -g > new >> > >> > real 0m2.210s >> > user 0m1.251s >> > sys 0m0.869s >> > >> > 14:40:36 > time ./ibnetdiscover -o 4 --node-name-map >> > /etc/opensm/ib-node-name-map -g > new >> > >> > real 0m3.385s >> > user 0m1.888s >> > sys 0m1.448s >> > >> > 14:40:46 > time ./ibnetdiscover -o 4 --node-name-map >> > /etc/opensm/ib-node-name-map -g > new >> > >> > real 0m2.211s >> > user 0m1.165s >> > sys 0m0.951s >> > >> > 14:40:51 > time ./ibnetdiscover -o 8 --node-name-map >> > /etc/opensm/ib-node-name-map -g > new >> > >> > real 0m2.249s >> > user 0m1.244s >> > sys 0m0.936s >> > >> > 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map >> > /etc/opensm/ib-node-name-map -g > new >> > >> > real 0m2.170s >> > user 0m1.160s >> > sys 0m0.933s >> > >> > 14:41:10 > /usr/sbin/ibqueryerrors -s >> > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data >> > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait >> > Errors for 0x66a00d90006fb "SW19" >> > GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] >> > [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954] >> > Link info: 139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> >> > 0x0002c9030001d736 864 1[ ] "hyperion1" ( ) >> > >> > Note that there were no additional VL15Dropped packets on the fabric. I >> > think 4 seems to be a good compromise. I have not tested when there are >> > errors on the fabric. (Right now things seem to be good!) >> >> Is this just with the SM doing light sweeping ? > > Yes. That's not a lot of SMP stress from the SM side. SMP consumers are SM, diags, and the unsolicited traps. > >> >> Is there a speedup with 4 rather than 2 ? > > There is a bit of a speed up (~0.5 to 1.0 sec). But my main reason to want to > go to 4 is that if there are issues on the fabric, unresponsive nodes etc.; 4 > will give us better parallelism to get around these issues. I have not had > the chance to test this condition with the new algorithm but the original > ibnetdiscover would slow way down when there are nodes which have unresponsive > SMA's. If there are only 2 outstanding this will not give us much speed up. > This was the main motivation I had for improving the library in this way. > > Also, I think you are correct that we should increase OpenSM's default from 4 > to 8. For the same reason as above. Some of our clusters have worked better > with 8 when we are having issues. But right now we are still running with 4. I'm concerned about just increasing ibnetdiscover to 4 rather than 2. I've seen a number of clusters with SMP dropping with the current lower defaults. -- Hal > Ira > >> >> -- Hal >> >> > >> > The first patch converts the algorithm and the second adds the >> > ibnd_set_max_smps_on_wire call. >> > >> > Let me know what you think. Because the algorithm changed so much testing >> > this is a bit difficult because the order of the node discovery is >> > different. However, I have done some extensive diffing of the output of >> > ibnetdiscover and things look good. >> > >> > Ira >> > >> > -- >> > Ira Weiny >> > Math Programmer/Computer Scientist >> > Lawrence Livermore National Lab >> > 925-423-8008 >> > wei...@llnl.gov >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> > the body of a message to majord...@vger.kernel.org >> > More majordomo info at http://*vger.kernel.org/majordomo-info.html >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://*vger.kernel
Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc
On Thu, 4 Feb 2010 09:19:39 -0500 Hal Rosenstock wrote: > On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny wrote: > > Sasha, > > > > Following up on our thread regarding having multiple outstanding SMP's in > > libibnetdisc. > > > > These 2 patches implement that as well as add a function to set the max > > outstanding the lib will use. > > > > I left the default here to be 4. On a large cluster there seems to be some > > variance with using 8 or 12. Sometimes I get a speed up over 4 and other > > times I don't see any. I think it has to do with the traffic on the fabric > > at any particular time. > > > > For example here are some runs I just did on Hyperion. > > > > 14:31:55 > /usr/sbin/ibqueryerrors -s > > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data > > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait > > Errors for 0x66a00d90006fb "SW19" > > GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 14562048] > > [RcvData == 14563872] [XmtPkts == 202255] [RcvPkts == 202276] > > Link info: 139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> > > 0x0002c9030001d736 864 1[ ] "hyperion1" ( ) > > > > 14:32:02 > time ./ibnetdiscover -o 8 --node-name-map > > /etc/opensm/ib-node-name-map -g > new > > > > real 0m2.210s > > user 0m1.251s > > sys 0m0.869s > > > > 14:40:36 > time ./ibnetdiscover -o 4 --node-name-map > > /etc/opensm/ib-node-name-map -g > new > > > > real 0m3.385s > > user 0m1.888s > > sys 0m1.448s > > > > 14:40:46 > time ./ibnetdiscover -o 4 --node-name-map > > /etc/opensm/ib-node-name-map -g > new > > > > real 0m2.211s > > user 0m1.165s > > sys 0m0.951s > > > > 14:40:51 > time ./ibnetdiscover -o 8 --node-name-map > > /etc/opensm/ib-node-name-map -g > new > > > > real 0m2.249s > > user 0m1.244s > > sys 0m0.936s > > > > 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map > > /etc/opensm/ib-node-name-map -g > new > > > > real 0m2.170s > > user 0m1.160s > > sys 0m0.933s > > > > 14:41:10 > /usr/sbin/ibqueryerrors -s > > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data > > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait > > Errors for 0x66a00d90006fb "SW19" > > GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] > > [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954] > > Link info: 139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> > > 0x0002c9030001d736 864 1[ ] "hyperion1" ( ) > > > > Note that there were no additional VL15Dropped packets on the fabric. I > > think 4 seems to be a good compromise. I have not tested when there are > > errors on the fabric. (Right now things seem to be good!) > > Is this just with the SM doing light sweeping ? Yes. > > Is there a speedup with 4 rather than 2 ? There is a bit of a speed up (~0.5 to 1.0 sec). But my main reason to want to go to 4 is that if there are issues on the fabric, unresponsive nodes etc.; 4 will give us better parallelism to get around these issues. I have not had the chance to test this condition with the new algorithm but the original ibnetdiscover would slow way down when there are nodes which have unresponsive SMA's. If there are only 2 outstanding this will not give us much speed up. This was the main motivation I had for improving the library in this way. Also, I think you are correct that we should increase OpenSM's default from 4 to 8. For the same reason as above. Some of our clusters have worked better with 8 when we are having issues. But right now we are still running with 4. Ira > > -- Hal > > > > > The first patch converts the algorithm and the second adds the > > ibnd_set_max_smps_on_wire call. > > > > Let me know what you think. Because the algorithm changed so much testing > > this is a bit difficult because the order of the node discovery is > > different. However, I have done some extensive diffing of the output of > > ibnetdiscover and things look good. > > > > Ira > > > > -- > > Ira Weiny > > Math Programmer/Computer Scientist > > Lawrence Livermore National Lab > > 925-423-8008 > > wei...@llnl.gov > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://*vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://*vger.kernel.org/majordomo-info.html > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc
On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny wrote: > Sasha, > > Following up on our thread regarding having multiple outstanding SMP's in > libibnetdisc. > > These 2 patches implement that as well as add a function to set the max > outstanding the lib will use. > > I left the default here to be 4. On a large cluster there seems to be some > variance with using 8 or 12. Sometimes I get a speed up over 4 and other > times I don't see any. I think it has to do with the traffic on the fabric > at any particular time. > > For example here are some runs I just did on Hyperion. > > 14:31:55 > /usr/sbin/ibqueryerrors -s > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait > Errors for 0x66a00d90006fb "SW19" > GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 14562048] > [RcvData == 14563872] [XmtPkts == 202255] [RcvPkts == 202276] > Link info: 139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> > 0x0002c9030001d736 864 1[ ] "hyperion1" ( ) > > 14:32:02 > time ./ibnetdiscover -o 8 --node-name-map > /etc/opensm/ib-node-name-map -g > new > > real 0m2.210s > user 0m1.251s > sys 0m0.869s > > 14:40:36 > time ./ibnetdiscover -o 4 --node-name-map > /etc/opensm/ib-node-name-map -g > new > > real 0m3.385s > user 0m1.888s > sys 0m1.448s > > 14:40:46 > time ./ibnetdiscover -o 4 --node-name-map > /etc/opensm/ib-node-name-map -g > new > > real 0m2.211s > user 0m1.165s > sys 0m0.951s > > 14:40:51 > time ./ibnetdiscover -o 8 --node-name-map > /etc/opensm/ib-node-name-map -g > new > > real 0m2.249s > user 0m1.244s > sys 0m0.936s > > 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map > /etc/opensm/ib-node-name-map -g > new > > real 0m2.170s > user 0m1.160s > sys 0m0.933s > > 14:41:10 > /usr/sbin/ibqueryerrors -s > RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data > Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait > Errors for 0x66a00d90006fb "SW19" > GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] > [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954] > Link info: 139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> > 0x0002c9030001d736 864 1[ ] "hyperion1" ( ) > > Note that there were no additional VL15Dropped packets on the fabric. I > think 4 seems to be a good compromise. I have not tested when there are > errors on the fabric. (Right now things seem to be good!) Is this just with the SM doing light sweeping ? Is there a speedup with 4 rather than 2 ? -- Hal > > The first patch converts the algorithm and the second adds the > ibnd_set_max_smps_on_wire call. > > Let me know what you think. Because the algorithm changed so much testing > this is a bit difficult because the order of the node discovery is different. > However, I have done some extensive diffing of the output of ibnetdiscover > and things look good. > > Ira > > -- > Ira Weiny > Math Programmer/Computer Scientist > Lawrence Livermore National Lab > 925-423-8008 > wei...@llnl.gov > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Using multi-smps on the wire in libibnetdisc
Sasha, Following up on our thread regarding having multiple outstanding SMP's in libibnetdisc. These 2 patches implement that as well as add a function to set the max outstanding the lib will use. I left the default here to be 4. On a large cluster there seems to be some variance with using 8 or 12. Sometimes I get a speed up over 4 and other times I don't see any. I think it has to do with the traffic on the fabric at any particular time. For example here are some runs I just did on Hyperion. 14:31:55 > /usr/sbin/ibqueryerrors -s RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait Errors for 0x66a00d90006fb "SW19" GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 14562048] [RcvData == 14563872] [XmtPkts == 202255] [RcvPkts == 202276] Link info:139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x0002c9030001d7368641[ ] "hyperion1" ( ) 14:32:02 > time ./ibnetdiscover -o 8 --node-name-map /etc/opensm/ib-node-name-map -g > new real0m2.210s user0m1.251s sys 0m0.869s 14:40:36 > time ./ibnetdiscover -o 4 --node-name-map /etc/opensm/ib-node-name-map -g > new real0m3.385s user0m1.888s sys 0m1.448s 14:40:46 > time ./ibnetdiscover -o 4 --node-name-map /etc/opensm/ib-node-name-map -g > new real0m2.211s user0m1.165s sys 0m0.951s 14:40:51 > time ./ibnetdiscover -o 8 --node-name-map /etc/opensm/ib-node-name-map -g > new real0m2.249s user0m1.244s sys 0m0.936s 14:40:59 > time ./ibnetdiscover -o 4 --node-name-map /etc/opensm/ib-node-name-map -g > new real0m2.170s user0m1.160s sys 0m0.933s 14:41:10 > /usr/sbin/ibqueryerrors -s RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait Errors for 0x66a00d90006fb "SW19" GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954] Link info:139 9[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x0002c9030001d7368641[ ] "hyperion1" ( ) Note that there were no additional VL15Dropped packets on the fabric. I think 4 seems to be a good compromise. I have not tested when there are errors on the fabric. (Right now things seem to be good!) The first patch converts the algorithm and the second adds the ibnd_set_max_smps_on_wire call. Let me know what you think. Because the algorithm changed so much testing this is a bit difficult because the order of the node discovery is different. However, I have done some extensive diffing of the output of ibnetdiscover and things look good. Ira -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html