After installing UCX 1.5.0 and OpenMPI 4.0.1 compiled for UCX and
without verbs
(full details below), my NetPIPE benchmark is reporting message failures for
some message sizes above 300 KB. There are no failures when I benchmark
with
a non-UCX (verbs) version of OpenMPI 4.0.1, and no failures when I test the
UCX version
with --mca btl tcp,self. These failures show up in testing QDR IB and 40
GbE networks.
NetPIPE tests the first and last bytes always, but can do a full integrity
test
using --integrity that tests all bytes and this shows that no message is
being
received in the cases of the failures.
Details on the system and software installation are below followed by
several NetPIPE runs illustrating the errors. This includes a minimal
case of 3 ping-pong messages where the middle one shows failures. Let me
know if there's any more information you need, or any additional tests I
can run.
Dave Turner
CentOS 7 on Intel processors, QDR IB and 40 GbE tests
UCX 1.5.0 installed from the tarball according to the docs on the webpage
OpenMPI-4.0.1 configured for verbs with:
./configure F77=ifort FC=ifort
--prefix=/homes/daveturner/libs/openmpi-4.0.1-verbs
--enable-mpirun-prefix-by-default --enable-mpi-fortran=all --enable-mpi-cxx
--enable-ipv6 --with-verbs --with-slurm --disable-dlopen
OpenMPI-4.0.1 configured for UCX with:
./configure F77=ifort FC=ifort
--prefix=/homes/daveturner/libs/openmpi-4.0.1-ucx
--enable-mpirun-prefix-by-default --enable-mpi-fortran=all --enable-mpi-cxx
--enable-ipv6 --without-verbs --with-slurm --disable-dlopen
--with-ucx=/homes/daveturner/libs/ucx-1.5.0/install
NetPIPE compiled with:
/homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpicc -g -O3 -Wall -lrt -DMPI
./src/netpipe.c ./src/mpi.c -o NPmpi-4.0.1-ucx -I./src
(http://netpipe.cs.ksu.edu/ compiled with 'make mpi')
**************************************************************************************
Normal uni-directional point-to-point test shows errors (testing first and
last bytes)
for messages over 300 KB.
**************************************************************************************
Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile
hf.elf NPmpi-4.0.1-ucx -o np.elf.mpi-4.0.1-ucx-ib --printhostnames
Saving output to np.elf.mpi-4.0.1-ucx-ib
Proc 0 is on host elf77
Proc 1 is on host elf78
Clock resolution ~ 1.000 nsecs Clock accuracy ~ 38.000 nsecs
Start testing with 7 trials for each message size
1: 1 B 24999 times --> 3.766 Mbps in 2.124 usecs
2: 2 B 117702 times --> 8.386 Mbps in 1.908 usecs
3: 3 B 131032 times --> 12.633 Mbps in 1.900 usecs
4: 4 B 131592 times --> 16.715 Mbps in 1.914 usecs
5: 6 B 130589 times --> 25.077 Mbps in 1.914 usecs
6: 8 B 130608 times --> 33.402 Mbps in 1.916 usecs
7: 12 B 130477 times --> 50.047 Mbps in 1.918 usecs
8: 13 B 130329 times --> 54.872 Mbps in 1.895 usecs
9: 16 B 131904 times --> 67.187 Mbps in 1.905 usecs
10: 19 B 131225 times --> 79.255 Mbps in 1.918 usecs
11: 21 B 130353 times --> 87.118 Mbps in 1.928 usecs
12: 24 B 129640 times --> 99.831 Mbps in 1.923 usecs
13: 27 B 129988 times --> 111.760 Mbps in 1.933 usecs
14: 29 B 129351 times --> 121.048 Mbps in 1.917 usecs
15: 32 B 130439 times --> 132.620 Mbps in 1.930 usecs
16: 35 B 129511 times --> 144.272 Mbps in 1.941 usecs
17: 45 B 128814 times --> 182.881 Mbps in 1.968 usecs
18: 48 B 127000 times --> 194.231 Mbps in 1.977 usecs
19: 51 B 126452 times --> 206.193 Mbps in 1.979 usecs
20: 61 B 126343 times --> 236.168 Mbps in 2.066 usecs
21: 64 B 120987 times --> 244.690 Mbps in 2.092 usecs
22: 67 B 119477 times --> 256.660 Mbps in 2.088 usecs
23: 93 B 119710 times --> 242.428 Mbps in 3.069 usecs
24: 96 B 81460 times --> 250.503 Mbps in 3.066 usecs
25: 99 B 81543 times --> 258.376 Mbps in 3.065 usecs
26: 125 B 81558 times --> 321.127 Mbps in 3.114 usecs
27: 128 B 80281 times --> 328.788 Mbps in 3.114 usecs
28: 131 B 80270 times --> 336.387 Mbps in 3.115 usecs
29: 189 B 80244 times --> 474.304 Mbps in 3.188 usecs
30: 192 B 78423 times --> 482.258 Mbps in 3.185 usecs
31: 195 B 78492 times --> 489.635 Mbps in 3.186 usecs
32: 253 B 78467 times --> 623.891 Mbps in 3.244 usecs
33: 256 B 77061 times --> 631.098 Mbps in 3.245 usecs
34: 259 B 77038 times --> 637.905 Mbps in 3.248 usecs
35: 381 B 76967 times --> 906.297 Mbps in 3.363 usecs
36: 384 B 74335 times --> 913.387 Mbps in 3.363 usecs
37: 387 B 74331 times --> 921.348 Mbps in 3.360 usecs
38: 509 B 74398 times --> 1.166 Gbps in 3.493 usecs
39: 512 B 71575 times --> 1.176 Gbps in 3.484 usecs
40: 515 B 71755 times --> 1.183 Gbps in 3.483 usecs
41: 765 B 71780 times --> 1.614 Gbps in 3.793 usecs
42: 768 B 65912 times --> 1.623 Gbps in 3.787 usecs
43: 771 B 66023 times --> 1.630 Gbps in 3.785 usecs
44: 1.021 KB 66050 times --> 2.034 Gbps in 4.016 usecs
45: 1.024 KB 62257 times --> 2.043 Gbps in 4.010 usecs
46: 1.027 KB 62338 times --> 2.050 Gbps in 4.007 usecs
47: 1.533 KB 62387 times --> 2.699 Gbps in 4.545 usecs
48: 1.536 KB 55010 times --> 2.708 Gbps in 4.538 usecs
49: 1.539 KB 55084 times --> 2.708 Gbps in 4.547 usecs
50: 2.045 KB 54978 times --> 3.216 Gbps in 5.086 usecs
51: 2.048 KB 49150 times --> 3.222 Gbps in 5.085 usecs
52: 2.051 KB 49166 times --> 3.225 Gbps in 5.088 usecs
53: 3.069 KB 49139 times --> 4.488 Gbps in 5.471 usecs
54: 3.072 KB 45697 times --> 4.494 Gbps in 5.469 usecs
55: 3.075 KB 45713 times --> 4.496 Gbps in 5.472 usecs
56: 4.093 KB 45686 times --> 5.570 Gbps in 5.878 usecs
57: 4.096 KB 42528 times --> 5.568 Gbps in 5.885 usecs
58: 4.099 KB 42483 times --> 5.550 Gbps in 5.909 usecs
59: 6.141 KB 42310 times --> 7.342 Gbps in 6.692 usecs
60: 6.144 KB 37359 times --> 7.315 Gbps in 6.719 usecs
61: 6.147 KB 37206 times --> 7.343 Gbps in 6.697 usecs
62: 8.189 KB 37328 times --> 8.286 Gbps in 7.907 usecs
63: 8.192 KB 31619 times --> 8.265 Gbps in 7.930 usecs
64: 8.195 KB 31527 times --> 8.254 Gbps in 7.943 usecs
65: 12.285 KB 31476 times --> 11.154 Gbps in 8.811 usecs
66: 12.288 KB 28373 times --> 11.157 Gbps in 8.811 usecs
67: 12.291 KB 28372 times --> 11.116 Gbps in 8.846 usecs
68: 16.381 KB 28262 times --> 12.525 Gbps in 10.463 usecs
69: 16.384 KB 23893 times --> 12.501 Gbps in 10.485 usecs
70: 16.387 KB 23843 times --> 12.502 Gbps in 10.486 usecs
71: 24.573 KB 23841 times --> 15.127 Gbps in 12.995 usecs
72: 24.576 KB 19237 times --> 15.120 Gbps in 13.003 usecs
73: 24.579 KB 19226 times --> 15.115 Gbps in 13.009 usecs
74: 32.765 KB 19217 times --> 17.114 Gbps in 15.316 usecs
75: 32.768 KB 16323 times --> 17.133 Gbps in 15.301 usecs
76: 32.771 KB 16339 times --> 17.117 Gbps in 15.316 usecs
77: 49.149 KB 16322 times --> 19.686 Gbps in 19.973 usecs
78: 49.152 KB 12516 times --> 19.644 Gbps in 20.017 usecs
79: 49.155 KB 12489 times --> 19.635 Gbps in 20.027 usecs
80: 65.533 KB 12483 times --> 21.295 Gbps in 24.619 usecs
81: 65.536 KB 10154 times --> 21.277 Gbps in 24.641 usecs
82: 65.539 KB 10145 times --> 21.265 Gbps in 24.656 usecs
83: 98.301 KB 10139 times --> 23.107 Gbps in 34.034 usecs
84: 98.304 KB 7345 times --> 23.137 Gbps in 33.990 usecs
85: 98.307 KB 7355 times --> 23.089 Gbps in 34.063 usecs
86: 131.069 KB 7339 times --> 24.208 Gbps in 43.314 usecs
87: 131.072 KB 5771 times --> 24.218 Gbps in 43.297 usecs
88: 131.075 KB 5774 times --> 24.192 Gbps in 43.345 usecs
89: 196.605 KB 5767 times --> 25.365 Gbps in 62.008 usecs
90: 196.608 KB 4031 times --> 25.326 Gbps in 62.106 usecs
91: 196.611 KB 4025 times --> 25.365 Gbps in 62.011 usecs
92: 262.141 KB 4031 times --> 26.066 Gbps in 80.454 usecs
93: 262.144 KB 3107 times --> 27.495 Gbps in 76.275 usecs
94: 262.147 KB 3277 times --> 27.162 Gbps in 77.210 usecs
95: 393.213 KB 3237 times --> 28.291 Gbps in 111.192 usecs 1
failures
96: 393.216 KB 2248 times --> 28.529 Gbps in 110.265 usecs
31472 failures
97: 393.219 KB 2267 times --> 28.360 Gbps in 110.922 usecs 1
failures
98: 524.285 KB 2253 times --> 28.830 Gbps in 145.483 usecs 1
failures
99: 524.288 KB 1718 times --> 28.869 Gbps in 145.288 usecs
24052 failures
100: 524.291 KB 1720 times --> 29.043 Gbps in 144.417 usecs 1
failures
101: 786.429 KB 1731 times --> 29.451 Gbps in 213.626 usecs 1
failures
102: 786.432 KB 1170 times --> 29.383 Gbps in 214.122 usecs
16380 failures
103: 786.435 KB 1167 times --> 29.481 Gbps in 213.408 usecs 1
failures
104: 1.049 MB 1171 times --> 29.791 Gbps in 281.580 usecs 1
failures
105: 1.049 MB 887 times --> 29.801 Gbps in 281.485 usecs
12418 failures
106: 1.049 MB 888 times --> 29.694 Gbps in 282.505 usecs 1
failures
107: 1.573 MB 884 times --> 30.118 Gbps in 417.786 usecs 1
failures
108: 1.573 MB 598 times --> 30.140 Gbps in 417.489 usecs 8372
failures
109: 1.573 MB 598 times --> 30.032 Gbps in 418.985 usecs 1
failures
110: 2.097 MB 596 times --> 30.179 Gbps in 555.919 usecs 1
failures
111: 2.097 MB 449 times --> 30.161 Gbps in 556.255 usecs 6286
failures
112: 2.097 MB 449 times --> 30.199 Gbps in 555.549 usecs 1
failures
113: 3.146 MB 450 times --> 30.372 Gbps in 828.586 usecs 1
failures
114: 3.146 MB 301 times --> 30.302 Gbps in 830.498 usecs 4214
failures
115: 3.146 MB 301 times --> 30.413 Gbps in 827.462 usecs 1
failures
116: 4.194 MB 302 times --> 30.442 Gbps in 1.102 msecs 1
failures
117: 4.194 MB 226 times --> 30.443 Gbps in 1.102 msecs 3164
failures
118: 4.194 MB 226 times --> 30.342 Gbps in 1.106 msecs 1
failures
119: 6.291 MB 226 times --> 29.276 Gbps in 1.719 msecs
120: 6.291 MB 145 times --> 29.274 Gbps in 1.719 msecs 2030
failures
121: 6.291 MB 145 times --> 29.199 Gbps in 1.724 msecs
122: 8.389 MB 145 times --> 29.012 Gbps in 2.313 msecs 1
failures
123: 8.389 MB 108 times --> 29.046 Gbps in 2.310 msecs 1512
failures
124: 8.389 MB 108 times --> 29.010 Gbps in 2.313 msecs 1
failures
Completed with max bandwidth 30.299 Gbps 1.931 usecs latency
**************************************************************************************
uni-directional point-to-point test with integrity check just doest 1 test
for each
message size but tests all bytes, not just first and last bytes.
**************************************************************************************
Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile
hf.elf NPmpi-4.0.1-ucx --printhostnames --integrity
Proc 0 is on host elf77
Doing a message integrity check instead of measuring performance
Proc 1 is on host elf78
Clock resolution ~ 1.000 nsecs Clock accuracy ~ 39.000 nsecs
Start testing with 1 trials for each message size
1: 1 B 24999 times --> 0 failures
2: 2 B 110029 times --> 0 failures
3: 3 B 111886 times --> 0 failures
4: 4 B 129467 times --> 0 failures
5: 6 B 129909 times --> 0 failures
6: 8 B 129157 times --> 0 failures
7: 12 B 128443 times --> 0 failures
8: 13 B 127507 times --> 0 failures
9: 16 B 126905 times --> 0 failures
10: 19 B 126650 times --> 0 failures
11: 21 B 125962 times --> 0 failures
12: 24 B 123901 times --> 0 failures
13: 27 B 124594 times --> 0 failures
14: 29 B 124139 times --> 0 failures
15: 32 B 123816 times --> 0 failures
16: 35 B 123149 times --> 0 failures
17: 45 B 122853 times --> 0 failures
18: 48 B 117985 times --> 0 failures
19: 51 B 117168 times --> 0 failures
20: 61 B 116449 times --> 0 failures
21: 64 B 110200 times --> 0 failures
22: 67 B 109647 times --> 0 failures
23: 93 B 108702 times --> 0 failures
24: 96 B 73950 times --> 0 failures
25: 99 B 74284 times --> 0 failures
26: 125 B 73998 times --> 0 failures
27: 128 B 71432 times --> 0 failures
28: 131 B 70523 times --> 0 failures
29: 189 B 71108 times --> 0 failures
30: 192 B 66161 times --> 0 failures
31: 195 B 66110 times --> 0 failures
32: 253 B 65814 times --> 0 failures
33: 256 B 62284 times --> 0 failures
34: 259 B 61922 times --> 0 failures
35: 381 B 61869 times --> 0 failures
36: 384 B 55731 times --> 0 failures
37: 387 B 55543 times --> 0 failures
38: 509 B 55352 times --> 0 failures
39: 512 B 50377 times --> 0 failures
40: 515 B 50237 times --> 0 failures
41: 765 B 50128 times --> 0 failures
42: 768 B 41593 times --> 0 failures
43: 771 B 41667 times --> 0 failures
44: 1.021 KB 41022 times --> 0 failures
45: 1.024 KB 35848 times --> 0 failures
46: 1.027 KB 35854 times --> 0 failures
47: 1.533 KB 35714 times --> 0 failures
48: 1.536 KB 27686 times --> 0 failures
49: 1.539 KB 27646 times --> 0 failures
50: 2.045 KB 27588 times --> 0 failures
51: 2.048 KB 22461 times --> 0 failures
52: 2.051 KB 22423 times --> 0 failures
53: 3.069 KB 22397 times --> 0 failures
54: 3.072 KB 17121 times --> 0 failures
55: 3.075 KB 17115 times --> 0 failures
56: 4.093 KB 17103 times --> 0 failures
57: 4.096 KB 13806 times --> 0 failures
58: 4.099 KB 13859 times --> 0 failures
59: 6.141 KB 13825 times --> 0 failures
60: 6.144 KB 10018 times --> 0 failures
61: 6.147 KB 10028 times --> 0 failures
62: 8.189 KB 10025 times --> 0 failures
63: 8.192 KB 7746 times --> 0 failures
64: 8.195 KB 7745 times --> 0 failures
65: 12.285 KB 7754 times --> 0 failures
66: 12.288 KB 5506 times --> 0 failures
67: 12.291 KB 5494 times --> 0 failures
68: 16.381 KB 5365 times --> 0 failures
69: 16.384 KB 4221 times --> 0 failures
70: 16.387 KB 4223 times --> 0 failures
71: 24.573 KB 4195 times --> 0 failures
72: 24.576 KB 3003 times --> 0 failures
73: 24.579 KB 3012 times --> 0 failures
74: 32.765 KB 3024 times --> 0 failures
75: 32.768 KB 2321 times --> 0 failures
76: 32.771 KB 2324 times --> 0 failures
77: 49.149 KB 2322 times --> 0 failures
78: 49.152 KB 1557 times --> 0 failures
79: 49.155 KB 1558 times --> 0 failures
80: 65.533 KB 1554 times --> 0 failures
81: 65.536 KB 1198 times --> 0 failures
82: 65.539 KB 1200 times --> 0 failures
83: 98.301 KB 1199 times --> 0 failures
84: 98.304 KB 808 times --> 0 failures
85: 98.307 KB 808 times --> 0 failures
86: 131.069 KB 808 times --> 0 failures
87: 131.072 KB 609 times --> 0 failures
88: 131.075 KB 609 times --> 0 failures
89: 196.605 KB 609 times --> 0 failures
90: 196.608 KB 410 times --> 0 failures
91: 196.611 KB 410 times --> 0 failures
92: 262.141 KB 410 times --> 0 failures
93: 262.144 KB 309 times --> 0 failures
94: 262.147 KB 284 times --> 0 failures
95: 393.213 KB 283 times --> 393212 failures
96: 393.216 KB 190 times --> 1180022 failures
97: 393.219 KB 206 times --> 393218 failures
98: 524.285 KB 189 times --> 524284 failures
99: 524.288 KB 143 times --> 1573144 failures
100: 524.291 KB 155 times --> 524290 failures
101: 786.429 KB 143 times --> 786428 failures
102: 786.432 KB 95 times --> 2359480 failures
103: 786.435 KB 103 times --> 786434 failures
104: 1.049 MB 95 times --> 1048572 failures
105: 1.049 MB 72 times --> 3145866 failures
106: 1.049 MB 77 times --> 1048578 failures
107: 1.573 MB 71 times --> 1572860 failures
108: 1.573 MB 48 times --> 4718682 failures
109: 1.573 MB 51 times --> 1572866 failures
110: 2.097 MB 48 times --> 2097148 failures
111: 2.097 MB 36 times --> 6291522 failures
112: 2.097 MB 38 times --> 2097154 failures
113: 3.146 MB 36 times --> 0 failures
114: 3.146 MB 24 times --> 9437226 failures
115: 3.146 MB 25 times --> 0 failures
116: 4.194 MB 24 times --> 4194300 failures
117: 4.194 MB 18 times --> 12582942 failures
118: 4.194 MB 18 times --> 4194306 failures
119: 6.291 MB 18 times --> 6291452 failures
120: 6.291 MB 12 times --> 18874386 failures
121: 6.291 MB 12 times --> 6291458 failures
122: 8.389 MB 12 times --> 8388604 failures
123: 8.389 MB 9 times --> 25165836 failures
124: 8.389 MB 9 times --> 8388610 failures
Completed with max bandwidth 2.596 Gbps 2.013 usecs latency
**************************************************************************************
minimal uni-directional point-to-point with just 3 messages being passed
round trip, then the same with tcp only showing no failures when UCX is not
used.
**************************************************************************************
Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile
hf.elf NPmpi-4.0.1-ucx --printhostnames --integrity --start 393216 --end
393216 --repeats 1
Proc 0 is on host elf77
Proc 1 is on host elf78
Doing a message integrity check instead of measuring performance
Using a constant number of 1 transmissions
NOTE: Be leary of timings that are close to the clock accuracy.
Clock resolution ~ 1.000 nsecs Clock accuracy ~ 39.000 nsecs
Start testing with 1 trials for each message size
1: 393.213 KB 1 times --> 0 failures
2: 393.216 KB 1 times --> 786430 failures
3: 393.219 KB 1 times --> 0 failures
Completed with max bandwidth 257.855 Mbps 6.496 msecs latency
Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --mca btl
tcp,self --hostfile hf.elf NPmpi-4.0.1-ucx --printhostnames --integrity
--start 393216 --end 393216 --repeats 1
Proc 0 is on host elf77
Doing a message integrity check instead of measuring performance
Using a constant number of 1 transmissions
NOTE: Be leary of timings that are close to the clock accuracy.
Proc 1 is on host elf78
Clock resolution ~ 1.000 nsecs Clock accuracy ~ 33.000 nsecs
Start testing with 1 trials for each message size
1: 393.213 KB 1 times --> 0 failures
2: 393.216 KB 1 times --> 0 failures
3: 393.219 KB 1 times --> 0 failures
Completed with max bandwidth 232.044 Mbps 7.004 msecs latency
**************************************************************************************
uni-directional point-to-point test with integrity check has no failures
when
restricted to only factors of 8 bytes. However, the full test with more
messages
of each size still shows some failures.
**************************************************************************************
Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile
hf.elf NPmpi-4.0.1-ucx --printhostnames --integrity --repeats 1 --pert 0
Proc 0 is on host elf77
Doing a message integrity check instead of measuring performance
Using a constant number of 1 transmissions
NOTE: Be leary of timings that are close to the clock accuracy.
Clock resolution ~ 1.000 nsecs Clock accuracy ~ 34.000 nsecs
Start testing with 1 trials for each message size
1: 1 B 1 times --> 0 failures
Proc 1 is on host elf78
2: 2 B 1 times --> 0 failures
3: 3 B 1 times --> 0 failures
4: 4 B 1 times --> 0 failures
5: 6 B 1 times --> 0 failures
6: 8 B 1 times --> 0 failures
7: 12 B 1 times --> 0 failures
8: 16 B 1 times --> 0 failures
9: 24 B 1 times --> 0 failures
10: 32 B 1 times --> 0 failures
11: 48 B 1 times --> 0 failures
12: 64 B 1 times --> 0 failures
13: 96 B 1 times --> 0 failures
14: 128 B 1 times --> 0 failures
15: 192 B 1 times --> 0 failures
16: 256 B 1 times --> 0 failures
17: 384 B 1 times --> 0 failures
18: 512 B 1 times --> 0 failures
19: 768 B 1 times --> 0 failures
20: 1.024 KB 1 times --> 0 failures
21: 1.536 KB 1 times --> 0 failures
22: 2.048 KB 1 times --> 0 failures
23: 3.072 KB 1 times --> 0 failures
24: 4.096 KB 1 times --> 0 failures
25: 6.144 KB 1 times --> 0 failures
26: 8.192 KB 1 times --> 0 failures
27: 12.288 KB 1 times --> 0 failures
28: 16.384 KB 1 times --> 0 failures
29: 24.576 KB 1 times --> 0 failures
30: 32.768 KB 1 times --> 0 failures
31: 49.152 KB 1 times --> 0 failures
32: 65.536 KB 1 times --> 0 failures
33: 98.304 KB 1 times --> 0 failures
34: 131.072 KB 1 times --> 0 failures
35: 196.608 KB 1 times --> 0 failures
36: 262.144 KB 1 times --> 0 failures
37: 393.216 KB 1 times --> 0 failures
38: 524.288 KB 1 times --> 0 failures
39: 786.432 KB 1 times --> 0 failures
40: 1.049 MB 1 times --> 0 failures
41: 1.573 MB 1 times --> 0 failures
42: 2.097 MB 1 times --> 0 failures
43: 3.146 MB 1 times --> 0 failures
44: 4.194 MB 1 times --> 0 failures
45: 6.291 MB 1 times --> 0 failures
46: 8.389 MB 1 times --> 0 failures
Completed with max bandwidth 1.108 Gbps 4.775 usecs latency
Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile
hf.elf NPmpi-4.0.1-ucx --printhostnames --pert 0
Proc 0 is on host elf77
Proc 1 is on host elf78
Clock resolution ~ 1.000 nsecs Clock accuracy ~ 33.000 nsecs
Start testing with 7 trials for each message size
1: 1 B 24999 times --> 3.792 Mbps in 2.110 usecs
2: 2 B 118504 times --> 8.337 Mbps in 1.919 usecs
3: 3 B 130269 times --> 12.513 Mbps in 1.918 usecs
4: 4 B 130341 times --> 16.568 Mbps in 1.931 usecs
5: 6 B 129437 times --> 24.877 Mbps in 1.929 usecs
6: 8 B 129569 times --> 33.003 Mbps in 1.939 usecs
7: 12 B 128919 times --> 49.425 Mbps in 1.942 usecs
8: 16 B 128711 times --> 65.956 Mbps in 1.941 usecs
9: 24 B 128820 times --> 98.671 Mbps in 1.946 usecs
10: 32 B 128477 times --> 130.950 Mbps in 1.955 usecs
11: 48 B 127880 times --> 191.863 Mbps in 2.001 usecs
12: 64 B 124910 times --> 242.916 Mbps in 2.108 usecs
13: 96 B 118611 times --> 249.356 Mbps in 3.080 usecs
14: 128 B 81170 times --> 326.676 Mbps in 3.135 usecs
15: 192 B 79754 times --> 479.527 Mbps in 3.203 usecs
16: 256 B 78047 times --> 627.600 Mbps in 3.263 usecs
17: 384 B 76611 times --> 904.073 Mbps in 3.398 usecs
18: 512 B 73573 times --> 1.170 Gbps in 3.502 usecs
19: 768 B 71385 times --> 1.615 Gbps in 3.805 usecs
20: 1.024 KB 65696 times --> 2.033 Gbps in 4.029 usecs
21: 1.536 KB 62047 times --> 2.695 Gbps in 4.560 usecs
22: 2.048 KB 54822 times --> 3.210 Gbps in 5.105 usecs
23: 3.072 KB 48974 times --> 4.485 Gbps in 5.480 usecs
24: 4.096 KB 45623 times --> 5.572 Gbps in 5.881 usecs
25: 6.144 KB 42511 times --> 7.320 Gbps in 6.715 usecs
26: 8.192 KB 37230 times --> 8.274 Gbps in 7.921 usecs
27: 12.288 KB 31561 times --> 11.203 Gbps in 8.774 usecs
28: 16.384 KB 28491 times --> 12.503 Gbps in 10.483 usecs
29: 24.576 KB 23847 times --> 15.161 Gbps in 12.968 usecs
30: 32.768 KB 19278 times --> 17.159 Gbps in 15.278 usecs
31: 49.152 KB 16363 times --> 19.749 Gbps in 19.911 usecs
32: 65.536 KB 12555 times --> 21.306 Gbps in 24.608 usecs
33: 98.304 KB 10159 times --> 23.167 Gbps in 33.946 usecs
34: 131.072 KB 7364 times --> 24.182 Gbps in 43.361 usecs
35: 196.608 KB 5765 times --> 25.415 Gbps in 61.887 usecs
36: 262.144 KB 4039 times --> 27.430 Gbps in 76.454 usecs
37: 393.216 KB 3269 times --> 28.316 Gbps in 111.095 usecs
38: 524.288 KB 2250 times --> 28.794 Gbps in 145.667 usecs 1
failures
39: 786.432 KB 1716 times --> 29.399 Gbps in 214.004 usecs 1
failures
40: 1.049 MB 1168 times --> 29.739 Gbps in 282.073 usecs 1
failures
41: 1.573 MB 886 times --> 30.059 Gbps in 418.613 usecs
42: 2.097 MB 597 times --> 30.099 Gbps in 557.407 usecs
43: 3.146 MB 448 times --> 30.408 Gbps in 827.607 usecs 1
failures
44: 4.194 MB 302 times --> 30.256 Gbps in 1.109 msecs 1
failures
45: 6.291 MB 225 times --> 29.272 Gbps in 1.719 msecs 1
failures
46: 8.389 MB 145 times --> 29.010 Gbps in 2.313 msecs 1
failures
Completed with max bandwidth 30.112 Gbps 1.953 usecs latency
--
Work: [email protected] (785) 532-7791
2219 Engineering Hall, Manhattan KS 66506
Home: [email protected]
cell: (785) 770-5929
_______________________________________________
devel mailing list
[email protected]
https://lists.open-mpi.org/mailman/listinfo/devel