The latest patch also causes a segfault... By the way, I found a typo as below. &ca_pml_ob1.use_all_rdma in the last line should be &mca_pml_ob1.use_all_rdma:
+ mca_pml_ob1.use_all_rdma = false; + (void) mca_base_component_var_register (&mca_pml_ob1_component.pmlm_version, "use_all_rdma", + "Use all available RDMA btls for the RDMA and RDMA pipeline protocols " + "(default: false)", MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, + OPAL_INFO_LVL_5, MCA_BASE_VAR_SCOPE_GROUP, &ca_pml_ob1.use_all_rdma); + Here is the OSU_BW and gdb output: # OSU MPI Bandwidth Test v3.1.1 # Size Bandwidth (MB/s) 1 2.19 2 4.43 4 8.98 8 18.07 16 35.58 32 70.62 64 108.88 128 172.97 256 305.73 512 536.48 1024 957.57 2048 1587.21 4096 1638.81 8192 2165.14 16384 2482.43 32768 2866.33 65536 3655.33 131072 4208.40 262144 4596.12 524288 4769.27 1048576 4900.00 [manage:16596] *** Process received signal *** [manage:16596] Signal: Segmentation fault (11) [manage:16596] Signal code: Address not mapped (1) [manage:16596] Failing at address: 0x8 ... Core was generated by `osu_bw'. Program terminated with signal 11, Segmentation fault. #0 0x00000031d9008806 in ?? () from /lib64/libgcc_s.so.1 (gdb) where #0 0x00000031d9008806 in ?? () from /lib64/libgcc_s.so.1 #1 0x00000031d9008934 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1 #2 0x00000037ab8e5ee8 in backtrace () from /lib64/libc.so.6 #3 0x00002b5060c14345 in opal_backtrace_print () at ./backtrace_execinfo.c:47 #4 0x00002b5060c11180 in show_stackframe () at ./stacktrace.c:331 #5 <signal handler called> #6 mca_pml_ob1_recv_request_schedule_once () at ./pml_ob1_recvreq.c:983 #7 0x00002aaab461c71a in mca_pml_ob1_recv_request_progress_rndv () from /home/mishima/opt/mpi/openmpi-2.0.0-pgi16.5/lib/openmpi/mca_pml_ob1.so #8 0x00002aaab46198e5 in mca_pml_ob1_recv_frag_match () at ./pml_ob1_recvfrag.c:715 #9 0x00002aaab4618e46 in mca_pml_ob1_recv_frag_callback_rndv () at ./pml_ob1_recvfrag.c:267 #10 0x00002aaab37958d3 in mca_btl_vader_poll_handle_frag () at ./btl_vader_component.c:589 #11 0x00002aaab3795b9a in mca_btl_vader_component_progress () at ./btl_vader_component.c:231 #12 0x00002b5060bd16fc in opal_progress () at runtime/opal_progress.c:224 #13 0x00002b50600e9aa5 in ompi_request_default_wait_all () at request/req_wait.c:77 #14 0x00002b50601310dd in PMPI_Waitall () at ./pwaitall.c:76 #15 0x0000000000401108 in main () at ./osu_bw.c:144 Tetsuya Mishima 2016/08/09 11:53:04、"devel"さんは「Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0」で書きました > No problem. Thanks for reporting this. Not all platforms see a slowdown so we missed it before the release. Let me know if that latest patch works for you. > > -Nathan > > > On Aug 8, 2016, at 8:50 PM, tmish...@jcity.maeda.co.jp wrote: > > > > I understood. Thanks. > > > > Tetsuya Mishima > > > > 2016/08/09 11:33:15、"devel"さんは「Re: [OMPI devel] sm BTL performace of > > the openmpi-2.0.0」で書きました > >> I will add a control to have the new behavior or using all available RDMA > > btls or just the eager ones for the RDMA protocol. The flags will remain as > > they are. And, yes, for 2.0.0 you can set the btl > >> flags if you do not intend to use MPI RMA. > >> > >> New patch: > >> > >> > > https://github.com/hjelmn/ompi/commit/43267012e58d78e3fc713b98c6fb9f782de977c7.patch > > > >> > >> -Nathan > >> > >>> On Aug 8, 2016, at 8:16 PM, tmish...@jcity.maeda.co.jp wrote: > >>> > >>> Then, my understanding is that you will restore the default value of > >>> btl_openib_flags to previous one( = 310) and add a new MCA parameter to > >>> control HCA inclusion for such a situation. The work arround so far for > >>> openmpi-2.0.0 is setting those flags manually. Right? > >>> > >>> Tetsuya Mishima > >>> > >>> 2016/08/09 9:56:29、"devel"さんは「Re: [OMPI devel] sm BTL performace > > of > >>> the openmpi-2.0.0」で書きました > >>>> Hmm, not good. So we have a situation where it is sometimes better to > >>> include the HCA when it is the only rdma btl. Will have a new version > > up in > >>> a bit that adds an MCA parameter to control the > >>>> behavior. The default will be the same as 1.10.x. > >>>> > >>>> -Nathan > >>>> > >>>>> On Aug 8, 2016, at 4:51 PM, tmish...@jcity.maeda.co.jp wrote: > >>>>> > >>>>> Hi, unfortunately it doesn't work well. The previous one was much > >>>>> better ... > >>>>> > >>>>> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 > > -report-bindings > >>>>> osu_bw > >>>>> [manage.cluster:25107] MCW rank 0 bound to socket 0[core 0[hwt 0]], > >>> socket > >>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so > >>>>> cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt > >>> 0]]: > >>>>> [B/B/B/B/B/B][./././././.] > >>>>> [manage.cluster:25107] MCW rank 1 bound to socket 0[core 0[hwt 0]], > >>> socket > >>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so > >>>>> cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt > >>> 0]]: > >>>>> [B/B/B/B/B/B][./././././.] > >>>>> # OSU MPI Bandwidth Test v3.1.1 > >>>>> # Size Bandwidth (MB/s) > >>>>> 1 2.22 > >>>>> 2 4.53 > >>>>> 4 9.11 > >>>>> 8 18.02 > >>>>> 16 35.44 > >>>>> 32 70.84 > >>>>> 64 113.71 > >>>>> 128 176.74 > >>>>> 256 311.07 > >>>>> 512 529.03 > >>>>> 1024 907.83 > >>>>> 2048 1597.66 > >>>>> 4096 330.14 > >>>>> 8192 516.49 > >>>>> 16384 780.31 > >>>>> 32768 1038.43 > >>>>> 65536 1186.36 > >>>>> 131072 1268.87 > >>>>> 262144 1222.24 > >>>>> 524288 1232.30 > >>>>> 1048576 1244.62 > >>>>> 2097152 1260.25 > >>>>> 4194304 1263.47 > >>>>> > >>>>> Tetsuya > >>>>> > >>>>> > >>>>> 2016/08/09 2:42:24、"devel"さんは「Re: [OMPI devel] sm BTL performace > >>> of > >>>>> the openmpi-2.0.0」で書きました > >>>>>> Ok, there was a problem with the selection logic when only one rdma > >>>>> capable btl is available. I changed the logic to always use the RDMA > >>> btl > >>>>> over pipelined send/recv. This works better for me on a > >>>>>> Intel Omnipath system. Let me know if this works for you. > >>>>>> > >>>>>> > >>>>> > >>> > > https://github.com/hjelmn/ompi/commit/dddb865b5337213fd73d0e226b02e2f049cfab47.patch > > > >>> > >>>>> > >>>>>> > >>>>>> -Nathan > >>>>>> > >>>>>> On Aug 07, 2016, at 10:00 PM, tmish...@jcity.maeda.co.jp wrote: > >>>>>> > >>>>>> Hi, here is the gdb output for additional information: > >>>>>> > >>>>>> (It might be inexact, because I built openmpi-2.0.0 without debug > >>> option) > >>>>>> > >>>>>> Core was generated by `osu_bw'. > >>>>>> Program terminated with signal 11, Segmentation fault. > >>>>>> #0 0x00000031d9008806 in ?? () from /lib64/libgcc_s.so.1 > >>>>>> (gdb) where > >>>>>> #0 0x00000031d9008806 in ?? () from /lib64/libgcc_s.so.1 > >>>>>> #1 0x00000031d9008934 in _Unwind_Backtrace () > >>> from /lib64/libgcc_s.so.1 > >>>>>> #2 0x00000037ab8e5ee8 in backtrace () from /lib64/libc.so.6 > >>>>>> #3 0x00002ad882bd4345 in opal_backtrace_print () > >>>>>> at ./backtrace_execinfo.c:47 > >>>>>> #4 0x00002ad882bd1180 in show_stackframe () at ./stacktrace.c:331 > >>>>>> #5 <signal handler called> > >>>>>> #6 mca_pml_ob1_recv_request_schedule_once () > >>> at ./pml_ob1_recvreq.c:983 > >>>>>> #7 0x00002aaab412f47a in mca_pml_ob1_recv_request_progress_rndv () > >>>>>> > >>>>>> > >>>>> > >>> > > from /home/mishima/opt/mpi/openmpi-2.0.0-pgi16.5/lib/openmpi/mca_pml_ob1.so > >>>>>> #8 0x00002aaab412c645 in mca_pml_ob1_recv_frag_match () > >>>>>> at ./pml_ob1_recvfrag.c:715 > >>>>>> #9 0x00002aaab412bba6 in mca_pml_ob1_recv_frag_callback_rndv () > >>>>>> at ./pml_ob1_recvfrag.c:267 > >>>>>> #10 0x00002aaaaf2748d3 in mca_btl_vader_poll_handle_frag () > >>>>>> at ./btl_vader_component.c:589 > >>>>>> #11 0x00002aaaaf274b9a in mca_btl_vader_component_progress () > >>>>>> at ./btl_vader_component.c:231 > >>>>>> #12 0x00002ad882b916fc in opal_progress () at > >>> runtime/opal_progress.c:224 > >>>>>> #13 0x00002ad8820a9aa5 in ompi_request_default_wait_all () at > >>>>>> request/req_wait.c:77 > >>>>>> #14 0x00002ad8820f10dd in PMPI_Waitall () at ./pwaitall.c:76 > >>>>>> #15 0x0000000000401108 in main () at ./osu_bw.c:144 > >>>>>> > >>>>>> Tetsuya > >>>>>> > >>>>>> > >>>>>> 2016/08/08 12:34:57、"devel"さんは「Re: [OMPI devel] sm BTL > > performace > >>> of > >>>>>> the openmpi-2.0.0」で書きました > >>>>>> Hi, it caused segfault as below: > >>>>>> [manage.cluster:25436] MCW rank 0 bound to socket 0[core 0[hwt > >>> 0]],socket > >>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], > >>>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt > >>>>> 0]]:[B/B/B/B/B/B][./././././.][manage.cluster:25436] MCW rank 1 bound > >>> to > >>>>> socket 0[core > >>>>>> 0[hwt 0]],socket > >>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], > >>>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt > >>>>> 0]]:[B/B/B/B/B/B][./././././.]# OSU MPI Bandwidth Test v3.1.1# Size > >>>>> Bandwidth (MB/s)1 > >>>>>> 2.232 4.514 8.998 17.8316 35.1832 69.6664 109.84128 179.65256 > >>> 303.52512 > >>>>> 532.811024 911.742048 1605.294096 1598.738192 2135.9416384 > > 2468.9832768 > >>>>> 2818.3765536 3658.83131072 4200.50262144 4545.01524288 > >>>>>> 4757.841048576 4831.75[manage:25442] *** Process received signal > >>>>> ***[manage:25442] Signal: Segmentation fault (11)[manage:25442] > > Signal > >>>>> code: Address not mapped (1)[manage:25442] Failing at address: > >>>>>> 0x8 > >>>>>> > >>>>> > >>> > > -------------------------------------------------------------------------- > >>>>>> mpirun noticed that process rank 1 with PID 0 on node manage exited > >>>>> onsignal 11 (Segmentation fault). > >>>>>> > >>>>> > >>> > > -------------------------------------------------------------------------- > >>>>>> > >>>>>> Tetsuya Mishima > >>>>>> > >>>>>> 2016/08/08 10:12:05、"devel"さんは「Re: [OMPI devel] sm BTL > > performace > >>>>> ofthe openmpi-2.0.0」で書きました> This patch also modifies the put > >>> path. > >>>>> Let me know if this works:>> diff --git > >>>>>> a/ompi/mca/pml/ob1/pml_ob1_rdma.cb/ompi/mca/pml/ob1/pml_ob1_rdma.c> > >>> index > >>>>> 888e126..a3ec6f8 100644> --- a/ompi/mca/pml/ob1/pml_ob1_rdma.c> +++ > >>>>> b/ompi/mca/pml/ob1/pml_ob1_rdma.c> @@ -42,6 +42,7 @@ > >>>>>> size_t mca_pml_ob1_rdma_btls(> mca_pml_ob1_com_btl_t* rdma_btls)> {> > >>> int > >>>>> num_btls = mca_bml_base_btl_array_get_size(&bml_endpoint-> btl_rdma); > >>>>>>> + int num_eager_btls = mca_bml_base_btl_array_get_size > >>> (&bml_endpoint-> > >>>>> btl_eager);> double weight_total = 0;> int num_btls_used = 0;>> @@ > >>> -57,6 > >>>>> +58,21 @@ size_t mca_pml_ob1_rdma_btls(> > >>>>>> (bml_endpoint->btl_rdma_index + n) % num_btls);> > >>>>> mca_btl_base_registration_handle_t *reg_handle = NULL;> > >>>>> mca_btl_base_module_t *btl = bml_btl->btl;> + bool ignore = true;> +> > >>> + /* > >>>>> do not use rdma > >>>>>> btls that are not in the eager list. thisis > >>>>>> necessary to avoid using> + * btls that exist on the endpoint only > > to > >>>>> support RMA. */> + for (int i = 0 ; i < num_eager_btls ; ++i) {> + > >>>>> mca_bml_base_btl_t *eager_btl > >>>>>> =mca_bml_base_btl_array_get_index (&bml_endpoint->btl_eager, i);> + > > if > >>>>> (eager_btl->btl_endpoint == bml_btl->btl_endpoint) {> + ignore = > >>> false;> + > >>>>> break;> + }> + }> +> + if (ignore) {> + continue;> + > >>>>>> }>> if (btl->btl_register_mem) {> /* do not use the RDMA protocol > > with > >>>>> this btl if 1) leave pinned isdisabled,> @@ -99,18 +115,34 @@ size_t > >>>>> mca_pml_ob1_rdma_pipeline_btls( mca_bml_base_endpoint_t* > >>>>>> bml_endpoint,> size_t size,> mca_pml_ob1_com_btl_t* rdma_btls )> {> > > - > >>> int > >>>>> i, num_btls = mca_bml_base_btl_array_get_size(&bml_endpoint-> > >>> btl_rdma);> + > >>>>> int num_btls = mca_bml_base_btl_array_get_size > >>>>>> (&bml_endpoint->btl_rdma);> + int num_eager_btls = > >>>>> mca_bml_base_btl_array_get_size(&bml_endpoint->btl_eager);> double > >>>>> weight_total = 0;> + int rdma_count = 0;>> - for(i = 0; i < num_btls > > && > >>> i < > >>>>> > >>>>>> mca_pml_ob1.max_rdma_per_request; i+ > >>>>>> +) {> - rdma_btls[i].bml_btl => - mca_bml_base_btl_array_get_next > >>>>> (&bml_endpoint->btl_rdma);> - rdma_btls[i].btl_reg = NULL;> + for (int > > i > >>> = > >>>>> 0; i < num_btls && i <mca_pml_ob1.max_rdma_per_request; > >>>>>> i++) {> + mca_bml_base_btl_t *bml_btl = > >>> mca_bml_base_btl_array_get_next > >>>>> (&bml_endpoint->btl_rdma);> + bool ignore = true;> +> + for (int i = > >>> 0 ; i > >>>>> < num_eager_btls ; ++i) {> + mca_bml_base_btl_t > >>>>>> *eager_btl =mca_bml_base_btl_array_get_index (&bml_endpoint-> > >>> btl_eager, > >>>>> i);> + if (eager_btl->btl_endpoint == bml_btl->btl_endpoint) {> + > >>> ignore = > >>>>> false;> + break;> + }> + }>> - weight_total += > >>>>>> rdma_btls[i].bml_btl->btl_weight;> + if (ignore) {> + continue;> > > + }> > >>> +> > >>>>> + rdma_btls[rdma_count].bml_btl = bml_btl;> + rdma_btls[rdma_count+ > >>>>> +].btl_reg = NULL;> +> + weight_total += > >>>>>> bml_btl->btl_weight;> }>> - mca_pml_ob1_calc_weighted_length > >>> (rdma_btls, > >>>>> i, size,weight_total); > >>>>>>> + mca_pml_ob1_calc_weighted_length (rdma_btls, rdma_count, > >>>>> size,weight_total);>> - return i;> + return rdma_count;> }>>>>> > On > >>> Aug 7, > >>>>> 2016, at 6:51 PM, Nathan Hjelm <hje...@me.com> wrote:> >> > > >>>>>> Looks like the put path probably needs a similar patch. Will > >>> sendanother > >>>>> patch soon.> >> >> On Aug 7, 2016, at 6:01 PM, > >>> tmish...@jcity.maeda.co.jp > >>>>> wrote:> >>> >> Hi,> >>> >> I applied the patch to > >>>>>> the file "pml_ob1_rdma.c" and ran osu_bwagain. > >>>>>>>>> Then, I still see the bad performance for larger size > > (>=2097152 ).> > >>>>>>>>>> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np > >>>>> 2-report-bindings > >>>>>>>>> osu_bw> >> [manage.cluster:27444] MCW rank 0 bound to socket 0 > > [core > >>>>> 0[hwt 0]],socket> >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so> > >>> > >>> cket > >>>>> 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket > >>>>>> 0[core 5[hwt0]]:> >> [B/B/B/B/B/B][./././././.]> >> > >>>>> [manage.cluster:27444] MCW rank 1 bound to socket 0[core 0[hwt > >>> 0]],socket> > >>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so> >> cket 0[core 3 [hwt > >>>>>> 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt0]]:> >> > >>>>> [B/B/B/B/B/B][./././././.]> >> # OSU MPI Bandwidth Test v3.1.1> >> # > >>> Size > >>>>> Bandwidth (MB/s)> >> 1 2.23> >> 2 4.52> >> 4 8.82> >> 8 17.83> >> > >>>>>> 16 35.31> >> 32 69.49> >> 64 109.46> >> 128 178.51> >> 256 307.68> > >>> > >>> 512 > >>>>> 532.64> >> 1024 909.34> >> 2048 1583.95> >> 4096 1554.74> >> 8192 > >>> 2120.31> > >>>>>>> 16384 2489.79> >> 32768 2853.66> >> 65536 > >>>>>> 3692.82> >> 131072 4236.67> >> 262144 4575.63> >> 524288 4778.47> >> > >>>>> 1048576 4839.34> >> 2097152 2231.46> >> 4194304 1505.48> >>> >> > >>> Regards,> > >>>>>>>>>> Tetsuya Mishima> >>> >> 2016/08/06 0:00:08、" > >>>>>> devel"さんは「Re: [OMPI devel] sm BTLperformace > >>>>>> of> >> the openmpi-2.0.0」で書きました> >>> Making ob1 ignore RDMA > >>> btls > >>>>> that are not in use for eager messagesmight> >> be sufficient. Please > >>> try > >>>>> the following patch and let me know if itworks> >> for you.> > >>>>>>>>>>>>> diff --git a/ompi/mca/pml/ob1/pml_ob1_rdma.c> >> > >>>>> b/ompi/mca/pml/ob1/pml_ob1_rdma.c> >>> index 888e126..0c99525 100644> > >>>>>> > >>>>> --- a/ompi/mca/pml/ob1/pml_ob1_rdma.c> >>> +++ > >>>>>> b/ompi/mca/pml/ob1/pml_ob1_rdma.c> >>> @@ -42,6 +42,7 @@ size_t > >>>>> mca_pml_ob1_rdma_btls(> >>> mca_pml_ob1_com_btl_t* rdma_btls)> >>> {> > >>>>>> > >>>>> int num_btls = > >>>>>> mca_bml_base_btl_array_get_size(&bml_endpoint->btl_rdma);> >>> + int > >>>>> num_eager_btls = mca_bml_base_btl_array_get_size> >> (&bml_endpoint-> > >>>>> btl_eager);> >>> double weight_total = 0;> >>> int > >>>>>> num_btls_used = 0;> >>>> >>> @@ -57,6 +58,21 @@ size_t > >>>>> mca_pml_ob1_rdma_btls(> >>> (bml_endpoint->btl_rdma_index + n) % > >>>>> num_btls);> >>> mca_btl_base_registration_handle_t *reg_handle = > > NULL;> > >>>>>> > >>>>> > >>>>>> mca_btl_base_module_t *btl = bml_btl->btl;> >>> + bool ignore = > > true;> > >>>>>>>> +> >>> + /* do not use rdma btls that are not in the eager > > list.this > >>>>>> is> >> necessary to avoid using> >>> + * btls that exist on the > >>> endpoint > >>>>> only to support RMA. */> >>> + for (int i = 0 ; i < num_eager_btls ; > > + > >>> +i) > >>>>> {> >>> + mca_bml_base_btl_t *eager_btl => >> > >>>>>> mca_bml_base_btl_array_get_index (&bml_endpoint->btl_eager, i);> >>> > > + > >>> if > >>>>> (eager_btl->btl_endpoint == bml_btl->btl_endpoint){ > >>>>>>>>>> + ignore = false;> >>> + break;> >>> + }> >>> + }> >>> +> >>> + > > if > >>>>> (ignore) {> >>> + continue;> >>> + }> >>>> >>> if (btl-> > >>> btl_register_mem) > >>>>> {> >>> /* do not use the RDMA protocol with this btl > >>>>>> if 1) leave pinned is> >> disabled,> >>>> >>>> >>>> >>> -Nathan> > >>>>> > >>>>>>>>>>>>> On Aug 5, 2016, at 8:44 AM, Nathan Hjelm <hje...@me.com> > >>> wrote:> > >>>>>>>>>>>>>> Nope. We are not going to change the flags > >>>>>> as this will disablethe > >>>>>> blt> >> for one-sided. Not sure what is going on here as the openib > >>>>> btlshould > >>>>>> be> >> 1) not used for pt2pt, and 2) polled infrequently.> >>> The > > btl > >>>>> debug log suggests both of these are the case. Not surewhat > >>>>>> is> >> going on yet.> >>>>> >>>> -Nathan> >>>>> >>>>> On Aug 5, > > 2016, > >>> at > >>>>> 8:16 AM, r...@open-mpi.org wrote:> >>>>>> >>>>> Perhaps those flags > > need > >>> to > >>>>> be the default?> >>>>>> >>>>>> >>>>>> On Aug 5, > >>>>>> 2016, at 7:14 AM, tmish...@jcity.maeda.co.jp wrote:> >>>>>>> >>>>>> > > Hi > >>>>> Christoph,> >>>>>>> >>>>>> I applied the commits - pull/#1250 as > > Nathan > >>>>> told me and added"-mca> >>>>>> btl_openib_flags 311" to > >>>>>> the mpirun command line option, then it> >> worked for> >>>>>> me. I > >>>>> don't know the reason, but it looks ATOMIC_FOP in the> >>>>>> > >>>>> btl_openib_flags degrades the sm/vader perfomance.> >>>>>>> >>>>>>>>>>>> Regards,> >>>>>> Tetsuya Mishima> >>>>>>> >>>>>>> >>>>>> 2016/08/05 > >>>>> 22:10:37、"devel"さんは「Re: [OMPI devel] sm BTL> >> performace of> > >>>>>>>>> > >>>>> the openmpi-2.0.0」で書きました> >>>>>>> Hello,> >>>>>>>> >>>>>>> We > >>>>>> see the same problem here on various machines with Open MPI> >> > >>> 2.0.0.> > >>>>>>>>>>>> To us it seems that enabling the openib btl triggers > >>>>> badperformance> >> for> >>>>>> the sm AND vader btls!> >>>>>>> > >>>>>> --mca btl_base_verbose 10 reports in both cases the correct useof> > >>> > >>> sm > >>>>> and> >>>>>> vader between MPI ranks - only performance differs?!> > >>>>>>>>>>> > >>>>>>>>>>>> One irritating thing I see in the log > >>>>>> output is the following:> >>>>>>> openib BTL: rdmacm CPC unavailable > >>> for > >>>>> use on mlx4_0:1; skipped> >>>>>>> [rank=1] openib: using port > > mlx4_0:1> > >>>>>>>>>>>> select: init of component openib returned > >>>>>> success> >>>>>>>> >>>>>>> Did not look into the "Skipped" code part > >>>>> yet, ...> >>>>>>>> >>>>>>> Results see below.> >>>>>>>> >>>>>>> Best > >>>>> regards> >>>>>>> Christoph Niethammer> >>>>>>>> >>>>>>> --> > >>>>>>>>>>>>>>>>>>>>> Christoph Niethammer> >>>>>>> High Performance > >>> Computing>>> Center Stuttgart (HLRS)> >>>>>>> Nobelstrasse 19> >>>>>>> > > 70569 > >>> Stuttgart> > >>>>>>>>>>>>>>>>>>>> Tel: ++49(0)711-685-87203> > >>>>>>>>>>>>> email: nietham...@hlrs.de> >>>>>>> > >>>>> http://www.hlrs.de/people/niethammer> >>>>>>>> >>>>>>>> >>>>>>>> > >>>>>>>>>> > >>>>> mpirun -np 2 --mca btl self,vader osu_bw> >>>>>>> # OSU MPI Bandwidth > >>> Test> > >>>>>>>>>>>> > >>>>>> # Size Bandwidth (MB/s)> >>>>>>> 1 4.83> >>>>>>> 2 10.30> >>>>>>> 4 > >>>>> 24.68> >>>>>>> 8 49.27> >>>>>>> 16 95.80> >>>>>>> 32 187.52> >>>>>>> > > 64 > >>>>> 270.82> >>>>>>> 128 405.00> >>>>>>> 256 659.26> >>>>>>> 512 > >>>>>> 1165.14> >>>>>>> 1024 2372.83> >>>>>>> 2048 3592.85> >>>>>>> 4096 > >>>>> 4283.51> >>>>>>> 8192 5523.55> >>>>>>> 16384 7388.92> >>>>>>> 32768 > >>>>> 7024.37> >>>>>>> 65536 7353.79> >>>>>>> 131072 7465.96> >>>>>>> > >>>>>> 262144 8597.56> >>>>>>> 524288 9292.86> >>>>>>> 1048576 9168.01> > >>>>>>>>>> > >>>>> 2097152 9009.62> >>>>>>> 4194304 9013.02> >>>>>>>> >>>>>>> mpirun -np > > 2 > >>>>> --mca btl self,vader,openib osu_bw> >>>>>>> # OSU MPI > >>>>>> Bandwidth Test> >>>>>>> # Size Bandwidth (MB/s)> >>>>>>> 1 5.32> > >>>>>>>>>> > >>>>> 2 11.14> >>>>>>> 4 20.88> >>>>>>> 8 49.26> >>>>>>> 16 99.11> >>>>>>> > > 32 > >>>>> 197.42> >>>>>>> 64 301.08> >>>>>>> 128 413.64> >>>>>>> > >>>>>> 256 651.15> >>>>>>> 512 1161.12> >>>>>>> 1024 2460.99> >>>>>>> 2048 > >>>>> 3627.36> >>>>>>> 4096 2191.06> >>>>>>> 8192 3118.36> >>>>>>> 16384 > >>> 3428.45> > >>>>>>>>>>>> 32768 3676.96> >>>>>>> 65536 3709.65> >>>>>>> > >>>>>> 131072 3748.64> >>>>>>> 262144 3764.88> >>>>>>> 524288 3764.61> > >>>>>>>>>> > >>>>> 1048576 3772.45> >>>>>>> 2097152 3757.37> >>>>>>> 4194304 3746.45> > >>>>>>>>>>> > >>>>>>>>>>>> mpirun -np 2 --mca btl self,sm osu_bw>>>>>>>>>> # OSU MPI > >>> Bandwidth Test> >>>>>>> # Size Bandwidth (MB/s)> > >>>>>>>>>>>> 1 2.98> >>>>>>> 2 5.97> >>>>>>> 4 11.99> >>>>>>> 8 23.47> > >>>>>>>>>> > >>>>> 16 50.64> >>>>>>> 32 99.91> >>>>>>> 64 197.87> >>>>>>> 128 > >>>>>> 343.32> >>>>>>> 256 667.48> >>>>>>> 512 1200.86> >>>>>>> 1024 > > 2050.05> > >>>>>>>>>>>> 2048 3578.52> >>>>>>> 4096 3966.92> >>>>>>> 8192 5687.96> > >>>>>>>>>> > >>>>> 16384 7395.88> >>>>>>> 32768 7101.41> >>>>>>> 65536 > >>>>>> 7619.49> >>>>>>> 131072 7978.09> >>>>>>> 262144 8648.87> >>>>>>> > >>> 524288 > >>>>> 9129.18> >>>>>>> 1048576 10525.31> >>>>>>> 2097152 10511.63> >>>>>>> > >>>>> 4194304 10489.66> >>>>>>>> >>>>>>> mpirun -np 2 --mca btl > >>>>>> self,sm,openib osu_bw> >>>>>>> # OSU MPI Bandwidth Test> >>>>>>> # > >>> Size > >>>>> Bandwidth (MB/s)> >>>>>>> 1 2.02> >>>>>>> 2 3.00> >>>>>>> 4 9.99> > >>>>>>>>>> 8 > >>>>> 19.96> >>>>>>> 16 40.10> >>>>>>> 32 70.63> >>>>>>> > >>>>>> 64 144.08> >>>>>>> 128 282.21> >>>>>>> 256 543.55> >>>>>>> > > 5121032.61 > >>>>>>>>>>>>>> 1024 1871.09> >>>>>>> 2048 3294.07> >>>>>>> 4096 2336.48> > >>>>>>>>>>>> 8192 3142.22> >>>>>>> 16384 3419.93> >>>>>>> 32768 3647.30> > >>>>>>>>>> > >>>>> 65536 3725.40> >>>>>>> 131072 3749.43> >>>>>>> 262144 > >>>>>> 3765.31> >>>>>>> 524288 3771.06> >>>>>>> 1048576 3772.54> >>>>>>> > >>> 2097152 > >>>>> 3760.93> >>>>>>> 4194304 3745.37> >>>>>>>> >>>>>>> ----- Original > >>> Message > >>>>> -----> >>>>>>> From: tmish...@jcity.maeda.co.jp> > >>>>>>>>>>>>> To: "Open MPI Developers" <de...@open-mpi.org>> >>>>>>> Sent: > >>>>> Wednesday, July 27, 2016 6:04:48 AM> >>>>>>> Subject: Re: [OMPI > > devel] > >>> sm > >>>>> BTL performace of theopenmpi-2.0.0 > >>>>>>>>>>>>>>>>>>>>>> HiNathan,> >>>>>>>> >>>>>>> I applied those commits > >>>>> and ran again without any BTLspecified. > >>>>>>>>>>>>>>>>>>>>>> Then, although it says "mca: bml: Using vader btl > > for > >>>>> send to> >>>>>> [[18993,1],1]> >>>>>>> on node manage",> >>>>>>> the > >>> osu_bw > >>>>> still shows it's very slow as shown below:> > >>>>>>>>>>>>>>>>>>>>> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 > >>>>> -mca> >>>>>> btl_base_verbose> >>>>>>> 10 -bind-to core > >>> -report-bindings > >>>>> osu_bw> >>>>>>> [manage.cluster:17482] MCW rank 0 bound > >>>>>> to socket 0[core 0[hwt0]]:> >>>>>>> [B/././././.][./././././.]> > >>>>>>>>>> > >>>>> [manage.cluster:17482] MCW rank 1 bound to socket 0[core 1[hwt0]]:> > >>>>>>>>>> > >>>>> [./B/./././.][./././././.]> >>>>>>> > >>>>>> [manage.cluster:17487] mca: base: components_register:registering> > >>>>>>>>>>>> framework btl components> >>>>>>> [manage.cluster:17487] mca: > >>> base: > >>>>> components_register: foundloaded> >>>>>>> component > >>>>>> self> >>>>>>> [manage.cluster:17487] mca: base: > >>>>> components_register:component > >>>>>>>>> self> >>>>>>> register function successful> >>>>>>> > >>>>> [manage.cluster:17487] mca: base: components_register: foundloaded> > >>>>>>>>>> > >>>>> component vader> >>>>>>> [manage.cluster:17488] mca: base: > >>>>>> components_register:registering> >>>>>>> framework btl components> > >>>>>>>>>>>> [manage.cluster:17488] mca: base: components_register: > >>> foundloaded> > >>>>>>>>>>>> component self> >>>>>>> [manage.cluster:17487] > >>>>>> mca: base: components_register:component > >>>>>>>>> vader> >>>>>>> register function successful> >>>>>>> > >>>>> [manage.cluster:17488] mca: base: components_register:component > >>>>>>>>> self> >>>>>>> register function successful> >>>>>>> > >>>>> [manage.cluster:17488] mca: base: components_register: foundloaded> > >>>>>>>>>> > >>>>> component vader> >>>>>>> [manage.cluster:17487] mca: base: > >>>>>> components_register: foundloaded> >>>>>>> component tcp> >>>>>>> > >>>>> [manage.cluster:17488] mca: base: components_register:component > >>>>>>>>> vader>>>>>>> register function successful> >>>>>>> > >>>>> [manage.cluster:17488] mca: base: components_register: foundloaded> > >>>>>>>>>> > >>>>> component tcp> >>>>>>> [manage.cluster:17487] mca: base: > >>>>>> components_register:component > >>>>>> tcp> >>>>>>> register function successful> >>>>>>> > >>> [manage.cluster:17487] > >>>>> mca: base: components_register: foundloaded> >>>>>>> component sm> > >>>>>>>>>> > >>>>> [manage.cluster:17488] mca: base: > >>>>>> components_register:component > >>>>>> tcp> >>>>>>> register function successful> >>>>>>> > >>> [manage.cluster:17488] > >>>>> mca: base: components_register: foundloaded> >>>>>>> component sm> > >>>>>>>>>> > >>>>> [manage.cluster:17487] mca: base: > >>>>>> components_register:component > >>>>>> sm> >>>>>>> register function successful> >>>>>>> > >>> [manage.cluster:17488] > >>>>> mca: base: components_register:component > >>>>>> sm> >>>>>>> register function successful> >>>>>>> > >>> [manage.cluster:17488] > >>>>> mca: base: components_register: foundloaded> >>>>>>> component > > openib> > >>>>>>>>>>>> [manage.cluster:17487] mca: base: > >>>>>> components_register: foundloaded> >>>>>>> component openib> >>>>>>> > >>>>> [manage.cluster:17488] mca: base: components_register:component > >>>>>>>>> openib> >>>>>>> register function successful> >>>>>>> > >>>>> [manage.cluster:17488] mca: base: components_open: opening btl> >> > >>>>> components> >>>>>>> [manage.cluster:17488] mca: base: > > components_open: > >>>>>> found loaded> >> component> >>>>>>> self> >>>>>>> > >>> [manage.cluster:17488] > >>>>> mca: base: components_open: componentself > >>>>>>>>> open> >>>>>>> function successful> >>>>>>> [manage.cluster:17488] > >>>>> mca: base: components_open: found loaded> >> component> >>>>>>> > > vader> > >>>>>>>>>>>> [manage.cluster:17488] mca: base: > >>>>>> components_open: componentvader> >> open> >>>>>>> function > > successful> > >>>>>>>>>>>> [manage.cluster:17488] mca: base: components_open: found > > loaded> > >>>>> > >>>>> component> >>>>>>> tcp> >>>>>>> > >>>>>> [manage.cluster:17488] mca: base: components_open: componenttcp > >>>>>>>>> open> >>>>>>> function successful> >>>>>>> [manage.cluster:17488] > >>>>> mca: base: components_open: found loaded> >> component> >>>>>>> sm> > >>>>>>>>>> > >>>>> [manage.cluster:17488] mca: base: components_open: > >>>>>> component smopen> >>>>>>> function successful> >>>>>>> > >>>>> [manage.cluster:17488] mca: base: components_open: found loaded> >> > >>>>> component> >>>>>>> openib> >>>>>>> [manage.cluster:17488] mca: base: > >>>>>> components_open: componentopenib> >> open> >>>>>>> function > >>> successful> > >>>>>>>>>>>> [manage.cluster:17488] select: initializing btl component > > self> > >>>>>>>>>>>> [manage.cluster:17488] select: init of > >>>>>> component self returned> >> success> >>>>>>> [manage.cluster:17488] > >>>>> select: initializing btl component vader> >>>>>>> > >>> [manage.cluster:17487] > >>>>> mca: base: components_register:component > >>>>>>>>> openib> >>>>>>> register function successful> >>>>>>> > >>>>> [manage.cluster:17487] mca: base: components_open: opening btl> >> > >>>>> components> >>>>>>> [manage.cluster:17487] mca: base: > > components_open: > >>>>>> found loaded> >> component> >>>>>>> self> >>>>>>> > >>> [manage.cluster:17487] > >>>>> mca: base: components_open: componentself > >>>>>>>>> open> >>>>>>> function successful> >>>>>>> [manage.cluster:17487] > >>>>> mca: base: components_open: found loaded> >> component> >>>>>>> > > vader> > >>>>>>>>>>>> [manage.cluster:17487] mca: base: > >>>>>> components_open: componentvader> >> open> >>>>>>> function > > successful> > >>>>>>>>>>>> [manage.cluster:17487] mca: base: components_open: found > > loaded> > >>>>> > >>>>> component> >>>>>>> tcp> >>>>>>> > >>>>>> [manage.cluster:17487] mca: base: components_open: componenttcp > >>>>>>>>> open> >>>>>>> function successful> >>>>>>> [manage.cluster:17487] > >>>>> mca: base: components_open: found loaded> >> component> >>>>>>> sm> > >>>>>>>>>> > >>>>> [manage.cluster:17487] mca: base: components_open: > >>>>>> component smopen> >>>>>>> function successful> >>>>>>> > >>>>> [manage.cluster:17487] mca: base: components_open: found loaded> >> > >>>>> component> >>>>>>> openib> >>>>>>> [manage.cluster:17488] select: > > init > >>> of > >>>>>> component vader returned> >> success> >>>>>>> [manage.cluster:17488] > >>>>> select: initializing btl component tcp> >>>>>>> > > [manage.cluster:17487] > >>> mca: > >>>>> base: components_open: componentopenib> >> open> > >>>>>>>>>>>>> function successful> >>>>>>> [manage.cluster:17487] select: > >>>>> initializing btl component self> >>>>>>> [manage.cluster:17487] > > select: > >>>>> init of component self returned> >> success> >>>>>>> > >>>>>> [manage.cluster:17487] select: initializing btl component vader> > >>>>>>>>>> > >>>>> [manage.cluster:17488] select: init of component tcp returned> >> > >>> success> > >>>>>>>>>>>> [manage.cluster:17488] select: initializing > >>>>>> btl component sm> >>>>>>> [manage.cluster:17488] select: init of > >>>>> component sm returnedsuccess> >>>>>>> [manage.cluster:17488] select: > >>>>> initializing btl componentopenib > >>>>>>>>>>>>>> [manage.cluster:17487] select: init of component vader > >>>>> returned> >> success> >>>>>>> [manage.cluster:17487] select: > >>> initializing > >>>>> btl component tcp> >>>>>>> [manage.cluster:17487] select: > >>>>>> init of component tcp returned> >> success> >>>>>>> > >>>>> [manage.cluster:17487] select: initializing btl component sm> >>>>>>> > >>>>> [manage.cluster:17488] Checking distance from this process to> >>>>>> > >>>>>> device=mthca0> >>>>>>> [manage.cluster:17488] hwloc_distances-> > >>> nbobjs=2> > >>>>>>>>>>>> [manage.cluster:17488] hwloc_distances->latency[0]=1.000000> > >>>>>>>>>>>> [manage.cluster:17488] > >>>>>> hwloc_distances->latency[1]=1.600000> >>>>>>> [manage.cluster:17488] > >>>>> hwloc_distances->latency[2]=1.600000> >>>>>>> [manage.cluster:17488] > >>>>> hwloc_distances->latency[3]=1.000000> >>>>>>> > >>>>>> [manage.cluster:17488] ibv_obj->type set to NULL> >>>>>>> > >>>>> [manage.cluster:17488] Process is bound: distance to device is> >> > >>>>> 0.000000> >>>>>>> [manage.cluster:17487] select: init of component sm> >>>>>> returnedsuccess> >>>>>>> [manage.cluster:17487] select: initializing > >>> btl > >>>>> componentopenib > >>>>>>>>>>>>>> [manage.cluster:17488] openib BTL: rdmacm CPC unavailable > >>>>> foruse > >>>>>> on> >>>>>>> mthca0:1; skipped> >>>>>>> [manage.cluster:17487] > > Checking > >>>>> distance from this process to> >>>>>> device=mthca0> >>>>>>> > >>>>> [manage.cluster:17487] hwloc_distances->nbobjs=2> >>>>>>> > >>>>>> [manage.cluster:17487] hwloc_distances->latency[0]=1.000000> >>>>>>> > >>>>> [manage.cluster:17487] hwloc_distances->latency[1]=1.600000> >>>>>>> > >>>>> [manage.cluster:17487] hwloc_distances->latency[2]=1.600000> > >>>>>>>>>>>>> [manage.cluster:17487] hwloc_distances->latency [3]=1.000000> > >>>>>>>>>>>> [manage.cluster:17487] ibv_obj->type set to NULL> >>>>>>> > >>>>> [manage.cluster:17487] Process is bound: distance to device is> > >>>>>>>> 0.000000> >>>>>>> [manage.cluster:17488] [rank=1] openib: using > > port > >>>>> mthca0:1> >>>>>>> [manage.cluster:17488] select: init of component > >>>>> openibreturned > >>>>>>>>> success> >>>>>>> [manage.cluster:17487] openib BTL: rdmacm CPC > >>>>> unavailable foruse > >>>>>> on> >>>>>>> mthca0:1; skipped> >>>>>>> [manage.cluster:17487] > > [rank=0] > >>>>> openib: using port mthca0:1>>>>> >> [manage.cluster:17487] select: > > init > >>> of > >>>>> component openib returnedsuccess> >>>>>>> > >>>>>> [manage.cluster:17488] mca: bml: Using self btl for send to> >> > >>>>> [[18993,1],1]> >>>>>>> on node manage> >>>>>>> [manage.cluster:17487] > >>> mca: > >>>>> bml: Using self btl for send to> >> [[18993,1],0]> >>>>>>> > >>>>>> on node manage> >>>>>>> [manage.cluster:17488] mca: bml: Using vader > >>> btl > >>>>> for send to> >>>>>> [[18993,1],0]> >>>>>>> on node manage> >>>>>>> > >>>>> [manage.cluster:17487] mca: bml: Using vader btl for send > >>>>>> to> >>>>>> [[18993,1],1]> >>>>>>> on node manage> >>>>>>> # OSU MPI > >>>>> Bandwidth Test v3.1.1> >>>>>>> # Size Bandwidth (MB/s)> >>>>>>> 1 > > 1.76> > >>>>>>>>>>>> 2 3.53> >>>>>>> 4 7.06> >>>>>>> 8 14.46> >>>>>>> 16 > >>>>>> 29.12> >>>>>>> 32 57.54> >>>>>>> 64 100.12> >>>>>>> 128 157.78> > >>>>>>>>>> > >>>>> 256 277.32> >>>>>>> 512 477.53> >>>>>>> 1024 894.81> >>>>>>> 2048 > >>> 1330.68> > >>>>>>>>>>>> 4096 278.58> >>>>>>> 8192 516.00> >>>>>>> > >>>>>> 16384 762.99> >>>>>>> 32768 1037.19> >>>>>>> 65536 1181.66> >>>>>>> > >>>>> 131072 1261.91> >>>>>>> 262144 1237.39> >>>>>>> 524288 1247.86> > >>>>>>>> > >>>>> 1048576 1252.04> >>>>>>> 2097152 1273.46> >>>>>>> 4194304 1281.21> > >>>>>>>>>> > >>>>> [manage.cluster:17488] mca: base: close: component self closed> > >>>>>>>> > >>>>> [manage.cluster:17488] mca: base: close: > >>>>>> unloading componentself > >>>>>>>>>>>>>> [manage.cluster:17487] mca: base: close: component s elf > >>> closed>>>>>>>>>>>>> [manage.cluster:17487] mca: base: close: unloading > > componentself > >>>>>>>>>>>>>> [manage.cluster:17488] mca: base: close: component vader > >>>>> closed> >>>>>>> [manage.cluster:17488] mca: base: close: unloading > >>>>> componentvader> >>>>>>> [manage.cluster:17487] mca: base: close: > >>>>>> component vader closed> >>>>>>> [manage.cluster:17487] mca: base: > >>> close: > >>>>> unloading componentvader> >>>>>>> [manage.cluster:17488] mca: base: > >>> close: > >>>>> component tcp closed> >>>>>>> > >>>>>> [manage.cluster:17488] mca: base: close: unloading componenttcp > >>>>>>>>>>>>>> [manage.cluster:17487] mca: base: close: component tcp > > closed> > >>>>>>>>>>>> [manage.cluster:17487] mca: base: close: unloading > > componenttcp > >>>>>>>>>>>>>> [manage.cluster:17488] mca: base: close: component sm > > closed> > >>>>>>>>>>>> [manage.cluster:17488] mca: base: close: unloading component > > sm> > >>>>>>>>>>>> [manage.cluster:17487] mca: base: close: > >>>>>> component sm closed> >>>>>>> [manage.cluster:17487] mca: base: > > close: > >>>>> unloading component sm> >>>>>>> [manage.cluster:17488] mca: base: > >>> close: > >>>>> component openibclosed > >>>>>>>>>>>>>> [manage.cluster:17488] mca: base: close: unloading > >>>>> componentopenib> >>>>>>> [manage.cluster:17487] mca: base: close: > >>> component > >>>>> openibclosed > >>>>>>>>>>>>>> [manage.cluster:17487] mca: base: close: unloading > >>>>> componentopenib> >>>>>>>> >>>>>>> Tetsuya Mishima> >>>>>>>> >>>>>>> > >>>>> 2016/07/27 9:20:28、"devel"さんは「Re: [OMPI devel] sm BTL> >> > >>> performace > >>>>>> of> >>>>>>> the openmpi-2.0.0」で書きました> >>>>>>>> sm is > > deprecated > >>> in > >>>>> 2.0.0 and will likely be removed in favorof > >>>>>>>>> vader> >>>>>> in> >>>>>>> 2.1.0.> >>>>>>>>> >>>>>>>> This issue > > is > >>>>> probably this known issue:> >>>>>>> > >>>>> https://github.com/open-mpi/ompi-release/pull/1250> >>>>>>>>> > >>>>>>>>> > >>>> > >> Please apply those>>>> commits and see if it fixes the issue foryou.> > >>>>>>>>>>>>>>>>>> > >>>>> -Nathan> >>>>>>>>> >>>>>>>>> On Jul 26, 2016, at 6:17 PM, > >>>>> tmish...@jcity.maeda.co.jpwrote: > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > _______________________________________________ > devel mailing list > de...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel