Hi, it caused segfault as below:

[manage.cluster:25436] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]:
[B/B/B/B/B/B][./././././.]
[manage.cluster:25436] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]:
[B/B/B/B/B/B][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# Size        Bandwidth (MB/s)
1                         2.23
2                         4.51
4                         8.99
8                        17.83
16                       35.18
32                       69.66
64                      109.84
128                     179.65
256                     303.52
512                     532.81
1024                    911.74
2048                   1605.29
4096                   1598.73
8192                   2135.94
16384                  2468.98
32768                  2818.37
65536                  3658.83
131072                 4200.50
262144                 4545.01
524288                 4757.84
1048576                4831.75
[manage:25442] *** Process received signal ***
[manage:25442] Signal: Segmentation fault (11)
[manage:25442] Signal code: Address not mapped (1)
[manage:25442] Failing at address: 0x8
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node manage exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Tetsuya Mishima


2016/08/08 10:12:05、"devel"さんは「Re: [OMPI devel] sm BTL performace of
the openmpi-2.0.0」で書きました
> This patch also modifies the put path. Let me know if this works:
>
> diff --git a/ompi/mca/pml/ob1/pml_ob1_rdma.c
b/ompi/mca/pml/ob1/pml_ob1_rdma.c
> index 888e126..a3ec6f8 100644
> --- a/ompi/mca/pml/ob1/pml_ob1_rdma.c
> +++ b/ompi/mca/pml/ob1/pml_ob1_rdma.c
> @@ -42,6 +42,7 @@ size_t mca_pml_ob1_rdma_btls(
> mca_pml_ob1_com_btl_t* rdma_btls)
> {
> int num_btls = mca_bml_base_btl_array_get_size(&bml_endpoint->btl_rdma);
> +    int num_eager_btls = mca_bml_base_btl_array_get_size
(&bml_endpoint->btl_eager);
> double weight_total = 0;
> int num_btls_used = 0;
>
> @@ -57,6 +58,21 @@ size_t mca_pml_ob1_rdma_btls(
> (bml_endpoint->btl_rdma_index + n) % num_btls);
> mca_btl_base_registration_handle_t *reg_handle = NULL;
> mca_btl_base_module_t *btl = bml_btl->btl;
> +        bool ignore = true;
> +
> +        /* do not use rdma btls that are not in the eager list. this is
necessary to avoid using
> +         * btls that exist on the endpoint only to support RMA. */
> +        for (int i = 0 ; i < num_eager_btls ; ++i) {
> +            mca_bml_base_btl_t *eager_btl =
mca_bml_base_btl_array_get_index (&bml_endpoint->btl_eager, i);
> +            if (eager_btl->btl_endpoint == bml_btl->btl_endpoint) {
> +                ignore = false;
> +                break;
> +            }
> +        }
> +
> +        if (ignore) {
> +            continue;
> +        }
>
> if (btl->btl_register_mem) {
> /* do not use the RDMA protocol with this btl if 1) leave pinned is
disabled,
> @@ -99,18 +115,34 @@ size_t mca_pml_ob1_rdma_pipeline_btls
( mca_bml_base_endpoint_t* bml_endpoint,
> size_t size,
> mca_pml_ob1_com_btl_t* rdma_btls )
> {
> -    int i, num_btls = mca_bml_base_btl_array_get_size(&bml_endpoint->
btl_rdma);
> +    int num_btls = mca_bml_base_btl_array_get_size (&bml_endpoint->
btl_rdma);
> +    int num_eager_btls = mca_bml_base_btl_array_get_size
(&bml_endpoint->btl_eager);
> double weight_total = 0;
> +    int rdma_count = 0;
>
> -    for(i = 0; i < num_btls && i < mca_pml_ob1.max_rdma_per_request; i+
+) {
> -        rdma_btls[i].bml_btl =
> -            mca_bml_base_btl_array_get_next(&bml_endpoint->btl_rdma);
> -        rdma_btls[i].btl_reg = NULL;
> +    for(int i = 0; i < num_btls && i < mca_pml_ob1.max_rdma_per_request;
i++) {
> +        mca_bml_base_btl_t *bml_btl = mca_bml_base_btl_array_get_next
(&bml_endpoint->btl_rdma);
> +        bool ignore = true;
> +
> +        for (int i = 0 ; i < num_eager_btls ; ++i) {
> +            mca_bml_base_btl_t *eager_btl =
mca_bml_base_btl_array_get_index (&bml_endpoint->btl_eager, i);
> +            if (eager_btl->btl_endpoint == bml_btl->btl_endpoint) {
> +                ignore = false;
> +                break;
> +            }
> +        }
>
> -        weight_total += rdma_btls[i].bml_btl->btl_weight;
> +        if (ignore) {
> +            continue;
> +        }
> +
> +        rdma_btls[rdma_count].bml_btl = bml_btl;
> +        rdma_btls[rdma_count++].btl_reg = NULL;
> +
> +        weight_total += bml_btl->btl_weight;
> }
>
> -    mca_pml_ob1_calc_weighted_length(rdma_btls, i, size, weight_total);
> +    mca_pml_ob1_calc_weighted_length (rdma_btls, rdma_count, size,
weight_total);
>
> -    return i;
> +    return rdma_count;
> }
>
>
>
>
> > On Aug 7, 2016, at 6:51 PM, Nathan Hjelm <hje...@me.com> wrote:
> >
> > Looks like the put path probably needs a similar patch. Will send
another patch soon.
> >
> >> On Aug 7, 2016, at 6:01 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >> Hi,
> >>
> >> I applied the patch to the file "pml_ob1_rdma.c" and ran osu_bw again.
> >> Then, I still see the bad performance for larger size(>=2097152 ).
> >>
> >> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -report-bindings
> >> osu_bw
> >> [manage.cluster:27444] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket
> >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> >> cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt
0]]:
> >> [B/B/B/B/B/B][./././././.]
> >> [manage.cluster:27444] MCW rank 1 bound to socket 0[core 0[hwt 0]],
socket
> >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> >> cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt
0]]:
> >> [B/B/B/B/B/B][./././././.]
> >> # OSU MPI Bandwidth Test v3.1.1
> >> # Size        Bandwidth (MB/s)
> >> 1                         2.23
> >> 2                         4.52
> >> 4                         8.82
> >> 8                        17.83
> >> 16                       35.31
> >> 32                       69.49
> >> 64                      109.46
> >> 128                     178.51
> >> 256                     307.68
> >> 512                     532.64
> >> 1024                    909.34
> >> 2048                   1583.95
> >> 4096                   1554.74
> >> 8192                   2120.31
> >> 16384                  2489.79
> >> 32768                  2853.66
> >> 65536                  3692.82
> >> 131072                 4236.67
> >> 262144                 4575.63
> >> 524288                 4778.47
> >> 1048576                4839.34
> >> 2097152                2231.46
> >> 4194304                1505.48
> >>
> >> Regards,
> >>
> >> Tetsuya Mishima
> >>
> >> 2016/08/06 0:00:08、"devel"さんは「Re: [OMPI devel] sm BTL performace
of
> >> the openmpi-2.0.0」で書きました
> >>> Making ob1 ignore RDMA btls that are not in use for eager messages
might
> >> be sufficient. Please try the following patch and let me know if it
works
> >> for you.
> >>>
> >>> diff --git a/ompi/mca/pml/ob1/pml_ob1_rdma.c
> >> b/ompi/mca/pml/ob1/pml_ob1_rdma.c
> >>> index 888e126..0c99525 100644
> >>> --- a/ompi/mca/pml/ob1/pml_ob1_rdma.c
> >>> +++ b/ompi/mca/pml/ob1/pml_ob1_rdma.c
> >>> @@ -42,6 +42,7 @@ size_t mca_pml_ob1_rdma_btls(
> >>> mca_pml_ob1_com_btl_t* rdma_btls)
> >>> {
> >>> int num_btls = mca_bml_base_btl_array_get_size(&bml_endpoint->
btl_rdma);
> >>> +    int num_eager_btls = mca_bml_base_btl_array_get_size
> >> (&bml_endpoint->btl_eager);
> >>> double weight_total = 0;
> >>> int num_btls_used = 0;
> >>>
> >>> @@ -57,6 +58,21 @@ size_t mca_pml_ob1_rdma_btls(
> >>> (bml_endpoint->btl_rdma_index + n) % num_btls);
> >>> mca_btl_base_registration_handle_t *reg_handle = NULL;
> >>> mca_btl_base_module_t *btl = bml_btl->btl;
> >>> +        bool ignore = true;
> >>> +
> >>> +        /* do not use rdma btls that are not in the eager list. this
is
> >> necessary to avoid using
> >>> +         * btls that exist on the endpoint only to support RMA. */
> >>> +        for (int i = 0 ; i < num_eager_btls ; ++i) {
> >>> +            mca_bml_base_btl_t *eager_btl =
> >> mca_bml_base_btl_array_get_index (&bml_endpoint->btl_eager, i);
> >>> +            if (eager_btl->btl_endpoint == bml_btl->btl_endpoint) {
> >>> +                ignore = false;
> >>> +                break;
> >>> +            }
> >>> +        }
> >>> +
> >>> +        if (ignore) {
> >>> +            continue;
> >>> +        }
> >>>
> >>> if (btl->btl_register_mem) {
> >>> /* do not use the RDMA protocol with this btl if 1) leave pinned is
> >> disabled,
> >>>
> >>>
> >>>
> >>> -Nathan
> >>>
> >>>
> >>>> On Aug 5, 2016, at 8:44 AM, Nathan Hjelm <hje...@me.com> wrote:
> >>>>
> >>>> Nope. We are not going to change the flags as this will disable the
blt
> >> for one-sided. Not sure what is going on here as the openib btl should
be
> >> 1) not used for pt2pt, and 2) polled infrequently.
> >>> The btl debug log suggests both of these are the case. Not sure what
is
> >> going on yet.
> >>>>
> >>>> -Nathan
> >>>>
> >>>>> On Aug 5, 2016, at 8:16 AM, r...@open-mpi.org wrote:
> >>>>>
> >>>>> Perhaps those flags need to be the default?
> >>>>>
> >>>>>
> >>>>>> On Aug 5, 2016, at 7:14 AM, tmish...@jcity.maeda.co.jp wrote:
> >>>>>>
> >>>>>> Hi Christoph,
> >>>>>>
> >>>>>> I applied the commits - pull/#1250 as Nathan told me and added
"-mca
> >>>>>> btl_openib_flags 311" to the mpirun command line option, then it
> >> worked for
> >>>>>> me. I don't know the reason, but it looks ATOMIC_FOP in the
> >>>>>> btl_openib_flags degrades the sm/vader perfomance.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Tetsuya Mishima
> >>>>>>
> >>>>>>
> >>>>>> 2016/08/05 22:10:37、"devel"さんは「Re: [OMPI devel] sm BTL
> >> performace of
> >>>>>> the openmpi-2.0.0」で書きました
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> We see the same problem here on various machines with Open MPI
> >> 2.0.0.
> >>>>>>> To us it seems that enabling the openib btl triggers bad
performance
> >> for
> >>>>>> the sm AND vader btls!
> >>>>>>> --mca btl_base_verbose 10 reports in both cases the correct use
of
> >> sm and
> >>>>>> vader between MPI ranks - only performance differs?!
> >>>>>>>
> >>>>>>> One irritating thing I see in the log output is the following:
> >>>>>>> openib BTL: rdmacm CPC unavailable for use on mlx4_0:1; skipped
> >>>>>>> [rank=1] openib: using port mlx4_0:1
> >>>>>>> select: init of component openib returned success
> >>>>>>>
> >>>>>>> Did not look into the "Skipped" code part yet, ...
> >>>>>>>
> >>>>>>> Results see below.
> >>>>>>>
> >>>>>>> Best regards
> >>>>>>> Christoph Niethammer
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Christoph Niethammer
> >>>>>>> High Performance Computing Center Stuttgart (HLRS)
> >>>>>>> Nobelstrasse 19
> >>>>>>> 70569 Stuttgart
> >>>>>>>
> >>>>>>> Tel: ++49(0)711-685-87203
> >>>>>>> email: nietham...@hlrs.de
> >>>>>>> http://www.hlrs.de/people/niethammer
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> mpirun -np 2 --mca btl self,vader  osu_bw
> >>>>>>> # OSU MPI Bandwidth Test
> >>>>>>> # Size        Bandwidth (MB/s)
> >>>>>>> 1                         4.83
> >>>>>>> 2                        10.30
> >>>>>>> 4                        24.68
> >>>>>>> 8                        49.27
> >>>>>>> 16                       95.80
> >>>>>>> 32                      187.52
> >>>>>>> 64                      270.82
> >>>>>>> 128                     405.00
> >>>>>>> 256                     659.26
> >>>>>>> 512                    1165.14
> >>>>>>> 1024                   2372.83
> >>>>>>> 2048                   3592.85
> >>>>>>> 4096                   4283.51
> >>>>>>> 8192                   5523.55
> >>>>>>> 16384                  7388.92
> >>>>>>> 32768                  7024.37
> >>>>>>> 65536                  7353.79
> >>>>>>> 131072                 7465.96
> >>>>>>> 262144                 8597.56
> >>>>>>> 524288                 9292.86
> >>>>>>> 1048576                9168.01
> >>>>>>> 2097152                9009.62
> >>>>>>> 4194304                9013.02
> >>>>>>>
> >>>>>>> mpirun -np 2 --mca btl self,vader,openib  osu_bw
> >>>>>>> # OSU MPI Bandwidth Test
> >>>>>>> # Size        Bandwidth (MB/s)
> >>>>>>> 1                         5.32
> >>>>>>> 2                        11.14
> >>>>>>> 4                        20.88
> >>>>>>> 8                        49.26
> >>>>>>> 16                       99.11
> >>>>>>> 32                      197.42
> >>>>>>> 64                      301.08
> >>>>>>> 128                     413.64
> >>>>>>> 256                     651.15
> >>>>>>> 512                    1161.12
> >>>>>>> 1024                   2460.99
> >>>>>>> 2048                   3627.36
> >>>>>>> 4096                   2191.06
> >>>>>>> 8192                   3118.36
> >>>>>>> 16384                  3428.45
> >>>>>>> 32768                  3676.96
> >>>>>>> 65536                  3709.65
> >>>>>>> 131072                 3748.64
> >>>>>>> 262144                 3764.88
> >>>>>>> 524288                 3764.61
> >>>>>>> 1048576                3772.45
> >>>>>>> 2097152                3757.37
> >>>>>>> 4194304                3746.45
> >>>>>>>
> >>>>>>> mpirun -np 2 --mca btl self,sm  osu_bw
> >>>>>>> # OSU MPI Bandwidth Test
> >>>>>>> # Size        Bandwidth (MB/s)
> >>>>>>> 1                         2.98
> >>>>>>> 2                         5.97
> >>>>>>> 4                        11.99
> >>>>>>> 8                        23.47
> >>>>>>> 16                       50.64
> >>>>>>> 32                       99.91
> >>>>>>> 64                      197.87
> >>>>>>> 128                     343.32
> >>>>>>> 256                     667.48
> >>>>>>> 512                    1200.86
> >>>>>>> 1024                   2050.05
> >>>>>>> 2048                   3578.52
> >>>>>>> 4096                   3966.92
> >>>>>>> 8192                   5687.96
> >>>>>>> 16384                  7395.88
> >>>>>>> 32768                  7101.41
> >>>>>>> 65536                  7619.49
> >>>>>>> 131072                 7978.09
> >>>>>>> 262144                 8648.87
> >>>>>>> 524288                 9129.18
> >>>>>>> 1048576               10525.31
> >>>>>>> 2097152               10511.63
> >>>>>>> 4194304               10489.66
> >>>>>>>
> >>>>>>> mpirun -np 2 --mca btl self,sm,openib  osu_bw
> >>>>>>> # OSU MPI Bandwidth Test
> >>>>>>> # Size        Bandwidth (MB/s)
> >>>>>>> 1                         2.02
> >>>>>>> 2                         3.00
> >>>>>>> 4                         9.99
> >>>>>>> 8                        19.96
> >>>>>>> 16                       40.10
> >>>>>>> 32                       70.63
> >>>>>>> 64                      144.08
> >>>>>>> 128                     282.21
> >>>>>>> 256                     543.55
> >>>>>>> 512                    1032.61
> >>>>>>> 1024                   1871.09
> >>>>>>> 2048                   3294.07
> >>>>>>> 4096                   2336.48
> >>>>>>> 8192                   3142.22
> >>>>>>> 16384                  3419.93
> >>>>>>> 32768                  3647.30
> >>>>>>> 65536                  3725.40
> >>>>>>> 131072                 3749.43
> >>>>>>> 262144                 3765.31
> >>>>>>> 524288                 3771.06
> >>>>>>> 1048576                3772.54
> >>>>>>> 2097152                3760.93
> >>>>>>> 4194304                3745.37
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>> From: tmish...@jcity.maeda.co.jp
> >>>>>>> To: "Open MPI Developers" <de...@open-mpi.org>
> >>>>>>> Sent: Wednesday, July 27, 2016 6:04:48 AM
> >>>>>>> Subject: Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0
> >>>>>>>
> >>>>>>> HiNathan,
> >>>>>>>
> >>>>>>> I applied those commits and ran again without any BTL specified.
> >>>>>>>
> >>>>>>> Then, although it says "mca: bml: Using vader btl for send to
> >>>>>> [[18993,1],1]
> >>>>>>> on node manage",
> >>>>>>> the osu_bw still shows it's very slow as shown below:
> >>>>>>>
> >>>>>>> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca
> >>>>>> btl_base_verbose
> >>>>>>> 10 -bind-to core -report-bindings osu_bw
> >>>>>>> [manage.cluster:17482] MCW rank 0 bound to socket 0[core 0[hwt
0]]:
> >>>>>>> [B/././././.][./././././.]
> >>>>>>> [manage.cluster:17482] MCW rank 1 bound to socket 0[core 1[hwt
0]]:
> >>>>>>> [./B/./././.][./././././.]
> >>>>>>> [manage.cluster:17487] mca: base: components_register:
registering
> >>>>>>> framework btl components
> >>>>>>> [manage.cluster:17487] mca: base: components_register: found
loaded
> >>>>>>> component self
> >>>>>>> [manage.cluster:17487] mca: base: components_register: component
> >> self
> >>>>>>> register function successful
> >>>>>>> [manage.cluster:17487] mca: base: components_register: found
loaded
> >>>>>>> component vader
> >>>>>>> [manage.cluster:17488] mca: base: components_register:
registering
> >>>>>>> framework btl components
> >>>>>>> [manage.cluster:17488] mca: base: components_register: found
loaded
> >>>>>>> component self
> >>>>>>> [manage.cluster:17487] mca: base: components_register: component
> >> vader
> >>>>>>> register function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_register: component
> >> self
> >>>>>>> register function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_register: found
loaded
> >>>>>>> component vader
> >>>>>>> [manage.cluster:17487] mca: base: components_register: found
loaded
> >>>>>>> component tcp
> >>>>>>> [manage.cluster:17488] mca: base: components_register: component
> >> vader>>>>>>> register function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_register: found
loaded
> >>>>>>> component tcp
> >>>>>>> [manage.cluster:17487] mca: base: components_register: component
tcp
> >>>>>>> register function successful
> >>>>>>> [manage.cluster:17487] mca: base: components_register: found
loaded
> >>>>>>> component sm
> >>>>>>> [manage.cluster:17488] mca: base: components_register: component
tcp
> >>>>>>> register function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_register: found
loaded
> >>>>>>> component sm
> >>>>>>> [manage.cluster:17487] mca: base: components_register: component
sm
> >>>>>>> register function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_register: component
sm
> >>>>>>> register function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_register: found
loaded
> >>>>>>> component openib
> >>>>>>> [manage.cluster:17487] mca: base: components_register: found
loaded
> >>>>>>> component openib
> >>>>>>> [manage.cluster:17488] mca: base: components_register: component
> >> openib
> >>>>>>> register function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_open: opening btl
> >> components
> >>>>>>> [manage.cluster:17488] mca: base: components_open: found loaded
> >> component
> >>>>>>> self
> >>>>>>> [manage.cluster:17488] mca: base: components_open: component self
> >> open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_open: found loaded
> >> component
> >>>>>>> vader
> >>>>>>> [manage.cluster:17488] mca: base: components_open: component
vader
> >> open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_open: found loaded
> >> component
> >>>>>>> tcp
> >>>>>>> [manage.cluster:17488] mca: base: components_open: component tcp
> >> open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_open: found loaded
> >> component
> >>>>>>> sm
> >>>>>>> [manage.cluster:17488] mca: base: components_open: component sm
open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17488] mca: base: components_open: found loaded
> >> component
> >>>>>>> openib
> >>>>>>> [manage.cluster:17488] mca: base: components_open: component
openib
> >> open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17488] select: initializing btl component self
> >>>>>>> [manage.cluster:17488] select: init of component self returned
> >> success
> >>>>>>> [manage.cluster:17488] select: initializing btl component vader
> >>>>>>> [manage.cluster:17487] mca: base: components_register: component
> >> openib
> >>>>>>> register function successful
> >>>>>>> [manage.cluster:17487] mca: base: components_open: opening btl
> >> components
> >>>>>>> [manage.cluster:17487] mca: base: components_open: found loaded
> >> component
> >>>>>>> self
> >>>>>>> [manage.cluster:17487] mca: base: components_open: component self
> >> open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17487] mca: base: components_open: found loaded
> >> component
> >>>>>>> vader
> >>>>>>> [manage.cluster:17487] mca: base: components_open: component
vader
> >> open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17487] mca: base: components_open: found loaded
> >> component
> >>>>>>> tcp
> >>>>>>> [manage.cluster:17487] mca: base: components_open: component tcp
> >> open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17487] mca: base: components_open: found loaded
> >> component
> >>>>>>> sm
> >>>>>>> [manage.cluster:17487] mca: base: components_open: component sm
open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17487] mca: base: components_open: found loaded
> >> component
> >>>>>>> openib
> >>>>>>> [manage.cluster:17488] select: init of component vader returned
> >> success
> >>>>>>> [manage.cluster:17488] select: initializing btl component tcp
> >>>>>>> [manage.cluster:17487] mca: base: components_open: component
openib
> >> open
> >>>>>>> function successful
> >>>>>>> [manage.cluster:17487] select: initializing btl component self
> >>>>>>> [manage.cluster:17487] select: init of component self returned
> >> success
> >>>>>>> [manage.cluster:17487] select: initializing btl component vader
> >>>>>>> [manage.cluster:17488] select: init of component tcp returned
> >> success
> >>>>>>> [manage.cluster:17488] select: initializing btl component sm
> >>>>>>> [manage.cluster:17488] select: init of component sm returned
success
> >>>>>>> [manage.cluster:17488] select: initializing btl component openib
> >>>>>>> [manage.cluster:17487] select: init of component vader returned
> >> success
> >>>>>>> [manage.cluster:17487] select: initializing btl component tcp
> >>>>>>> [manage.cluster:17487] select: init of component tcp returned
> >> success
> >>>>>>> [manage.cluster:17487] select: initializing btl component sm
> >>>>>>> [manage.cluster:17488] Checking distance from this process to
> >>>>>> device=mthca0
> >>>>>>> [manage.cluster:17488] hwloc_distances->nbobjs=2
> >>>>>>> [manage.cluster:17488] hwloc_distances->latency[0]=1.000000
> >>>>>>> [manage.cluster:17488] hwloc_distances->latency[1]=1.600000
> >>>>>>> [manage.cluster:17488] hwloc_distances->latency[2]=1.600000
> >>>>>>> [manage.cluster:17488] hwloc_distances->latency[3]=1.000000
> >>>>>>> [manage.cluster:17488] ibv_obj->type set to NULL
> >>>>>>> [manage.cluster:17488] Process is bound: distance to device is
> >> 0.000000
> >>>>>>> [manage.cluster:17487] select: init of component sm returned
success
> >>>>>>> [manage.cluster:17487] select: initializing btl component openib
> >>>>>>> [manage.cluster:17488] openib BTL: rdmacm CPC unavailable for use
on
> >>>>>>> mthca0:1; skipped
> >>>>>>> [manage.cluster:17487] Checking distance from this process to
> >>>>>> device=mthca0
> >>>>>>> [manage.cluster:17487] hwloc_distances->nbobjs=2
> >>>>>>> [manage.cluster:17487] hwloc_distances->latency[0]=1.000000
> >>>>>>> [manage.cluster:17487] hwloc_distances->latency[1]=1.600000
> >>>>>>> [manage.cluster:17487] hwloc_distances->latency[2]=1.600000
> >>>>>>> [manage.cluster:17487] hwloc_distances->latency[3]=1.000000
> >>>>>>> [manage.cluster:17487] ibv_obj->type set to NULL
> >>>>>>> [manage.cluster:17487] Process is bound: distance to device is
> >> 0.000000
> >>>>>>> [manage.cluster:17488] [rank=1] openib: using port mthca0:1
> >>>>>>> [manage.cluster:17488] select: init of component openib returned
> >> success
> >>>>>>> [manage.cluster:17487] openib BTL: rdmacm CPC unavailable for use
on
> >>>>>>> mthca0:1; skipped
> >>>>>>> [manage.cluster:17487] [rank=0] openib: using port mthca0:1>>>>
> >> [manage.cluster:17487] select: init of component openib returned
success
> >>>>>>> [manage.cluster:17488] mca: bml: Using self btl for send to
> >> [[18993,1],1]
> >>>>>>> on node manage
> >>>>>>> [manage.cluster:17487] mca: bml: Using self btl for send to
> >> [[18993,1],0]
> >>>>>>> on node manage
> >>>>>>> [manage.cluster:17488] mca: bml: Using vader btl for send to
> >>>>>> [[18993,1],0]
> >>>>>>> on node manage
> >>>>>>> [manage.cluster:17487] mca: bml: Using vader btl for send to
> >>>>>> [[18993,1],1]
> >>>>>>> on node manage
> >>>>>>> # OSU MPI Bandwidth Test v3.1.1
> >>>>>>> # Size        Bandwidth (MB/s)
> >>>>>>> 1                         1.76
> >>>>>>> 2                         3.53
> >>>>>>> 4                         7.06
> >>>>>>> 8                        14.46
> >>>>>>> 16                       29.12
> >>>>>>> 32                       57.54
> >>>>>>> 64                      100.12
> >>>>>>> 128                     157.78
> >>>>>>> 256                     277.32
> >>>>>>> 512                     477.53
> >>>>>>> 1024                    894.81
> >>>>>>> 2048                   1330.68
> >>>>>>> 4096                    278.58
> >>>>>>> 8192                    516.00
> >>>>>>> 16384                   762.99
> >>>>>>> 32768                  1037.19
> >>>>>>> 65536                  1181.66
> >>>>>>> 131072                 1261.91
> >>>>>>> 262144                 1237.39
> >>>>>>> 524288                 1247.86
> >>>>>>> 1048576                1252.04
> >>>>>>> 2097152                1273.46
> >>>>>>> 4194304                1281.21
> >>>>>>> [manage.cluster:17488] mca: base: close: component self closed
> >>>>>>> [manage.cluster:17488] mca: base: close: unloading component self
> >>>>>>> [manage.cluster:17487] mca: base: close: component self closed
> >>>>>>> [manage.cluster:17487] mca: base: close: unloading component self
> >>>>>>> [manage.cluster:17488] mca: base: close: component vader closed
> >>>>>>> [manage.cluster:17488] mca: base: close: unloading component
vader
> >>>>>>> [manage.cluster:17487] mca: base: close: component vader closed
> >>>>>>> [manage.cluster:17487] mca: base: close: unloading component
vader
> >>>>>>> [manage.cluster:17488] mca: base: close: component tcp closed
> >>>>>>> [manage.cluster:17488] mca: base: close: unloading component tcp
> >>>>>>> [manage.cluster:17487] mca: base: close: component tcp closed
> >>>>>>> [manage.cluster:17487] mca: base: close: unloading component tcp
> >>>>>>> [manage.cluster:17488] mca: base: close: component sm closed
> >>>>>>> [manage.cluster:17488] mca: base: close: unloading component sm
> >>>>>>> [manage.cluster:17487] mca: base: close: component sm closed
> >>>>>>> [manage.cluster:17487] mca: base: close: unloading component sm
> >>>>>>> [manage.cluster:17488] mca: base: close: component openib closed
> >>>>>>> [manage.cluster:17488] mca: base: close: unloading component
openib
> >>>>>>> [manage.cluster:17487] mca: base: close: component openib closed
> >>>>>>> [manage.cluster:17487] mca: base: close: unloading component
openib
> >>>>>>>
> >>>>>>> Tetsuya Mishima
> >>>>>>>
> >>>>>>> 2016/07/27 9:20:28、"devel"さんは「Re: [OMPI devel] sm BTL
> >> performace of
> >>>>>>> the openmpi-2.0.0」で書きました
> >>>>>>>> sm is deprecated in 2.0.0 and will likely be removed in favor of
> >> vader
> >>>>>> in
> >>>>>>> 2.1.0.
> >>>>>>>>
> >>>>>>>> This issue is probably this known issue:
> >>>>>>> https://github.com/open-mpi/ompi-release/pull/1250
> >>>>>>>>
> >>>>>>>> Please apply those commits and see if it fixes the issue for
you.
> >>>>>>>>
> >>>>>>>> -Nathan
> >>>>>>>>
> >>>>>>>>> On Jul 26, 2016, at 6:17 PM, tmish...@jcity.maeda.co.jp wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Gilles,
> >>>>>>>>>
> >>>>>>>>> Thanks. I ran again with --mca pml ob1 but I've got the same
> >> results
> >>>>>> as
> >>>>>>>>> below:
> >>>>>>>>>
> >>>>>>>>> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml
ob1
> >>>>>>> -bind-to
> >>>>>>>>> core -report-bindings osu_bw
> >>>>>>>>> [manage.cluster:18142] MCW rank 0 bound to socket 0[core 0[hwt
> >> 0]]:
> >>>>>>>>> [B/././././.][./././././.]
> >>>>>>>>> [manage.cluster:18142] MCW rank 1 bound to socket 0[core 1[hwt
> >> 0]]:
> >>>>>>>>> [./B/./././.][./././././.]
> >>>>>>>>> # OSU MPI Bandwidth Test v3.1.1
> >>>>>>>>> # Size        Bandwidth (MB/s)
> >>>>>>>>> 1                         1.48
> >>>>>>>>> 2                         3.07
> >>>>>>>>> 4                         6.26
> >>>>>>>>> 8                        12.53
> >>>>>>>>> 16                       24.33
> >>>>>>>>> 32                       49.03
> >>>>>>>>> 64                       83.46
> >>>>>>>>> 128                     132.60
> >>>>>>>>> 256                     234.96
> >>>>>>>>> 512                     420.86
> >>>>>>>>> 1024                    842.37
> >>>>>>>>> 2048                   1231.65
> >>>>>>>>> 4096                    264.67
> >>>>>>>>> 8192                    472.16
> >>>>>>>>> 16384                   740.42
> >>>>>>>>> 32768                  1030.39
> >>>>>>>>> 65536                  1191.16
> >>>>>>>>> 131072                 1269.45
> >>>>>>>>> 262144                 1238.33> > 524288
1247.97
> >>>>>>>>> 1048576                1257.96
> >>>>>>>>> 2097152                1274.74
> >>>>>>>>> 4194304                1280.94
> >>>>>>>>> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml
ob1
> >>>>>> -mca
> >>>>>>> btl
> >>>>>>>>> self,sm -bind-to core -report-bindings osu_b
> >>>>>>>>> w
> >>>>>>>>> [manage.cluster:18204] MCW rank 0 bound to socket 0[core 0[hwt
> >> 0]]:
> >>>>>>>>> [B/././././.][./././././.]
> >>>>>>>>> [manage.cluster:18204] MCW rank 1 bound to socket 0[core 1[hwt
> >> 0]]:
> >>>>>>>>> [./B/./././.][./././././.]
> >>>>>>>>> # OSU MPI Bandwidth Test v3.1.1
> >>>>>>>>> # Size        Bandwidth (MB/s)
> >>>>>>>>> 1                         0.52
> >>>>>>>>> 2                         1.05
> >>>>>>>>> 4                         2.08
> >>>>>>>>> 8                         4.18
> >>>>>>>>> 16                        8.21
> >>>>>>>>> 32                       16.65
> >>>>>>>>> 64                       32.60
> >>>>>>>>> 128                      66.70
> >>>>>>>>> 256                     132.45
> >>>>>>>>> 512                     269.27
> >>>>>>>>> 1024                    504.63
> >>>>>>>>> 2048                    819.76
> >>>>>>>>> 4096                    874.54
> >>>>>>>>> 8192                   1447.11
> >>>>>>>>> 16384                  2263.28
> >>>>>>>>> 32768                  3236.85
> >>>>>>>>> 65536                  3567.34
> >>>>>>>>> 131072                 3555.17
> >>>>>>>>> 262144                 3455.76
> >>>>>>>>> 524288                 3441.80
> >>>>>>>>> 1048576                3505.30
> >>>>>>>>> 2097152                3534.01
> >>>>>>>>> 4194304                3546.94
> >>>>>>>>> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml
ob1
> >>>>>> -mca
> >>>>>>> btl
> >>>>>>>>> self,sm,openib -bind-to core -report-binding
> >>>>>>>>> s osu_bw
> >>>>>>>>> [manage.cluster:18218] MCW rank 0 bound to socket 0[core 0[hwt
> >> 0]]:
> >>>>>>>>> [B/././././.][./././././.]
> >>>>>>>>> [manage.cluster:18218] MCW rank 1 bound to socket 0[core 1[hwt
> >> 0]]:
> >>>>>>>>> [./B/./././.][./././././.]
> >>>>>>>>> # OSU MPI Bandwidth Test v3.1.1
> >>>>>>>>> # Size        Bandwidth (MB/s)
> >>>>>>>>> 1                         0.51
> >>>>>>>>> 2                         1.03
> >>>>>>>>> 4                         2.05
> >>>>>>>>> 8                         4.07
> >>>>>>>>> 16                        8.14
> >>>>>>>>> 32                       16.32
> >>>>>>>>> 64                       32.98
> >>>>>>>>> 128                      63.70
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to