Re: [OMPI devel] openmpi-2.0.0 - problems with ppc64, PGI and atomics

2016-09-02 Thread Jeff Squyres (jsquyres)
Issue filed at https://github.com/open-mpi/ompi/issues/2044.

I asked Nathan and Sylvain to have a look.


> On Sep 1, 2016, at 9:20 PM, Paul Hargrove  wrote:
> 
> I failed to get PGI 16.x working at all (licence issue, I think).
> So, I can neither confirm nor refute Geoffroy's reported problems.
> 
> -Paul
> 
> On Thu, Sep 1, 2016 at 6:15 PM, Vallee, Geoffroy R.  wrote:
> Interesting, I am having the problem with both 16.5 and 16.7.
> 
> My 2 cents,
> 
> > On Sep 1, 2016, at 8:25 PM, Paul Hargrove  wrote:
> >
> > FWIW I have not seen problems when testing the 2.0.1rc2 w/ PGI versions 
> > 12.10, 13.9, 14.3 or 15.9.
> >
> > I am going to test 2.0.2.rc3 ASAP and try to get PGI 16.4 coverage added in
> >
> > -Paul
> >
> > On Thu, Sep 1, 2016 at 12:48 PM, Jeff Squyres (jsquyres) 
> >  wrote:
> > Please send all the information on the build support page and open an issue 
> > at github.  Thanks.
> >
> >
> > > On Sep 1, 2016, at 3:41 PM, Vallee, Geoffroy R.  wrote:
> > >
> > > This is indeed a little better but still creating a problem:
> > >
> > >  CCLD opal_wrapper
> > > ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function 
> > > `_opal_progress_unregister':
> > > /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:459:
> > >  undefined reference to `opal_atomic_swap_64'
> > > ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function 
> > > `_opal_progress_register':
> > > /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:398:
> > >  undefined reference to `opal_atomic_swap_64'
> > > make[2]: *** [opal_wrapper] Error 2
> > > make[2]: Leaving directory 
> > > `/autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/tools/wrappers'
> > > make[1]: *** [all-recursive] Error 1
> > > make[1]: Leaving directory 
> > > `/autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal'
> > > make: *** [all-recursive] Error 1
> > >
> > > $ nm libopen-pal.a  | grep atomic
> > > U opal_atomic_cmpset_64
> > > 0ab0 t opal_atomic_cmpset_ptr
> > > U opal_atomic_wmb
> > > 0950 t opal_lifo_push_atomic
> > > U opal_atomic_cmpset_acq_32
> > > 03d0 t opal_atomic_lock
> > > 0450 t opal_atomic_unlock
> > > U opal_atomic_wmb
> > > U opal_atomic_ll_64
> > > U opal_atomic_sc_64
> > > U opal_atomic_wmb
> > > 1010 t opal_lifo_pop_atomic
> > > U opal_atomic_cmpset_acq_32
> > > 04b0 t opal_atomic_init
> > > 04e0 t opal_atomic_lock
> > > U opal_atomic_mb
> > > 0560 t opal_atomic_unlock
> > > U opal_atomic_wmb
> > > U opal_atomic_add_32
> > > U opal_atomic_cmpset_acq_32
> > > 0820 t opal_atomic_init
> > > 0850 t opal_atomic_lock
> > > U opal_atomic_sub_32
> > > U opal_atomic_swap_64
> > > 08d0 t opal_atomic_unlock
> > > U opal_atomic_wmb
> > > 0130 t opal_atomic_init
> > > atomic-asm.o:
> > > 0138 T opal_atomic_add_32
> > > 0018 T opal_atomic_cmpset_32
> > > 00c4 T opal_atomic_cmpset_64
> > > 003c T opal_atomic_cmpset_acq_32
> > > 00e8 T opal_atomic_cmpset_acq_64
> > > 0070 T opal_atomic_cmpset_rel_32
> > > 0110 T opal_atomic_cmpset_rel_64
> > >  T opal_atomic_mb
> > > 0008 T opal_atomic_rmb
> > > 0150 T opal_atomic_sub_32
> > > 0010 T opal_atomic_wmb
> > > 2280 t mca_base_pvar_is_atomic
> > > U opal_atomic_ll_64
> > > U opal_atomic_sc_64
> > > U opal_atomic_wmb
> > > 0900 t opal_lifo_pop_atomic
> > >
> > >> On Sep 1, 2016, at 3:16 PM, Jeff Squyres (jsquyres)  
> > >> wrote:
> > >>
> > >> Can you try the latest v2.0.1 nightly snapshot tarball?
> > >>
> > >>
> > >>> On Sep 1, 2016, at 2:56 PM, Vallee, Geoffroy R.  
> > >>> wrote:
> > >>>
> > >>> Hello,
> > >>>
> > >>> I get the following problem when we compile OpenMPI-2.0.0 (it seems to 
> > >>> be specific to 2.x; the problem did not appear with 1.10.x) with PGI:
> > >>>
> > >>> CCLD opal_wrapper
> > >>> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> > >>> `opal_atomic_sc_64'
> > >>> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> > >>> `opal_atomic_ll_64'
> > >>> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> > >>> `opal_atomic_swap_64'
> > >>> make[1]: *** [opal_wrapper] Error 2
> > >>>
> > >>> It is a little for me to pin point the exact problem but i can see the 
> > >>> following:
> > >>>
> > >>> $ nm ./.libs/libopen-pal.so | grep atomic
> > >>> 00026320 t 0017.plt_call.opal_atomic_add_32
> > >>> 00026250 t 0017.plt_call.opal_atomic_cmpset_32
> > >>> 00026780 t 0017.plt_c

Re: [OMPI devel] 2.0.1rc3 posted

2016-09-02 Thread Jeff Squyres (jsquyres)
> On Sep 1, 2016, at 8:42 PM, Gilles Gouaillardet  wrote:
> 
> Paul,
> 
> 
> I guess this was a typo, and you should either read
> 
> - Fix a SPARC alignment issue
> 
> or
> 
> - Fix an alignment issue on alignment sensitive processors such as SPARC

I did not copy and paste those bullets from NEWS -- that was just shorthand for 
us to know what was done since rc2; sorry for the confusion.

I fixed the bullet this morning to be:

- Fix alignment issues on SPARC platforms.

Good enough.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] 2.0.1rc3 posted

2016-09-02 Thread Paul Hargrove
I can confirm that 2.0.1rc2+patch *did* run correctly on Linux/SPARC.
I am running 2.0.1rc3 now, for completeness.

-Paul

On Fri, Sep 2, 2016 at 3:24 AM, Jeff Squyres (jsquyres) 
wrote:

> > On Sep 1, 2016, at 8:42 PM, Gilles Gouaillardet 
> wrote:
> >
> > Paul,
> >
> >
> > I guess this was a typo, and you should either read
> >
> > - Fix a SPARC alignment issue
> >
> > or
> >
> > - Fix an alignment issue on alignment sensitive processors such as SPARC
>
> I did not copy and paste those bullets from NEWS -- that was just
> shorthand for us to know what was done since rc2; sorry for the confusion.
>
> I fixed the bullet this morning to be:
>
> - Fix alignment issues on SPARC platforms.
>
> Good enough.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] 2.0.1rc3 posted

2016-09-02 Thread Paul Hargrove
All of my testing on 2.0.1rc3 is complete except for SPARC.
The alignment issue on SPARC *has* been tested via 2.0.1rc2 + patch (so
there is very low probability that 2.0.1rc3 would fail).

At this point I am aware of only two platforms that fail that we didn't
already know about:
+ OpenBSD-6.0 disallows the "patcher" call to mprotect() unless you make
the required animal sacrifice (or something close to it!)
+ Geoffroy's reported problems w/ atomics on PPC64 with PGI-16.x (yes, PGI
for PPC), which is issue open-mpi/ompi#2044

I mention these only for completeness, and don't advocate holding up a
2.0.1 release for either.

-Paul

On Thu, Sep 1, 2016 at 1:47 PM, Jeff Squyres (jsquyres) 
wrote:

> We're getting close.  Unless any showstoppers show up in the immediate
> future, we will likely be releasing this as 2.0.1:
>
> rc3 tarballs here:
>
> https://www.open-mpi.org/software/ompi/v2.0/
>
> Changes since rc2:
>
> - Fix COMM_SPAWN segv
> - Fix yalla bandwidth issue
> - Fix an OMPIO-related crash when using built-in datatypes
> - Fix a Solaris alignment issue
> - Fix a stdin problem
> - Fix a bunch of typos and make other updates in README
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] Question about Open MPI bindings

2016-09-02 Thread George Bosilca
While investigating the ongoing issue with OMPI messaging layer, I run into
some troubles with process binding. I read the documentation, but I still
find this puzzling.

Disclaimer: all experiments were done with current master (9c496f7)
compiled in optimized mode. The hardware: a single node 20 core
Xeon E5-2650 v3 (hwloc-ls is at the end of this email).

First and foremost, trying to bind to NUMA nodes was a sure way to get a
segfault:

$ mpirun -np 2 --mca btl vader,self --bind-to numa --report-bindings true
--
No objects of the specified type were found on at least one node:

  Type: NUMANode
  Node: arc00

The map cannot be done as specified.
--
[dancer:32162] *** Process received signal ***
[dancer:32162] Signal: Segmentation fault (11)
[dancer:32162] Signal code: Address not mapped (1)
[dancer:32162] Failing at address: 0x3c
[dancer:32162] [ 0] /lib64/libpthread.so.0[0x3126a0f7e0]
[dancer:32162] [ 1]
/home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(+0x560e0)[0x7f9075bc60e0]
[dancer:32162] [ 2]
/home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(orte_grpcomm_API_xcast+0x84)[0x7f9075bc6f54]
[dancer:32162] [ 3]
/home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(orte_plm_base_orted_exit+0x1a8)[0x7f9075bd9308]
[dancer:32162] [ 4]
/home/bosilca/opt/trunk/fast/lib/openmpi/mca_plm_rsh.so(+0x384e)[0x7f907361284e]
[dancer:32162] [ 5]
/home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(orte_state_base_check_all_complete+0x324)[0x7f9075bedca4]
[dancer:32162] [ 6]
/home/bosilca/opt/trunk/fast/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7f90758eafec]
[dancer:32162] [ 7] mpirun[0x401251]
[dancer:32162] [ 8] mpirun[0x400e24]
[dancer:32162] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x312621ed1d]
[dancer:32162] [10] mpirun[0x400d49]
[dancer:32162] *** End of error message ***
Segmentation fault

As you can see on the hwloc output below, there are 2 NUMA nodes on the
node and HWLOC correctly identifies them, making OMPI error message
confusing. Anyway, we should not segfault but report a more meaning error
message.

Binding to slot (I got this from the man page for 2.0) is apparently not
supported anymore. Reminder: We should update the manpage accordingly.

Trying to bind to core looks better, the application at least starts.
Unfortunately the reported bindings (or at least my understanding of these
bindings) are troubling. Assuming that the way we report the bindings is
correct, why are my processes assigned to 2 cores far apart each ?

$ mpirun -np 2 --mca btl vader,self --bind-to core --report-bindings true
[arc00:39350] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core
8[hwt 0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
[arc00:39350] MCW rank 1 bound to socket 0[core 1[hwt 0]], socket 0[core
9[hwt 0]]: [../B./../../../../../../../B.][../../../../../../../../../..]

Maybe because I only used the binding option. Adding the mapping to the mix
(--map-by option) seem hopeless, the binding remains unchanged for 2
processes.

$ mpirun -np 2 --mca btl vader,self --bind-to core --report-bindings true
[arc00:40401] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core
8[hwt 0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
[arc00:40401] MCW rank 1 bound to socket 0[core 1[hwt 0]], socket 0[core
9[hwt 0]]: [../B./../../../../../../../B.][../../../../../../../../../..]

At this point I really wondered what is going on. To clarify I tried to
launch 3 processes on the node. Bummer ! the reported binding shows that
one of my processes got assigned to cores on different sockets.

$ mpirun -np 3 --mca btl vader,self --bind-to core --report-bindings true
[arc00:40311] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core
8[hwt 0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
[arc00:40311] MCW rank 2 bound to socket 0[core 1[hwt 0]], socket 0[core
9[hwt 0]]: [../B./../../../../../../../B.][../../../../../../../../../..]
[arc00:40311] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket 1[core
12[hwt 0]]: [../../../../B./../../../../..][../../B./../../../../../../..]

Why is rank 1 on core 4 and rank 2 on core 1 ? Maybe specifying the mapping
will help. Will I get a more sensible binding (as suggested by our online
documentation and the man pages) ?

$ mpirun -np 3 --mca btl vader,self --bind-to core --map-by core
--report-bindings true
[arc00:40254] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core
8[hwt 0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
[arc00:40254] MCW rank 1 bound to socket 0[core 1[hwt 0]], socket 0[core
9[hwt 0]]: [../B./../../../../../../../B.][../../../../../../../../../..]
[arc00:40254] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket 1[core
10[hwt 0]]: [../../B./../../../../../../..][B./../../../../../../../../..]

There is a difference. T

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread r...@open-mpi.org
I’ll dig more later, but just checking offhand, I can’t replicate this on my 
box, so it may be something in hwloc for that box (or maybe you have some MCA 
params set somewhere?):

$ mpirun -n 2 --bind-to core --report-bindings hostname
[rhc001:83938] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: 
[BB/../../../../../../../../../../..][../../../../../../../../../../../..]
[rhc001:83938] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../../../../../../../..][../../../../../../../../../../../..]

$ mpirun -n 2 --bind-to numa --report-bindings hostname
[rhc001:83927] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 
0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], 
socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 
0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]: 
[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
[rhc001:83927] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 
0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], 
socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 
0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]: 
[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]


$ mpirun -n 2 --bind-to socket --report-bindings hostname
[rhc001:83965] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 
0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], 
socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 
0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]: 
[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
[rhc001:83965] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 
0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], 
socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 
0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]: 
[BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]


I have seen the segfault when something fails early in the setup procedure - 
planned to fix that this weekend.


> On Sep 2, 2016, at 9:09 PM, George Bosilca  wrote:
> 
> While investigating the ongoing issue with OMPI messaging layer, I run into 
> some troubles with process binding. I read the documentation, but I still 
> find this puzzling.
> 
> Disclaimer: all experiments were done with current master (9c496f7) compiled 
> in optimized mode. The hardware: a single node 20 core Xeon E5-2650 v3 
> (hwloc-ls is at the end of this email).
> 
> First and foremost, trying to bind to NUMA nodes was a sure way to get a 
> segfault:
> 
> $ mpirun -np 2 --mca btl vader,self --bind-to numa --report-bindings true
> --
> No objects of the specified type were found on at least one node:
> 
>   Type: NUMANode
>   Node: arc00
> 
> The map cannot be done as specified.
> --
> [dancer:32162] *** Process received signal ***
> [dancer:32162] Signal: Segmentation fault (11)
> [dancer:32162] Signal code: Address not mapped (1)
> [dancer:32162] Failing at address: 0x3c
> [dancer:32162] [ 0] /lib64/libpthread.so.0[0x3126a0f7e0]
> [dancer:32162] [ 1] 
> /home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(+0x560e0)[0x7f9075bc60e0]
> [dancer:32162] [ 2] 
> /home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(orte_grpcomm_API_xcast+0x84)[0x7f9075bc6f54]
> [dancer:32162] [ 3] 
> /home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(orte_plm_base_orted_exit+0x1a8)[0x7f9075bd9308]
> [dancer:32162] [ 4] 
> /home/bosilca/opt/trunk/fast/lib/openmpi/mca_plm_rsh.so(+0x384e)[0x7f907361284e]
> [dancer:32162] [ 5] 
> /home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(orte_state_base_check_all_complete+0x324)[0x7f9075bedca4]
> [dancer:32162] [ 6] 
> /home/bosilca/opt/trunk/fast/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7f90758eafec]
> [dancer:32162] [ 7] mpirun[0x401251]
> [dancer:32162] [ 8] mpirun[0x400e24]
> [dancer:32162] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x312621ed1d]
> [dancer:32162] [10] mpirun[0x400d49]
> [dancer:32162] *** End of error message ***
> Segmentation fault
> 
> As you can see on the hwloc output below, there are 2 NUMA nodes on the node 
> and HWLOC correctly identifies them, making OMPI error message confusing. 
> Anyway, we should not segfault but report a more meaning error message.
> 
> Binding to slot (I got this from the man page for 2.0) is apparently not 
> supported anymore. Reminder: We should u

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread Gilles Gouaillardet
George,

I cannot help much with this i am afraid

My best bet would be to rebuild OpenMPI with --enable-debug and an external 
recent hwloc (iirc hwloc v2 cannot be used in Open MPI yet)

You might also want to try
mpirun --tag-output --bind-to xxx --report-bindings grep Cpus_allowed_list 
/proc/self/status

So you can confirm both openmpi and /proc/self/status report the same thing

Hope this helps a bit ...

Gilles

George Bosilca  wrote:
>While investigating the ongoing issue with OMPI messaging layer, I run into 
>some troubles with process binding. I read the documentation, but I still find 
>this puzzling.
>
>
>Disclaimer: all experiments were done with current master (9c496f7) compiled 
>in optimized mode. The hardware: a single node 20 core Xeon E5-2650 v3 
>(hwloc-ls is at the end of this email).
>
>
>First and foremost, trying to bind to NUMA nodes was a sure way to get a 
>segfault:
>
>
>$ mpirun -np 2 --mca btl vader,self --bind-to numa --report-bindings true
>
>--
>
>No objects of the specified type were found on at least one node:
>
>
>  Type: NUMANode
>
>  Node: arc00
>
>
>The map cannot be done as specified.
>
>--
>
>[dancer:32162] *** Process received signal ***
>
>[dancer:32162] Signal: Segmentation fault (11)
>
>[dancer:32162] Signal code: Address not mapped (1)
>
>[dancer:32162] Failing at address: 0x3c
>
>[dancer:32162] [ 0] /lib64/libpthread.so.0[0x3126a0f7e0]
>
>[dancer:32162] [ 1] 
>/home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(+0x560e0)[0x7f9075bc60e0]
>
>[dancer:32162] [ 2] 
>/home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(orte_grpcomm_API_xcast+0x84)[0x7f9075bc6f54]
>
>[dancer:32162] [ 3] 
>/home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(orte_plm_base_orted_exit+0x1a8)[0x7f9075bd9308]
>
>[dancer:32162] [ 4] 
>/home/bosilca/opt/trunk/fast/lib/openmpi/mca_plm_rsh.so(+0x384e)[0x7f907361284e]
>
>[dancer:32162] [ 5] 
>/home/bosilca/opt/trunk/fast/lib/libopen-rte.so.0(orte_state_base_check_all_complete+0x324)[0x7f9075bedca4]
>
>[dancer:32162] [ 6] 
>/home/bosilca/opt/trunk/fast/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7f90758eafec]
>
>[dancer:32162] [ 7] mpirun[0x401251]
>
>[dancer:32162] [ 8] mpirun[0x400e24]
>
>[dancer:32162] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x312621ed1d]
>
>[dancer:32162] [10] mpirun[0x400d49]
>
>[dancer:32162] *** End of error message ***
>
>Segmentation fault
>
>
>As you can see on the hwloc output below, there are 2 NUMA nodes on the node 
>and HWLOC correctly identifies them, making OMPI error message confusing. 
>Anyway, we should not segfault but report a more meaning error message.
>
>
>Binding to slot (I got this from the man page for 2.0) is apparently not 
>supported anymore. Reminder: We should update the manpage accordingly.
>
>
>Trying to bind to core looks better, the application at least starts. 
>Unfortunately the reported bindings (or at least my understanding of these 
>bindings) are troubling. Assuming that the way we report the bindings is 
>correct, why are my processes assigned to 2 cores far apart each ?
>
>
>$ mpirun -np 2 --mca btl vader,self --bind-to core --report-bindings true
>
>[arc00:39350] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 8[hwt 
>0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
>
>[arc00:39350] MCW rank 1 bound to socket 0[core 1[hwt 0]], socket 0[core 9[hwt 
>0]]: [../B./../../../../../../../B.][../../../../../../../../../..]
>
>
>Maybe because I only used the binding option. Adding the mapping to the mix 
>(--map-by option) seem hopeless, the binding remains unchanged for 2 processes.
>
>
>$ mpirun -np 2 --mca btl vader,self --bind-to core --report-bindings true
>
>[arc00:40401] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 8[hwt 
>0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
>
>[arc00:40401] MCW rank 1 bound to socket 0[core 1[hwt 0]], socket 0[core 9[hwt 
>0]]: [../B./../../../../../../../B.][../../../../../../../../../..]
>
>
>At this point I really wondered what is going on. To clarify I tried to launch 
>3 processes on the node. Bummer ! the reported binding shows that one of my 
>processes got assigned to cores on different sockets.
>
>
>$ mpirun -np 3 --mca btl vader,self --bind-to core --report-bindings true
>
>[arc00:40311] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 8[hwt 
>0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
>
>[arc00:40311] MCW rank 2 bound to socket 0[core 1[hwt 0]], socket 0[core 9[hwt 
>0]]: [../B./../../../../../../../B.][../../../../../../../../../..]
>
>[arc00:40311] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket 1[core 
>12[hwt 0]]: [../../../../B./../../../../..][../../B./../../../../../../..]
>
>
>Why is rank 1 on core 4 and rank 2 on core 1 ? Maybe specifying the mapping 
>will help. Will I get 

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread George Bosilca
On Sat, Sep 3, 2016 at 12:18 AM, r...@open-mpi.org  wrote:

> I’ll dig more later, but just checking offhand, I can’t replicate this on
> my box, so it may be something in hwloc for that box (or maybe you have
> some MCA params set somewhere?):
>

Yes, I have 2 MCA parameters set (orte_default_hostfile and
state_novm_select), but I don't think they are expected to affect the
bindings. Or are they ?

  George.



> $ mpirun -n 2 --bind-to core --report-bindings hostname
> [rhc001:83938] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
> [BB/../../../../../../../../../../..][../../../../../../../../../../../..]
> [rhc001:83938] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]:
> [../BB/../../../../../../../../../..][../../../../../../../../../../../..]
>
> $ mpirun -n 2 --bind-to numa --report-bindings hostname
> [rhc001:83927] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket
> 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
> socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
> 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
> 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]:
> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
> [rhc001:83927] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket
> 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
> socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
> 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
> 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]:
> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
>
>
> $ mpirun -n 2 --bind-to socket --report-bindings hostname
> [rhc001:83965] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket
> 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
> socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
> 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
> 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]:
> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
> [rhc001:83965] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket
> 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]],
> socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt
> 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core
> 9[hwt 0-1]], socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]:
> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
>
>
> I have seen the segfault when something fails early in the setup procedure
> - planned to fix that this weekend.
>
>
> On Sep 2, 2016, at 9:09 PM, George Bosilca  wrote:
>
> While investigating the ongoing issue with OMPI messaging layer, I run
> into some troubles with process binding. I read the documentation, but I
> still find this puzzling.
>
> Disclaimer: all experiments were done with current master (9c496f7)
> compiled in optimized mode. The hardware: a single node 20 core
> Xeon E5-2650 v3 (hwloc-ls is at the end of this email).
>
> First and foremost, trying to bind to NUMA nodes was a sure way to get a
> segfault:
>
> $ mpirun -np 2 --mca btl vader,self --bind-to numa --report-bindings true
> --
> No objects of the specified type were found on at least one node:
>
>   Type: NUMANode
>   Node: arc00
>
> The map cannot be done as specified.
> --
> [dancer:32162] *** Process received signal ***
> [dancer:32162] Signal: Segmentation fault (11)
> [dancer:32162] Signal code: Address not mapped (1)
> [dancer:32162] Failing at address: 0x3c
> [dancer:32162] [ 0] /lib64/libpthread.so.0[0x3126a0f7e0]
> [dancer:32162] [ 1] /home/bosilca/opt/trunk/fast/
> lib/libopen-rte.so.0(+0x560e0)[0x7f9075bc60e0]
> [dancer:32162] [ 2] /home/bosilca/opt/trunk/fast/
> lib/libopen-rte.so.0(orte_grpcomm_API_xcast+0x84)[0x7f9075bc6f54]
> [dancer:32162] [ 3] /home/bosilca/opt/trunk/fast/
> lib/libopen-rte.so.0(orte_plm_base_orted_exit+0x1a8)[0x7f9075bd9308]
> [dancer:32162] [ 4] /home/bosilca/opt/trunk/fast/
> lib/openmpi/mca_plm_rsh.so(+0x384e)[0x7f907361284e]
> [dancer:32162] [ 5] /home/bosilca/opt/trunk/fast/
> lib/libopen-rte.so.0(orte_state_base_check_all_complete+
> 0x324)[0x7f9075bedca4]
> [dancer:32162] [ 6] /home/bosilca/opt/trunk/fast/
> lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+
> 0x53c)[0x7f90758eafec]
> [dancer:32162] [ 7] mpirun[0x401251]
> [dancer:32162] [ 8] mpirun[0x400e24]
> [dancer:32162] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x312621ed1d]
> [dancer:32162] [10] mpirun[0x400d49]
> [dancer:32162] *** End of error message ***
> Segmentation fault
>
> As you can see on the hwloc output below, there are 2 NUMA nodes on the

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread George Bosilca
Thanks Gilles, that's a very useful trick. The bindings reported by ORTE
are in sync with the one reported by the OS.

$ mpirun -np 2 --tag-output --bind-to core --report-bindings grep
Cpus_allowed_list /proc/self/status
[1,0]:[arc00:90813] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket 0[core 4[hwt 0]]:
[B./../../../B./../../../../..][../../../../../../../../../..]
[1,1]:[arc00:90813] MCW rank 1 bound to socket 1[core 10[hwt 0]],
socket 1[core 14[hwt 0]]:
[../../../../../../../../../..][B./../../../B./../../../../..]
[1,0]:Cpus_allowed_list:0,8
[1,1]:Cpus_allowed_list:1,9

George.



On Sat, Sep 3, 2016 at 12:27 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> George,
>
> I cannot help much with this i am afraid
>
> My best bet would be to rebuild OpenMPI with --enable-debug and an
> external recent hwloc (iirc hwloc v2 cannot be used in Open MPI yet)
>
> You might also want to try
> mpirun --tag-output --bind-to xxx --report-bindings grep Cpus_allowed_list
> /proc/self/status
>
> So you can confirm both openmpi and /proc/self/status report the same thing
>
> Hope this helps a bit ...
>
> Gilles
>
>
> George Bosilca  wrote:
> While investigating the ongoing issue with OMPI messaging layer, I run
> into some troubles with process binding. I read the documentation, but I
> still find this puzzling.
>
> Disclaimer: all experiments were done with current master (9c496f7)
> compiled in optimized mode. The hardware: a single node 20 core
> Xeon E5-2650 v3 (hwloc-ls is at the end of this email).
>
> First and foremost, trying to bind to NUMA nodes was a sure way to get a
> segfault:
>
> $ mpirun -np 2 --mca btl vader,self --bind-to numa --report-bindings true
> --
> No objects of the specified type were found on at least one node:
>
>   Type: NUMANode
>   Node: arc00
>
> The map cannot be done as specified.
> --
> [dancer:32162] *** Process received signal ***
> [dancer:32162] Signal: Segmentation fault (11)
> [dancer:32162] Signal code: Address not mapped (1)
> [dancer:32162] Failing at address: 0x3c
> [dancer:32162] [ 0] /lib64/libpthread.so.0[0x3126a0f7e0]
> [dancer:32162] [ 1] /home/bosilca/opt/trunk/fast/
> lib/libopen-rte.so.0(+0x560e0)[0x7f9075bc60e0]
> [dancer:32162] [ 2] /home/bosilca/opt/trunk/fast/
> lib/libopen-rte.so.0(orte_grpcomm_API_xcast+0x84)[0x7f9075bc6f54]
> [dancer:32162] [ 3] /home/bosilca/opt/trunk/fast/
> lib/libopen-rte.so.0(orte_plm_base_orted_exit+0x1a8)[0x7f9075bd9308]
> [dancer:32162] [ 4] /home/bosilca/opt/trunk/fast/
> lib/openmpi/mca_plm_rsh.so(+0x384e)[0x7f907361284e]
> [dancer:32162] [ 5] /home/bosilca/opt/trunk/fast/
> lib/libopen-rte.so.0(orte_state_base_check_all_complete+
> 0x324)[0x7f9075bedca4]
> [dancer:32162] [ 6] /home/bosilca/opt/trunk/fast/
> lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+
> 0x53c)[0x7f90758eafec]
> [dancer:32162] [ 7] mpirun[0x401251]
> [dancer:32162] [ 8] mpirun[0x400e24]
> [dancer:32162] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x312621ed1d]
> [dancer:32162] [10] mpirun[0x400d49]
> [dancer:32162] *** End of error message ***
> Segmentation fault
>
> As you can see on the hwloc output below, there are 2 NUMA nodes on the
> node and HWLOC correctly identifies them, making OMPI error message
> confusing. Anyway, we should not segfault but report a more meaning error
> message.
>
> Binding to slot (I got this from the man page for 2.0) is apparently not
> supported anymore. Reminder: We should update the manpage accordingly.
>
> Trying to bind to core looks better, the application at least starts.
> Unfortunately the reported bindings (or at least my understanding of these
> bindings) are troubling. Assuming that the way we report the bindings is
> correct, why are my processes assigned to 2 cores far apart each ?
>
> $ mpirun -np 2 --mca btl vader,self --bind-to core --report-bindings true
> [arc00:39350] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core
> 8[hwt 0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
> [arc00:39350] MCW rank 1 bound to socket 0[core 1[hwt 0]], socket 0[core
> 9[hwt 0]]: [../B./../../../../../../../B.][../../../../../../../../../..]
>
> Maybe because I only used the binding option. Adding the mapping to the
> mix (--map-by option) seem hopeless, the binding remains unchanged for 2
> processes.
>
> $ mpirun -np 2 --mca btl vader,self --bind-to core --report-bindings true
> [arc00:40401] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core
> 8[hwt 0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
> [arc00:40401] MCW rank 1 bound to socket 0[core 1[hwt 0]], socket 0[core
> 9[hwt 0]]: [../B./../../../../../../../B.][../../../../../../../../../..]
>
> At this point I really wondered what is going on. To clarify I tried to
> launch 3 processes on the node. Bummer ! the reported binding sh

[OMPI devel] Question about Open MPI bindings

2016-09-02 Thread Gilles Gouaillardet
George,

did you mean to write *not* in sync instead ?

 note the ORTE output is different than the one you posted earlier
(though btl were differents)

as far as I understand, CPUs_allowed_list should really be
0,20
and
10,30

and in order to match the ORTE output, they should be
0,4
and
10,14

Cheers,

Gilles


On Saturday, September 3, 2016, George Bosilca > wrote:

> Thanks Gilles, that's a very useful trick. The bindings reported by ORTE
> are in sync with the one reported by the OS.
>
> $ mpirun -np 2 --tag-output --bind-to core --report-bindings grep
> Cpus_allowed_list /proc/self/status
> [1,0]:[arc00:90813] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> socket 0[core 4[hwt 0]]: [B./../../../B./../../../../..
> ][../../../../../../../../../..]
> [1,1]:[arc00:90813] MCW rank 1 bound to socket 1[core 10[hwt 0]],
> socket 1[core 14[hwt 0]]: [../../../../../../../../../..
> ][B./../../../B./../../../../..]
> [1,0]:Cpus_allowed_list:0,8
> [1,1]:Cpus_allowed_list:1,9
>
> George.
>
>
>
> On Sat, Sep 3, 2016 at 12:27 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> George,
>>
>> I cannot help much with this i am afraid
>>
>> My best bet would be to rebuild OpenMPI with --enable-debug and an
>> external recent hwloc (iirc hwloc v2 cannot be used in Open MPI yet)
>>
>> You might also want to try
>> mpirun --tag-output --bind-to xxx --report-bindings grep
>> Cpus_allowed_list /proc/self/status
>>
>> So you can confirm both openmpi and /proc/self/status report the same
>> thing
>>
>> Hope this helps a bit ...
>>
>> Gilles
>>
>>
>> George Bosilca  wrote:
>> While investigating the ongoing issue with OMPI messaging layer, I run
>> into some troubles with process binding. I read the documentation, but I
>> still find this puzzling.
>>
>> Disclaimer: all experiments were done with current master (9c496f7)
>> compiled in optimized mode. The hardware: a single node 20 core
>> Xeon E5-2650 v3 (hwloc-ls is at the end of this email).
>>
>> First and foremost, trying to bind to NUMA nodes was a sure way to get a
>> segfault:
>>
>> $ mpirun -np 2 --mca btl vader,self --bind-to numa --report-bindings true
>> 
>> --
>> No objects of the specified type were found on at least one node:
>>
>>   Type: NUMANode
>>   Node: arc00
>>
>> The map cannot be done as specified.
>> 
>> --
>> [dancer:32162] *** Process received signal ***
>> [dancer:32162] Signal: Segmentation fault (11)
>> [dancer:32162] Signal code: Address not mapped (1)
>> [dancer:32162] Failing at address: 0x3c
>> [dancer:32162] [ 0] /lib64/libpthread.so.0[0x3126a0f7e0]
>> [dancer:32162] [ 1] /home/bosilca/opt/trunk/fast/l
>> ib/libopen-rte.so.0(+0x560e0)[0x7f9075bc60e0]
>> [dancer:32162] [ 2] /home/bosilca/opt/trunk/fast/l
>> ib/libopen-rte.so.0(orte_grpcomm_API_xcast+0x84)[0x7f9075bc6f54]
>> [dancer:32162] [ 3] /home/bosilca/opt/trunk/fast/l
>> ib/libopen-rte.so.0(orte_plm_base_orted_exit+0x1a8)[0x7f9075bd9308]
>> [dancer:32162] [ 4] /home/bosilca/opt/trunk/fast/l
>> ib/openmpi/mca_plm_rsh.so(+0x384e)[0x7f907361284e]
>> [dancer:32162] [ 5] /home/bosilca/opt/trunk/fast/l
>> ib/libopen-rte.so.0(orte_state_base_check_all_complete+0x324
>> )[0x7f9075bedca4]
>> [dancer:32162] [ 6] /home/bosilca/opt/trunk/fast/l
>> ib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)
>> [0x7f90758eafec]
>> [dancer:32162] [ 7] mpirun[0x401251]
>> [dancer:32162] [ 8] mpirun[0x400e24]
>> [dancer:32162] [ 9] /lib64/libc.so.6(__libc_start_
>> main+0xfd)[0x312621ed1d]
>> [dancer:32162] [10] mpirun[0x400d49]
>> [dancer:32162] *** End of error message ***
>> Segmentation fault
>>
>> As you can see on the hwloc output below, there are 2 NUMA nodes on the
>> node and HWLOC correctly identifies them, making OMPI error message
>> confusing. Anyway, we should not segfault but report a more meaning error
>> message.
>>
>> Binding to slot (I got this from the man page for 2.0) is apparently not
>> supported anymore. Reminder: We should update the manpage accordingly.
>>
>> Trying to bind to core looks better, the application at least starts.
>> Unfortunately the reported bindings (or at least my understanding of these
>> bindings) are troubling. Assuming that the way we report the bindings is
>> correct, why are my processes assigned to 2 cores far apart each ?
>>
>> $ mpirun -np 2 --mca btl vader,self --bind-to core --report-bindings true
>> [arc00:39350] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core
>> 8[hwt 0]]: [B./../../../../../../../B./..][../../../../../../../../../..]
>> [arc00:39350] MCW rank 1 bound to socket 0[core 1[hwt 0]], socket 0[core
>> 9[hwt 0]]: [../B./../../../../../../../B.][../../../../../../../../../..]
>>
>> Maybe because I only used the binding option. Adding the mapping to the
>> mix (--map-by option) seem hopeless, the binding remains unchanged for 2
>> processes.
>>
>> $ mp