Re: [OMPI users] User-built OpenMPI 3.0.1 segfaults when storing into an atomic 128-bit variable

2018-05-04 Thread Jeff Hammond
On Thu, May 3, 2018 at 11:23 PM, Nathan Hjelm  wrote:

> Not saying we won't change the behavior. Just saying the user can't expect
> a particular alignment as there is no guarantee in the standard.  In Open
> MPI we just don't bother to align the pointer so right now so it naturally
> aligns as 64-bit. It isn't about wasting memory.
>

We should add an info key for alignment.  It's pretty silly we don't have
one already, given how windows are allocated.

At the very least, MPI-3 implementations should allocate all windows to
128b on x86_64 in order to allow MPI_Accumulate(MPI_COMPLEX_DOUBLE) to use
CMPXCHG16.

It's pretty lame for MPI_Alloc_mem to be worse than malloc.  We should fix
this in the standard.  MPI should not be breaking the ABI behavior when
substituted for the system allocator.


> Also remember that by default the shared memory regions are contiguous
> across local ranks so each rank will get a buffer alignment dictated by the
> sizes of the allocations specified by the prior ranks in addition to the
> zero rank buffer alignment.
>

If a user is allocating std::atomic<__int128>, every element will be
128b-aligned if the base is.  Noncontiguous is actually worse in that the
implementation could allocate the segment for each process with only 64b
alignment.

Jeff


> -Nathan
>
> On May 3, 2018, at 9:43 PM, Jeff Hammond  wrote:
>
> Given that this seems to break user experience on a relatively frequent
> basis, I’d like to know the compelling reason why MPI implementers aren’t
> willing to do something utterly trivial to fix it.
>
> And don’t tell me that 16B alignment wastes memory versus 8B alignment.
> Open-MPI “wastes” 4B relative to MPICH for every handle on I32LP64 systems.
> The internal state associated with MPI allocations - particularly windows -
> is bigger than 8B.  I recall ptmalloc uses something like 32B per heap
> allocation.
>
> Jeff
>
> On Thu, May 3, 2018 at 5:20 PM Nathan Hjelm  wrote:
>
>> That is probably it. When there are 4 ranks there are 4 int64’s just
>> before the user data (for PSCW). With 1 rank we don’t even bother, its just
>> malloc (16-byte aligned). With any other odd number of ranks the user data
>> is after an odd number of int64’s and is 8-byte aligned. There is no
>> requirement in MPI to provide 16-byte alignment (which is required for
>> _Atomic __int128 because of the alignment requirement of cmpxchg16b) so you
>> have to align it yourself.
>>
>> -Nathan
>>
>> > On May 3, 2018, at 2:16 PM, Joseph Schuchart  wrote:
>> >
>> > Martin,
>> >
>> > You say that you allocate shared memory, do you mean shared memory
>> windows? If so, this could be the reason: https://github.com/open-mpi/om
>> pi/issues/4952
>> >
>> > The alignment of memory allocated for MPI windows is not suitable for
>> 128bit values in Open MPI (only 8-byte alignment is guaranteed atm). I have
>> seen the alignment change depending on the number of processes. Could you
>> check the alignment of the memory you are trying to access?
>> >
>> > Cheers,
>> > Joseph
>> >
>> > On 05/03/2018 08:48 PM, Martin Böhm wrote:
>> >> Dear all,
>> >> I have a problem with a segfault on a user-built OpenMPI 3.0.1 running
>> on Ubuntu 16.04
>> >> in local mode (only processes on a single computer).
>> >> The problem manifests itself as a segfault when allocating shared
>> memory for
>> >> (at least) one 128-bit atomic variable (say std::atomic<__int128>) and
>> then
>> >> storing into this variable. This error only occurs when there are at
>> least two
>> >> processes, even though only the process of shared rank 0 is the one
>> doing both
>> >> the allocation and the writing. The compile flags -march=core2 or
>> -march=native
>> >> or -latomic were tried, none of which helped.
>> >> An example of the code that triggers it on my computers is this:
>> >> https://github.com/bohm/binstretch/blob/parallel-classic/
>> algorithm/bug.cpp
>> >> The code works fine with mpirun -np 1 and segfaults with mpirun -np 2,
>> 3 and 4;
>> >> if line 41 is commented out (the 128-bit atomic write), everything
>> works fine
>> >> with -np 2 or more.
>> >> As for Ubuntu's stock package containing OpenMPI 1.10.2, the code
>> segfaults with
>> >> "-np 2" and "-np 3" but not "-np 1" or "-np 4".
>> >> Thank you for any assistance concerning this problem. I would suspect
>> my own
>> >> code to be the most likely culprit, since it triggers on both the
>> stock package
>> >> and custom-built OpenMPI.
>> >> I attach the config.log.bz2 and ompi_info.log. Below I list some runs
>> of the program
>> >> and what errors are produced.
>> >> Thank you for any assistance. I have tried googling and searching the
>> mailing list
>> >> for this problem; if I missed something, I apologize.
>> >> Martin Böhm
>> >> - Ubuntu 16.04 stock mpirun and mpic++ -
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 1
>> ../tests/bug
>> >> ex1 success.
>> >> ex2 

Re: [OMPI users] User-built OpenMPI 3.0.1 segfaults when storing into an atomic 128-bit variable

2018-05-04 Thread Nathan Hjelm
Not saying we won't change the behavior. Just saying the user can't expect a 
particular alignment as there is no guarantee in the standard.  In Open MPI we 
just don't bother to align the pointer so right now so it naturally aligns as 
64-bit. It isn't about wasting memory.

Also remember that by default the shared memory regions are contiguous across 
local ranks so each rank will get a buffer alignment dictated by the sizes of 
the allocations specified by the prior ranks in addition to the zero rank 
buffer alignment.

-Nathan

> On May 3, 2018, at 9:43 PM, Jeff Hammond  wrote:
> 
> Given that this seems to break user experience on a relatively frequent 
> basis, I’d like to know the compelling reason why MPI implementers aren’t 
> willing to do something utterly trivial to fix it.
> 
> And don’t tell me that 16B alignment wastes memory versus 8B alignment. 
> Open-MPI “wastes” 4B relative to MPICH for every handle on I32LP64 systems. 
> The internal state associated with MPI allocations - particularly windows - 
> is bigger than 8B.  I recall ptmalloc uses something like 32B per heap 
> allocation.
> 
> Jeff
> 
>> On Thu, May 3, 2018 at 5:20 PM Nathan Hjelm  wrote:
>> That is probably it. When there are 4 ranks there are 4 int64’s just before 
>> the user data (for PSCW). With 1 rank we don’t even bother, its just malloc 
>> (16-byte aligned). With any other odd number of ranks the user data is after 
>> an odd number of int64’s and is 8-byte aligned. There is no requirement in 
>> MPI to provide 16-byte alignment (which is required for _Atomic __int128 
>> because of the alignment requirement of cmpxchg16b) so you have to align it 
>> yourself.
>> 
>> -Nathan
>> 
>> > On May 3, 2018, at 2:16 PM, Joseph Schuchart  wrote:
>> > 
>> > Martin,
>> > 
>> > You say that you allocate shared memory, do you mean shared memory 
>> > windows? If so, this could be the reason: 
>> > https://github.com/open-mpi/ompi/issues/4952
>> > 
>> > The alignment of memory allocated for MPI windows is not suitable for 
>> > 128bit values in Open MPI (only 8-byte alignment is guaranteed atm). I 
>> > have seen the alignment change depending on the number of processes. Could 
>> > you check the alignment of the memory you are trying to access?
>> > 
>> > Cheers,
>> > Joseph
>> > 
>> > On 05/03/2018 08:48 PM, Martin Böhm wrote:
>> >> Dear all,
>> >> I have a problem with a segfault on a user-built OpenMPI 3.0.1 running on 
>> >> Ubuntu 16.04
>> >> in local mode (only processes on a single computer).
>> >> The problem manifests itself as a segfault when allocating shared memory 
>> >> for
>> >> (at least) one 128-bit atomic variable (say std::atomic<__int128>) and 
>> >> then
>> >> storing into this variable. This error only occurs when there are at 
>> >> least two
>> >> processes, even though only the process of shared rank 0 is the one doing 
>> >> both
>> >> the allocation and the writing. The compile flags -march=core2 or 
>> >> -march=native
>> >> or -latomic were tried, none of which helped.
>> >> An example of the code that triggers it on my computers is this:
>> >> https://github.com/bohm/binstretch/blob/parallel-classic/algorithm/bug.cpp
>> >> The code works fine with mpirun -np 1 and segfaults with mpirun -np 2, 3 
>> >> and 4;
>> >> if line 41 is commented out (the 128-bit atomic write), everything works 
>> >> fine
>> >> with -np 2 or more.
>> >> As for Ubuntu's stock package containing OpenMPI 1.10.2, the code 
>> >> segfaults with
>> >> "-np 2" and "-np 3" but not "-np 1" or "-np 4".
>> >> Thank you for any assistance concerning this problem. I would suspect my 
>> >> own
>> >> code to be the most likely culprit, since it triggers on both the stock 
>> >> package
>> >> and custom-built OpenMPI.
>> >> I attach the config.log.bz2 and ompi_info.log. Below I list some runs of 
>> >> the program
>> >> and what errors are produced.
>> >> Thank you for any assistance. I have tried googling and searching the 
>> >> mailing list
>> >> for this problem; if I missed something, I apologize.
>> >> Martin Böhm
>> >> - Ubuntu 16.04 stock mpirun and mpic++ -
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 1 
>> >> ../tests/bug
>> >> ex1 success.
>> >> ex2 success.
>> >> Inserted into ex1.
>> >> Inserted into ex2.
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 4 
>> >> ../tests/bug
>> >> Thread 2: ex1 success.
>> >> Thread 3: ex1 success.
>> >> ex1 success.
>> >> Thread 1: ex1 success.
>> >> ex2 success.
>> >> Thread 2: ex2 success.
>> >> Thread 1: ex2 success.
>> >> Thread 3: ex2 success.
>> >> Inserted into ex1.
>> >> Inserted into ex2.
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 2 
>> >> ../tests/bug
>> >> Thread 1: ex1 success.
>> >> ex1 success.
>> >> Thread 1: ex2 success.
>> >> ex2 success.
>> >> Inserted into ex1.
>> >> [kamenice:13662] *** Process received signal ***
>> >> [kamenice:13662] 

Re: [OMPI users] User-built OpenMPI 3.0.1 segfaults when storing into an atomic 128-bit variable

2018-05-03 Thread Jeff Hammond
Given that this seems to break user experience on a relatively frequent
basis, I’d like to know the compelling reason why MPI implementers aren’t
willing to do something utterly trivial to fix it.

And don’t tell me that 16B alignment wastes memory versus 8B alignment.
Open-MPI “wastes” 4B relative to MPICH for every handle on I32LP64 systems.
The internal state associated with MPI allocations - particularly windows -
is bigger than 8B.  I recall ptmalloc uses something like 32B per heap
allocation.

Jeff

On Thu, May 3, 2018 at 5:20 PM Nathan Hjelm  wrote:

> That is probably it. When there are 4 ranks there are 4 int64’s just
> before the user data (for PSCW). With 1 rank we don’t even bother, its just
> malloc (16-byte aligned). With any other odd number of ranks the user data
> is after an odd number of int64’s and is 8-byte aligned. There is no
> requirement in MPI to provide 16-byte alignment (which is required for
> _Atomic __int128 because of the alignment requirement of cmpxchg16b) so you
> have to align it yourself.
>
> -Nathan
>
> > On May 3, 2018, at 2:16 PM, Joseph Schuchart  wrote:
> >
> > Martin,
> >
> > You say that you allocate shared memory, do you mean shared memory
> windows? If so, this could be the reason:
> https://github.com/open-mpi/ompi/issues/4952
> >
> > The alignment of memory allocated for MPI windows is not suitable for
> 128bit values in Open MPI (only 8-byte alignment is guaranteed atm). I have
> seen the alignment change depending on the number of processes. Could you
> check the alignment of the memory you are trying to access?
> >
> > Cheers,
> > Joseph
> >
> > On 05/03/2018 08:48 PM, Martin Böhm wrote:
> >> Dear all,
> >> I have a problem with a segfault on a user-built OpenMPI 3.0.1 running
> on Ubuntu 16.04
> >> in local mode (only processes on a single computer).
> >> The problem manifests itself as a segfault when allocating shared
> memory for
> >> (at least) one 128-bit atomic variable (say std::atomic<__int128>) and
> then
> >> storing into this variable. This error only occurs when there are at
> least two
> >> processes, even though only the process of shared rank 0 is the one
> doing both
> >> the allocation and the writing. The compile flags -march=core2 or
> -march=native
> >> or -latomic were tried, none of which helped.
> >> An example of the code that triggers it on my computers is this:
> >>
> https://github.com/bohm/binstretch/blob/parallel-classic/algorithm/bug.cpp
> >> The code works fine with mpirun -np 1 and segfaults with mpirun -np 2,
> 3 and 4;
> >> if line 41 is commented out (the 128-bit atomic write), everything
> works fine
> >> with -np 2 or more.
> >> As for Ubuntu's stock package containing OpenMPI 1.10.2, the code
> segfaults with
> >> "-np 2" and "-np 3" but not "-np 1" or "-np 4".
> >> Thank you for any assistance concerning this problem. I would suspect
> my own
> >> code to be the most likely culprit, since it triggers on both the stock
> package
> >> and custom-built OpenMPI.
> >> I attach the config.log.bz2 and ompi_info.log. Below I list some runs
> of the program
> >> and what errors are produced.
> >> Thank you for any assistance. I have tried googling and searching the
> mailing list
> >> for this problem; if I missed something, I apologize.
> >> Martin Böhm
> >> - Ubuntu 16.04 stock mpirun and mpic++ -
> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 1
> ../tests/bug
> >> ex1 success.
> >> ex2 success.
> >> Inserted into ex1.
> >> Inserted into ex2.
> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 4
> ../tests/bug
> >> Thread 2: ex1 success.
> >> Thread 3: ex1 success.
> >> ex1 success.
> >> Thread 1: ex1 success.
> >> ex2 success.
> >> Thread 2: ex2 success.
> >> Thread 1: ex2 success.
> >> Thread 3: ex2 success.
> >> Inserted into ex1.
> >> Inserted into ex2.
> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 2
> ../tests/bug
> >> Thread 1: ex1 success.
> >> ex1 success.
> >> Thread 1: ex2 success.
> >> ex2 success.
> >> Inserted into ex1.
> >> [kamenice:13662] *** Process received signal ***
> >> [kamenice:13662] Signal: Segmentation fault (11)
> >> [kamenice:13662] Signal code:  (128)
> >> [kamenice:13662] Failing at address: (nil)
> >> [kamenice:13662] [ 0]
> /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f31773844b0]
> >> [kamenice:13662] [ 1] ../tests/bug[0x40d8ac]
> >> [kamenice:13662] [ 2] ../tests/bug[0x408997]
> >> [kamenice:13662] [ 3] ../tests/bug[0x408bf0]
> >> [kamenice:13662] [ 4]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f317736f830]
> >> [kamenice:13662] [ 5] ../tests/bug[0x4086e9]
> >> [kamenice:13662] *** End of error message ***
> >> - Ubuntu 16.04 custom-compiled OpenMPI 3.0.1, installed to
> /usr/local (the stock packages were uninstalled) -
> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 1
> ../tests/bug
> >> ex1 success.
> >> ex2 success.
> >> Inserted into 

Re: [OMPI users] User-built OpenMPI 3.0.1 segfaults when storing into an atomic 128-bit variable

2018-05-03 Thread Nathan Hjelm
That is probably it. When there are 4 ranks there are 4 int64’s just before the 
user data (for PSCW). With 1 rank we don’t even bother, its just malloc 
(16-byte aligned). With any other odd number of ranks the user data is after an 
odd number of int64’s and is 8-byte aligned. There is no requirement in MPI to 
provide 16-byte alignment (which is required for _Atomic __int128 because of 
the alignment requirement of cmpxchg16b) so you have to align it yourself.

-Nathan

> On May 3, 2018, at 2:16 PM, Joseph Schuchart  wrote:
> 
> Martin,
> 
> You say that you allocate shared memory, do you mean shared memory windows? 
> If so, this could be the reason: https://github.com/open-mpi/ompi/issues/4952
> 
> The alignment of memory allocated for MPI windows is not suitable for 128bit 
> values in Open MPI (only 8-byte alignment is guaranteed atm). I have seen the 
> alignment change depending on the number of processes. Could you check the 
> alignment of the memory you are trying to access?
> 
> Cheers,
> Joseph
> 
> On 05/03/2018 08:48 PM, Martin Böhm wrote:
>> Dear all,
>> I have a problem with a segfault on a user-built OpenMPI 3.0.1 running on 
>> Ubuntu 16.04
>> in local mode (only processes on a single computer).
>> The problem manifests itself as a segfault when allocating shared memory for
>> (at least) one 128-bit atomic variable (say std::atomic<__int128>) and then
>> storing into this variable. This error only occurs when there are at least 
>> two
>> processes, even though only the process of shared rank 0 is the one doing 
>> both
>> the allocation and the writing. The compile flags -march=core2 or 
>> -march=native
>> or -latomic were tried, none of which helped.
>> An example of the code that triggers it on my computers is this:
>> https://github.com/bohm/binstretch/blob/parallel-classic/algorithm/bug.cpp
>> The code works fine with mpirun -np 1 and segfaults with mpirun -np 2, 3 and 
>> 4;
>> if line 41 is commented out (the 128-bit atomic write), everything works fine
>> with -np 2 or more.
>> As for Ubuntu's stock package containing OpenMPI 1.10.2, the code segfaults 
>> with
>> "-np 2" and "-np 3" but not "-np 1" or "-np 4".
>> Thank you for any assistance concerning this problem. I would suspect my own
>> code to be the most likely culprit, since it triggers on both the stock 
>> package
>> and custom-built OpenMPI.
>> I attach the config.log.bz2 and ompi_info.log. Below I list some runs of the 
>> program
>> and what errors are produced.
>> Thank you for any assistance. I have tried googling and searching the 
>> mailing list
>> for this problem; if I missed something, I apologize.
>> Martin Böhm
>> - Ubuntu 16.04 stock mpirun and mpic++ -
>> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 1 ../tests/bug
>> ex1 success.
>> ex2 success.
>> Inserted into ex1.
>> Inserted into ex2.
>> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 4 ../tests/bug
>> Thread 2: ex1 success.
>> Thread 3: ex1 success.
>> ex1 success.
>> Thread 1: ex1 success.
>> ex2 success.
>> Thread 2: ex2 success.
>> Thread 1: ex2 success.
>> Thread 3: ex2 success.
>> Inserted into ex1.
>> Inserted into ex2.
>> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 2 ../tests/bug
>> Thread 1: ex1 success.
>> ex1 success.
>> Thread 1: ex2 success.
>> ex2 success.
>> Inserted into ex1.
>> [kamenice:13662] *** Process received signal ***
>> [kamenice:13662] Signal: Segmentation fault (11)
>> [kamenice:13662] Signal code:  (128)
>> [kamenice:13662] Failing at address: (nil)
>> [kamenice:13662] [ 0] 
>> /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f31773844b0]
>> [kamenice:13662] [ 1] ../tests/bug[0x40d8ac]
>> [kamenice:13662] [ 2] ../tests/bug[0x408997]
>> [kamenice:13662] [ 3] ../tests/bug[0x408bf0]
>> [kamenice:13662] [ 4] 
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f317736f830]
>> [kamenice:13662] [ 5] ../tests/bug[0x4086e9]
>> [kamenice:13662] *** End of error message ***
>> - Ubuntu 16.04 custom-compiled OpenMPI 3.0.1, installed to /usr/local 
>> (the stock packages were uninstalled) -
>> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 1 
>> ../tests/bug
>> ex1 success.
>> ex2 success.
>> Inserted into ex1.
>> Inserted into ex2.
>> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 2 
>> ../tests/bug
>> ex1 success.
>> Thread 1: ex1 success.
>> Inserted into ex1.
>> [kamenice:22794] *** Process received signal ***
>> ex2 success.
>> Thread 1: ex2 success.
>> [kamenice:22794] Signal: Segmentation fault (11)
>> [kamenice:22794] Signal code:  (128)
>> [kamenice:22794] Failing at address: (nil)
>> [kamenice:22794] [ 0] 
>> /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7ff8bad084b0]
>> [kamenice:22794] [ 1] ../tests/bug[0x401010]
>> [kamenice:22794] [ 2] ../tests/bug[0x400d27]
>> [kamenice:22794] [ 3] ../tests/bug[0x400f80]
>> [kamenice:22794] [ 4] 
>> 

Re: [OMPI users] User-built OpenMPI 3.0.1 segfaults when storing into an atomic 128-bit variable

2018-05-03 Thread Joseph Schuchart

Martin,

You say that you allocate shared memory, do you mean shared memory 
windows? If so, this could be the reason: 
https://github.com/open-mpi/ompi/issues/4952


The alignment of memory allocated for MPI windows is not suitable for 
128bit values in Open MPI (only 8-byte alignment is guaranteed atm). I 
have seen the alignment change depending on the number of processes. 
Could you check the alignment of the memory you are trying to access?


Cheers,
Joseph

On 05/03/2018 08:48 PM, Martin Böhm wrote:

Dear all,
I have a problem with a segfault on a user-built OpenMPI 3.0.1 running on 
Ubuntu 16.04
in local mode (only processes on a single computer).

The problem manifests itself as a segfault when allocating shared memory for
(at least) one 128-bit atomic variable (say std::atomic<__int128>) and then
storing into this variable. This error only occurs when there are at least two
processes, even though only the process of shared rank 0 is the one doing both
the allocation and the writing. The compile flags -march=core2 or -march=native
or -latomic were tried, none of which helped.

An example of the code that triggers it on my computers is this:
https://github.com/bohm/binstretch/blob/parallel-classic/algorithm/bug.cpp

The code works fine with mpirun -np 1 and segfaults with mpirun -np 2, 3 and 4;
if line 41 is commented out (the 128-bit atomic write), everything works fine
with -np 2 or more.

As for Ubuntu's stock package containing OpenMPI 1.10.2, the code segfaults with
"-np 2" and "-np 3" but not "-np 1" or "-np 4".

Thank you for any assistance concerning this problem. I would suspect my own
code to be the most likely culprit, since it triggers on both the stock package
and custom-built OpenMPI.

I attach the config.log.bz2 and ompi_info.log. Below I list some runs of the 
program
and what errors are produced.

Thank you for any assistance. I have tried googling and searching the mailing 
list
for this problem; if I missed something, I apologize.

Martin Böhm

- Ubuntu 16.04 stock mpirun and mpic++ -
bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 1 ../tests/bug
ex1 success.
ex2 success.
Inserted into ex1.
Inserted into ex2.
bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 4 ../tests/bug
Thread 2: ex1 success.
Thread 3: ex1 success.
ex1 success.
Thread 1: ex1 success.
ex2 success.
Thread 2: ex2 success.
Thread 1: ex2 success.
Thread 3: ex2 success.
Inserted into ex1.
Inserted into ex2.
bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 2 ../tests/bug
Thread 1: ex1 success.
ex1 success.
Thread 1: ex2 success.
ex2 success.
Inserted into ex1.
[kamenice:13662] *** Process received signal ***
[kamenice:13662] Signal: Segmentation fault (11)
[kamenice:13662] Signal code:  (128)
[kamenice:13662] Failing at address: (nil)
[kamenice:13662] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f31773844b0]
[kamenice:13662] [ 1] ../tests/bug[0x40d8ac]
[kamenice:13662] [ 2] ../tests/bug[0x408997]
[kamenice:13662] [ 3] ../tests/bug[0x408bf0]
[kamenice:13662] [ 4] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f317736f830]
[kamenice:13662] [ 5] ../tests/bug[0x4086e9]
[kamenice:13662] *** End of error message ***

- Ubuntu 16.04 custom-compiled OpenMPI 3.0.1, installed to /usr/local (the 
stock packages were uninstalled) -
bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 1 
../tests/bug
ex1 success.
ex2 success.
Inserted into ex1.
Inserted into ex2.
bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 2 
../tests/bug
ex1 success.
Thread 1: ex1 success.
Inserted into ex1.
[kamenice:22794] *** Process received signal ***
ex2 success.
Thread 1: ex2 success.
[kamenice:22794] Signal: Segmentation fault (11)
[kamenice:22794] Signal code:  (128)
[kamenice:22794] Failing at address: (nil)
[kamenice:22794] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7ff8bad084b0]
[kamenice:22794] [ 1] ../tests/bug[0x401010]
[kamenice:22794] [ 2] ../tests/bug[0x400d27]
[kamenice:22794] [ 3] ../tests/bug[0x400f80]
[kamenice:22794] [ 4] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff8bacf3830]
[kamenice:22794] [ 5] ../tests/bug[0x400a79]
[kamenice:22794] *** End of error message ***
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
---
--
mpirun noticed that process rank 0 with PID 0 on node kamenice exited on signal 
11 (Segmentation fault).
--
bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 4 
--oversubscribe ../tests/bug
ex1 success.
Thread 1: ex1 success.
Thread 2: ex1 success.
Thread 3: ex1 success.
ex2 success.
Thread 1: ex2 success.
Thread 2: ex2 success.
Thread 3: ex2