Re: [OMPI devel] [2.1.0rc2] ring_c SEGV on OpenBSD/i386

2017-03-07 Thread Paul Hargrove
Both 2.1.0rc2 and 2.0.2 appear to crash about 1 run in every 5.
This probabilistic nature is why I did not notice it in 2.0x.

-Paul

On Mon, Mar 6, 2017 at 7:58 PM, Paul Hargrove  wrote:

> I am traveling all this week and so don't know when I can take a look, but
> will try.
> -Paul
>
> On Mon, Mar 6, 2017 at 7:40 PM, r...@open-mpi.org  wrote:
>
>> I’m not sure what could be going on here. I take it you were able to run
>> this example for the 2.0 series under this environment, yes? This code
>> hasn’t changed since that release, so I’m not sure why it would be failing
>> to resolve symbols now.
>>
>>
>> On Mar 6, 2017, at 2:22 PM, Paul Hargrove  wrote:
>>
>> RC2 tarball for 2.1.0 configured with only --prefix=...
>> and --enable-mca-no-build=patcher
>> I don't have time to dig right now:
>>
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c
>> [openbsd-i386:95593] *** Process received signal ***
>> 
>> --
>> mpirun noticed that process rank 1 with PID 0 on node openbsd-i386 exited
>> on signal 11 (Segmentation fault).
>> 
>> --
>>
>> $ gdb examples/ring_c ring_c.core
>> [...]
>> (gdb) where
>> #0  0x0ff27cf3 in _dl_find_symbol_obj (object=0x7d49a000, name=0xc7d96ab
>> "strsignal", hash=Variable "hash" is
>> not available.
>> )
>> at /usr/src/libexec/ld.so/resolve.c:540
>> #1  0x0ff27f8d in _dl_find_symbol (name=0xc7d96ab "strsignal",
>> this=0x830f1584, flags=Variable "flags" is not
>> available.
>> )
>> at /usr/src/libexec/ld.so/resolve.c:669
>> #2  0x0ff2a75f in _dl_bind (object=0x7d49a600, index=3704) at
>> /usr/src/libexec/ld.so/i386/rtld_machine.c:387
>> #3  0x0ff26637 in _dl_bind_start () at /usr/src/libexec/ld.so/i386/ld
>> asm.S:155
>> #4  0x7d49a600 in ?? ()
>> #5  0x0e78 in ?? ()
>> #6  0x0d560033 in __fgetwc_unlock (fp=0x1) at
>> /usr/src/lib/libc/stdio/fgetwc.c:65
>> #7  
>> #8  0x0ff27cf3 in _dl_find_symbol_obj (object=0x7dd41c00, name=0xd48042f
>> "recv", hash=Variable "hash" is not available.
>> )
>> at /usr/src/libexec/ld.so/resolve.c:540
>> #9  0x0ff27f8d in _dl_find_symbol (name=0xd48042f "recv",
>> this=0x830f1c34, flags=Variable "flags" is not available.
>> )
>> at /usr/src/libexec/ld.so/resolve.c:669
>> #10 0x0ff2a75f in _dl_bind (object=0x82980e00, index=32) at
>> /usr/src/libexec/ld.so/i386/rtld_machine.c:387
>> #11 0x0ff26637 in _dl_bind_start () at /usr/src/libexec/ld.so/i386/ld
>> asm.S:155
>> #12 0x82980e00 in ?? ()
>> #13 0x0020 in ?? ()
>> #14 0x0c820033 in opal_getcwd ()
>>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
>> libopen-pal.so.30.0
>> #15 0x0d4856e2 in mca_oob_usock_peer_recv_connect_ack ()
>>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
>> openmpi/mca_oob_usock.so
>> #16 0x0d48789e in mca_oob_usock_recv_handler ()
>>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
>> openmpi/mca_oob_usock.so
>> #17 0x0c82f11a in opal_libevent2022_event_base_loop (base=0x805b9000,
>> flags=1)
>> at /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/openmpi-2
>> .1.0rc2/opal/mca/event/libevent2022/libevent/event.c:1321
>> #18 0x0c7f16b4 in progress_engine ()
>>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
>> libopen-pal.so.30.0
>> #19 0x0b3cc852 in _rthread_start (v=0x7dd42428) at
>> /usr/src/lib/librthread/rthread.c:115
>> #20 0x0d5c4f82 in __tfork_thread () at /usr/src/lib/libc/arch/i386/sy
>> s/tfork_thread.S:95
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> <(510)%20495-2352>
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> <(510)%20486-6900>
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>>
>>
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra

2017-03-07 Thread Paul Hargrove
The following is fairly annoying (though I understand the problem is real):

$ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c
PMIx has detected a temporary directory name that results
in a path that is too long for the Unix domain socket:

Temp dir:
/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/openmpi-sessions-502@anlextwls026-173_0
/53422

Try setting your TMPDIR environmental variable to point to
something shorter in length

Of course this comes from the fact that something outside my control has
set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR)
   TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/

I am just reporting this for three minor reasons
1) Just in case nobody was aware of this problem
2) To request that an FAQ entry related to this be added
3) Yes, the message is clear, but it could be improved by indicating the
allowable length of $TMPDIR

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra

2017-03-07 Thread Howard Pritchard
Hi Paul

There is an entry 8 under OS-X FAQ which describes this problem.

Adding max allowable len is a good idea.

Howard

Paul Hargrove  schrieb am Di. 7. März 2017 um 08:04:

> The following is fairly annoying (though I understand the problem is real):
>
> $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c
> PMIx has detected a temporary directory name that results
> in a path that is too long for the Unix domain socket:
>
> Temp dir:
> /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/openmpi-sessions-502@anlextwls026-173_0
> /53422
>
> Try setting your TMPDIR environmental variable to point to
> something shorter in length
>
> Of course this comes from the fact that something outside my control has
> set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR)
>TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
>
> I am just reporting this for three minor reasons
> 1) Just in case nobody was aware of this problem
> 2) To request that an FAQ entry related to this be added
> 3) Yes, the message is clear, but it could be improved by indicating the
> allowable length of $TMPDIR
>
> -Paul
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra

2017-03-07 Thread Paul Hargrove
I initially did a Google search on the error text and "Open MPI FAQ"
Since the error message issued by 2.1.x no longer matches the text in the
FAQ entry, my search did not find the entry.

Not only is the FAQ entry text (error message) specific to 2.0.x, but so is
the entry's title "I am using Open MPI 2.0.x and getting an error at
application startup. How do I work around this?"

So, I still think a new FAQ entry is needed OR the existing one should be
generalized.

-Paul


On Tue, Mar 7, 2017 at 9:15 AM, Howard Pritchard 
wrote:

> Hi Paul
>
> There is an entry 8 under OS-X FAQ which describes this problem.
>
> Adding max allowable len is a good idea.
>
> Howard
>
> Paul Hargrove  schrieb am Di. 7. März 2017 um 08:04:
>
>> The following is fairly annoying (though I understand the problem is
>> real):
>>
>> $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c
>> PMIx has detected a temporary directory name that results
>> in a path that is too long for the Unix domain socket:
>>
>> Temp dir: /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
>> openmpi-sessions-502@anlextwls026-173_0/53422
>>
>> Try setting your TMPDIR environmental variable to point to
>> something shorter in length
>>
>> Of course this comes from the fact that something outside my control has
>> set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR)
>>TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
>>
>> I am just reporting this for three minor reasons
>> 1) Just in case nobody was aware of this problem
>> 2) To request that an FAQ entry related to this be added
>> 3) Yes, the message is clear, but it could be improved by indicating the
>> allowable length of $TMPDIR
>>
>> -Paul
>>
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> <(510)%20495-2352>
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> <(510)%20486-6900>
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra

2017-03-07 Thread Jeff Squyres (jsquyres)
Good point.  I just updated the FAQ item to include the v2.1.x text.

Thanks!


> On Mar 7, 2017, at 10:52 AM, Paul Hargrove  wrote:
> 
> I initially did a Google search on the error text and "Open MPI FAQ"
> Since the error message issued by 2.1.x no longer matches the text in the FAQ 
> entry, my search did not find the entry.
> 
> Not only is the FAQ entry text (error message) specific to 2.0.x, but so is 
> the entry's title "I am using Open MPI 2.0.x and getting an error at 
> application startup. How do I work around this?"
> 
> So, I still think a new FAQ entry is needed OR the existing one should be 
> generalized.
> 
> -Paul
> 
> 
> On Tue, Mar 7, 2017 at 9:15 AM, Howard Pritchard  wrote:
> Hi Paul
> 
> There is an entry 8 under OS-X FAQ which describes this problem.
> 
> Adding max allowable len is a good idea.
> 
> Howard
> 
> Paul Hargrove  schrieb am Di. 7. März 2017 um 08:04:
> The following is fairly annoying (though I understand the problem is real):
> 
> $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c
> PMIx has detected a temporary directory name that results
> in a path that is too long for the Unix domain socket:
> 
> Temp dir: 
> /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/openmpi-sessions-502@anlextwls026-173_0/53422
> 
> Try setting your TMPDIR environmental variable to point to
> something shorter in length
> 
> Of course this comes from the fact that something outside my control has set 
> TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR)
>TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
> 
> I am just reporting this for three minor reasons
> 1) Just in case nobody was aware of this problem
> 2) To request that an FAQ entry related to this be added
> 3) Yes, the message is clear, but it could be improved by indicating the 
> allowable length of $TMPDIR
> 
> -Paul
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra

2017-03-07 Thread Paul Hargrove
Sorry to be so pedantic, but I try to put myself in the position of the
clueless user (which is actually not that hard early in the morning w/o
sufficient coffee).

-Paul

On Tue, Mar 7, 2017 at 10:00 AM, Jeff Squyres (jsquyres)  wrote:

> Good point.  I just updated the FAQ item to include the v2.1.x text.
>
> Thanks!
>
>
> > On Mar 7, 2017, at 10:52 AM, Paul Hargrove  wrote:
> >
> > I initially did a Google search on the error text and "Open MPI FAQ"
> > Since the error message issued by 2.1.x no longer matches the text in
> the FAQ entry, my search did not find the entry.
> >
> > Not only is the FAQ entry text (error message) specific to 2.0.x, but so
> is the entry's title "I am using Open MPI 2.0.x and getting an error at
> application startup. How do I work around this?"
> >
> > So, I still think a new FAQ entry is needed OR the existing one should
> be generalized.
> >
> > -Paul
> >
> >
> > On Tue, Mar 7, 2017 at 9:15 AM, Howard Pritchard 
> wrote:
> > Hi Paul
> >
> > There is an entry 8 under OS-X FAQ which describes this problem.
> >
> > Adding max allowable len is a good idea.
> >
> > Howard
> >
> > Paul Hargrove  schrieb am Di. 7. März 2017 um 08:04:
> > The following is fairly annoying (though I understand the problem is
> real):
> >
> > $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c
> > PMIx has detected a temporary directory name that results
> > in a path that is too long for the Unix domain socket:
> >
> > Temp dir: /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
> openmpi-sessions-502@anlextwls026-173_0/53422
> >
> > Try setting your TMPDIR environmental variable to point to
> > something shorter in length
> >
> > Of course this comes from the fact that something outside my control has
> set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR)
> >TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
> >
> > I am just reporting this for three minor reasons
> > 1) Just in case nobody was aware of this problem
> > 2) To request that an FAQ entry related to this be added
> > 3) Yes, the message is clear, but it could be improved by indicating the
> allowable length of $TMPDIR
> >
> > -Paul
> >
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >
> >
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra

2017-03-07 Thread Jeff Squyres (jsquyres)
On Mar 7, 2017, at 11:07 AM, Paul Hargrove  wrote:
> 
> Sorry to be so pedantic, but I try to put myself in the position of the 
> clueless user (which is actually not that hard early in the morning w/o 
> sufficient coffee).

This is a Very Good Thing.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel