Re: [OMPI devel] [2.1.0rc2] ring_c SEGV on OpenBSD/i386
Both 2.1.0rc2 and 2.0.2 appear to crash about 1 run in every 5. This probabilistic nature is why I did not notice it in 2.0x. -Paul On Mon, Mar 6, 2017 at 7:58 PM, Paul Hargrove wrote: > I am traveling all this week and so don't know when I can take a look, but > will try. > -Paul > > On Mon, Mar 6, 2017 at 7:40 PM, r...@open-mpi.org wrote: > >> I’m not sure what could be going on here. I take it you were able to run >> this example for the 2.0 series under this environment, yes? This code >> hasn’t changed since that release, so I’m not sure why it would be failing >> to resolve symbols now. >> >> >> On Mar 6, 2017, at 2:22 PM, Paul Hargrove wrote: >> >> RC2 tarball for 2.1.0 configured with only --prefix=... >> and --enable-mca-no-build=patcher >> I don't have time to dig right now: >> >> $ mpirun -mca btl sm,self -np 2 examples/ring_c >> [openbsd-i386:95593] *** Process received signal *** >> >> -- >> mpirun noticed that process rank 1 with PID 0 on node openbsd-i386 exited >> on signal 11 (Segmentation fault). >> >> -- >> >> $ gdb examples/ring_c ring_c.core >> [...] >> (gdb) where >> #0 0x0ff27cf3 in _dl_find_symbol_obj (object=0x7d49a000, name=0xc7d96ab >> "strsignal", hash=Variable "hash" is >> not available. >> ) >> at /usr/src/libexec/ld.so/resolve.c:540 >> #1 0x0ff27f8d in _dl_find_symbol (name=0xc7d96ab "strsignal", >> this=0x830f1584, flags=Variable "flags" is not >> available. >> ) >> at /usr/src/libexec/ld.so/resolve.c:669 >> #2 0x0ff2a75f in _dl_bind (object=0x7d49a600, index=3704) at >> /usr/src/libexec/ld.so/i386/rtld_machine.c:387 >> #3 0x0ff26637 in _dl_bind_start () at /usr/src/libexec/ld.so/i386/ld >> asm.S:155 >> #4 0x7d49a600 in ?? () >> #5 0x0e78 in ?? () >> #6 0x0d560033 in __fgetwc_unlock (fp=0x1) at >> /usr/src/lib/libc/stdio/fgetwc.c:65 >> #7 >> #8 0x0ff27cf3 in _dl_find_symbol_obj (object=0x7dd41c00, name=0xd48042f >> "recv", hash=Variable "hash" is not available. >> ) >> at /usr/src/libexec/ld.so/resolve.c:540 >> #9 0x0ff27f8d in _dl_find_symbol (name=0xd48042f "recv", >> this=0x830f1c34, flags=Variable "flags" is not available. >> ) >> at /usr/src/libexec/ld.so/resolve.c:669 >> #10 0x0ff2a75f in _dl_bind (object=0x82980e00, index=32) at >> /usr/src/libexec/ld.so/i386/rtld_machine.c:387 >> #11 0x0ff26637 in _dl_bind_start () at /usr/src/libexec/ld.so/i386/ld >> asm.S:155 >> #12 0x82980e00 in ?? () >> #13 0x0020 in ?? () >> #14 0x0c820033 in opal_getcwd () >>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/ >> libopen-pal.so.30.0 >> #15 0x0d4856e2 in mca_oob_usock_peer_recv_connect_ack () >>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/ >> openmpi/mca_oob_usock.so >> #16 0x0d48789e in mca_oob_usock_recv_handler () >>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/ >> openmpi/mca_oob_usock.so >> #17 0x0c82f11a in opal_libevent2022_event_base_loop (base=0x805b9000, >> flags=1) >> at /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/openmpi-2 >> .1.0rc2/opal/mca/event/libevent2022/libevent/event.c:1321 >> #18 0x0c7f16b4 in progress_engine () >>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/ >> libopen-pal.so.30.0 >> #19 0x0b3cc852 in _rthread_start (v=0x7dd42428) at >> /usr/src/lib/librthread/rthread.c:115 >> #20 0x0d5c4f82 in __tfork_thread () at /usr/src/lib/libc/arch/i386/sy >> s/tfork_thread.S:95 >> >> -Paul >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> <(510)%20495-2352> >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> <(510)%20486-6900> >> ___ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> >> >> >> ___ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > <(510)%20495-2352> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > <(510)%20486-6900> > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
[OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra
The following is fairly annoying (though I understand the problem is real): $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c PMIx has detected a temporary directory name that results in a path that is too long for the Unix domain socket: Temp dir: /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/openmpi-sessions-502@anlextwls026-173_0 /53422 Try setting your TMPDIR environmental variable to point to something shorter in length Of course this comes from the fact that something outside my control has set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR) TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/ I am just reporting this for three minor reasons 1) Just in case nobody was aware of this problem 2) To request that an FAQ entry related to this be added 3) Yes, the message is clear, but it could be improved by indicating the allowable length of $TMPDIR -Paul -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra
Hi Paul There is an entry 8 under OS-X FAQ which describes this problem. Adding max allowable len is a good idea. Howard Paul Hargrove schrieb am Di. 7. März 2017 um 08:04: > The following is fairly annoying (though I understand the problem is real): > > $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c > PMIx has detected a temporary directory name that results > in a path that is too long for the Unix domain socket: > > Temp dir: > /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/openmpi-sessions-502@anlextwls026-173_0 > /53422 > > Try setting your TMPDIR environmental variable to point to > something shorter in length > > Of course this comes from the fact that something outside my control has > set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR) >TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/ > > I am just reporting this for three minor reasons > 1) Just in case nobody was aware of this problem > 2) To request that an FAQ entry related to this be added > 3) Yes, the message is clear, but it could be improved by indicating the > allowable length of $TMPDIR > > -Paul > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra
I initially did a Google search on the error text and "Open MPI FAQ" Since the error message issued by 2.1.x no longer matches the text in the FAQ entry, my search did not find the entry. Not only is the FAQ entry text (error message) specific to 2.0.x, but so is the entry's title "I am using Open MPI 2.0.x and getting an error at application startup. How do I work around this?" So, I still think a new FAQ entry is needed OR the existing one should be generalized. -Paul On Tue, Mar 7, 2017 at 9:15 AM, Howard Pritchard wrote: > Hi Paul > > There is an entry 8 under OS-X FAQ which describes this problem. > > Adding max allowable len is a good idea. > > Howard > > Paul Hargrove schrieb am Di. 7. März 2017 um 08:04: > >> The following is fairly annoying (though I understand the problem is >> real): >> >> $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c >> PMIx has detected a temporary directory name that results >> in a path that is too long for the Unix domain socket: >> >> Temp dir: /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/ >> openmpi-sessions-502@anlextwls026-173_0/53422 >> >> Try setting your TMPDIR environmental variable to point to >> something shorter in length >> >> Of course this comes from the fact that something outside my control has >> set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR) >>TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/ >> >> I am just reporting this for three minor reasons >> 1) Just in case nobody was aware of this problem >> 2) To request that an FAQ entry related to this be added >> 3) Yes, the message is clear, but it could be improved by indicating the >> allowable length of $TMPDIR >> >> -Paul >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> <(510)%20495-2352> >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> <(510)%20486-6900> >> ___ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra
Good point. I just updated the FAQ item to include the v2.1.x text. Thanks! > On Mar 7, 2017, at 10:52 AM, Paul Hargrove wrote: > > I initially did a Google search on the error text and "Open MPI FAQ" > Since the error message issued by 2.1.x no longer matches the text in the FAQ > entry, my search did not find the entry. > > Not only is the FAQ entry text (error message) specific to 2.0.x, but so is > the entry's title "I am using Open MPI 2.0.x and getting an error at > application startup. How do I work around this?" > > So, I still think a new FAQ entry is needed OR the existing one should be > generalized. > > -Paul > > > On Tue, Mar 7, 2017 at 9:15 AM, Howard Pritchard wrote: > Hi Paul > > There is an entry 8 under OS-X FAQ which describes this problem. > > Adding max allowable len is a good idea. > > Howard > > Paul Hargrove schrieb am Di. 7. März 2017 um 08:04: > The following is fairly annoying (though I understand the problem is real): > > $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c > PMIx has detected a temporary directory name that results > in a path that is too long for the Unix domain socket: > > Temp dir: > /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/openmpi-sessions-502@anlextwls026-173_0/53422 > > Try setting your TMPDIR environmental variable to point to > something shorter in length > > Of course this comes from the fact that something outside my control has set > TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR) >TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/ > > I am just reporting this for three minor reasons > 1) Just in case nobody was aware of this problem > 2) To request that an FAQ entry related to this be added > 3) Yes, the message is clear, but it could be improved by indicating the > allowable length of $TMPDIR > > -Paul > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra
Sorry to be so pedantic, but I try to put myself in the position of the clueless user (which is actually not that hard early in the morning w/o sufficient coffee). -Paul On Tue, Mar 7, 2017 at 10:00 AM, Jeff Squyres (jsquyres) wrote: > Good point. I just updated the FAQ item to include the v2.1.x text. > > Thanks! > > > > On Mar 7, 2017, at 10:52 AM, Paul Hargrove wrote: > > > > I initially did a Google search on the error text and "Open MPI FAQ" > > Since the error message issued by 2.1.x no longer matches the text in > the FAQ entry, my search did not find the entry. > > > > Not only is the FAQ entry text (error message) specific to 2.0.x, but so > is the entry's title "I am using Open MPI 2.0.x and getting an error at > application startup. How do I work around this?" > > > > So, I still think a new FAQ entry is needed OR the existing one should > be generalized. > > > > -Paul > > > > > > On Tue, Mar 7, 2017 at 9:15 AM, Howard Pritchard > wrote: > > Hi Paul > > > > There is an entry 8 under OS-X FAQ which describes this problem. > > > > Adding max allowable len is a good idea. > > > > Howard > > > > Paul Hargrove schrieb am Di. 7. März 2017 um 08:04: > > The following is fairly annoying (though I understand the problem is > real): > > > > $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c > > PMIx has detected a temporary directory name that results > > in a path that is too long for the Unix domain socket: > > > > Temp dir: /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/ > openmpi-sessions-502@anlextwls026-173_0/53422 > > > > Try setting your TMPDIR environmental variable to point to > > something shorter in length > > > > Of course this comes from the fact that something outside my control has > set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR) > >TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/ > > > > I am just reporting this for three minor reasons > > 1) Just in case nobody was aware of this problem > > 2) To request that an FAQ entry related to this be added > > 3) Yes, the message is clear, but it could be improved by indicating the > allowable length of $TMPDIR > > > > -Paul > > > > > > -- > > Paul H. Hargrove phhargr...@lbl.gov > > Computer Languages & Systems Software (CLaSS) Group > > Computer Science Department Tel: +1-510-495-2352 > > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > ___ > > devel mailing list > > devel@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > > ___ > > devel mailing list > > devel@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > > > > > > -- > > Paul H. Hargrove phhargr...@lbl.gov > > Computer Languages & Systems Software (CLaSS) Group > > Computer Science Department Tel: +1-510-495-2352 > > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > ___ > > devel mailing list > > devel@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra
On Mar 7, 2017, at 11:07 AM, Paul Hargrove wrote: > > Sorry to be so pedantic, but I try to put myself in the position of the > clueless user (which is actually not that hard early in the morning w/o > sufficient coffee). This is a Very Good Thing. :-) -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel