Re: [OMPI devel] mpirun hangs
Aha! This is a problem that continues to bite us - it relates to the pty problem in Mac OSX. Been a ton of chatter about this, but Mac doesn't seem inclined to fix it. Try configuring --disable-pty-support and see if that helps. FWIW, you will find a platform file for Mac OSX in the trunk - I always build with it, and have spent considerable time fine-tuning it. You configure with: ./configure --prefix=whatever --with-platform=contrib/platform/lanl/macosx-dynamic In that directory, you will also find platform files for static builds under both Tiger and Leopard (slight differences). ralph On 5/27/08 8:01 PM, "Greg Watson"wrote: > Ralph, > > I tried rolling back to 18513 but no luck. Steps: > > $ ./autogen.sh > $ ./configure --prefix=/usr/local/openmpi-1.3-devel > $ make > $ make install > $ mpicc -g -o xxx xxx.c > $ mpirun -np 2 ./xxx > $ ps x > 44832 s001 R+ 0:50.00 mpirun -np 2 ./xxx > 44833 s001 S+ 0:00.03 ./xxx > $ gdb /usr/local/openmpi-1.3-devel/bin/mpirun > ... > (gdb) attach 44832 > Attaching to program: `/usr/local/openmpi-1.3-devel/bin/mpirun', > process 44832. > Reading symbols for shared libraries > +.. done > 0x9371b3dd in ioctl () > (gdb) where > #0 0x9371b3dd in ioctl () > #1 0x93754812 in grantpt () > #2 0x9375470b in openpty () > #3 0x001446d9 in opal_openpty () > #4 0x000bf3bf in orte_iof_base_setup_prefork () > #5 0x003da62f in odls_default_fork_local_proc (context=0x216a60, > child=0x216dd0, environ_copy=0x217930) at odls_default_module.c:191 > #6 0x000c3e76 in orte_odls_base_default_launch_local () > #7 0x003daace in orte_odls_default_launch_local_procs (data=0x216780) > at odls_default_module.c:360 > #8 0x000ad2f6 in process_commands (sender=0x216768, buffer=0x216780, > tag=1) at orted/orted_comm.c:441 > #9 0x000acd52 in orte_daemon_cmd_processor (fd=-1, opal_event=1, > data=0x216750) at orted/orted_comm.c:346 > #10 0x0012bd21 in event_process_active () at opal_object.h:498 > #11 0x0012c3c5 in opal_event_base_loop () at opal_object.h:498 > #12 0x0012bf8c in opal_event_loop () at opal_object.h:498 > #13 0x0011b334 in opal_progress () at runtime/opal_progress.c:169 > #14 0x000cd9b4 in orte_plm_base_report_launched () at opal_object.h:498 > #15 0x000cc2b7 in orte_plm_base_launch_apps () at opal_object.h:498 > #16 0x0003d626 in orte_plm_rsh_launch (jdata=0x200ae0) at > plm_rsh_module.c:1126 > #17 0x2604 in orterun (argc=4, argv=0xb880) at orterun.c:549 > #18 0x1bd6 in main (argc=4, argv=0xb880) at main.c:13 > > On May 27, 2008, at 9:11 PM, Ralph Castain wrote: > >> Yo Greg >> >> I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can >> you tell >> me how you configured, and the precise command you executed? >> >> Thanks >> Ralph >> >> >> >> On 5/27/08 5:15 PM, "Ralph Castain" wrote: >> >>> Hmmm...well, it was working about 3 hours ago! I'll try to take a >>> look >>> tonight, but it may be tomorrow. >>> >>> Try rolling it back just a little to r18513 - that's the last rev I >>> tested >>> on my Mac. >>> >>> >>> On 5/27/08 5:00 PM, "Greg Watson" wrote: >>> Something seems to be broken in the trunk for MacOS X. I can run a 1 process job, but a >1 process job hangs. It was working a few days ago. Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] mpirun hangs
BTW, this is Leopard. Greg On May 27, 2008, at 9:11 PM, Ralph Castain wrote: Yo Greg I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can you tell me how you configured, and the precise command you executed? Thanks Ralph On 5/27/08 5:15 PM, "Ralph Castain"wrote: Hmmm...well, it was working about 3 hours ago! I'll try to take a look tonight, but it may be tomorrow. Try rolling it back just a little to r18513 - that's the last rev I tested on my Mac. On 5/27/08 5:00 PM, "Greg Watson" wrote: Something seems to be broken in the trunk for MacOS X. I can run a 1 process job, but a >1 process job hangs. It was working a few days ago. Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] mpirun hangs
Ralph, I tried rolling back to 18513 but no luck. Steps: $ ./autogen.sh $ ./configure --prefix=/usr/local/openmpi-1.3-devel $ make $ make install $ mpicc -g -o xxx xxx.c $ mpirun -np 2 ./xxx $ ps x 44832 s001 R+ 0:50.00 mpirun -np 2 ./xxx 44833 s001 S+ 0:00.03 ./xxx $ gdb /usr/local/openmpi-1.3-devel/bin/mpirun ... (gdb) attach 44832 Attaching to program: `/usr/local/openmpi-1.3-devel/bin/mpirun', process 44832. Reading symbols for shared libraries +.. done 0x9371b3dd in ioctl () (gdb) where #0 0x9371b3dd in ioctl () #1 0x93754812 in grantpt () #2 0x9375470b in openpty () #3 0x001446d9 in opal_openpty () #4 0x000bf3bf in orte_iof_base_setup_prefork () #5 0x003da62f in odls_default_fork_local_proc (context=0x216a60, child=0x216dd0, environ_copy=0x217930) at odls_default_module.c:191 #6 0x000c3e76 in orte_odls_base_default_launch_local () #7 0x003daace in orte_odls_default_launch_local_procs (data=0x216780) at odls_default_module.c:360 #8 0x000ad2f6 in process_commands (sender=0x216768, buffer=0x216780, tag=1) at orted/orted_comm.c:441 #9 0x000acd52 in orte_daemon_cmd_processor (fd=-1, opal_event=1, data=0x216750) at orted/orted_comm.c:346 #10 0x0012bd21 in event_process_active () at opal_object.h:498 #11 0x0012c3c5 in opal_event_base_loop () at opal_object.h:498 #12 0x0012bf8c in opal_event_loop () at opal_object.h:498 #13 0x0011b334 in opal_progress () at runtime/opal_progress.c:169 #14 0x000cd9b4 in orte_plm_base_report_launched () at opal_object.h:498 #15 0x000cc2b7 in orte_plm_base_launch_apps () at opal_object.h:498 #16 0x0003d626 in orte_plm_rsh_launch (jdata=0x200ae0) at plm_rsh_module.c:1126 #17 0x2604 in orterun (argc=4, argv=0xb880) at orterun.c:549 #18 0x1bd6 in main (argc=4, argv=0xb880) at main.c:13 On May 27, 2008, at 9:11 PM, Ralph Castain wrote: Yo Greg I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can you tell me how you configured, and the precise command you executed? Thanks Ralph On 5/27/08 5:15 PM, "Ralph Castain"wrote: Hmmm...well, it was working about 3 hours ago! I'll try to take a look tonight, but it may be tomorrow. Try rolling it back just a little to r18513 - that's the last rev I tested on my Mac. On 5/27/08 5:00 PM, "Greg Watson" wrote: Something seems to be broken in the trunk for MacOS X. I can run a 1 process job, but a >1 process job hangs. It was working a few days ago. Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] mpirun hangs
Yo Greg I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can you tell me how you configured, and the precise command you executed? Thanks Ralph On 5/27/08 5:15 PM, "Ralph Castain"wrote: > Hmmm...well, it was working about 3 hours ago! I'll try to take a look > tonight, but it may be tomorrow. > > Try rolling it back just a little to r18513 - that's the last rev I tested > on my Mac. > > > On 5/27/08 5:00 PM, "Greg Watson" wrote: > >> Something seems to be broken in the trunk for MacOS X. I can run a 1 >> process job, but a >1 process job hangs. It was working a few days ago. >> >> Greg >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open MPI session directory location
Oops, sorry. We were having problems with the memory allocator when ompi_info called orte_init(). I think it might be best to call the ORTE MCA registration function directly... On May 27, 2008, at 10:40 AM, Ralph H Castain wrote: I see the problem (I think). A recent change was made to ompi_info so it no longer calls orte_init. As a result, none of the ORTE-level params (i.e., those params registered outside of ORTE frameworks) are being reported. I'll chat with Jeff and see how we resolve the problem. On 5/27/08 8:32 AM, "Ralph H Castain"wrote: It "should" be visible nownot sure why it isn't. It conforms to the naming rules and -used- to be reported by ompi_info... On 5/27/08 8:31 AM, "Shipman, Galen M." wrote: Make that "ompi_info". We need to make that visible via orte_info. I thought this was done at some point, perhaps it got overwritten? Thanks, Galen On May 27, 2008, at 10:27 AM, Ralph H Castain wrote: -mca orte_tmpdir_base foo On 5/27/08 8:24 AM, "Gleb Natapov" wrote: Hi, Is there a way to change where Open MPI creates session directory. I can't find mca parameter that specifies this. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Memory hooks stuff
Ok -- I added Galen and Sharon. If you want to attend and haven't told me, please let me know -- I have only reserved exactly as many phone lines as the number of people who have attended (8, so far). On May 27, 2008, at 1:50 PM, Sharon Melamed wrote: Who would be interested in discussing this stuff? (me, Brian, ? someone from Sun?, ...?) Me. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Memory hooks stuff
> Who would be interested in discussing this stuff? (me, Brian, ? > someone from Sun?, ...?) > Me.
Re: [OMPI devel] Memory hooks stuff
I will be there as well. - Galen On May 27, 2008, at 10:14 AM, Jeff Squyres wrote: To accommodate timezones spanning from US Mountain to Israel, let's have the teleconference tomorrow, Wednesday 28 May 2008: - 10:30am US Eastern time - 8:30am US Mountain time - 5:30pm Israel time I'll send around callin information to the following people (let me know if anyone else wants to attend): - Terry D - Gleb N - Patrick G - Pasha S - Brian B - Jeff S On May 23, 2008, at 7:19 AM, Jeff Squyres wrote: Brian and I were chatting the other day about random OMPI stuff and the topic of the memory hooks came up again. Brian was wondering if we should [finally] revisit this topic -- there's a few things that could be done to make life "better". Two things jump to mind: - using mallopt on Linux - doing *something* on Solaris It would probably be worthwhile to have a teleconf about this in the near future for anyone who is interested. I propose any time before 4pm US Eastern on Wednesday, 28 May, 2008. Who would be interested in discussing this stuff? (me, Brian, ? someone from Sun?, ...?) -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Memory hooks stuff
Please post minutes of this meeting to OMPI-devel. While I probably have nothing to contribute the the discussion, I and others are interested in hearing the outcomes/conclusions. -Paul Jeff Squyres wrote: To accommodate timezones spanning from US Mountain to Israel, let's have the teleconference tomorrow, Wednesday 28 May 2008: - 10:30am US Eastern time - 8:30am US Mountain time - 5:30pm Israel time I'll send around callin information to the following people (let me know if anyone else wants to attend): - Terry D - Gleb N - Patrick G - Pasha S - Brian B - Jeff S On May 23, 2008, at 7:19 AM, Jeff Squyres wrote: Brian and I were chatting the other day about random OMPI stuff and the topic of the memory hooks came up again. Brian was wondering if we should [finally] revisit this topic -- there's a few things that could be done to make life "better". Two things jump to mind: - using mallopt on Linux - doing *something* on Solaris It would probably be worthwhile to have a teleconf about this in the near future for anyone who is interested. I propose any time before 4pm US Eastern on Wednesday, 28 May, 2008. Who would be interested in discussing this stuff? (me, Brian, ? someone from Sun?, ...?) -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] Does Open MPI class exist?
I am unaware of any formal classes that are offered on a regular basis; we periodically do tutorials at various conferences, though (George just did one at the Linux Clusters Institute conference a few weeks ago). Your post finally motivated me to take the last few steps and publish a bunch of instructional Open MPI videos on the web site. See this post for some details: http://www.open-mpi.org/community/lists/users/2008/05/5737.php Hopefully, those will be helpful to you. On May 21, 2008, at 6:36 PM, Jennis Pruett wrote: I would dearly like a week-long class on Open MPI - what it is, does, how to build, parameter tweaking, etc. Does anyone know if such a class exists *anywhere* ? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Open MPI session directory location
I see the problem (I think). A recent change was made to ompi_info so it no longer calls orte_init. As a result, none of the ORTE-level params (i.e., those params registered outside of ORTE frameworks) are being reported. I'll chat with Jeff and see how we resolve the problem. On 5/27/08 8:32 AM, "Ralph H Castain"wrote: > It "should" be visible nownot sure why it isn't. It conforms to the > naming rules and -used- to be reported by ompi_info... > > > > On 5/27/08 8:31 AM, "Shipman, Galen M." wrote: > >> Make that "ompi_info". >> >> We need to make that visible via orte_info. >> I thought this was done at some point, perhaps it got overwritten? >> >> Thanks, >> >> Galen >> >> On May 27, 2008, at 10:27 AM, Ralph H Castain wrote: >> >>> -mca orte_tmpdir_base foo >>> >>> >>> >>> On 5/27/08 8:24 AM, "Gleb Natapov" wrote: >>> Hi, Is there a way to change where Open MPI creates session directory. I can't find mca parameter that specifies this. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open MPI session directory location
On Tue, May 27, 2008 at 08:27:49AM -0600, Ralph H Castain wrote: > -mca orte_tmpdir_base foo Thanks! It works. But this parameter is not reported by ompi_info :( > > > > On 5/27/08 8:24 AM, "Gleb Natapov"wrote: > > > Hi, > > > > Is there a way to change where Open MPI creates session directory. I > > can't find mca parameter that specifies this. > > > > -- > > Gleb. > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] Open MPI session directory location
It "should" be visible nownot sure why it isn't. It conforms to the naming rules and -used- to be reported by ompi_info... On 5/27/08 8:31 AM, "Shipman, Galen M."wrote: > Make that "ompi_info". > > We need to make that visible via orte_info. > I thought this was done at some point, perhaps it got overwritten? > > Thanks, > > Galen > > On May 27, 2008, at 10:27 AM, Ralph H Castain wrote: > >> -mca orte_tmpdir_base foo >> >> >> >> On 5/27/08 8:24 AM, "Gleb Natapov" wrote: >> >>> Hi, >>> >>> Is there a way to change where Open MPI creates session >>> directory. I >>> can't find mca parameter that specifies this. >>> >>> -- >>> Gleb. >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open MPI session directory location
Make that "ompi_info". We need to make that visible via orte_info. I thought this was done at some point, perhaps it got overwritten? Thanks, Galen On May 27, 2008, at 10:27 AM, Ralph H Castain wrote: -mca orte_tmpdir_base foo On 5/27/08 8:24 AM, "Gleb Natapov"wrote: Hi, Is there a way to change where Open MPI creates session directory. I can't find mca parameter that specifies this. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open MPI session directory location
We need to make that visible via orte_info. I thought this was done at some point, perhaps it got overwritten? Thanks, Galen On May 27, 2008, at 10:27 AM, Ralph H Castain wrote: -mca orte_tmpdir_base foo On 5/27/08 8:24 AM, "Gleb Natapov"wrote: Hi, Is there a way to change where Open MPI creates session directory. I can't find mca parameter that specifies this. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Open MPI session directory location
Hi, Is there a way to change where Open MPI creates session directory. I can't find mca parameter that specifies this. -- Gleb.
Re: [OMPI devel] Memory hooks stuff
To accommodate timezones spanning from US Mountain to Israel, let's have the teleconference tomorrow, Wednesday 28 May 2008: - 10:30am US Eastern time - 8:30am US Mountain time - 5:30pm Israel time I'll send around callin information to the following people (let me know if anyone else wants to attend): - Terry D - Gleb N - Patrick G - Pasha S - Brian B - Jeff S On May 23, 2008, at 7:19 AM, Jeff Squyres wrote: Brian and I were chatting the other day about random OMPI stuff and the topic of the memory hooks came up again. Brian was wondering if we should [finally] revisit this topic -- there's a few things that could be done to make life "better". Two things jump to mind: - using mallopt on Linux - doing *something* on Solaris It would probably be worthwhile to have a teleconf about this in the near future for anyone who is interested. I propose any time before 4pm US Eastern on Wednesday, 28 May, 2008. Who would be interested in discussing this stuff? (me, Brian, ? someone from Sun?, ...?) -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems