Re: [OMPI devel] mpirun hangs

2008-05-27 Thread Ralph Castain
Aha! This is a problem that continues to bite us - it relates to the pty
problem in Mac OSX. Been a ton of chatter about this, but Mac doesn't seem
inclined to fix it.

Try configuring --disable-pty-support and see if that helps. FWIW, you will
find a platform file for Mac OSX in the trunk - I always build with it, and
have spent considerable time fine-tuning it. You configure with:

./configure --prefix=whatever
--with-platform=contrib/platform/lanl/macosx-dynamic

In that directory, you will also find platform files for static builds under
both Tiger and Leopard (slight differences).

ralph


On 5/27/08 8:01 PM, "Greg Watson"  wrote:

> Ralph,
> 
> I tried rolling back to 18513 but no luck. Steps:
> 
> $ ./autogen.sh
> $ ./configure --prefix=/usr/local/openmpi-1.3-devel
> $ make
> $ make install
> $ mpicc -g -o xxx xxx.c
> $ mpirun -np 2 ./xxx
> $ ps x
> 44832 s001  R+ 0:50.00 mpirun -np 2 ./xxx
> 44833 s001  S+ 0:00.03 ./xxx
> $ gdb /usr/local/openmpi-1.3-devel/bin/mpirun
> ...
> (gdb) attach 44832
> Attaching to program: `/usr/local/openmpi-1.3-devel/bin/mpirun',
> process 44832.
> Reading symbols for shared libraries 
> +.. done
> 0x9371b3dd in ioctl ()
> (gdb) where
> #0  0x9371b3dd in ioctl ()
> #1  0x93754812 in grantpt ()
> #2  0x9375470b in openpty ()
> #3  0x001446d9 in opal_openpty ()
> #4  0x000bf3bf in orte_iof_base_setup_prefork ()
> #5  0x003da62f in odls_default_fork_local_proc (context=0x216a60,
> child=0x216dd0, environ_copy=0x217930) at odls_default_module.c:191
> #6  0x000c3e76 in orte_odls_base_default_launch_local ()
> #7  0x003daace in orte_odls_default_launch_local_procs (data=0x216780)
> at odls_default_module.c:360
> #8  0x000ad2f6 in process_commands (sender=0x216768, buffer=0x216780,
> tag=1) at orted/orted_comm.c:441
> #9  0x000acd52 in orte_daemon_cmd_processor (fd=-1, opal_event=1,
> data=0x216750) at orted/orted_comm.c:346
> #10 0x0012bd21 in event_process_active () at opal_object.h:498
> #11 0x0012c3c5 in opal_event_base_loop () at opal_object.h:498
> #12 0x0012bf8c in opal_event_loop () at opal_object.h:498
> #13 0x0011b334 in opal_progress () at runtime/opal_progress.c:169
> #14 0x000cd9b4 in orte_plm_base_report_launched () at opal_object.h:498
> #15 0x000cc2b7 in orte_plm_base_launch_apps () at opal_object.h:498
> #16 0x0003d626 in orte_plm_rsh_launch (jdata=0x200ae0) at
> plm_rsh_module.c:1126
> #17 0x2604 in orterun (argc=4, argv=0xb880) at orterun.c:549
> #18 0x1bd6 in main (argc=4, argv=0xb880) at main.c:13
> 
> On May 27, 2008, at 9:11 PM, Ralph Castain wrote:
> 
>> Yo Greg
>> 
>> I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can
>> you tell
>> me how you configured, and the precise command you executed?
>> 
>> Thanks
>> Ralph
>> 
>> 
>> 
>> On 5/27/08 5:15 PM, "Ralph Castain"  wrote:
>> 
>>> Hmmm...well, it was working about 3 hours ago! I'll try to take a
>>> look
>>> tonight, but it may be tomorrow.
>>> 
>>> Try rolling it back just a little to r18513 - that's the last rev I
>>> tested
>>> on my Mac.
>>> 
>>> 
>>> On 5/27/08 5:00 PM, "Greg Watson"  wrote:
>>> 
 Something seems to be broken in the trunk for MacOS X. I can run a 1
 process job, but a >1 process job hangs. It was working a few days
 ago.
 
 Greg
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] mpirun hangs

2008-05-27 Thread Greg Watson

BTW, this is Leopard.

Greg

On May 27, 2008, at 9:11 PM, Ralph Castain wrote:


Yo Greg

I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can  
you tell

me how you configured, and the precise command you executed?

Thanks
Ralph



On 5/27/08 5:15 PM, "Ralph Castain"  wrote:

Hmmm...well, it was working about 3 hours ago! I'll try to take a  
look

tonight, but it may be tomorrow.

Try rolling it back just a little to r18513 - that's the last rev I  
tested

on my Mac.


On 5/27/08 5:00 PM, "Greg Watson"  wrote:


Something seems to be broken in the trunk for MacOS X. I can run a 1
process job, but a >1 process job hangs. It was working a few days  
ago.


Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] mpirun hangs

2008-05-27 Thread Greg Watson

Ralph,

I tried rolling back to 18513 but no luck. Steps:

$ ./autogen.sh
$ ./configure --prefix=/usr/local/openmpi-1.3-devel
$ make
$ make install
$ mpicc -g -o xxx xxx.c
$ mpirun -np 2 ./xxx
$ ps x
44832 s001  R+ 0:50.00 mpirun -np 2 ./xxx
44833 s001  S+ 0:00.03 ./xxx
$ gdb /usr/local/openmpi-1.3-devel/bin/mpirun
...
(gdb) attach 44832
Attaching to program: `/usr/local/openmpi-1.3-devel/bin/mpirun',  
process 44832.
Reading symbols for shared libraries  
+.. done

0x9371b3dd in ioctl ()
(gdb) where
#0  0x9371b3dd in ioctl ()
#1  0x93754812 in grantpt ()
#2  0x9375470b in openpty ()
#3  0x001446d9 in opal_openpty ()
#4  0x000bf3bf in orte_iof_base_setup_prefork ()
#5  0x003da62f in odls_default_fork_local_proc (context=0x216a60,  
child=0x216dd0, environ_copy=0x217930) at odls_default_module.c:191

#6  0x000c3e76 in orte_odls_base_default_launch_local ()
#7  0x003daace in orte_odls_default_launch_local_procs (data=0x216780)  
at odls_default_module.c:360
#8  0x000ad2f6 in process_commands (sender=0x216768, buffer=0x216780,  
tag=1) at orted/orted_comm.c:441
#9  0x000acd52 in orte_daemon_cmd_processor (fd=-1, opal_event=1,  
data=0x216750) at orted/orted_comm.c:346

#10 0x0012bd21 in event_process_active () at opal_object.h:498
#11 0x0012c3c5 in opal_event_base_loop () at opal_object.h:498
#12 0x0012bf8c in opal_event_loop () at opal_object.h:498
#13 0x0011b334 in opal_progress () at runtime/opal_progress.c:169
#14 0x000cd9b4 in orte_plm_base_report_launched () at opal_object.h:498
#15 0x000cc2b7 in orte_plm_base_launch_apps () at opal_object.h:498
#16 0x0003d626 in orte_plm_rsh_launch (jdata=0x200ae0) at  
plm_rsh_module.c:1126

#17 0x2604 in orterun (argc=4, argv=0xb880) at orterun.c:549
#18 0x1bd6 in main (argc=4, argv=0xb880) at main.c:13

On May 27, 2008, at 9:11 PM, Ralph Castain wrote:


Yo Greg

I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can  
you tell

me how you configured, and the precise command you executed?

Thanks
Ralph



On 5/27/08 5:15 PM, "Ralph Castain"  wrote:

Hmmm...well, it was working about 3 hours ago! I'll try to take a  
look

tonight, but it may be tomorrow.

Try rolling it back just a little to r18513 - that's the last rev I  
tested

on my Mac.


On 5/27/08 5:00 PM, "Greg Watson"  wrote:


Something seems to be broken in the trunk for MacOS X. I can run a 1
process job, but a >1 process job hangs. It was working a few days  
ago.


Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] mpirun hangs

2008-05-27 Thread Ralph Castain
Yo Greg

I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can you tell
me how you configured, and the precise command you executed?

Thanks
Ralph



On 5/27/08 5:15 PM, "Ralph Castain"  wrote:

> Hmmm...well, it was working about 3 hours ago! I'll try to take a look
> tonight, but it may be tomorrow.
> 
> Try rolling it back just a little to r18513 - that's the last rev I tested
> on my Mac.
> 
> 
> On 5/27/08 5:00 PM, "Greg Watson"  wrote:
> 
>> Something seems to be broken in the trunk for MacOS X. I can run a 1
>> process job, but a >1 process job hangs. It was working a few days ago.
>> 
>> Greg 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Jeff Squyres

Oops, sorry.

We were having problems with the memory allocator when ompi_info  
called orte_init().  I think it might be best to call the ORTE MCA  
registration function directly...



On May 27, 2008, at 10:40 AM, Ralph H Castain wrote:

I see the problem (I think). A recent change was made to ompi_info  
so it no
longer calls orte_init. As a result, none of the ORTE-level params  
(i.e.,
those params registered outside of ORTE frameworks) are being  
reported.


I'll chat with Jeff and see how we resolve the problem.


On 5/27/08 8:32 AM, "Ralph H Castain"  wrote:

It "should" be visible nownot sure why it isn't. It conforms to  
the

naming rules and -used- to be reported by ompi_info...



On 5/27/08 8:31 AM, "Shipman, Galen M."  wrote:


Make that "ompi_info".

We need to make that visible via orte_info.
I thought this was done at some point, perhaps it got overwritten?

Thanks,

Galen

On May 27, 2008, at 10:27 AM, Ralph H Castain wrote:


-mca orte_tmpdir_base foo



On 5/27/08 8:24 AM, "Gleb Natapov"  wrote:


Hi,

 Is there a way to change where Open MPI creates session
directory. I
can't find mca parameter that specifies this.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Memory hooks stuff

2008-05-27 Thread Jeff Squyres

Ok -- I added Galen and Sharon.

If you want to attend and haven't told me, please let me know -- I  
have only reserved exactly as many phone lines as the number of people  
who have attended (8, so far).



On May 27, 2008, at 1:50 PM, Sharon Melamed wrote:


Who would be interested in discussing this stuff?  (me, Brian, ?
someone from Sun?, ...?)


Me.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Memory hooks stuff

2008-05-27 Thread Sharon Melamed
> Who would be interested in discussing this stuff?  (me, Brian, ?
> someone from Sun?, ...?)
>
Me.


Re: [OMPI devel] Memory hooks stuff

2008-05-27 Thread Shipman, Galen M.

I will be there as well.

- Galen

On May 27, 2008, at 10:14 AM, Jeff Squyres wrote:


To accommodate timezones spanning from US Mountain to Israel, let's
have the teleconference tomorrow, Wednesday 28 May 2008:

- 10:30am US Eastern time
- 8:30am US Mountain time
- 5:30pm Israel time

I'll send around callin information to the following people (let me
know if anyone else wants to attend):

- Terry D
- Gleb N
- Patrick G
- Pasha S
- Brian B
- Jeff S


On May 23, 2008, at 7:19 AM, Jeff Squyres wrote:


Brian and I were chatting the other day about random OMPI stuff and
the topic of the memory hooks came up again.  Brian was wondering if
we should [finally] revisit this topic -- there's a few things that
could be done to make life "better".  Two things jump to mind:

- using mallopt on Linux
- doing *something* on Solaris

It would probably be worthwhile to have a teleconf about this in the
near future for anyone who is interested.  I propose any time before
4pm US Eastern on Wednesday, 28 May, 2008.

Who would be interested in discussing this stuff?  (me, Brian, ?
someone from Sun?, ...?)

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Memory hooks stuff

2008-05-27 Thread Paul H. Hargrove

Please post minutes of this meeting to OMPI-devel.
While I probably have nothing to contribute the the discussion, I and 
others are interested in hearing the outcomes/conclusions.


-Paul

Jeff Squyres wrote:
To accommodate timezones spanning from US Mountain to Israel, let's  
have the teleconference tomorrow, Wednesday 28 May 2008:


- 10:30am US Eastern time
- 8:30am US Mountain time
- 5:30pm Israel time

I'll send around callin information to the following people (let me  
know if anyone else wants to attend):


- Terry D
- Gleb N
- Patrick G
- Pasha S
- Brian B
- Jeff S


On May 23, 2008, at 7:19 AM, Jeff Squyres wrote:

  

Brian and I were chatting the other day about random OMPI stuff and
the topic of the memory hooks came up again.  Brian was wondering if
we should [finally] revisit this topic -- there's a few things that
could be done to make life "better".  Two things jump to mind:

- using mallopt on Linux
- doing *something* on Solaris

It would probably be worthwhile to have a teleconf about this in the
near future for anyone who is interested.  I propose any time before
4pm US Eastern on Wednesday, 28 May, 2008.

Who would be interested in discussing this stuff?  (me, Brian, ?
someone from Sun?, ...?)

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] Does Open MPI class exist?

2008-05-27 Thread Jeff Squyres
I am unaware of any formal classes that are offered on a regular  
basis; we periodically do tutorials at various conferences, though  
(George just did one at the Linux Clusters Institute conference a few  
weeks ago).


Your post finally motivated me to take the last few steps and publish  
a bunch of instructional Open MPI videos on the web site.  See this  
post for some details:


http://www.open-mpi.org/community/lists/users/2008/05/5737.php

Hopefully, those will be helpful to you.


On May 21, 2008, at 6:36 PM, Jennis Pruett wrote:



I would dearly like a week-long class on Open MPI -
what it is, does, how to build, parameter tweaking, etc.

Does anyone know if such a class exists *anywhere* ?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Ralph H Castain
I see the problem (I think). A recent change was made to ompi_info so it no
longer calls orte_init. As a result, none of the ORTE-level params (i.e.,
those params registered outside of ORTE frameworks) are being reported.

I'll chat with Jeff and see how we resolve the problem.


On 5/27/08 8:32 AM, "Ralph H Castain"  wrote:

> It "should" be visible nownot sure why it isn't. It conforms to the
> naming rules and -used- to be reported by ompi_info...
> 
> 
> 
> On 5/27/08 8:31 AM, "Shipman, Galen M."  wrote:
> 
>> Make that "ompi_info".
>> 
>> We need to make that visible via orte_info.
>> I thought this was done at some point, perhaps it got overwritten?
>> 
>> Thanks,
>> 
>> Galen
>> 
>> On May 27, 2008, at 10:27 AM, Ralph H Castain wrote:
>> 
>>> -mca orte_tmpdir_base foo
>>> 
>>> 
>>> 
>>> On 5/27/08 8:24 AM, "Gleb Natapov"  wrote:
>>> 
 Hi,
 
   Is there a way to change where Open MPI creates session
 directory. I
 can't find mca parameter that specifies this.
 
 --
 Gleb.
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Gleb Natapov
On Tue, May 27, 2008 at 08:27:49AM -0600, Ralph H Castain wrote:
> -mca orte_tmpdir_base foo
Thanks! It works. But this parameter is not reported by ompi_info :(

> 
> 
> 
> On 5/27/08 8:24 AM, "Gleb Natapov"  wrote:
> 
> > Hi,
> > 
> >   Is there a way to change where Open MPI creates session directory. I
> > can't find mca parameter that specifies this.
> > 
> > --
> > Gleb.
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.


Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Ralph H Castain
It "should" be visible nownot sure why it isn't. It conforms to the
naming rules and -used- to be reported by ompi_info...



On 5/27/08 8:31 AM, "Shipman, Galen M."  wrote:

> Make that "ompi_info".
> 
> We need to make that visible via orte_info.
> I thought this was done at some point, perhaps it got overwritten?
> 
> Thanks,
> 
> Galen
> 
> On May 27, 2008, at 10:27 AM, Ralph H Castain wrote:
> 
>> -mca orte_tmpdir_base foo
>> 
>> 
>> 
>> On 5/27/08 8:24 AM, "Gleb Natapov"  wrote:
>> 
>>> Hi,
>>> 
>>>   Is there a way to change where Open MPI creates session
>>> directory. I
>>> can't find mca parameter that specifies this.
>>> 
>>> --
>>> Gleb.
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Shipman, Galen M.

Make that "ompi_info".

We need to make that visible via orte_info.
I thought this was done at some point, perhaps it got overwritten?

Thanks,

Galen

On May 27, 2008, at 10:27 AM, Ralph H Castain wrote:


-mca orte_tmpdir_base foo



On 5/27/08 8:24 AM, "Gleb Natapov"  wrote:


Hi,

  Is there a way to change where Open MPI creates session  
directory. I

can't find mca parameter that specifies this.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Shipman, Galen M.

We need to make that visible via orte_info.
I thought this was done at some point, perhaps it got overwritten?

Thanks,

Galen

On May 27, 2008, at 10:27 AM, Ralph H Castain wrote:


-mca orte_tmpdir_base foo



On 5/27/08 8:24 AM, "Gleb Natapov"  wrote:


Hi,

  Is there a way to change where Open MPI creates session  
directory. I

can't find mca parameter that specifies this.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Open MPI session directory location

2008-05-27 Thread Gleb Natapov
Hi,

  Is there a way to change where Open MPI creates session directory. I
can't find mca parameter that specifies this.

--
Gleb.


Re: [OMPI devel] Memory hooks stuff

2008-05-27 Thread Jeff Squyres
To accommodate timezones spanning from US Mountain to Israel, let's  
have the teleconference tomorrow, Wednesday 28 May 2008:


- 10:30am US Eastern time
- 8:30am US Mountain time
- 5:30pm Israel time

I'll send around callin information to the following people (let me  
know if anyone else wants to attend):


- Terry D
- Gleb N
- Patrick G
- Pasha S
- Brian B
- Jeff S


On May 23, 2008, at 7:19 AM, Jeff Squyres wrote:


Brian and I were chatting the other day about random OMPI stuff and
the topic of the memory hooks came up again.  Brian was wondering if
we should [finally] revisit this topic -- there's a few things that
could be done to make life "better".  Two things jump to mind:

- using mallopt on Linux
- doing *something* on Solaris

It would probably be worthwhile to have a teleconf about this in the
near future for anyone who is interested.  I propose any time before
4pm US Eastern on Wednesday, 28 May, 2008.

Who would be interested in discussing this stuff?  (me, Brian, ?
someone from Sun?, ...?)

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems