Re: [OMPI users] [Open MPI] #3493: Handle the case where rankfile provides theallocation

2013-02-11 Thread Siegmar Gross
Hi

> #3493: Handle the case where rankfile provides the allocation
> ---+-
> Reporter:  rhc |   Owner:  ompi-gk1.6
> Type:  changeset move request  |  Status:  closed
> Priority:  critical|   Milestone:  Open MPI 1.6.4
>  Version:  trunk   |  Resolution:  fixed
> Keywords:  |
> ---+-
> Changes (by jsquyres):
> 
>  * status:  assigned => closed
>  * resolution:   => fixed

Excellent! The problem is solved! Thank you very much to everybody.
It even works in a mixed Linux/Solaris environment.

tyr rankfiles 106 mpiexec -report-bindings -rf rf_ex_sunpc_linpc hostname
[linpc1:29841] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]:
  [B B][B B] (slot list 0:0-1,1:0-1)
linpc1
sunpc1
[sunpc1:10829] MCW rank 1 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
sunpc1
[sunpc1:10829] MCW rank 2 bound to socket 1[core 0]:
  [. .][B .] (slot list 1:0)
[sunpc1:10829] MCW rank 3 bound to socket 1[core 1]:
  [. .][. B] (slot list 1:1)
sunpc1

tyr rankfiles 107 ompi_info | grep "MPI:"
Open MPI: 1.6.4rc4r28039


tyr rankfiles 108  cat rf_ex_sunpc_linpc
# mpiexec -report-bindings -rf rf_ex_sunpc_linpc hostname
rank 0=linpc1 slot=0:0-1,1:0-1
rank 1=sunpc1 slot=0:0-1
rank 2=sunpc1 slot=1:0
rank 3=sunpc1 slot=1:1


Thank you very much once more

Siegmar



Re: [OMPI users] how to find the binding of each rank on the local machine

2013-02-11 Thread Kranthi Kumar
Sir,

I was following your discussion.

Brice Sir's explanation of what I want is correct.

Your last reply was asking me to look for ompi_proc_t for the process in
the proc_flafs field if I am correct. You said that the defintion of the
values will be in opal/mca/hwloc/hwloc.h. I checked in this file in OpenMPI
1.6. I couldn't find it. Is it available in the later versions of OpenMPI.


Thank You

The locality of every process is stored on the ompi_proc_t for that process
in the proc_flags field. You can find the definition of the values in
opal/mca/hwloc/hwloc.h.
On Sun, Feb 10, 2013 at 10:16 AM, Kranthi Kumar wrote:

> Hello Sir,
>
> I need a way to find out where each rank runs from inside the
> implementation?
> How do I  know the binding of each rank in an MPI application?
>
> Thank You
> --
> Kranthi




-- 
Kranthi


Re: [OMPI users] how to find the binding of each rank on the local machine

2013-02-11 Thread Jeff Squyres (jsquyres)
Remember that OMPI 1.6.x is our stable series; we're no longer adding new 
features to it -- only bug fixes.

What Ralph described is available on the OMPI SVN trunk HEAD (i.e., what will 
eventually become the v1.9 series).  It may also be available in the upcoming 
v1.7 series; I'm not sure if we pulled it over there or not.


On Feb 10, 2013, at 3:16 PM, Ralph Castain  wrote:

> There is no MPI standard call to get the binding. He could try to use the MPI 
> extensions, depending on which version of OMPI he's using. It is in v1.6 and 
> above.
> 
> See "man OMPI_Affinity_str" for details (assuming you included the OMPI man 
> pages in your MANPATH), or look at it online at
> 
> http://www.open-mpi.org/doc/v1.6/man3/OMPI_Affinity_str.3.php
> 
> Remember, you have to configure with --enable-mpi-ext in order to enable the 
> extensions.
> 
> 
> On Feb 10, 2013, at 12:08 AM, Brice Goglin  wrote:
> 
>> I've been talking with Kranthi offline, he wants to use locality info
>> inside OMPI. He needs the binding info from *inside* MPI. From 10
>> thousands feet, it looks like communicator->rank[X]->locality_info as a
>> hwloc object or as a hwloc bitmap.
>> 
>> Brice
>> 
>> 
>> 
>> Le 10/02/2013 06:07, Ralph Castain a écrit :
>>> Add --report-bindings to the mpirun cmd line
>>> 
>>> Remember, we do not bind processes by default, so you will need to include 
>>> something about the binding to use (by core, by socket, etc.) on the cmd 
>>> line
>>> 
>>> See "mpirun -h" for the options
>>> 
>>> On Feb 9, 2013, at 8:46 PM, Kranthi Kumar  wrote:
>>> 
 Hello Sir,
 
 I need a way to find out where each rank runs from inside the 
 implementation? 
 How do I  know the binding of each rank in an MPI application? 
 
 Thank You
 -- 
 Kranthi ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] newbie: Submitting Open MPI jobs to SGE ( `qsh, -pe orte 4` fails)

2013-02-11 Thread Pierre Lindenbaum



This is a good sign, as it tries to use `qrsh -inherit ...` already. Can you 
confirm the following settings:

$ qconf -sp orte
...
control_slaves TRUE

$ qconf -sq all.q
...
shell_start_mode  unix_behavior

-- Reuti


   qconf -sp orte

   pe_nameorte
   slots  448
   user_lists NONE
   xuser_listsNONE
   start_proc_args/bin/true
   stop_proc_args /bin/true
   allocation_rule$round_robin
   control_slaves FALSE
   job_is_first_task  TRUE
   urgency_slots  min
   accounting_summary FALSE


and

 qconf -sq all.q | grep start_
   shell_start_mode  posix_compliant



I've edited the env conf using `qconf -mp orte` changing 
`control_slaves` to TRUE



   # qconf -sp orte
   pe_nameorte
   slots  448
   user_lists NONE
   xuser_listsNONE
   start_proc_args/bin/true
   stop_proc_args /bin/true
   allocation_rule$round_robin
   control_slaves TRUE
   job_is_first_task  TRUE
   urgency_slots  min
   accounting_summary FALSE

and I've changed `shell_start_mode  posix_compliant`  to 
`unix_behavior ` using  `qconf -mconf`. (However, shell_start_mode is  
still listed as posix_compliant )


Now, qsh -pe orte 4 works

   qsh -pe orte 4
   Your job 84581 ("INTERACTIVE") has been submitted
   waiting for interactive job to be scheduled ...
   Your interactive job 84581 has been successfully scheduled.


(should I run that command before running any a new mpirun command ?)

when invoking:

 qsub -cwd -pe orte 7 with-a-shell.sh
or
 qrsh -cwd -pe orte 100 /commun/data/packages/openmpi/bin/mpirun 
/path/to/a.out  arg1 arg2 arg3 


that works too ! Thank you ! :-)


   queuename  qtype resv/used/tot. load_avg
   arch  states
   
-
   all.q@node01   BIP   0/15/642.76 lx24-amd64
  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3615
   
-
   all.q@node02   BIP   0/14/643.89 lx24-amd64
  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614
   
-
   all.q@node03   BIP   0/14/643.23 lx24-amd64
  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614
   
-
   all.q@node04   BIP   0/14/643.68 lx24-amd64
  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614
   
-
   all.q@node05   BIP   0/15/642.91 lx24-amd64
  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3615
   
-
   all.q@node06   BIP   0/14/643.91 lx24-amd64
  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614
   
-
   all.q@node07   BIP   0/14/643.79 lx24-amd64
  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614



OK, my first openmpi program works. But as far as I can see: it is 
faster when invoked on the master node (~3.22min) than when invoked by 
means of SGE (~7H45):



   time /commun/data/packages/openmpi/bin/mpirun -np 7 /path/to/a.out 
   arg1 arg2 arg3 

   670.985u 64.929s 3:32.36 346.5%0+0k 16322112+6560io 32pf+0w

   time qrsh -cwd -pe orte 7 /commun/data/packages/openmpi/bin/mpirun
   /path/to/a.out  arg1 arg2 arg3 
   0.023u 0.036s 7:45.05 0.0%0+0k 1496+0io 1pf+0w



I'm going to investigate this... :-)

Thank you again

Pierre




[OMPI users] mpirun completes for one user, not for another

2013-02-11 Thread Daniel Fetchinson
Hi folks,

I have a really strange problem: a super simple MPI test program (see
below) runs successfully for all users when executed on 4 processes in
1 node, but hangs for user A and runs successfully for user B when
executed on 8 processes in 2 nodes. The executable used is the same
and the appfile used is also the same for user A and user B. Both
users launch it by

mpirun --app appfile

where the content of 'appfile' is

-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test

for the single node run with 4 processes and is replaced by

-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node1 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test
-np 1 -host node2 -wdir /tmp/test ./test

for the 2-node run with 8 processes. Just to recap, the single node
run works for both user A and user B, but the 2-node run only works
for user B and it hangs for user A. It does respond to Ctrl-C though.
Both users use bash, have set up passwordless ssh, are able to ssh
from node1 to node2 and back, have the same PATH and use the same
'mpirun' executable.

At this point I've run out of ideas what to check and debug because
the setups look really identical. The test program is simply

#include 
#include 

int main( int argc, char **argv )
{
   int node;

   MPI_Init( &argc, &argv );
   MPI_Comm_rank( MPI_COMM_WORLD, &node );

   printf( "First Hello World from Node %d\n", node );
   MPI_Barrier( MPI_COMM_WORLD );
   printf( "Second Hello World from Node %d\n",node );

   MPI_Finalize(  );

   return 0;
}


I also asked both users to compile the test program separately, and
the resulting executable 'test' is the same for both indicating again
that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is
1.5. and the interconnect is infiniband.

I've really run out of ideas what else to compare between user A and B.

Thanks for any hints,
Daniel





-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown



-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown


Re: [OMPI users] [Open MPI] #3493: Handle the case where rankfile provides theallocation

2013-02-11 Thread Jeff Squyres (jsquyres)
Sweet!

Sent from my phone. No type good. 

On Feb 11, 2013, at 1:39 AM, "Siegmar Gross" 
 wrote:

> Hi
> 
>> #3493: Handle the case where rankfile provides the allocation
>> ---+-
>> Reporter:  rhc |   Owner:  ompi-gk1.6
>>Type:  changeset move request  |  Status:  closed
>> Priority:  critical|   Milestone:  Open MPI 1.6.4
>> Version:  trunk   |  Resolution:  fixed
>> Keywords:  |
>> ---+-
>> Changes (by jsquyres):
>> 
>> * status:  assigned => closed
>> * resolution:   => fixed
> 
> Excellent! The problem is solved! Thank you very much to everybody.
> It even works in a mixed Linux/Solaris environment.
> 
> tyr rankfiles 106 mpiexec -report-bindings -rf rf_ex_sunpc_linpc hostname
> [linpc1:29841] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]:
>  [B B][B B] (slot list 0:0-1,1:0-1)
> linpc1
> sunpc1
> [sunpc1:10829] MCW rank 1 bound to socket 0[core 0-1]:
>  [B B][. .] (slot list 0:0-1)
> sunpc1
> [sunpc1:10829] MCW rank 2 bound to socket 1[core 0]:
>  [. .][B .] (slot list 1:0)
> [sunpc1:10829] MCW rank 3 bound to socket 1[core 1]:
>  [. .][. B] (slot list 1:1)
> sunpc1
> 
> tyr rankfiles 107 ompi_info | grep "MPI:"
>Open MPI: 1.6.4rc4r28039
> 
> 
> tyr rankfiles 108  cat rf_ex_sunpc_linpc
> # mpiexec -report-bindings -rf rf_ex_sunpc_linpc hostname
> rank 0=linpc1 slot=0:0-1,1:0-1
> rank 1=sunpc1 slot=0:0-1
> rank 2=sunpc1 slot=1:0
> rank 3=sunpc1 slot=1:1
> 
> 
> Thank you very much once more
> 
> Siegmar
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Fwd: an error when running MPI on 2 machines

2013-02-11 Thread Jeff Squyres (jsquyres)
Can you provide any more detail?  

Your report looks weird - you said its a simple c++ hello world, but the 
executable you show is "pi", which is typically a simple C example program. 

Are you using the same version of open MPI on all nodes?  Are you able to run n 
way jobs on single nodes?

Sent from my phone. No type good. 

On Feb 9, 2013, at 2:03 PM, "Paul Gribelyuk"  wrote:

>> Hello,
>> I am getting the following stacktrace when running a simple hello world MPI 
>> C++ program on 2 machines:
>> 
>> 
>> mini:mpi_cw paul$ mpirun --prefix /usr/local/Cellar/open-mpi/1.6.3 
>> --hostfile hosts_home -np 2 ./pi 100
>> rank and name: 0 aka mini.local
>> [home-mini:12175] *** Process received signal ***
>> [home-mini:12175] Signal: Segmentation fault: 11 (11)
>> [home-mini:12175] Signal code: Address not mapped (1)
>> [home-mini:12175] Failing at address: 0x1042e
>> [home-mini:12175] [ 0] 2   libsystem_c.dylib   
>> 0x7fff94050cfa _sigtramp + 26
>> [home-mini:12175] [ 1] 3   mca_btl_tcp.so  
>> 0x00010397092c best_addr + 2620
>> [home-mini:12175] [ 2] 4   pi  
>> 0x000103649d24 start + 52
>> [home-mini:12175] [ 3] 5   ??? 
>> 0x0002 0x0 + 2
>> [home-mini:12175] *** End of error message ***
>> rank: 0 sum: 1.85459
>> --
>> mpirun noticed that process rank 1 with PID 12175 on node home-mini.local 
>> exited on signal 11 (Segmentation fault: 11).
>> --
>> 
>> 
>> 
>> I get a similar result even when I don't use --prefix since the .bashrc file 
>> on the remote machine is correctly pointing to PATH and LD_LIBRARY_PATH
>> 
>> Any help with this seg fault is greatly appreciated.  Thanks.
>> 
>> -Paul
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] running mpi job..

2013-02-11 Thread Jeff Squyres (jsquyres)
Can you provide all the information in http://www.open-mpi.org/community/help/ ?

Sent from my phone. No type good.

On Feb 10, 2013, at 12:14 PM, "satya k" 
mailto:satya5...@gmail.com>> wrote:


Hi everyone out there,

   I am a Newbie to HPC,

we have a couple of HPC clusters where I work.

So started to create in vmware workstation. lot of times I failed to run mpi 
job. I have followed the default configs.

finally succeeded in installing but I had a problem when running a mpi sample 
job it was throwing 127 error.

but when ran with --prefix and path to openmpi binaries, it worked.

I exported the path in rightway but its not detecting.

is there any way to work around and solve the issue.



--
regards,
Albatross.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
That's very idd; I cant think of why that would happen offhand. I build and run 
all the time on ML with no problems. Can you deleted that plugin and run ok?

Sent from my phone. No type good. 

On Feb 10, 2013, at 10:22 PM, "Mark Bolstad"  wrote:

> I having some difficulties with building/running 1.6.3 on Mountain Lion 
> (10.8.2). I build with no errors and install into a prefix directory. I get 
> the following errors:
> ...
> [bolstadm-lm3:38486] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_shmem_sysv:
>  lt_dlerror() returned NULL! (ignored)
> --
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   opal_shmem_base_select failed
>   --> Returned value -1 instead of OPAL_SUCCESS
> --
> [bolstadm-lm3:38486] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file 
> runtime/orte_init.c at line 79
> [bolstadm-lm3:38486] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file 
> orterun.c at line 694
> 
> I've fiddled with LD_LIBRARY_PATH, DYLD_LIBRARY_PATH, OPAL_PREFIX, in 
> combination and separately, and none of these seem to have much effect. 
> 
> So, I decided to try a straight build. The only option was --disable-mpi-f77. 
> It installed into /usr/local. There is no other mpi version installed on the 
> system, and I still get the same errors.
> 
> However, I did install the version from MacPorts (also 1.6.3), and it works 
> correctly. 
> 
> I would appreciate if anyone had some insight into building on OS X 10.8.
> 
> Mark
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] newbie: Submitting Open MPI jobs to SGE ( `qsh, -pe orte 4` fails)

2013-02-11 Thread Reuti
Am 11.02.2013 um 12:26 schrieb Pierre Lindenbaum:

> 
> and I've changed `shell_start_mode  posix_compliant`  to `unix_behavior ` 
> using  `qconf -mconf`. (However, shell_start_mode is  still listed as 
> posix_compliant )

AFAIK this is deprecated on the configuration level, as it moved to the queue 
definition `qconf -mq all.q`.


> Now, qsh -pe orte 4 works
> 
>   qsh -pe orte 4

A plain `qsh` is working for you? This is an old startup method due to the 
insecure X11 startup it shouldn't be used any longer IMO.


>   Your job 84581 ("INTERACTIVE") has been submitted
>   waiting for interactive job to be scheduled ...
>   Your interactive job 84581 has been successfully scheduled.
> 
> 
> (should I run that command before running any a new mpirun command ?)
> 
> when invoking:
> 
> qsub -cwd -pe orte 7 with-a-shell.sh
> or
> qrsh -cwd -pe orte 100 /commun/data/packages/openmpi/bin/mpirun 
> /path/to/a.out  arg1 arg2 arg3 
> 
> that works too ! Thank you ! :-)
> 
> 
>   queuename  qtype resv/used/tot. load_avg
>   arch  states
>   
> -
>   all.q@node01   BIP   0/15/642.76 lx24-amd64
>  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3615
>   
> -
>   all.q@node02   BIP   0/14/643.89 lx24-amd64
>  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614
>   
> -
>   all.q@node03   BIP   0/14/643.23 lx24-amd64
>  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614
>   
> -
>   all.q@node04   BIP   0/14/643.68 lx24-amd64
>  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614
>   
> -
>   all.q@node05   BIP   0/15/642.91 lx24-amd64
>  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3615
>   
> -
>   all.q@node06   BIP   0/14/643.91 lx24-amd64
>  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614
>   
> -
>   all.q@node07   BIP   0/14/643.79 lx24-amd64
>  84598 0.55500 mpirun lindenb  r 02/11/2013 12:03:3614
> 
> 
> 
> OK, my first openmpi program works. But as far as I can see: it is faster 
> when invoked on the master node (~3.22min) than when invoked by means of SGE 
> (~7H45):

It's 7:45 to 3:32 - both in minutes:seconds, or?

All machines are the same regarding speed and core count? BTW: running 
interactively in SGE might not set environment variables in case you use `qrsh` 
without a command or `qlogin` and some default hostfile will be used instead 
(unless you provide one). Below with the supplied command it should be fine.

-- Reuti


>   time /commun/data/packages/openmpi/bin/mpirun -np 7 /path/to/a.outarg1 
> arg2 arg3 
>   670.985u 64.929s 3:32.36 346.5%0+0k 16322112+6560io 32pf+0w
> 
>   time qrsh -cwd -pe orte 7 /commun/data/packages/openmpi/bin/mpirun
>   /path/to/a.out  arg1 arg2 arg3 
>   0.023u 0.036s 7:45.05 0.0%0+0k 1496+0io 1pf+0w
> 
> 
> 
> I'm going to investigate this... :-)
> 
> Thank you again
> 
> Pierre
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] MPI_FILE_READ: wrong file-size does not raise an exception

2013-02-11 Thread Stefan Mauerberger
Hi Everyone!

Playing around with MPI_FILE_READ() puzzles me a little. To catch all
errors I set the error-handler - the one which is related to file I/O -
to MPI_ERRORS_ARE_FATAL. 
However, when reading from a file which has not the necessary size
MPI_FILE_READ(...) returns 'MPI_SUCCESS: no errors'. Well, read values
are just a mess. 
Does anyone have an idea how to catch such an error? 

Cheers, 
Stefan 

Btw.: Attached, there is a minimal example in Fortran. 



PROGRAM main 
USE mpi 
IMPLICIT NONE
INTEGER :: i, field(10), mpi_file_handle, mpi_err, mpi_status(MPI_STATUS_SIZE), mpi_resultlen
CHARACTER(MPI_MAX_ERROR_STRING) :: mpi_iomsg

CALL MPI_INIT( mpi_err )
CALL MPI_FILE_SET_ERRHANDLER( MPI_FILE_NULL, &
  MPI_ERRORS_ARE_FATAL, &
  mpi_err )

CALL EXECUTE_COMMAND_LINE( 'touch test.dat' )

CALL MPI_FILE_OPEN( MPI_COMM_WORLD, &
'test.dat', &
MPI_MODE_RDONLY, &
MPI_INFO_NULL, &
mpi_file_handle, &
mpi_err )

CALL MPI_FILE_READ( mpi_file_handle, &
field(:), &
SIZE(field), &
MPI_REAL, &
mpi_status, &
mpi_err ) 

CALL MPI_ERROR_STRING( mpi_err, mpi_iomsg, mpi_resultlen, i )
WRITE(*,*) mpi_iomsg(1:mpi_resultlen)

CALL MPI_FILE_CLOSE( mpi_file_handle, mpi_err )

CALL MPI_FINALIZE( mpi_err )

END PROGRAM main


Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Mark Bolstad
It's not just one plugin, it was about 6 of them. I just deleted the error
message from the others as I believed that opal_init was the problem.

However, I have done a full build multiple times and have blown away all
the plugins and other remnants of the build and install and get the same
results every time.

Here's the output from running ompi_info (same result with or w/o
OPAL_PREFIX are the same; LD_LIBRARY_PATH set; path points to both bin and
lib directory ):

[bolstadm-lm3:~/papillon/build/src] bolstadm% ompi_info
 Package: Open MPI bolstadm@bolstadm-lm3 Distribution
Open MPI: 1.6.3
   Open MPI SVN revision: r27472
   Open MPI release date: Oct 24, 2012
Open RTE: 1.6.3
   Open RTE SVN revision: r27472
   Open RTE release date: Oct 24, 2012
OPAL: 1.6.3
   OPAL SVN revision: r27472
   OPAL release date: Oct 24, 2012
 MPI API: 2.1
Ident string: 1.6.3
  Prefix:
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3
 Configured architecture: x86_64-apple-darwin12.2.1
  Configure host: bolstadm-lm3
   Configured by: bolstadm
   Configured on: Sun Feb 10 19:09:36 EST 2013
  Configure host: bolstadm-lm3
Built by: bolstadm
Built on: Sun Feb 10 19:16:52 EST 2013
  Built host: bolstadm-lm3
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: no
  Fortran90 bindings: no
 Fortran90 bindings size: na
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
  C compiler family name: GNU
  C compiler version: 4.2.1
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
  Fortran90 compiler: none
  Fortran90 compiler abs: none
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: no
 Fortran90 profiling: no
  C++ exceptions: no
  Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
   Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
 MPI I/O support: yes
   MPI_WTIME support: gettimeofday
 Symbol vis. support: yes
   Host topology support: yes
  MPI extensions: affinity example
   FT Checkpoint support: no (checkpoint thread: no)
 VampirTrace support: yes
  MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
 MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
   MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_paffinity_hwloc:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_carto_auto_detect:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_carto_file:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_shmem_mmap:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_shmem_posix:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_shmem_sysv:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_maffinity_first_use:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_maffinity_hwloc:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_sysinfo_darwin:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_errmgr_default:
lt_dlerror() returned NULL! (ignored)
[bolstadm-lm3:86426] mca: base: component_find: unable to open
/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_gr

Re: [OMPI users] mpirun completes for one user, not for another

2013-02-11 Thread Jeff Squyres (jsquyres)
Make sure that the PATH really is identical between users -- especially for 
non-iteractive logins.  E.g.:

env

vs. 

ssh othernode env

Also check the LD_LIBRARY_PATH.


On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson  
wrote:

> Hi folks,
> 
> I have a really strange problem: a super simple MPI test program (see
> below) runs successfully for all users when executed on 4 processes in
> 1 node, but hangs for user A and runs successfully for user B when
> executed on 8 processes in 2 nodes. The executable used is the same
> and the appfile used is also the same for user A and user B. Both
> users launch it by
> 
> mpirun --app appfile
> 
> where the content of 'appfile' is
> 
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> 
> for the single node run with 4 processes and is replaced by
> 
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> 
> for the 2-node run with 8 processes. Just to recap, the single node
> run works for both user A and user B, but the 2-node run only works
> for user B and it hangs for user A. It does respond to Ctrl-C though.
> Both users use bash, have set up passwordless ssh, are able to ssh
> from node1 to node2 and back, have the same PATH and use the same
> 'mpirun' executable.
> 
> At this point I've run out of ideas what to check and debug because
> the setups look really identical. The test program is simply
> 
> #include 
> #include 
> 
> int main( int argc, char **argv )
> {
> int node;
> 
> MPI_Init( &argc, &argv );
> MPI_Comm_rank( MPI_COMM_WORLD, &node );
> 
> printf( "First Hello World from Node %d\n", node );
> MPI_Barrier( MPI_COMM_WORLD );
> printf( "Second Hello World from Node %d\n",node );
> 
> MPI_Finalize(  );
> 
> return 0;
> }
> 
> 
> I also asked both users to compile the test program separately, and
> the resulting executable 'test' is the same for both indicating again
> that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is
> 1.5. and the interconnect is infiniband.
> 
> I've really run out of ideas what else to compare between user A and B.
> 
> Thanks for any hints,
> Daniel
> 
> 
> 
> 
> 
> -- 
> Psss, psss, put it down! - http://www.cafepress.com/putitdown
> 
> 
> 
> -- 
> Psss, psss, put it down! - http://www.cafepress.com/putitdown
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] mpirun completes for one user, not for another

2013-02-11 Thread Jeff Squyres (jsquyres)
Make sure that the PATH really is identical between users -- especially for 
non-iteractive logins.  E.g.:

env

vs. 

ssh othernode env

Also check the LD_LIBRARY_PATH.


On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson  
wrote:

> Hi folks,
> 
> I have a really strange problem: a super simple MPI test program (see
> below) runs successfully for all users when executed on 4 processes in
> 1 node, but hangs for user A and runs successfully for user B when
> executed on 8 processes in 2 nodes. The executable used is the same
> and the appfile used is also the same for user A and user B. Both
> users launch it by
> 
> mpirun --app appfile
> 
> where the content of 'appfile' is
> 
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> 
> for the single node run with 4 processes and is replaced by
> 
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node1 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> -np 1 -host node2 -wdir /tmp/test ./test
> 
> for the 2-node run with 8 processes. Just to recap, the single node
> run works for both user A and user B, but the 2-node run only works
> for user B and it hangs for user A. It does respond to Ctrl-C though.
> Both users use bash, have set up passwordless ssh, are able to ssh
> from node1 to node2 and back, have the same PATH and use the same
> 'mpirun' executable.
> 
> At this point I've run out of ideas what to check and debug because
> the setups look really identical. The test program is simply
> 
> #include 
> #include 
> 
> int main( int argc, char **argv )
> {
> int node;
> 
> MPI_Init( &argc, &argv );
> MPI_Comm_rank( MPI_COMM_WORLD, &node );
> 
> printf( "First Hello World from Node %d\n", node );
> MPI_Barrier( MPI_COMM_WORLD );
> printf( "Second Hello World from Node %d\n",node );
> 
> MPI_Finalize(  );
> 
> return 0;
> }
> 
> 
> I also asked both users to compile the test program separately, and
> the resulting executable 'test' is the same for both indicating again
> that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is
> 1.5. and the interconnect is infiniband.
> 
> I've really run out of ideas what else to compare between user A and B.
> 
> Thanks for any hints,
> Daniel
> 
> 
> 
> 
> 
> -- 
> Psss, psss, put it down! - http://www.cafepress.com/putitdown
> 
> 
> 
> -- 
> Psss, psss, put it down! - http://www.cafepress.com/putitdown
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] error when running mpirun

2013-02-11 Thread albatr...@gmail.com
Hi Ralph ,

 Thanks for the reply.. In fact it worked after installing the
libnuma  rpm .

Thanks a ton..

cheers
Satya




On Mon, Feb 11, 2013 at 1:40 AM, Ralph Castain  wrote:

> The error message indicates that libnuma was not installed on at least one
> node. That's a system library, not an OMPI one, so you'll need to get it
> installed by someone with root privileges.
>
> On Feb 10, 2013, at 12:04 PM, satya k  wrote:
>
> > Hi everyone,
> >
> > I m getting the below error when executing mpirun with hostfile option
> >
> > $mpirun -np 4 -hostfile nodes ./hello
> >
> > orted: error while loading shared libraries: libnuma.so.1: cannot open
> shared object file: No such file or directory
> >
> --
> > A daemon (pid 11798) died unexpectedly with status 127 while attempting
> > to launch so we are aborting.
> >
> > There may be more information reported by the environment (see above).
> >
> > This may be because the daemon was unable to find all the needed shared
> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> > location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes.
> >
> --
> >
> --
> > mpirun noticed that the job aborted, but has no info as to the process
> > that caused that situation.
> >
> --
> >
> > Also checked with the echo $LD_LIBRARY_PATH command on the nodes, Its
> giving output as /apps/mpi/lib where lib files exists.
> >
> > Any suggestions... I could not find anything as I am a newbie..
> >
> > ---
> > Albatross
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
regards,
Satya.K


Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Beatty, Daniel D CIV NAVAIR, 474300D
Greetings Fellow MPI users,
I may need to get involved here on this issue also.  I will need to do a
similar number for Mountain Lion/ and regular Lion.  I am still a little bit
in design phase at this time so I am paying close attention to this thread.

There are two issues that have concerned me.  One is universal capabilities,
namely ensuring that the library allows the same results for binaries in
both any of their universal compiled forms.Also, the linking of the MPI
host files and marshaling capabilities.  I am hoping to address these issues
in design before I get to implementation.  Naturally, there is a matter of
tinkering that goes back and forth.  So I will need some help here.  Is
there a standard for MPI currently in existence that enables such a thing?
If there is a standard, is there a way to accredit such a standard for OSX
if it gets built for universal capabilities?   What is the standards body
for MPI to facilitate this?   If such a thing is built, how do we contribute
back in a standards consistent way?



V/R,

Daniel Beatty, Ph.D.
Computer Scientist, Detonation Sciences Branch
Code 474300D
1 Administration Circle M/S 1109
China Lake, CA 93555
daniel.bea...@navy.mil
(LandLine) (760)939-7097
(iPhone) (806)438-6620



On 2/11/13 8:09 AM, "Mark Bolstad"  wrote:

> It's not just one plugin, it was about 6 of them. I just deleted the error
> message from the others as I believed that opal_init was the problem.
> 
> However, I have done a full build multiple times and have blown away all the
> plugins and other remnants of the build and install and get the same results
> every time.
> 
> Here's the output from running ompi_info (same result with or w/o OPAL_PREFIX
> are the same; LD_LIBRARY_PATH set; path points to both bin and lib directory
> ):
> 
> [bolstadm-lm3:~/papillon/build/src] bolstadm% ompi_info                      
>                    Package: Open MPI bolstadm@bolstadm-lm3 Distribution
>                 Open MPI: 1.6.3
>    Open MPI SVN revision: r27472
>    Open MPI release date: Oct 24, 2012
>                 Open RTE: 1.6.3
>    Open RTE SVN revision: r27472
>    Open RTE release date: Oct 24, 2012
>                     OPAL: 1.6.3
>        OPAL SVN revision: r27472
>        OPAL release date: Oct 24, 2012
>                  MPI API: 2.1
>             Ident string: 1.6.3
>                   Prefix:
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3
>  Configured architecture: x86_64-apple-darwin12.2.1
>           Configure host: bolstadm-lm3
>            Configured by: bolstadm
>            Configured on: Sun Feb 10 19:09:36 EST 2013
>           Configure host: bolstadm-lm3
>                 Built by: bolstadm
>                 Built on: Sun Feb 10 19:16:52 EST 2013
>               Built host: bolstadm-lm3
>               C bindings: yes
>             C++ bindings: yes
>       Fortran77 bindings: no
>       Fortran90 bindings: no
>  Fortran90 bindings size: na
>               C compiler: gcc
>      C compiler absolute: /usr/bin/gcc
>   C compiler family name: GNU
>       C compiler version: 4.2.1
>             C++ compiler: g++
>    C++ compiler absolute: /usr/bin/g++
>       Fortran77 compiler: gfortran
>   Fortran77 compiler abs: /usr/bin/gfortran
>       Fortran90 compiler: none
>   Fortran90 compiler abs: none
>              C profiling: yes
>            C++ profiling: yes
>      Fortran77 profiling: no
>      Fortran90 profiling: no
>           C++ exceptions: no
>           Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
>            Sparse Groups: no
>   Internal debug support: no
>   MPI interface warnings: no
>      MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>          libltdl support: yes
>    Heterogeneous support: no
>  mpirun default --prefix: no
>          MPI I/O support: yes
>        MPI_WTIME support: gettimeofday
>      Symbol vis. support: yes
>    Host topology support: yes
>           MPI extensions: affinity example
>    FT Checkpoint support: no (checkpoint thread: no)
>      VampirTrace support: yes
>   MPI_MAX_PROCESSOR_NAME: 256
>     MPI_MAX_ERROR_STRING: 256
>      MPI_MAX_OBJECT_NAME: 64
>         MPI_MAX_INFO_KEY: 36
>         MPI_MAX_INFO_VAL: 256
>        MPI_MAX_PORT_NAME: 1024
>   MPI_MAX_DATAREP_STRING: 128
> [bolstadm-lm3:86426] mca: base: component_find: unable to open
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi
> /mca_paffinity_hwloc: lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi
> /mca_carto_auto_detect: lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi
> /mca_carto_file: lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: componen

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
Ah -- your plugins are all .a files.

How did you configure/build Open MPI?


On Feb 11, 2013, at 11:09 AM, Mark Bolstad  wrote:

> It's not just one plugin, it was about 6 of them. I just deleted the error 
> message from the others as I believed that opal_init was the problem.
> 
> However, I have done a full build multiple times and have blown away all the 
> plugins and other remnants of the build and install and get the same results 
> every time.
> 
> Here's the output from running ompi_info (same result with or w/o OPAL_PREFIX 
> are the same; LD_LIBRARY_PATH set; path points to both bin and lib directory 
> ):
> 
> [bolstadm-lm3:~/papillon/build/src] bolstadm% ompi_info   
>Package: Open MPI bolstadm@bolstadm-lm3 Distribution
> Open MPI: 1.6.3
>Open MPI SVN revision: r27472
>Open MPI release date: Oct 24, 2012
> Open RTE: 1.6.3
>Open RTE SVN revision: r27472
>Open RTE release date: Oct 24, 2012
> OPAL: 1.6.3
>OPAL SVN revision: r27472
>OPAL release date: Oct 24, 2012
>  MPI API: 2.1
> Ident string: 1.6.3
>   Prefix: 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3
>  Configured architecture: x86_64-apple-darwin12.2.1
>   Configure host: bolstadm-lm3
>Configured by: bolstadm
>Configured on: Sun Feb 10 19:09:36 EST 2013
>   Configure host: bolstadm-lm3
> Built by: bolstadm
> Built on: Sun Feb 10 19:16:52 EST 2013
>   Built host: bolstadm-lm3
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: no
>   Fortran90 bindings: no
>  Fortran90 bindings size: na
>   C compiler: gcc
>  C compiler absolute: /usr/bin/gcc
>   C compiler family name: GNU
>   C compiler version: 4.2.1
> C++ compiler: g++
>C++ compiler absolute: /usr/bin/g++
>   Fortran77 compiler: gfortran
>   Fortran77 compiler abs: /usr/bin/gfortran
>   Fortran90 compiler: none
>   Fortran90 compiler abs: none
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: no
>  Fortran90 profiling: no
>   C++ exceptions: no
>   Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
>Sparse Groups: no
>   Internal debug support: no
>   MPI interface warnings: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>Heterogeneous support: no
>  mpirun default --prefix: no
>  MPI I/O support: yes
>MPI_WTIME support: gettimeofday
>  Symbol vis. support: yes
>Host topology support: yes
>   MPI extensions: affinity example
>FT Checkpoint support: no (checkpoint thread: no)
>  VampirTrace support: yes
>   MPI_MAX_PROCESSOR_NAME: 256
> MPI_MAX_ERROR_STRING: 256
>  MPI_MAX_OBJECT_NAME: 64
> MPI_MAX_INFO_KEY: 36
> MPI_MAX_INFO_VAL: 256
>MPI_MAX_PORT_NAME: 1024
>   MPI_MAX_DATAREP_STRING: 128
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_paffinity_hwloc:
>  lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_carto_auto_detect:
>  lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_carto_file:
>  lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_shmem_mmap:
>  lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_shmem_posix:
>  lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_shmem_sysv:
>  lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_maffinity_first_use:
>  lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_maffinity_hwloc:
>  lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_sysinfo_darwin:
>  l

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 11, 2013, at 1:11 PM, "Beatty, Daniel D CIV NAVAIR, 474300D" 
 wrote:

> There are two issues that have concerned me.  One is universal capabilities, 
> namely ensuring that the library allows the same results for binaries in both 
> any of their universal compiled forms.   

Not sure what you mean here -- are you referring to Intel+PPC "universal" OS X 
binaries?

> Also, the linking of the MPI host files and marshaling capabilities.  I am 
> hoping to address these issues in design before I get to implementation.  
> Naturally, there is a matter of tinkering that goes back and forth.  So I 
> will need some help here.  Is there a standard for MPI currently in existence 
> that enables such a thing?

Er... I'm also not sure what you mean here, either.  :-(

What exactly do you mean by "linking of MPI host files"?  Host files are just 
text files; they're not involved with the (run-time or compile-time) linker at 
all.

What marshaling capabilities are you referring to, MPI datatypes?

> If there is a standard, is there a way to accredit such a standard for OSX if 
> it gets built for universal capabilities?   What is the standards body for 
> MPI to facilitate this?   If such a thing is built, how do we contribute back 
> in a standards consistent way?
> 
> 
> 
> V/R,
> 
> Daniel Beatty, Ph.D.
> Computer Scientist, Detonation Sciences Branch
> Code 474300D
> 1 Administration Circle M/S 1109
> China Lake, CA 93555
> daniel.bea...@navy.mil
> (LandLine) (760)939-7097 
> (iPhone) (806)438-6620
> 
> 
> 
> On 2/11/13 8:09 AM, "Mark Bolstad"  wrote:
> 
> It's not just one plugin, it was about 6 of them. I just deleted the error 
> message from the others as I believed that opal_init was the problem.
> 
> However, I have done a full build multiple times and have blown away all the 
> plugins and other remnants of the build and install and get the same results 
> every time.
> 
> Here's the output from running ompi_info (same result with or w/o OPAL_PREFIX 
> are the same; LD_LIBRARY_PATH set; path points to both bin and lib directory 
> ):
> 
> [bolstadm-lm3:~/papillon/build/src] bolstadm% ompi_info   
>Package: Open MPI bolstadm@bolstadm-lm3 Distribution
> Open MPI: 1.6.3
>Open MPI SVN revision: r27472
>Open MPI release date: Oct 24, 2012
> Open RTE: 1.6.3
>Open RTE SVN revision: r27472
>Open RTE release date: Oct 24, 2012
> OPAL: 1.6.3
>OPAL SVN revision: r27472
>OPAL release date: Oct 24, 2012
>  MPI API: 2.1
> Ident string: 1.6.3
>   Prefix: 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3
>  Configured architecture: x86_64-apple-darwin12.2.1
>   Configure host: bolstadm-lm3
>Configured by: bolstadm
>Configured on: Sun Feb 10 19:09:36 EST 2013
>   Configure host: bolstadm-lm3
> Built by: bolstadm
> Built on: Sun Feb 10 19:16:52 EST 2013
>   Built host: bolstadm-lm3
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: no
>   Fortran90 bindings: no
>  Fortran90 bindings size: na
>   C compiler: gcc
>  C compiler absolute: /usr/bin/gcc
>   C compiler family name: GNU
>   C compiler version: 4.2.1
> C++ compiler: g++
>C++ compiler absolute: /usr/bin/g++
>   Fortran77 compiler: gfortran
>   Fortran77 compiler abs: /usr/bin/gfortran
>   Fortran90 compiler: none
>   Fortran90 compiler abs: none
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: no
>  Fortran90 profiling: no
>   C++ exceptions: no
>   Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
>Sparse Groups: no
>   Internal debug support: no
>   MPI interface warnings: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>Heterogeneous support: no
>  mpirun default --prefix: no
>  MPI I/O support: yes
>MPI_WTIME support: gettimeofday
>  Symbol vis. support: yes
>Host topology support: yes
>   MPI extensions: affinity example
>FT Checkpoint support: no (checkpoint thread: no)
>  VampirTrace support: yes
>   MPI_MAX_PROCESSOR_NAME: 256
> MPI_MAX_ERROR_STRING: 256
>  MPI_MAX_OBJECT_NAME: 64
> MPI_MAX_INFO_KEY: 36
> MPI_MAX_INFO_VAL: 256
>MPI_MAX_PORT_NAME: 1024
>   MPI_MAX_DATAREP_STRING: 128
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_paffinity_hwloc:
>  lt_dlerror() returned NULL! (ignored)
> [bolstadm-lm3:86426] mca: base: component_find: unable to open 
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_carto_auto_detec

Re: [OMPI users] Simple MPI hello world hangs over IB

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 4, 2013, at 10:55 AM, Bharath Ramesh  wrote:

> I am trying to debug an issue which is really weird. I have
> simple MPI hello world application (attached) that hangs when I
> try to run on our cluster using 256 nodes with 16 cores on each
> node. The cluster uses QDR IB.
> 
> I am able to run the test over ethernet by excluding openib from
> the btl. However, what is weird is that for the same set of nodes
> xhpl completes without any error using 256 nodes and 16 cores. I
> have tried running the Pallas MPI Benchmark and it also behaves
> similarly to hello world and ends up hanging when I run it using
> 256 nodes.

Sorry for the delay; I was on travel all last week and fell behind.

I'm not sure I can parse your scenario description.  Are you saying:

- hello world over IB hangs at 256*16 procs
- hello world over TCP works at 256*16 procs
- xhpl over TCP works at 256*16 procs
- IMB over ?TCP|IB? hangs at 256*16 procs

> When I attach gdb to the MPI processes and look at the backtrace
> I see that close ~1000 of the MPI processes are stuck in MPI_Send
> while the others are waiting in MPI_Finalize. I have checked to
> make sure that the ulimit setting for locked memory is unlimited.
> The number of open files per process is 131072. The default MPI
> stack provided is openmpi-1.6.1 on the system. I compiled
> openmpi-1.6.3 in my home directory and the behavior remains to be
> the same.
> 
> I would appreciate any help in debugging this issue.

Can you try the 1.6.4rc?  http://www.open-mpi.org/software/ompi/v1.6/

> -- 
> Bharath
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] mmap and MPI_File_Read

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 2, 2013, at 3:52 AM, Andreas Bok Andersen  wrote:

> I am using Open-MPI in a parallelization of matrix multiplication for large 
> matrices. 
> My question is: 
> -  Is MPI_File_read using mmapping under the hood when reading a binary file. 

Sorry for the delay in replying; my INBOX is a disaster.

It depends on what driver you compiled; I'm guessing it's the standard NFS file 
I/O driver.  In this case, OMPI is just using open(), which may use mmap() 
under the covers.

> - Or is the better/most efficient solution to read the input files using the 
> native mmap in C++

You'll have to play with this yourself to see which works best in your 
environment.  Unfortunately, in at least this case, there's no "method X always 
works better than method Y" kind of advice available -- there's far too much 
variation in individual execution environments, connection to storage, and 
application access patterns.

A third variation to try might be to read in the file in a single MPI process 
and MPI_Scatter (or Broadcast?) the data out to all other processes.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Mark Bolstad
That's what I noticed, no .so's (actually, I noticed that the dlname in the
.la file is empty. thank you, dtruss)

I've built it two different ways:
--disable-mpi-f77

and
 --prefix=/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3
--disable-mpi-f77 --with-openib=no --enable-shared --disable-static

Both give me the same errors and no .so's.

I noticed that I point to the maports libtool (/opt/local/bin/libtool) so I
changed the path to find /usr/bin first to no avail. I changed the compiler
from gcc to clang and that didn't work either.

Where do the shared objects get created in the build cycle?

Mark

On Mon, Feb 11, 2013 at 1:35 PM, Jeff Squyres (jsquyres)  wrote:

> Ah -- your plugins are all .a files.
>
> How did you configure/build Open MPI?
>
>
> On Feb 11, 2013, at 11:09 AM, Mark Bolstad 
> wrote:
>
> > It's not just one plugin, it was about 6 of them. I just deleted the
> error message from the others as I believed that opal_init was the problem.
> >
> > However, I have done a full build multiple times and have blown away all
> the plugins and other remnants of the build and install and get the same
> results every time.
> >
> > Here's the output from running ompi_info (same result with or w/o
> OPAL_PREFIX are the same; LD_LIBRARY_PATH set; path points to both bin and
> lib directory ):
> >
> > [bolstadm-lm3:~/papillon/build/src] bolstadm% ompi_info
>  Package: Open MPI bolstadm@bolstadm-lm3Distribution
> > Open MPI: 1.6.3
> >Open MPI SVN revision: r27472
> >Open MPI release date: Oct 24, 2012
> > Open RTE: 1.6.3
> >Open RTE SVN revision: r27472
> >Open RTE release date: Oct 24, 2012
> > OPAL: 1.6.3
> >OPAL SVN revision: r27472
> >OPAL release date: Oct 24, 2012
> >  MPI API: 2.1
> > Ident string: 1.6.3
> >   Prefix:
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3
> >  Configured architecture: x86_64-apple-darwin12.2.1
> >   Configure host: bolstadm-lm3
> >Configured by: bolstadm
> >Configured on: Sun Feb 10 19:09:36 EST 2013
> >   Configure host: bolstadm-lm3
> > Built by: bolstadm
> > Built on: Sun Feb 10 19:16:52 EST 2013
> >   Built host: bolstadm-lm3
> >   C bindings: yes
> > C++ bindings: yes
> >   Fortran77 bindings: no
> >   Fortran90 bindings: no
> >  Fortran90 bindings size: na
> >   C compiler: gcc
> >  C compiler absolute: /usr/bin/gcc
> >   C compiler family name: GNU
> >   C compiler version: 4.2.1
> > C++ compiler: g++
> >C++ compiler absolute: /usr/bin/g++
> >   Fortran77 compiler: gfortran
> >   Fortran77 compiler abs: /usr/bin/gfortran
> >   Fortran90 compiler: none
> >   Fortran90 compiler abs: none
> >  C profiling: yes
> >C++ profiling: yes
> >  Fortran77 profiling: no
> >  Fortran90 profiling: no
> >   C++ exceptions: no
> >   Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
> >Sparse Groups: no
> >   Internal debug support: no
> >   MPI interface warnings: no
> >  MPI parameter check: runtime
> > Memory profiling support: no
> > Memory debugging support: no
> >  libltdl support: yes
> >Heterogeneous support: no
> >  mpirun default --prefix: no
> >  MPI I/O support: yes
> >MPI_WTIME support: gettimeofday
> >  Symbol vis. support: yes
> >Host topology support: yes
> >   MPI extensions: affinity example
> >FT Checkpoint support: no (checkpoint thread: no)
> >  VampirTrace support: yes
> >   MPI_MAX_PROCESSOR_NAME: 256
> > MPI_MAX_ERROR_STRING: 256
> >  MPI_MAX_OBJECT_NAME: 64
> > MPI_MAX_INFO_KEY: 36
> > MPI_MAX_INFO_VAL: 256
> >MPI_MAX_PORT_NAME: 1024
> >   MPI_MAX_DATAREP_STRING: 128
> > [bolstadm-lm3:86426] mca: base: component_find: unable to open
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_paffinity_hwloc:
> lt_dlerror() returned NULL! (ignored)
> > [bolstadm-lm3:86426] mca: base: component_find: unable to open
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_carto_auto_detect:
> lt_dlerror() returned NULL! (ignored)
> > [bolstadm-lm3:86426] mca: base: component_find: unable to open
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_carto_file:
> lt_dlerror() returned NULL! (ignored)
> > [bolstadm-lm3:86426] mca: base: component_find: unable to open
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_shmem_mmap:
> lt_dlerror() returned NULL! (ignored)
> > [bolstadm-lm3:86426] mca: base: component_find: unable to open
> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3/lib/openmpi/mca_shmem_posix:
> lt_dlerror() 

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 11, 2013, at 2:46 PM, Mark Bolstad  wrote:

> That's what I noticed, no .so's (actually, I noticed that the dlname in the 
> .la file is empty. thank you, dtruss)

Please send all the information listed here:

http://www.open-mpi.org/community/help/

> I've built it two different ways:
> --disable-mpi-f77
> 
> and
>  --prefix=/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3 
> --disable-mpi-f77 --with-openib=no --enable-shared --disable-static
> 
> Both give me the same errors and no .so's.

That's weird -- it should make .so's in both cases.

> I noticed that I point to the maports libtool (/opt/local/bin/libtool) so I 
> changed the path to find /usr/bin first to no avail. I changed the compiler 
> from gcc to clang and that didn't work either.

configure/make should be using the "libtool" that is internal to the expanded 
tarball tree, so whichever libtool your PATH points to shouldn't matter.

> Where do the shared objects get created in the build cycle?

All throughout the build, actually.  Generally, they're created in the 
*/mca/*/* directories in the source tree.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Beatty, Daniel D CIV NAVAIR, 474300D
Hi Jeff,
The Intel+PPC is one issue.  However, even on Intel, there tends to be a
distinction between Intel environments going from Xeon to Core iX
environments. While Objective-C/C/C++ handle this well, the Fortran
compilers have given me a different story over the years.  It tends to be
the case that not all of the 64 bit compilations are not the same,
especially with the Fortran compilers.  The LLVM compiler for Objective C
with C based MPI constructs tends to be more consistent.  However, if I have
to reference legacy Fortran code, then I could be in trouble.   I am hoping
that in my next development cycle to make an architecture that insulates
this effect, regardless of language.

The second issue of the host files has to do with a defunct capability for
example Xgrid.  It used to be that various Xgrid sites were reasonable
sources of information with regards to MPI and one could get the recipes to
equip either a cluster or grid environment.  Examples included TenGrid
(http:/tengrid.com) , MacResearch ( http://www.macresearch.org), etc.  I
hope that the new amount of data being collected at the open MPI site will
facilitate both the mobile and data center variety of MPI alike.  The beauty
of mobility is the feature that the Mac, Windows, and potentially iOS, and
Andriod platforms bring to the concept of MPI.  The cost is keeping track of
when such devices come and go from such grids.

I may had a misarticulated question with regards to the marshaling
capabilities of MPI.  It may have improved since I last used MPI.  I know
that there are standards bodies for MPI itself.  Therefore, I will need to
check to see what changes have occurred.


Daniel Beatty, Ph.D.
Computer Scientist, Detonation Sciences Branch
Code 474300D
1 Administration Circle M/S 1109
China Lake, CA 93555
daniel.bea...@navy.mil
(LandLine) (760)939-7097
(iPhone) (806)438-6620




On 2/11/13 10:37 AM, "Jeff Squyres (jsquyres)"  wrote:

> On Feb 11, 2013, at 1:11 PM, "Beatty, Daniel D CIV NAVAIR, 474300D"
>  wrote:
> 
>> There are two issues that have concerned me.  One is universal capabilities,
>> namely ensuring that the library allows the same results for binaries in both
>> any of their universal compiled forms.
> 
> Not sure what you mean here -- are you referring to Intel+PPC "universal" OS X
> binaries?
> 
>> Also, the linking of the MPI host files and marshaling capabilities.  I am
>> hoping to address these issues in design before I get to implementation.
>> Naturally, there is a matter of tinkering that goes back and forth.  So I
>> will need some help here.  Is there a standard for MPI currently in existence
>> that enables such a thing?
> 
> Er... I'm also not sure what you mean here, either.  :-(
> 
> What exactly do you mean by "linking of MPI host files"?  Host files are just
> text files; they're not involved with the (run-time or compile-time) linker at
> all.
> 
> What marshaling capabilities are you referring to, MPI datatypes?
> 
>> If there is a standard, is there a way to accredit such a standard for OSX if
>> it gets built for universal capabilities?   What is the standards body for
>> MPI to facilitate this?   If such a thing is built, how do we contribute back
>> in a standards consistent way?
>> 
>> 
>> 
>> V/R,
>> 
>> Daniel Beatty, Ph.D.
>> Computer Scientist, Detonation Sciences Branch
>> Code 474300D
>> 1 Administration Circle M/S 1109
>> China Lake, CA 93555
>> daniel.bea...@navy.mil
>> (LandLine) (760)939-7097
>> (iPhone) (806)438-6620
>> 
>> 
>> 
>> On 2/11/13 8:09 AM, "Mark Bolstad"  wrote:
>> 
>> It's not just one plugin, it was about 6 of them. I just deleted the error
>> message from the others as I believed that opal_init was the problem.
>> 
>> However, I have done a full build multiple times and have blown away all the
>> plugins and other remnants of the build and install and get the same results
>> every time.
>> 
>> Here's the output from running ompi_info (same result with or w/o OPAL_PREFIX
>> are the same; LD_LIBRARY_PATH set; path points to both bin and lib directory
>> ):
>> 
>> [bolstadm-lm3:~/papillon/build/src] bolstadm% ompi_info
>> Package: Open MPI bolstadm@bolstadm-lm3 Distribution
>> Open MPI: 1.6.3
>>Open MPI SVN revision: r27472
>>Open MPI release date: Oct 24, 2012
>> Open RTE: 1.6.3
>>Open RTE SVN revision: r27472
>>Open RTE release date: Oct 24, 2012
>> OPAL: 1.6.3
>>OPAL SVN revision: r27472
>>OPAL release date: Oct 24, 2012
>>  MPI API: 2.1
>> Ident string: 1.6.3
>>   Prefix:
>> /Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3
>>  Configured architecture: x86_64-apple-darwin12.2.1
>>   Configure host: bolstadm-lm3
>>Configured by: bolstadm
>>Configured on: Sun Feb 10 19:09:36 EST 2013
>>   Configure host: bolstadm-lm3
>> Built by: bolstadm
>> Built on: Sun Feb 1

Re: [OMPI users] mpirun completes for one user, not for another

2013-02-11 Thread Daniel Fetchinson
Thanks a lot, this was exactly the problem:

> Make sure that the PATH really is identical between users -- especially for
> non-iteractive logins.  E.g.:
>
> env

Here PATH was correct.

> vs.
>
> ssh othernode env

Here PATH was not correct. The PATH was set in .bash_profile and
apparently in non-interactive logins .bash_profile is not sourced.
Only .bashrc is sourced. So if the PATH is set in .bashrc everything
is fine and the problem went away.

Thanks again,
Daniel


> Also check the LD_LIBRARY_PATH.
>
>
> On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson 
> wrote:
>
>> Hi folks,
>>
>> I have a really strange problem: a super simple MPI test program (see
>> below) runs successfully for all users when executed on 4 processes in
>> 1 node, but hangs for user A and runs successfully for user B when
>> executed on 8 processes in 2 nodes. The executable used is the same
>> and the appfile used is also the same for user A and user B. Both
>> users launch it by
>>
>> mpirun --app appfile
>>
>> where the content of 'appfile' is
>>
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>>
>> for the single node run with 4 processes and is replaced by
>>
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node2 -wdir /tmp/test ./test
>> -np 1 -host node2 -wdir /tmp/test ./test
>> -np 1 -host node2 -wdir /tmp/test ./test
>> -np 1 -host node2 -wdir /tmp/test ./test
>>
>> for the 2-node run with 8 processes. Just to recap, the single node
>> run works for both user A and user B, but the 2-node run only works
>> for user B and it hangs for user A. It does respond to Ctrl-C though.
>> Both users use bash, have set up passwordless ssh, are able to ssh
>> from node1 to node2 and back, have the same PATH and use the same
>> 'mpirun' executable.
>>
>> At this point I've run out of ideas what to check and debug because
>> the setups look really identical. The test program is simply
>>
>> #include 
>> #include 
>>
>> int main( int argc, char **argv )
>> {
>> int node;
>>
>> MPI_Init( &argc, &argv );
>> MPI_Comm_rank( MPI_COMM_WORLD, &node );
>>
>> printf( "First Hello World from Node %d\n", node );
>> MPI_Barrier( MPI_COMM_WORLD );
>> printf( "Second Hello World from Node %d\n",node );
>>
>> MPI_Finalize(  );
>>
>> return 0;
>> }
>>
>>
>> I also asked both users to compile the test program separately, and
>> the resulting executable 'test' is the same for both indicating again
>> that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is
>> 1.5. and the interconnect is infiniband.
>>
>> I've really run out of ideas what else to compare between user A and B.
>>
>> Thanks for any hints,
>> Daniel
>>
>>
>>
>>
>>
>> --
>> Psss, psss, put it down! - http://www.cafepress.com/putitdown
>>
>>
>>
>> --
>> Psss, psss, put it down! - http://www.cafepress.com/putitdown
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown


Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Mark Bolstad
I packed the compile info as requested but the message is to big. Changing
the compression didn't help. I can split it, or do you just want to approve
it out of the hold queue?

Mark

On Mon, Feb 11, 2013 at 3:03 PM, Jeff Squyres (jsquyres)  wrote:

> On Feb 11, 2013, at 2:46 PM, Mark Bolstad 
> wrote:
>
> > That's what I noticed, no .so's (actually, I noticed that the dlname in
> the .la file is empty. thank you, dtruss)
>
> Please send all the information listed here:
>
> http://www.open-mpi.org/community/help/
>
> > I've built it two different ways:
> > --disable-mpi-f77
> >
> > and
> >
>  --prefix=/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3
> --disable-mpi-f77 --with-openib=no --enable-shared --disable-static
> >
> > Both give me the same errors and no .so's.
>
> That's weird -- it should make .so's in both cases.
>
> > I noticed that I point to the maports libtool (/opt/local/bin/libtool)
> so I changed the path to find /usr/bin first to no avail. I changed the
> compiler from gcc to clang and that didn't work either.
>
> configure/make should be using the "libtool" that is internal to the
> expanded tarball tree, so whichever libtool your PATH points to shouldn't
> matter.
>
> > Where do the shared objects get created in the build cycle?
>
> All throughout the build, actually.  Generally, they're created in the
> */mca/*/* directories in the source tree.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Fwd: an error when running MPI on 2 machines

2013-02-11 Thread Paul Gribelyuk
Hi Jeff,
Thank you for your email.  The program make an MPI_Reduce call as the only form 
of explicit communication between machines… I said it was simple because it's 
effectively a very trivial distributed computation for me to learn MPI.  I am 
using the same version, by doing "brew install openmpi" on each of the 
machines.  They're both running the last update of OSX 10.7 but their PATHs and 
LD_LIBRARY_PATHs might be slightly different.  I am able to run n-way jobs on a 
single machine.

UPDATE: I wish I could reproduce the error, because now it's gone and I can run 
the same program from each machine in the hostfile.  I would still be very 
interested to know what kind of MPI situations are likely to cause these kinds 
of seg faults….

-Paul

On Feb 11, 2013, at 8:27 AM, Jeff Squyres (jsquyres) wrote:

> Can you provide any more detail?  
> 
> Your report looks weird - you said its a simple c++ hello world, but the 
> executable you show is "pi", which is typically a simple C example program. 
> 
> Are you using the same version of open MPI on all nodes?  Are you able to run 
> n way jobs on single nodes?
> 
> Sent from my phone. No type good. 
> 
> On Feb 9, 2013, at 2:03 PM, "Paul Gribelyuk"  wrote:
> 
>>> Hello,
>>> I am getting the following stacktrace when running a simple hello world MPI 
>>> C++ program on 2 machines:
>>> 
>>> 
>>> mini:mpi_cw paul$ mpirun --prefix /usr/local/Cellar/open-mpi/1.6.3 
>>> --hostfile hosts_home -np 2 ./pi 100
>>> rank and name: 0 aka mini.local
>>> [home-mini:12175] *** Process received signal ***
>>> [home-mini:12175] Signal: Segmentation fault: 11 (11)
>>> [home-mini:12175] Signal code: Address not mapped (1)
>>> [home-mini:12175] Failing at address: 0x1042e
>>> [home-mini:12175] [ 0] 2   libsystem_c.dylib   
>>> 0x7fff94050cfa _sigtramp + 26
>>> [home-mini:12175] [ 1] 3   mca_btl_tcp.so  
>>> 0x00010397092c best_addr + 2620
>>> [home-mini:12175] [ 2] 4   pi  
>>> 0x000103649d24 start + 52
>>> [home-mini:12175] [ 3] 5   ??? 
>>> 0x0002 0x0 + 2
>>> [home-mini:12175] *** End of error message ***
>>> rank: 0 sum: 1.85459
>>> --
>>> mpirun noticed that process rank 1 with PID 12175 on node home-mini.local 
>>> exited on signal 11 (Segmentation fault: 11).
>>> --
>>> 
>>> 
>>> 
>>> I get a similar result even when I don't use --prefix since the .bashrc 
>>> file on the remote machine is correctly pointing to PATH and LD_LIBRARY_PATH
>>> 
>>> Any help with this seg fault is greatly appreciated.  Thanks.
>>> 
>>> -Paul
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
I got your tarball (no need to re-send it).  

I'm a little confused by your output from make, though.

Did you run autogen?  If so, there's no need to do that -- try expanding a 
fresh tarball and just running ./configure and make.


On Feb 11, 2013, at 10:03 PM, Mark Bolstad  wrote:

> I packed the compile info as requested but the message is to big. Changing 
> the compression didn't help. I can split it, or do you just want to approve 
> it out of the hold queue?
> 
> Mark
> 
> On Mon, Feb 11, 2013 at 3:03 PM, Jeff Squyres (jsquyres)  
> wrote:
> On Feb 11, 2013, at 2:46 PM, Mark Bolstad  wrote:
> 
> > That's what I noticed, no .so's (actually, I noticed that the dlname in the 
> > .la file is empty. thank you, dtruss)
> 
> Please send all the information listed here:
> 
> http://www.open-mpi.org/community/help/
> 
> > I've built it two different ways:
> > --disable-mpi-f77
> >
> > and
> >  
> > --prefix=/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3 
> > --disable-mpi-f77 --with-openib=no --enable-shared --disable-static
> >
> > Both give me the same errors and no .so's.
> 
> That's weird -- it should make .so's in both cases.
> 
> > I noticed that I point to the maports libtool (/opt/local/bin/libtool) so I 
> > changed the path to find /usr/bin first to no avail. I changed the compiler 
> > from gcc to clang and that didn't work either.
> 
> configure/make should be using the "libtool" that is internal to the expanded 
> tarball tree, so whichever libtool your PATH points to shouldn't matter.
> 
> > Where do the shared objects get created in the build cycle?
> 
> All throughout the build, actually.  Generally, they're created in the 
> */mca/*/* directories in the source tree.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 11, 2013, at 3:41 PM, "Beatty, Daniel D CIV NAVAIR, 474300D" 
 wrote:

> The Intel+PPC is one issue.  However, even on Intel, there tends to be a
> distinction between Intel environments going from Xeon to Core iX
> environments. While Objective-C/C/C++ handle this well, the Fortran
> compilers have given me a different story over the years.  It tends to be
> the case that not all of the 64 bit compilations are not the same,
> especially with the Fortran compilers.  

I don't know if there are any Fortran compilers who create OS X universal 
binaries.

> The second issue of the host files has to do with a defunct capability for
> example Xgrid.  It used to be that various Xgrid sites were reasonable
> sources of information with regards to MPI and one could get the recipes to
> equip either a cluster or grid environment.  Examples included TenGrid
> (http:/tengrid.com) , MacResearch ( http://www.macresearch.org), etc.  

I'm still not sure what you're asking here -- Xgrid is long dead.

> I hope that the new amount of data being collected at the open MPI site will
> facilitate both the mobile and data center variety of MPI alike.  

What data collection are you referring to?

> The beauty
> of mobility is the feature that the Mac, Windows, and potentially iOS, and
> Andriod platforms bring to the concept of MPI.  The cost is keeping track of
> when such devices come and go from such grids.

People have experimented with trying to do parallel computing on mobile 
devices, but power has always been a problem.

> I may had a misarticulated question with regards to the marshaling
> capabilities of MPI.  It may have improved since I last used MPI.  I know
> that there are standards bodies for MPI itself.  Therefore, I will need to
> check to see what changes have occurred.


MPI has datatypes, but they're not really the same thing as traditional 
marshaling/dynamic serializing.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] an error when running MPI on 2 machines

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 11, 2013, at 10:17 PM, Paul Gribelyuk  wrote:

> UPDATE: I wish I could reproduce the error, because now it's gone and I can 
> run the same program from each machine in the hostfile.  

Good!

> I would still be very interested to know what kind of MPI situations are 
> likely to cause these kinds of seg faults….

Usually, this is due to application errors or mismatches of libraries/versions.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/