Re: [OMPI users] WRF run on multiple Nodes

2011-04-05 Thread Ralph Castain
Did you request an allocation from PCM? If not, then PCM will block you from 
arbitrarily launching jobs on non-allocated nodes. Print out your environment 
and look for any envars from PCM and/or LSF (e.g., LSB_JOBID).

I don't know what you mean about "no OMPI application is yet integrated with 
LSF" - an application would never be integrated with LSF. However, OMPI will 
configure itself to use LSF as its launcher if it detects the presence of LSF 
on the system. When that happens, you no longer need to supply a machinefile as 
OMPI will automatically pickup the list of allocated nodes.


On Apr 4, 2011, at 9:31 PM, Ahsan Ali wrote:

> Dear John Hearns,
> 
>  The cluster is installed using Platform cluster Manager (PCM). LSF is 
> installed but no OpenMPI application is yet integrated with LSF.
>  WRF help gave me following instructions.
> 
> mpirun -v -machinefile ~/mach.conf -np 2 wrf.exe
> 
> Please talk to your computer manager about how to setup mach.conf and allow 
> communications between nodes.
> 
> Ahsan,
>  you have a Dell cluster. Can we ask which company installed the
> cluster, and who manages the cluster?
> The company who installed it should have given you some documentation
> on how to run MPI jobs.
> 
> Also can we ask if there is a batch scheduler or workload management
> software on this cluster?
> I ask because if there is PBS, Gridengine, LSF etc. installed there
> will be an 'integration; with OpenMPI
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] orte-odls-default:execv-error

2011-04-05 Thread SLIM H.A.
After an upgrade of our system I receive the following error message
(openmpi 1.4.2 with gridengine):

>quote

--
Sorry!  You were supposed to get help about:
orte-odls-default:execv-error
But I couldn't open the help file:
...path/1.4.2/share/openmpi/help-odls-default.txt: Cannot send after
transport endpoint shut
down.  Sorry!
>end quote

and this is this is the section in the text file
...path/1.4.2/share/openmpi/help-odls-default.txt that refers to
orte-odls-default:execv-error


>quote
[orte-odls-default:execv-error]
Could not execute the executable "%s": %s

This could mean that your PATH or executable name is wrong, or that you
do not
have the necessary permissions.  Please ensure that the executable is
able to be
found and executed."
>end quote

Does the execv-error mean that the file
...path/1.4.2/share/openmpi/help-odls-default.txt was not accessible or
is there a different reason?

The error message continues with

>quote

--
[cn004:00591] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_grpcomm_basic: file not found (ignored)
[cn004:00586] mca: base: component_find: unable to open 
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)
[cn004:00585] mca: base: component_find: unable to open 
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)

--
Sorry!  You were supposed to get help about:
find-available:none-found
But I couldn't open the help file:
...path/1.4.2/share/openmpi/help-mca-base.txt: Cannot send after
transport endpoint shutdown
.  Sorry!

--
[cn004:00586] PML ob1 cannot be selected
>end quote

but there are .so and .la libraries in the directory
...path/1.4.2/lib/openmpi
Are those the ones not found?

Thanks

Henk



Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread Terry Dontje

On 04/05/2011 05:11 AM, SLIM H.A. wrote:

After an upgrade of our system I receive the following error message
(openmpi 1.4.2 with gridengine):


quote


--
Sorry!  You were supposed to get help about:
 orte-odls-default:execv-error
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-odls-default.txt: Cannot send after
transport endpoint shut
down.  Sorry!

end quote

and this is this is the section in the text file
...path/1.4.2/share/openmpi/help-odls-default.txt that refers to
orte-odls-default:execv-error





quote

[orte-odls-default:execv-error]
Could not execute the executable "%s": %s

This could mean that your PATH or executable name is wrong, or that you
do not
have the necessary permissions.  Please ensure that the executable is
able to be
found and executed."

end quote

Does the execv-error mean that the file
...path/1.4.2/share/openmpi/help-odls-default.txt was not accessible or
is there a different reason?

No, it thinks it cannot find some executable that was requested to run.  
Do you have the exact mpirun command line that was trying to be ran?  
Can you first try and run without gridengine?

The error message continues with


quote


--
[cn004:00591] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_grpcomm_basic: file not found (ignored)
[cn004:00586] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)
[cn004:00585] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)

--
Sorry!  You were supposed to get help about:
 find-available:none-found
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-mca-base.txt: Cannot send after
transport endpoint shutdown
.  Sorry!

--
[cn004:00586] PML ob1 cannot be selected

end quote

but there are .so and .la libraries in the directory
...path/1.4.2/lib/openmpi
Are those the ones not found?
I've seen this when either OPAL_PREFIX or LD_LIBRARY_PATH not being set 
up correctly.

Thanks

Henk

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread Reuti
Am 05.04.2011 um 11:11 schrieb SLIM H.A.:

> After an upgrade of our system I receive the following error message
> (openmpi 1.4.2 with gridengine):

Did you move openmpi 1.4.2 to a new (i.e. different) location?

-- Reuti


>> quote
> 
> --
> Sorry!  You were supposed to get help about:
>orte-odls-default:execv-error
> But I couldn't open the help file:
>...path/1.4.2/share/openmpi/help-odls-default.txt: Cannot send after
> transport endpoint shut
> down.  Sorry!
>> end quote
> 
> and this is this is the section in the text file
> ...path/1.4.2/share/openmpi/help-odls-default.txt that refers to
> orte-odls-default:execv-error
> 
> 
>> quote
> [orte-odls-default:execv-error]
> Could not execute the executable "%s": %s
> 
> This could mean that your PATH or executable name is wrong, or that you
> do not
> have the necessary permissions.  Please ensure that the executable is
> able to be
> found and executed."
>> end quote
> 
> Does the execv-error mean that the file
> ...path/1.4.2/share/openmpi/help-odls-default.txt was not accessible or
> is there a different reason?
> 
> The error message continues with
> 
>> quote
> 
> --
> [cn004:00591] mca: base: component_find: unable to open
> ...path/1.4.2/lib/openmpi/mca_grpcomm_basic: file not found (ignored)
> [cn004:00586] mca: base: component_find: unable to open 
> ...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)
> [cn004:00585] mca: base: component_find: unable to open 
> ...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)
> 
> --
> Sorry!  You were supposed to get help about:
>find-available:none-found
> But I couldn't open the help file:
>...path/1.4.2/share/openmpi/help-mca-base.txt: Cannot send after
> transport endpoint shutdown
> .  Sorry!
> 
> --
> [cn004:00586] PML ob1 cannot be selected
>> end quote
> 
> but there are .so and .la libraries in the directory
> ...path/1.4.2/lib/openmpi
> Are those the ones not found?
> 
> Thanks
> 
> Henk
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] problems with the -xterm option

2011-04-05 Thread jody
Hi

On my workstation and  the cluster i set up OpenMPI (v 1.4.2) so that
it works in "text-mode":
  $ mpirun -np 4  -x DISPLAY -host squid_0   printenv | grep WORLD_RANK
  OMPI_COMM_WORLD_RANK=0
  OMPI_COMM_WORLD_RANK=1
  OMPI_COMM_WORLD_RANK=2
  OMPI_COMM_WORLD_RANK=3

but when i use  the -xterm option to mpirun, it doesn't work

 $ mpirun -np 4  -x DISPLAY -host squid_0 -xterm 1,2  printenv | grep WORLD_RANK
  Warning: untrusted X11 forwarding setup failed: xauth key data not generated
  Warning: No xauth data; using fake authentication data for X11 forwarding.
  OMPI_COMM_WORLD_RANK=0
  [squid_0:05266] [[55607,0],1]->[[55607,0],0]
mca_oob_tcp_msg_send_handler: writev failed: Bad file descriptor (9)
[sd = 8]
  [squid_0:05266] [[55607,0],1] routed:binomial: Connection to
lifeline [[55607,0],0] lost
  /usr/bin/xterm Xt error: Can't open display: chefli.uzh.ch:0.0
  /usr/bin/xterm Xt error: Can't open display: chefli.uzh.ch:0.0

(strange: somebody wrote his message to the console)

No matter whether i set the DISPLAY variable to the full hostname of
the workstation,
to the IP-Adress of the workstation or simply to ":0.0", it doesn't work

But i do have xauth data (as far as i know):
On the remote (squid_0):
  jody@squid_0 ~ $ xauth list
  chefli/unix:10  MIT-MAGIC-COOKIE-1  5293e179bc7b2036d87cbcdf14891d0c
  chefli/unix:0  MIT-MAGIC-COOKIE-1  146c7f438fab79deb8a8a7df242b6f4b
  chefli.uzh.ch:0  MIT-MAGIC-COOKIE-1  146c7f438fab79deb8a8a7df242b6f4b

on the workstation:
  $ xauth list
  chefli/unix:10  MIT-MAGIC-COOKIE-1  5293e179bc7b2036d87cbcdf14891d0c
  chefli/unix:0  MIT-MAGIC-COOKIE-1  146c7f438fab79deb8a8a7df242b6f4b
  localhost.localdomain/unix:0  MIT-MAGIC-COOKIE-1
146c7f438fab79deb8a8a7df242b6f4b
  chefli.uzh.ch/unix:0  MIT-MAGIC-COOKIE-1  146c7f438fab79deb8a8a7df242b6f4b

In sshd_config on the workstation i have 'X11Forwarding yes'
I have also done
   xhost + squid_0
on the workstation.


How can i get the -xterm option running?

Thank You
  Jody


Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread SLIM H.A.
 

Hi Terry

 

I think the problem may have been caused now by our lustre file system
being sick, so I'll wait until that is fixed. 

It worked outside gridengine but I think I did not include --mca btl
self,sm,ib or the corresponding environment variables with gridengine,
although it usually finds the fastest interconnect.

 

>I've seen this when either OPAL_PREFIX or LD_LIBRARY_PATH not being set
up correctly.

 

LD_LIBRARY_PATH is set correctly but where is OPAL_PREFIX set?

 

Thanks

 

Henk

 

 

 

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Terry Dontje
Sent: 05 April 2011 11:21
To: us...@open-mpi.org
Subject: Re: [OMPI users] orte-odls-default:execv-error

 

On 04/05/2011 05:11 AM, SLIM H.A. wrote: 

After an upgrade of our system I receive the following error message
(openmpi 1.4.2 with gridengine):
 

quote


--
Sorry!  You were supposed to get help about:
orte-odls-default:execv-error
But I couldn't open the help file:
...path/1.4.2/share/openmpi/help-odls-default.txt: Cannot send after
transport endpoint shut
down.  Sorry!

end quote

 
and this is this is the section in the text file
...path/1.4.2/share/openmpi/help-odls-default.txt that refers to
orte-odls-default:execv-error
 
 





quote

[orte-odls-default:execv-error]
Could not execute the executable "%s": %s
 
This could mean that your PATH or executable name is wrong, or that you
do not
have the necessary permissions.  Please ensure that the executable is
able to be
found and executed."

end quote

 
Does the execv-error mean that the file
...path/1.4.2/share/openmpi/help-odls-default.txt was not accessible or
is there a different reason?
 

No, it thinks it cannot find some executable that was requested to run.
Do you have the exact mpirun command line that was trying to be ran?
Can you first try and run without gridengine? 

The error message continues with
 

quote


--
[cn004:00591] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_grpcomm_basic: file not found (ignored)
[cn004:00586] mca: base: component_find: unable to open 
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)
[cn004:00585] mca: base: component_find: unable to open 
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)

--
Sorry!  You were supposed to get help about:
find-available:none-found
But I couldn't open the help file:
...path/1.4.2/share/openmpi/help-mca-base.txt: Cannot send after
transport endpoint shutdown
.  Sorry!

--
[cn004:00586] PML ob1 cannot be selected

end quote

 
but there are .so and .la libraries in the directory
...path/1.4.2/lib/openmpi
Are those the ones not found?

I've seen this when either OPAL_PREFIX or LD_LIBRARY_PATH not being set
up correctly.



 
Thanks
 
Henk
 
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

 

-- 
 
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

 

 



Re: [OMPI users] alltoall messages > 2^26

2011-04-05 Thread Michael Di Domenico
There are no messages being spit out, but i'm not sure i have all the
correct debugs turn on.  I turned on -debug-devel -debug-daemons and
mca_verbose.  but it appears that the process just hangs.

If it's memory exhaustion its not from the core memory these nodes
have 48GB of memory, it could be a buffer somewhere, but i'm not sure
where

On Mon, Apr 4, 2011 at 10:17 PM, David Zhang  wrote:
> Any error messages?  Maybe the nodes ran out of memory?  I know MPI
> implement some kind of buffering under the hood, so even though you're
> sending array's over 2^26 in size, it may require more than that for MPI to
> actually send it.
>
> On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico 
> wrote:
>>
>> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
>> messages over 2^26 in size?
>>
>> For a reason i have not determined just yet machines on my cluster
>> (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
>> array's over 2^26 in size via the AllToAll collective. (user code)
>>
>> Further testing seems to indicate that an MPI message over 2^26 fails
>> (tested with IMB-MPI)
>>
>> Running the same test on a different older IB connected cluster seems
>> to work, which would seem to indicate a problem with the infiniband
>> drivers of some sort rather then openmpi (but i'm not sure).
>>
>> Any thoughts, directions, or tests?
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> David Zhang
> University of California, San Diego
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread SLIM H.A.
Hi Reuti

1.4.2 is still in the same location and I also built 1.4.3 anew. It
appeared the lustre and ib where not playing along and it is working
now.

Thanks

henk



> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
On
> Behalf Of Reuti
> Sent: 05 April 2011 11:23
> To: Open MPI Users
> Subject: Re: [OMPI users] orte-odls-default:execv-error
> 
> Am 05.04.2011 um 11:11 schrieb SLIM H.A.:
> 
> > After an upgrade of our system I receive the following error message
> > (openmpi 1.4.2 with gridengine):
> 
> Did you move openmpi 1.4.2 to a new (i.e. different) location?
> 
> -- Reuti
> 
> 
> >> quote
> >
-
> ---
> > --
> > Sorry!  You were supposed to get help about:
> >orte-odls-default:execv-error
> > But I couldn't open the help file:
> >...path/1.4.2/share/openmpi/help-odls-default.txt: Cannot send
> after
> > transport endpoint shut
> > down.  Sorry!
> >> end quote
> >
> > and this is this is the section in the text file
> > ...path/1.4.2/share/openmpi/help-odls-default.txt that refers to
> > orte-odls-default:execv-error
> >
> >
> >> quote
> > [orte-odls-default:execv-error]
> > Could not execute the executable "%s": %s
> >
> > This could mean that your PATH or executable name is wrong, or that
> you
> > do not
> > have the necessary permissions.  Please ensure that the executable
is
> > able to be
> > found and executed."
> >> end quote
> >
> > Does the execv-error mean that the file
> > ...path/1.4.2/share/openmpi/help-odls-default.txt was not accessible
> or
> > is there a different reason?
> >
> > The error message continues with
> >
> >> quote
> >
-
> ---
> > --
> > [cn004:00591] mca: base: component_find: unable to open
> > ...path/1.4.2/lib/openmpi/mca_grpcomm_basic: file not found
(ignored)
> > [cn004:00586] mca: base: component_find: unable to open
> > ...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found
> (ignored)
> > [cn004:00585] mca: base: component_find: unable to open
> > ...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found
> (ignored)
> >
-
> ---
> > --
> > Sorry!  You were supposed to get help about:
> >find-available:none-found
> > But I couldn't open the help file:
> >...path/1.4.2/share/openmpi/help-mca-base.txt: Cannot send after
> > transport endpoint shutdown
> > .  Sorry!
> >
-
> ---
> > --
> > [cn004:00586] PML ob1 cannot be selected
> >> end quote
> >
> > but there are .so and .la libraries in the directory
> > ...path/1.4.2/lib/openmpi
> > Are those the ones not found?
> >
> > Thanks
> >
> > Henk
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread Terry Dontje

On 04/05/2011 07:37 AM, SLIM H.A. wrote:


Hi Terry

I think the problem may have been caused now by our lustre file system 
being sick, so I'll wait until that is fixed.


It worked outside gridengine but I think I did not include --mca btl 
self,sm,ib or the corresponding environment variables with gridengine, 
although it usually finds the fastest interconnect.


>I've seen this when either OPAL_PREFIX or LD_LIBRARY_PATH not being 
set up correctly.


LD_LIBRARY_PATH is set correctly but where is OPAL_PREFIX set?

OPAL_PREFIX should be set to the base directory of where OMPI is 
installed.  In theory it should not need to be set if configure's prefix 
option is the same place you installed OMPI.  I think it is only when 
you've moved the OMPI installation bits somewhere that doesn't 
corresponds to the configure prefix option.


Of course the same is similarly true with LD_LIBRARY_PATH that you 
really shouldn't need to set that in your scripts/shell if you've 
compiled the programs such that the Rpath is correctly passed to the linker.


--td


Thanks

Henk

*From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
*On Behalf Of *Terry Dontje

*Sent:* 05 April 2011 11:21
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] orte-odls-default:execv-error

On 04/05/2011 05:11 AM, SLIM H.A. wrote:

After an upgrade of our system I receive the following error message
(openmpi 1.4.2 with gridengine):
  


quote


--
Sorry!  You were supposed to get help about:
 orte-odls-default:execv-error
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-odls-default.txt: Cannot send after
transport endpoint shut
down.  Sorry!

end quote

  
and this is this is the section in the text file

...path/1.4.2/share/openmpi/help-odls-default.txt that refers to
orte-odls-default:execv-error
  
  




quote

[orte-odls-default:execv-error]
Could not execute the executable "%s": %s
  
This could mean that your PATH or executable name is wrong, or that you

do not
have the necessary permissions.  Please ensure that the executable is
able to be
found and executed."

end quote

  
Does the execv-error mean that the file

...path/1.4.2/share/openmpi/help-odls-default.txt was not accessible or
is there a different reason?
  

No, it thinks it cannot find some executable that was requested to 
run.  Do you have the exact mpirun command line that was trying to be 
ran?  Can you first try and run without gridengine?


The error message continues with
  


quote


--
[cn004:00591] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_grpcomm_basic: file not found (ignored)
[cn004:00586] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)
[cn004:00585] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)

--
Sorry!  You were supposed to get help about:
 find-available:none-found
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-mca-base.txt: Cannot send after
transport endpoint shutdown
.  Sorry!

--
[cn004:00586] PML ob1 cannot be selected

end quote

  
but there are .so and .la libraries in the directory

...path/1.4.2/lib/openmpi
Are those the ones not found?

I've seen this when either OPAL_PREFIX or LD_LIBRARY_PATH not being 
set up correctly.


  
Thanks
  
Henk
  
___

users mailing list
us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] MPI-2 I/O functions (Open MPI 1.5.x on Windows)

2011-04-05 Thread Satoi Ogawa
Hello, Rob, and Open MPI users,

Thank you for your advice.
I understand that current Open MPI 1.5.3 Win-32bit binary distributed
from this site
don't support MPI-IO on NTFS.

I try to check this problem with using other code.
The code is the following code:
http://www.mcs.anl.gov/research/projects/mpi/usingmpi2/examples/moreio/subarray_c.htm

First I insert the following lines after  "MPI_Init( &argc, &argv );" line.
m = 20;
n = 30;
Of course, the name of output file is ... "/pfs/datafile"-->"datafile".
It correctly works on Open MPI 1.4.1 on Linux Ubunts 10.04 LTS. I
found "datafile" after run.
However, It doesn't work correctly on Open MPI 1.5.3 32bit binary,
Windows 7 32bit SP1.
I can't find "datafile"

Sincerely yours,
Satoi



2011/4/4 Rob Latham :
> On Sat, Apr 02, 2011 at 09:07:55PM +0900, Satoi Ogawa wrote:
>> Dear Developers and Users,
>>
>> Thank you for your development of Open MPI.
>>
>> I want to use Open MPI 1.5.3 on Windows 7 32bit, one PC.
>> But there is something wrong with the part using MPI-2 I/O functions
>> in my program.
>> It correctly worked on Open MPI on Linux.
>> I would very much appreciate any information you could send me.
>> I can't find it in Open MPI User's Mailing List Archives.
>
> you probably need to configure OpenMPI so that ROMIO (the MPI-IO
> library) is built with "NTFS" support.
>
> ==rob


Satoi Ogawa 


[OMPI users] deny permission

2011-04-05 Thread mohd naseem
Sir,
  i made the bewoulf cluster but when i try to run examples that is
given in the mpich2.i get error i.e permission denied on other node.

please help me


[OMPI users] Not pointing to correct libraries

2011-04-05 Thread Warnett, Jason
Hello

I am running on Linux, latest version of mpi built but I've run into a few 
issues with a program which I am trying to run. It is a widely used open source 
application called LIGGGHTS so I know the code works and should compile, so I 
obviously have a setting wrong with MPI. I saw a similar problem in a previous 
post (2007), but couldn't see how to resolve it as I am quite new to the 
terminal environment in Unix (always been windows... until now).

So the issue I am getting is the following error...

[Jay@Jay chute_wear]$ mpirun -np 1 lmp_fedora < in.chute_wear
lmp_fedora: error while loading shared libraries: libmpi_cxx.so.0: cannot open 
shared object file: No such file or directory

So I checked where stuff was pointing using the ldd command as in that post and 
found the following:
linux-gate.so.1 =>  (0x00d1)
libmpi_cxx.so.0 => not found
libmpi.so.0 => not found
libopen-rte.so.0 => not found
libopen-pal.so.0 => not found
libdl.so.2 => /lib/libdl.so.2 (0x00cbe000)
libnsl.so.1 => /lib/libnsl.so.1 (0x007e6000)
libutil.so.1 => /lib/libutil.so.1 (0x009fa000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x04a02000)
libm.so.6 => /lib/libm.so.6 (0x008a4000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0011)
libpthread.so.0 => /lib/libpthread.so.0 (0x0055)
libc.so.6 => /lib/libc.so.6 (0x003b3000)
/lib/ld-linux.so.2 (0x00bfa000)

so it is the open mpi files it isn't linking to. How can i sort this? I 
shouldn't need to edit code of the executable of LIGGGHTS I've compiled as I 
know other people are using the same thing so I guess it is to do with the way 
I installed openMPI. I did a system search and couldn't find a file called 
libmpi* anywhere... so my guess is that I've incorrectly installed. I have 
tried several ways, but could you tell me how to fix it/ install correctly? 
(embaressing if it is to do with a correct install...)

Thanks

Jay


Re: [OMPI users] alltoall messages > 2^26

2011-04-05 Thread Terry Dontje
It was asked during the community concall whether the below may be 
related to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722?


--td

On 04/04/2011 10:17 PM, David Zhang wrote:
Any error messages?  Maybe the nodes ran out of memory?  I know MPI 
implement some kind of buffering under the hood, so even though you're 
sending array's over 2^26 in size, it may require more than that for 
MPI to actually send it.


On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico 
mailto:mdidomeni...@gmail.com>> wrote:


Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
messages over 2^26 in size?

For a reason i have not determined just yet machines on my cluster
(OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
array's over 2^26 in size via the AllToAll collective. (user code)

Further testing seems to indicate that an MPI message over 2^26 fails
(tested with IMB-MPI)

Running the same test on a different older IB connected cluster seems
to work, which would seem to indicate a problem with the infiniband
drivers of some sort rather then openmpi (but i'm not sure).

Any thoughts, directions, or tests?
___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
David Zhang
University of California, San Diego


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Not pointing to correct libraries

2011-04-05 Thread Terry Dontje
I am not sure Fedora comes with Open MPI installed on it by default (at 
least my FC13 did not).  You may want to look at trying to install the 
Open MPI from yum or some other package mananger.  Or you can download 
the source tarball from http://www.open-mpi.org/software/ompi/v1.4/, 
build and install it yourself.


--td

On 04/05/2011 11:01 AM, Warnett, Jason wrote:


Hello

I am running on Linux, latest version of mpi built but I've run into a 
few issues with a program which I am trying to run. It is a widely 
used open source application called LIGGGHTS so I know the code works 
and should compile, so I obviously have a setting wrong with MPI. I 
saw a similar problem in a previous post (2007), but couldn't see how 
to resolve it as I am quite new to the terminal environment in Unix 
(always been windows... until now).


So the issue I am getting is the following error...

[Jay@Jay chute_wear]$ mpirun -np 1 lmp_fedora < in.chute_wear
lmp_fedora: error while loading shared libraries: libmpi_cxx.so.0: 
cannot open shared object file: No such file or directory


So I checked where stuff was pointing using the ldd command as in that 
post and found the following:

linux-gate.so.1 =>  (0x00d1)
libmpi_cxx.so.0 => not found
libmpi.so.0 => not found
libopen-rte.so.0 => not found
libopen-pal.so.0 => not found
libdl.so.2 => /lib/libdl.so.2 (0x00cbe000)
libnsl.so.1 => /lib/libnsl.so.1 (0x007e6000)
libutil.so.1 => /lib/libutil.so.1 (0x009fa000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x04a02000)
libm.so.6 => /lib/libm.so.6 (0x008a4000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0011)
libpthread.so.0 => /lib/libpthread.so.0 (0x0055)
libc.so.6 => /lib/libc.so.6 (0x003b3000)
/lib/ld-linux.so.2 (0x00bfa000)

so it is the open mpi files it isn't linking to. How can i sort this? 
I shouldn't need to edit code of the executable of LIGGGHTS I've 
compiled as I know other people are using the same thing so I guess it 
is to do with the way I installed openMPI. I did a system search and 
couldn't find a file called libmpi* anywhere... so my guess is that 
I've incorrectly installed. I have tried several ways, but could you 
tell me how to fix it/ install correctly? (embaressing if it is to do 
with a correct install...)


Thanks

Jay


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Not pointing to correct libraries

2011-04-05 Thread Gus Correa

Warnett, Jason wrote:

Hello

I am running on Linux, latest version of mpi built but I've run into a 
few issues with a program which I am trying to run. It is a widely used 
open source application called LIGGGHTS so I know the code works and 
should compile, so I obviously have a setting wrong with MPI. I saw a 
similar problem in a previous post (2007), but couldn't see how to 
resolve it as I am quite new to the terminal environment in Unix (always 
been windows... until now).


So the issue I am getting is the following error...

[Jay@Jay chute_wear]$ mpirun -np 1 lmp_fedora < in.chute_wear
lmp_fedora: error while loading shared libraries: libmpi_cxx.so.0: 
cannot open shared object file: No such file or directory


So I checked where stuff was pointing using the ldd command as in that 
post and found the following:

linux-gate.so.1 =>  (0x00d1)
libmpi_cxx.so.0 => not found
libmpi.so.0 => not found
libopen-rte.so.0 => not found
libopen-pal.so.0 => not found
libdl.so.2 => /lib/libdl.so.2 (0x00cbe000)
libnsl.so.1 => /lib/libnsl.so.1 (0x007e6000)
libutil.so.1 => /lib/libutil.so.1 (0x009fa000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x04a02000)
libm.so.6 => /lib/libm.so.6 (0x008a4000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0011)
libpthread.so.0 => /lib/libpthread.so.0 (0x0055)
libc.so.6 => /lib/libc.so.6 (0x003b3000)
/lib/ld-linux.so.2 (0x00bfa000)

so it is the open mpi files it isn't linking to. How can i sort this? I 
shouldn't need to edit code of the executable of LIGGGHTS I've compiled 
as I know other people are using the same thing so I guess it is to do 
with the way I installed openMPI. I did a system search and couldn't 
find a file called libmpi* anywhere... so my guess is that I've 
incorrectly installed. I have tried several ways, but could you tell me 
how to fix it/ install correctly? (embaressing if it is to do with a 
correct install...)


Thanks

Jay


Hi Jay

You must set the LD_LIBRARY_PATH (in your .bashrc/.cshrc file)
to include the OpenMPI 'lib' directory. Something like

(for csh)
setenv LD_LIBRARY_PATH /path/to/openmpi/lib:$LD_LIBRARY_PATH

(for bash)
export LD_LIBRARY_PATH=/path/to/openmpi/lib:$LD_LIBRARY_PATH

Likewise, PATH should include the OpenMPI 'bin' directory.
See the OpenMPI FAQ for details:
http://www.open-mpi.org/faq/

Also, before trying more complex codes, I suggest that you test
your OpenMPI functionality with the simple example programs
that come in the OpenMPI 'examples' directory: hello_c.c, ring_c.c,
and connectivity_c.c.
It will save you many headaches.
Ex:
mpicc -o hello_c hello_c.c
then
mpirun -np 4 hello_c

How did you compile LIGGGHTS?
Did you use the OpenMPI compiler wrappers?

You say 'latest version of mpi built'.
Did you build OpenMPI from source?
Got it via yum perhaps?
Did you use the mpi compiler wrappers (mpicc,mpif90,etc)
*from OpenMPI* to build the application (LIGGGHTS)?

Is this a single machine or a cluster?

Are you sure LIGGGHTS runs in parallel mode with a single process?
You said 'mpirun -np 1 lmp_fedora < in.chute_wear'.
Wouldn't it need at least two processes, perhaps?
'mpirun -np 2 lmp_fedora < in.chute_wear'

I hope this helps,
Gus Correa


Re: [OMPI users] deny permission

2011-04-05 Thread David Zhang
have you set up ssh keys for all nodes to access each other?

On Tue, Apr 5, 2011 at 7:42 AM, mohd naseem  wrote:

> Sir,
>   i made the bewoulf cluster but when i try to run examples that is
> given in the mpich2.i get error i.e permission denied on other node.
>
> please help me
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Zhang
University of California, San Diego