Re: [OMPI users] try to understand heat equation 2D mpi version

2010-10-22 Thread Eugene Loh

christophe petit wrote:

i'm studying the parallelized version of a solving 2D heat equation 
code in order to understand cartesian topology and the famous 
"MPI_CART_SHIFT".

Here's my problem at this part of the code :


---
call MPI_INIT(infompi)
  comm = MPI_COMM_WORLD
  call MPI_COMM_SIZE(comm,nproc,infompi)
  call MPI_COMM_RANK(comm,me,infompi)
!

..


! Create 2D cartesian grid
  periods(:) = .false.

  ndims = 2
  dims(1)=x_domains
  dims(2)=y_domains
  CALL MPI_CART_CREATE(MPI_COMM_WORLD, ndims, dims, periods, &
reorganisation,comm2d,infompi)
!
! Identify neighbors
!
  NeighBor(:) = MPI_PROC_NULL
! Left/West and right/Est neigbors
  CALL MPI_CART_SHIFT(comm2d,0,1,NeighBor(W),NeighBor(E),infompi)
 
  print *,'rank=', me

  print *, 'here first mpi_cart_shift : neighbor(w)=',NeighBor(W)
  print *, 'here first mpi_cart_shift : neighbor(e)=',NeighBor(E)

...

---

with x_domains=y_domains=2

and i get at the execution :" mpirun -np 4 ./explicitPar"

 rank=   0
here first mpi_cart_shift : neighbor(w)=  -1
 here first  mpi_cart_shift : neighbor(e)=   2
rank=   3
 here first mpi_cart_shift : neighbor(w)=   1
 here first mpi_cart_shift : neighbor(e)=  -1
 rank=   2
 here first mpi_cart_shift : neighbor(w)=   0
 here first mpi_cart_shift : neighbor(e)=  -1
 rank=   1
 here first mpi_cart_shift : neighbor(w)=  -1
 here first mpi_cart_shift : neighbor(e)=   3

I saw that if the rank is out of the topology and wihtout periodicity, 
the rank should be equal to MPI_UNDEFINED whis is assigned to -32766 
in "mpif.h" . So, why have i got the value "-1" ?

On my Macbook pro, i get the value "-2".


It seems to me the man page says MPI_PROC_NULL may be returned, and in 
OMPI that looks like -2.  Can you try the following:


% cat x.f90
 include "mpif.h"

 integer comm, dims(2)
 logical periods(2)

 call MPI_INIT(ier)
 ndims = 2
 dims(1)=2;  periods(1) = .false.
 dims(2)=2;  periods(2) = .false.
 CALL MPI_CART_CREATE(MPI_COMM_WORLD, ndims, dims, periods, .false., 
comm, ier)

 CALL MPI_CART_SHIFT(comm, 0, 1, iwest, ieast, ier)
 write(6,*) iwest, ieast, MPI_PROC_NULL
 call MPI_Finalize(ier)
end
% mpif90 x.f90
% mpirun -n 4 ./a.out
1 -2 -2
-2 2 -2
-2 3 -2
0 -2 -2



[OMPI users] Some problems

2010-10-22 Thread 邵思睿
Hello, I'm using OpenMPI with VTK (Visualization Toolkit) now on Windows Vista, 
and here are some problems occured during installation.
 
OpenMPI 1.5: Error during CMake, no matter using MinGW32 or VS2005 as compiler
 
OpenMPI 1.4.3:
 
1 Building with VS2005 is OK, but when I used MinGW v3.81 (I had chosen MinGW 
in CMake and then used mingw32-make to build) it reported error at the very 
beginning (0%) of make progress
 
2 When I tried to build VTK with OM, it reported 'undefined reference to 
'MPI::Comm::Comm()', ''undefined reference to 'MPI::Win::Free()' and 'undefined 
reference to 'MPI::Datatype::Free()'
 
So could I get some help? Thanks!


  

[OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Vasiliy G Tolstov
Hello. May be this question already answered, but i can't see it in list
archive.

I'm running about 60 Xen nodes with about 7-20 virtual machines under
it. I want to gather disk,cpu,memory,network utilisation from virtual
machines and get it into database for later processing.

As i see, my architecture like this - One or two master servers with mpi
process with rank 0, that can insert data into database. This master
servers spawns on each Xen node mpi process, that gather statistics from
virtual machines on that node and send it to masters (may be with
multicast request). On each virtual machine i have process (mpi) that
can get and send data to mpi process on each Xen node. Virtual machine
have ability to migrate on other Xen node


Please, Can You help me with architecture of this system (is my thoughts
right) ?
And one more qeustion - that is the best way, to attach mpi process to
already running group? (for example, virtual machine is rebooted, or may
be Xen node rebooted)

Than You for any answers...

-- 
Vasiliy G Tolstov 
Selfip.Ru



Re: [OMPI users] Some problems

2010-10-22 Thread Jeff Squyres
Just to let you know -- our main Windows developer is out for a little bit.  
He'll reply when he returns, but until he does, there's really no one else who 
can answer your question.  Sorry!  :-\



On Oct 22, 2010, at 4:01 AM, 邵思睿 wrote:

> Hello, I'm using OpenMPI with VTK (Visualization Toolkit) now on Windows 
> Vista, and here are some problems occured during installation.
>  
> OpenMPI 1.5: Error during CMake, no matter using MinGW32 or VS2005 as compiler
>  
> OpenMPI 1.4.3:
>  
> 1 Building with VS2005 is OK, but when I used MinGW v3.81 (I had chosen MinGW 
> in CMake and then used mingw32-make to build) it reported error at the very 
> beginning (0%) of make progress
>  
> 2 When I tried to build VTK with OM, it reported 'undefined reference to 
> 'MPI::Comm::Comm()', ''undefined reference to 'MPI::Win::Free()' and 
> 'undefined reference to 'MPI::Datatype::Free()'
>  
> So could I get some help? Thanks!
> 
>  ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Reuti
Hi,

Am 22.10.2010 um 10:58 schrieb Vasiliy G Tolstov:

> Hello. May be this question already answered, but i can't see it in list
> archive.
> 
> I'm running about 60 Xen nodes with about 7-20 virtual machines under
> it. I want to gather disk,cpu,memory,network utilisation from virtual
> machines and get it into database for later processing.
> 
> As i see, my architecture like this - One or two master servers with mpi
> process with rank 0, that can insert data into database. This master
> servers spawns on each Xen node mpi process, that gather statistics from
> virtual machines on that node and send it to masters (may be with
> multicast request). On each virtual machine i have process (mpi) that
> can get and send data to mpi process on each Xen node. Virtual machine
> have ability to migrate on other Xen node

do you want just to monitor the physical and virtual machines by an application 
running under MPI? It sounds like it could be done by Ganglia or Nagios then.

-- Reuti


> Please, Can You help me with architecture of this system (is my thoughts
> right) ?
> And one more qeustion - that is the best way, to attach mpi process to
> already running group? (for example, virtual machine is rebooted, or may
> be Xen node rebooted)
> 
> Than You for any answers...
> 
> -- 
> Vasiliy G Tolstov 
> Selfip.Ru
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Vasiliy G Tolstov
On Fri, 2010-10-22 at 14:07 +0200, Reuti wrote:
> Hi,
> 
> Am 22.10.2010 um 10:58 schrieb Vasiliy G Tolstov:
> 
> > Hello. May be this question already answered, but i can't see it in list
> > archive.
> > 
> > I'm running about 60 Xen nodes with about 7-20 virtual machines under
> > it. I want to gather disk,cpu,memory,network utilisation from virtual
> > machines and get it into database for later processing.
> > 
> > As i see, my architecture like this - One or two master servers with mpi
> > process with rank 0, that can insert data into database. This master
> > servers spawns on each Xen node mpi process, that gather statistics from
> > virtual machines on that node and send it to masters (may be with
> > multicast request). On each virtual machine i have process (mpi) that
> > can get and send data to mpi process on each Xen node. Virtual machine
> > have ability to migrate on other Xen node
> 
> do you want just to monitor the physical and virtual machines by an 
> application running under MPI? It sounds like it could be done by Ganglia or 
> Nagios then.

No.. I want to get realtime data to decide what virtual machine i need
to migrate to other Xen, becouse it need more resources.


-- 
Vasiliy G Tolstov 
Selfip.Ru



Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-22 Thread Scott Atchley
On Oct 20, 2010, at 9:43 PM, Raymond Muno wrote:

> On 10/20/2010 8:30 PM, Scott Atchley wrote
>> Are you building OMPI with support for both MX and IB? If not and you only 
>> want MX support, try configuring OMPI using --disable-memory-manager (check 
>> configure for the exact option).
>> 
>> We have fixed this bug in the most recent 1.4.x and 1.5.x releases.
>> 
>> Scott
> 
> Hmmm, not sure which configure option you want me to try.
> 
> $ ./configure --help | grep memory
>  --enable-mem-debug  enable memory debugging (debugging only) (default:
>  --enable-mem-profileenable memory profiling (debugging only) (default:
>  --enable-memchecker Enable memory and buffer checks. Note that disabling
>  --with-memory-manager=TYPE
>  Use TYPE for intercepting memory management calls to
>  control memory pinning.

Use --without-memory-manager

Scott


Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-22 Thread Scott Atchley
Ray,

Looking back at your original message, you say that it works if you use the 
Myricom supplied mpirun from the Myrinet roll. I wonder if this is a mismatch 
between libraries on the compute nodes.

What do you get if you use your OMPI's mpirun with:

$ mpirun -n 1 -H  ldd $PWD/

I am wondering if ldd find the libraries from your compile or the Myrinet roll.

Scott

On Oct 21, 2010, at 10:39 AM, Raymond Muno wrote:

> On 10/20/2010 8:30 PM, Scott Atchley wrote:
>> We have fixed this bug in the most recent 1.4.x and 1.5.x releases.
>> 
>> Scott
> OK, a few more tests.  I was using PGI 10.4 as the compiler.
> 
> I have now tried OpenMPI 1.4.3 with PGI 10.8 and Intel 11.1.  I get the same 
> results in each case, mpirun seg faults. (I really did not expect that to 
> change anything).
> 
> I tried OpenMPI 1.5.  Under PGI, I could not get it to compile.   With Intel 
> 11.1, it compiles. When I try to run a simple test, mpirun just seems to hang 
> and I never see anything start on the nodes.  I would rather stick with 1.4.x 
> for now since that is what we are running on our other production cluster.  I 
> will leave this for a later day.
> 
> I grabbed the 1.4.3 version from this page.
> 
> http://www.open-mpi.org/software/ompi/v1.4/
> 
> When you say this bug is fixed in recent  1.4.x releases,  should I try one 
> from here?
> 
> http://www.open-mpi.org/nightly/v1.4/
> 
> For grins, I compiled the OpenMPI 1.4.1 tree.  This what Myricom supplied 
> with the MX roll. Same result.  I can still run with their compiled version 
> of mpirun, even when I compile with the other build trees and compilers.  I 
> just do not know what options they compiled with.
> 
> Any insight would be appreciated.
> 
> -Ray Muno
> University of Minnesota
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Fix the use of hostfiles when a username is supplied in v1.5 ?

2010-10-22 Thread Olivier Riff
Hello,

There was a bug in the use of hostfiles when a username is supplied which
has been fixed in OpenMPI v1.4.2.
I have just installed the v1.5 and the bug seems to come out again : only
the first username provided in the machinefile is taken into account.

See mails below for the history.

My configuration :
OpenMPI 1.5
Linux Mandriva 2008 x86_64 and Linux RHE x86_64
machinefile example :
or985966@is209898 slots=1
realtime@is206022 slots=8
realtime@is206025 slots=8

Best regards,

Olivier

-- Forwarded message --
From: Ralph Castain 
List-Post: users@lists.open-mpi.org
Date: 2010/3/11
Subject: Re: [OMPI users] OPenMpi: How to specify login name in machinefile
passed to mpirun
To: Open MPI Users 


Yeah, it was a bug in the parser - fix scheduled for 1.4.2 release.

Thanks!
Ralph

On Mar 11, 2010, at 4:32 AM, Olivier Riff wrote:

Hello Ralph,

Thanks for you quick reply.
Sorry I did not mention the version : it is the v1.4 (which indeed is not
the very last one).
I will appreciate if you could make a short test.

Thanks and Regards,

Olivier

2010/3/10 Ralph Castain 

> Probably a bug - I don't recall if/when anyone actually tested that code
> path. I'll have a look...probably in the hostfile parser.
>
> What version are you using?
>
> On Mar 10, 2010, at 8:26 AM, Olivier Riff wrote:
>
> Oops sorry I made the test too fast: it still does not work properly with
> several logins:
>
> I start on user1's machine:
> mpirun -np 2 --machinefile machinefile.txt MyProgram
>
> with machinefile:
> user1@machine1 slots=1
> user2@machine2 slots=1
>
> and I got :
> user1@machine2 password prompt ?! (there is no user1 account on
> machine2...)
>
> My problem is still open... why is there a connection attempt to machine2
> with user1 ...
> Has somebody an explanation ?
>
> Thanks,
>
> Olivier
>
>
> 2010/3/10 Olivier Riff 
>
>> OK, it works now thanks. I forgot to add the slots information in the
>> machinefile.
>>
>> Cheers,
>>
>> Olivier
>>
>>
>>
>> 2010/3/10 Ralph Castain 
>>
>> It is the exact same syntax inside of the machinefile:
>>>
>>> user1@machine1 slots=4
>>> user2@machine2 slots=3
>>> 
>>>
>>>
>>> On Mar 10, 2010, at 5:41 AM, Olivier Riff wrote:
>>>
>>> > Hello,
>>> >
>>> > I am using openmpi on several machines which have different user
>>> accounts and I cannot find a way to specify the login for each machine in
>>> the machinefile passed to mpirun.
>>> > The only solution I found is to use the -host argument of mpirun, such
>>> as:
>>> > mpirun -np 2 --host user1@machine1,user2@machine2 MyProgram
>>> > which is very inconvenient with a lot of machines.
>>> >
>>> > Is there a way to do the same using a machinefile text? :
>>> > mpirun -np 2 -machinefile machinefile.txt MyProgram
>>> >
>>> > I cannot find the appropriate syntax for specifying a user in
>>> machinefile.txt...
>>> >
>>> > Thanks in advance,
>>> >
>>> > Olivier
>>> >
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Ralph Castain
MPI won't do this - if a node dies, the entire MPI job is terminated.

Take a look at OpenRCM, a subproject of Open MPI:

http://www.open-mpi.org/projects/orcm/

This is designed to do what you describe as we have a similar (open source) 
project underway at Cisco. If I were writing your system, I would:

(a) add my sensors to the orte/mca/sensor framework. You'll find that we 
already monitor memory usage, for example. Use the orte/mca/db framework to 
store your data in a database. Several different databases are already 
supported, though it is easy to add another if you want (e.g., sqlite support).

(b) add my desired error response to the src/orte/mca/errmgr/orcm module. The 
ability to migrate processes is already implemented, but you may need to do 
something additional to migrate a VM. If you prefer, you can create your own 
module in that area and use one of the other components as an example.

Then let orcm start its daemons across your nodes. Orcm daemons will do the 
monitoring and reporting for you, and will start and monitor the virtual 
machines. If you set the max local restarts to 0, and max global restarts to 
some number, the system will automatically migrate any failures to other nodes.

See the June 2010 presentation under "Publications" on the web page above for 
an overview of how it all works. If you decide to go this route, I'll be happy 
to provide advice and further explanation. And of course, you are welcome to 
participate in ORCM if you choose.

Ralph

On Oct 22, 2010, at 6:09 AM, Vasiliy G Tolstov wrote:

> On Fri, 2010-10-22 at 14:07 +0200, Reuti wrote:
>> Hi,
>> 
>> Am 22.10.2010 um 10:58 schrieb Vasiliy G Tolstov:
>> 
>>> Hello. May be this question already answered, but i can't see it in list
>>> archive.
>>> 
>>> I'm running about 60 Xen nodes with about 7-20 virtual machines under
>>> it. I want to gather disk,cpu,memory,network utilisation from virtual
>>> machines and get it into database for later processing.
>>> 
>>> As i see, my architecture like this - One or two master servers with mpi
>>> process with rank 0, that can insert data into database. This master
>>> servers spawns on each Xen node mpi process, that gather statistics from
>>> virtual machines on that node and send it to masters (may be with
>>> multicast request). On each virtual machine i have process (mpi) that
>>> can get and send data to mpi process on each Xen node. Virtual machine
>>> have ability to migrate on other Xen node
>> 
>> do you want just to monitor the physical and virtual machines by an 
>> application running under MPI? It sounds like it could be done by Ganglia or 
>> Nagios then.
> 
> No.. I want to get realtime data to decide what virtual machine i need
> to migrate to other Xen, becouse it need more resources.
> 
> 
> -- 
> Vasiliy G Tolstov 
> Selfip.Ru
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Fix the use of hostfiles when a username is supplied in v1.5 ?

2010-10-22 Thread Ralph Castain
Well that stinks. I'll take care of it - sorry about that. Guess a patch didn't 
come across at some point.


On Oct 22, 2010, at 6:55 AM, Olivier Riff wrote:

> Hello,
> 
> There was a bug in the use of hostfiles when a username is supplied which has 
> been fixed in OpenMPI v1.4.2.
> I have just installed the v1.5 and the bug seems to come out again : only the 
> first username provided in the machinefile is taken into account.
> 
> See mails below for the history.
> 
> My configuration : 
> OpenMPI 1.5
> Linux Mandriva 2008 x86_64 and Linux RHE x86_64
> machinefile example : 
> or985966@is209898 slots=1
> realtime@is206022 slots=8
> realtime@is206025 slots=8
> 
> Best regards,
> 
> Olivier
> 
> -- Forwarded message --
> From: Ralph Castain 
> Date: 2010/3/11
> Subject: Re: [OMPI users] OPenMpi: How to specify login name in machinefile 
> passed to mpirun
> To: Open MPI Users 
> 
> 
> Yeah, it was a bug in the parser - fix scheduled for 1.4.2 release.
> 
> Thanks!
> Ralph
> 
> On Mar 11, 2010, at 4:32 AM, Olivier Riff wrote:
> 
>> Hello Ralph,
>> 
>> Thanks for you quick reply.
>> Sorry I did not mention the version : it is the v1.4 (which indeed is not 
>> the very last one).
>> I will appreciate if you could make a short test.
>> 
>> Thanks and Regards,
>> 
>> Olivier
>> 
>> 2010/3/10 Ralph Castain 
>> Probably a bug - I don't recall if/when anyone actually tested that code 
>> path. I'll have a look...probably in the hostfile parser.
>> 
>> What version are you using?
>> 
>> On Mar 10, 2010, at 8:26 AM, Olivier Riff wrote:
>> 
>>> Oops sorry I made the test too fast: it still does not work properly with 
>>> several logins:
>>> 
>>> I start on user1's machine: 
>>> mpirun -np 2 --machinefile machinefile.txt MyProgram
>>> 
>>> with machinefile:
>>> user1@machine1 slots=1
>>> user2@machine2 slots=1
>>> 
>>> and I got :
>>> user1@machine2 password prompt ?! (there is no user1 account on machine2...)
>>> 
>>> My problem is still open... why is there a connection attempt to machine2 
>>> with user1 ...
>>> Has somebody an explanation ?
>>> 
>>> Thanks,
>>> 
>>> Olivier
>>> 
>>> 
>>> 2010/3/10 Olivier Riff 
>>> OK, it works now thanks. I forgot to add the slots information in the 
>>> machinefile.
>>> 
>>> Cheers,
>>> 
>>> Olivier
>>> 
>>> 
>>> 
>>> 2010/3/10 Ralph Castain 
>>> 
>>> It is the exact same syntax inside of the machinefile:
>>> 
>>> user1@machine1 slots=4
>>> user2@machine2 slots=3
>>> 
>>> 
>>> 
>>> On Mar 10, 2010, at 5:41 AM, Olivier Riff wrote:
>>> 
>>> > Hello,
>>> >
>>> > I am using openmpi on several machines which have different user accounts 
>>> > and I cannot find a way to specify the login for each machine in the 
>>> > machinefile passed to mpirun.
>>> > The only solution I found is to use the -host argument of mpirun, such as:
>>> > mpirun -np 2 --host user1@machine1,user2@machine2 MyProgram
>>> > which is very inconvenient with a lot of machines.
>>> >
>>> > Is there a way to do the same using a machinefile text? :
>>> > mpirun -np 2 -machinefile machinefile.txt MyProgram
>>> >
>>> > I cannot find the appropriate syntax for specifying a user in 
>>> > machinefile.txt...
>>> >
>>> > Thanks in advance,
>>> >
>>> > Olivier
>>> >
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Vasiliy G Tolstov
On Fri, 2010-10-22 at 07:36 -0600, Ralph Castain wrote:
> MPI won't do this - if a node dies, the entire MPI job is terminated.
> 
> 
> Take a look at OpenRCM, a subproject of Open MPI:
> 
> 
> http://www.open-mpi.org/projects/orcm/
> 
> 
> This is designed to do what you describe as we have a similar (open
> source) project underway at Cisco. If I were writing your system, I
> would:
> 
> 
> (a) add my sensors to the orte/mca/sensor framework. You'll find that
> we already monitor memory usage, for example. Use the orte/mca/db
> framework to store your data in a database. Several different
> databases are already supported, though it is easy to add another if
> you want (e.g., sqlite support).
> 
> 
> (b) add my desired error response to the src/orte/mca/errmgr/orcm
> module. The ability to migrate processes is already implemented, but
> you may need to do something additional to migrate a VM. If you
> prefer, you can create your own module in that area and use one of the
> other components as an example.
> 
> 
> Then let orcm start its daemons across your nodes. Orcm daemons will
> do the monitoring and reporting for you, and will start and monitor
> the virtual machines. If you set the max local restarts to 0, and max
> global restarts to some number, the system will automatically migrate
> any failures to other nodes.
> 
> 
> See the June 2010 presentation under "Publications" on the web page
> above for an overview of how it all works. If you decide to go this
> route, I'll be happy to provide advice and further explanation. And of
> course, you are welcome to participate in ORCM if you choose.
> 


Thank You very much. I think this is very useful for me. Can You provide
me link to presentation (i can't see it under
http://www.open-mpi.org/papers/)

And can You send me very simple example, how can i use ORCM.. (may be i
can get some useful information by reading
http://svn.open-mpi.org/svn/orcm/trunk/test...)

Does ORCM have man pages for functions like openmpi?

-- 
Vasiliy G Tolstov 
Selfip.Ru



Re: [OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Reuti
Am 22.10.2010 um 14:09 schrieb Vasiliy G Tolstov:

> On Fri, 2010-10-22 at 14:07 +0200, Reuti wrote:
>> Hi,
>> 
>> Am 22.10.2010 um 10:58 schrieb Vasiliy G Tolstov:
>> 
>>> Hello. May be this question already answered, but i can't see it in list
>>> archive.
>>> 
>>> I'm running about 60 Xen nodes with about 7-20 virtual machines under
>>> it. I want to gather disk,cpu,memory,network utilisation from virtual
>>> machines and get it into database for later processing.
>>> 
>>> As i see, my architecture like this - One or two master servers with mpi
>>> process with rank 0, that can insert data into database. This master
>>> servers spawns on each Xen node mpi process, that gather statistics from
>>> virtual machines on that node and send it to masters (may be with
>>> multicast request). On each virtual machine i have process (mpi) that
>>> can get and send data to mpi process on each Xen node. Virtual machine
>>> have ability to migrate on other Xen node
>> 
>> do you want just to monitor the physical and virtual machines by an 
>> application running under MPI? It sounds like it could be done by Ganglia or 
>> Nagios then.
> 
> No.. I want to get realtime data to decide what virtual machine i need
> to migrate to other Xen, becouse it need more resources.

This is indeed an interesting field, as it was a couple of times also on the 
SGE Gridengine mailing list: how to handle jobs with varying resource requests 
over their lifetime, and how should they signal it (or provide it already in 
the `qsub` command) to the queuing system, that they now have to move to 
another bigger node (or could be moved to a smaller node with less resources).

-- Reuti


Re: [OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Vasiliy G Tolstov
On Fri, 2010-10-22 at 16:04 +0200, Reuti wrote:
> Am 22.10.2010 um 14:09 schrieb Vasiliy G Tolstov:
> 
> > On Fri, 2010-10-22 at 14:07 +0200, Reuti wrote:
> >> Hi,
> >> 
> >> Am 22.10.2010 um 10:58 schrieb Vasiliy G Tolstov:
> >> 
> >>> Hello. May be this question already answered, but i can't see it in list
> >>> archive.
> >>> 
> >>> I'm running about 60 Xen nodes with about 7-20 virtual machines under
> >>> it. I want to gather disk,cpu,memory,network utilisation from virtual
> >>> machines and get it into database for later processing.
> >>> 
> >>> As i see, my architecture like this - One or two master servers with mpi
> >>> process with rank 0, that can insert data into database. This master
> >>> servers spawns on each Xen node mpi process, that gather statistics from
> >>> virtual machines on that node and send it to masters (may be with
> >>> multicast request). On each virtual machine i have process (mpi) that
> >>> can get and send data to mpi process on each Xen node. Virtual machine
> >>> have ability to migrate on other Xen node
> >> 
> >> do you want just to monitor the physical and virtual machines by an 
> >> application running under MPI? It sounds like it could be done by Ganglia 
> >> or Nagios then.
> > 
> > No.. I want to get realtime data to decide what virtual machine i need
> > to migrate to other Xen, becouse it need more resources.
> 
> This is indeed an interesting field, as it was a couple of times also on the 
> SGE Gridengine mailing list: how to handle jobs with varying resource 
> requests over their lifetime, and how should they signal it (or provide it 
> already in the `qsub` command) to the queuing system, that they now have to 
> move to another bigger node (or could be moved to a smaller node with less 
> resources).
> 
> -- Reuti

Very interesting. Thank You for suggestion.
-- 
Vasiliy G Tolstov 
Selfip.Ru



[OMPI users] cannot build Open MPI-1.5 on Solaris x86 with Sun C 5.9

2010-10-22 Thread Siegmar Gross
Hi,

I tried to build Open MPI 1.5 on SunOS x86_64 with the Oracle/Sun
Studio C compiler and gcc-4.2.0 in 32- and 64-bit mode. I couldn't
built the package with Oracle/Sun C 5.9 in 32-bit mode with thread
support.

sunpc4 openmpi-1.5-SunOS.x86_64.32_cc 110 tail -15
  log.make.SunOS.x86_64.32_cc 
make[3]: Leaving directory `/.../ompi/include'
make[2]: Leaving directory `/.../ompi/include'
Making all in datatype
make[2]: Entering directory `/.../ompi/datatype'
  CC ompi_datatype_args.lo
"ompi_datatype_args.c", [ompi_datatype_set_args]:ube: error:
  Unsupported constraint 'y' in GASM Inlining
"ompi_datatype_args.c", [ompi_datatype_set_args]:ube: error:
  Unsupported constraint 'y' in GASM Inlining
"ompi_datatype_args.c", [ompi_datatype_set_args]:ube: error:
  Unsupported constraint 'y' in GASM Inlining
"ompi_datatype_args.c", [ompi_datatype_set_args]:ube: error:
  Unsupported constraint 'y' in GASM Inlining
cc: ube failed for ../../../openmpi-1.5/ompi/datatype/ompi_datatype_args.c
make[2]: *** [ompi_datatype_args.lo] Error 1
make[2]: Leaving directory `/.../ompi/datatype'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `.../ompi'
make: *** [all-recursive] Error 1
sunpc4 openmpi-1.5-SunOS.x86_64.32_cc 111 


Does anybody know how to solve the problem? Thank you very much
for any suggestions in advance. I could built the package without
thread support. "make check" reports the following warnings and
when I run any program it blocks without any output.

sunpc4 openmpi-1.5-SunOS.x86_64.32_cc 27 grep -i warning:
  log.make-check.SunOS.x86_64.32_cc
"../../../openmpi-1.5/test/asm/atomic_barrier_noinline.c",
  line 43: warning: static function called but not defined:
  opal_atomic_sub_32()
"../../../openmpi-1.5/test/asm/atomic_barrier_noinline.c",
  line 43: warning: static function called but not defined:
  opal_atomic_add_32()
"../../../openmpi-1.5/test/asm/atomic_spinlock_noinline.c",
  line 135: warning: static function called but not defined:
  opal_atomic_sub_32()
"../../../openmpi-1.5/test/asm/atomic_spinlock_noinline.c",
  line 135: warning: static function called but not defined:
  opal_atomic_add_32()
"../../../openmpi-1.5/test/asm/atomic_math_noinline.c",
  line 147: warning: static function called but not defined:
  opal_atomic_sub_32()
"../../../openmpi-1.5/test/asm/atomic_math_noinline.c",
  line 147: warning: static function called but not defined:
  opal_atomic_add_32()
"../../../openmpi-1.5/test/asm/atomic_cmpset_noinline.c",
  line 291: warning: static function called but not defined:
  opal_atomic_sub_32()
"../../../openmpi-1.5/test/asm/atomic_cmpset_noinline.c",
  line 291: warning: static function called but not defined:
  opal_atomic_add_32()


Some tests didn't pass.

sunpc4 openmpi-1.5-SunOS.x86_64.32_cc 30 grep FAIL
  log.make-check.SunOS.x86_64.32_cc 
FAIL: atomic_cmpset
sunpc4 openmpi-1.5-SunOS.x86_64.32_cc 31 grep SKIP
  log.make-check.SunOS.x86_64.32_cc
SKIP: atomic_spinlock
SKIP: atomic_spinlock_noinline
sunpc4 openmpi-1.5-SunOS.x86_64.32_cc 32 grep PASS
  log.make-check.SunOS.x86_64.32_cc
PASS: predefined_gap_test
PASS: dlopen_test
PASS: atomic_barrier
PASS: atomic_barrier_noinline
PASS: atomic_math
PASS: atomic_math_noinline
PASS: atomic_cmpset_noinline


One test results in a segmentation fault.

sunpc4 openmpi-1.5-SunOS.x86_64.32_cc 34 tail -40
  log.make-check.SunOS.x86_64.32_cc
- 8 threads: Skipped
PASS: atomic_math
--> Testing atomic_math_noinline
- 1 threads: Passed
- 2 threads: Skipped
- 4 threads: Skipped
- 5 threads: Skipped
- 8 threads: Skipped
PASS: atomic_math_noinline
--> Testing atomic_cmpset
../../../openmpi-1.5/test/asm/run_tests: line 8: 14573 Segmentation Fault
$* $threads
- 1 threads: Failed
../../../openmpi-1.5/test/asm/run_tests: line 8: 14574 Segmentation Fault
$* $threads
- 2 threads: Failed
../../../openmpi-1.5/test/asm/run_tests: line 8: 14575 Segmentation Fault
$* $threads
- 4 threads: Failed
../../../openmpi-1.5/test/asm/run_tests: line 8: 14576 Segmentation Fault
$* $threads
- 5 threads: Failed
../../../openmpi-1.5/test/asm/run_tests: line 8: 14577 Segmentation Fault
$* $threads
- 8 threads: Failed
FAIL: atomic_cmpset
--> Testing atomic_cmpset_noinline
- 1 threads: Passed
- 2 threads: Passed
- 4 threads: Passed
- 5 threads: Passed
- 8 threads: Passed
PASS: atomic_cmpset_noinline

1 of 6 tests failed
(2 tests were not run)
Please report to http://www.open-mpi.org/community/help/

make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory `/.../test/asm'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/.../test/asm'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory `/.../test'
make: *** [check-recursive] Error 1
sunpc4 openmpi-1.5-SunOS.x86_64.32_cc 35 




I add a short summary about my successes and failures. "ok" m

[OMPI users] OPEN MPI data transfer error

2010-10-22 Thread Jack Bryan

Hi, 
I am using open MPI to transfer data between nodes. 
But the received data is not what the data sender sends out . 
I have tried C and C++ binding . 
data sender:double* sendArray = new double[sendResultVec.size()];
for (int ii =0 ; ii < sendResultVec.size() ; ii++)  {   
sendArray[ii] = sendResultVec[ii];  }
MPI::COMM_WORLD.Send(sendArray, sendResultVec.size(), MPI_DOUBLE, 0, 
myworkerUpStreamTaskTag);  data receiver:  double* recvArray = new 
double[objSize];
mToMasterT1Req = MPI::COMM_WORLD.Irecv(recvArray, objSize, MPI_DOUBLE, 
destRank, myUpStreamTaskTag);

The sendResultVec.size() = objSize. 

What is the possible reason ? 

Any help is appreciated. 
thanks
jack
Oct. 22 1010  

Re: [OMPI users] OPEN MPI data transfer error

2010-10-22 Thread Jeff Squyres
It doesn't look like you have completed the request that came back from Irecv.  
You need to TEST or WAIT on requests before they are actually completed (e.g., 
in the case of a receive, the data won't be guaranteed to be in the target 
buffer until TEST/WAIT indicates that the request has completed).



On Oct 22, 2010, at 3:19 PM, Jack Bryan wrote:

> Hi, 
> 
> I am using open MPI to transfer data between nodes. 
> 
> But the received data is not what the data sender sends out . 
> 
> I have tried C and C++ binding . 
> 
> data sender: 
>   double* sendArray = new double[sendResultVec.size()];
> 
>   for (int ii =0 ; ii < sendResultVec.size() ; ii++)
>   {
>   sendArray[ii] = sendResultVec[ii];
>   }
> 
>   MPI::COMM_WORLD.Send(sendArray, sendResultVec.size(), MPI_DOUBLE, 0, 
> myworkerUpStreamTaskTag);  
>   
> data receiver: 
>   double* recvArray = new double[objSize];
> 
>   mToMasterT1Req = MPI::COMM_WORLD.Irecv(recvArray, objSize, MPI_DOUBLE, 
> destRank, myUpStreamTaskTag);
> 
> 
> The sendResultVec.size() = objSize. 
> 
> 
> What is the possible reason ? 
> 
> 
> Any help is appreciated. 
> 
> thanks
> 
> jack
> 
> Oct. 22 1010
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] AUTO: Richard Treumann/Poughkeepsie/IBM is out of the office until 01/02/2001. (returning 11/01/2010)

2010-10-22 Thread Richard Treumann

I am out of the office until 11/01/2010.

I will be out of the office on vacation the last week of Oct. Back Nov 1.
I will not see any email.


Note: This is an automated response to your message  "[OMPI users] OPEN MPI
data transfer error" sent on 10/22/10 15:19:05.

This is the only notification you will receive while this person is away.



Re: [OMPI users] OPEN MPI data transfer error

2010-10-22 Thread Jack Bryan

Hi, 
I have used mpi_waitall() to do it. 
The data can be received but the contents are wrong.
Any help is appreciated. 
thanks

> From: jsquy...@cisco.com
> Date: Fri, 22 Oct 2010 15:35:11 -0400
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] OPEN MPI data transfer error
> 
> It doesn't look like you have completed the request that came back from 
> Irecv.  You need to TEST or WAIT on requests before they are actually 
> completed (e.g., in the case of a receive, the data won't be guaranteed to be 
> in the target buffer until TEST/WAIT indicates that the request has 
> completed).
> 
> 
> 
> On Oct 22, 2010, at 3:19 PM, Jack Bryan wrote:
> 
> > Hi, 
> > 
> > I am using open MPI to transfer data between nodes. 
> > 
> > But the received data is not what the data sender sends out . 
> > 
> > I have tried C and C++ binding . 
> > 
> > data sender: 
> > double* sendArray = new double[sendResultVec.size()];
> > 
> > for (int ii =0 ; ii < sendResultVec.size() ; ii++)
> > {
> > sendArray[ii] = sendResultVec[ii];
> > }
> > 
> > MPI::COMM_WORLD.Send(sendArray, sendResultVec.size(), MPI_DOUBLE, 0, 
> > myworkerUpStreamTaskTag);  
> > 
> > data receiver: 
> > double* recvArray = new double[objSize];
> > 
> > mToMasterT1Req = MPI::COMM_WORLD.Irecv(recvArray, objSize, MPI_DOUBLE, 
> > destRank, myUpStreamTaskTag);
> > 
> > 
> > The sendResultVec.size() = objSize. 
> > 
> > 
> > What is the possible reason ? 
> > 
> > 
> > Any help is appreciated. 
> > 
> > thanks
> > 
> > jack
> > 
> > Oct. 22 1010
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
  

Re: [OMPI users] OPEN MPI data transfer error

2010-10-22 Thread Jeff Squyres
On Oct 22, 2010, at 5:36 PM, Jack Bryan wrote:

> I have used mpi_waitall() to do it. 
> 
> The data can be received but the contents are wrong.

Can you send a more accurate code snipit, and/or the code that you are using to 
check whether the data is right/wrong?  

I ask because I'm a little suspect of what you sent already (e.g., you didn't 
include the waitall, which is kinda important :-) ).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] OPEN MPI data transfer error

2010-10-22 Thread David Zhang
Did you use the waitall on the sender or the receiver side? I noticed you
didn't have the request variable at the receiver side that is needed in the
waitall.

On Fri, Oct 22, 2010 at 2:48 PM, Jeff Squyres  wrote:

> On Oct 22, 2010, at 5:36 PM, Jack Bryan wrote:
>
> > I have used mpi_waitall() to do it.
> >
> > The data can be received but the contents are wrong.
>
> Can you send a more accurate code snipit, and/or the code that you are
> using to check whether the data is right/wrong?
>
> I ask because I'm a little suspect of what you sent already (e.g., you
> didn't include the waitall, which is kinda important :-) ).
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Zhang
University of California, San Diego


[OMPI users] Running simple MPI program

2010-10-22 Thread Brandon Fulcher
Hi, I am completely new to MPI and am having trouble running a job between
two  cpus.

The same thing happens no matter what MPI job I try to run, but here is a
simple 'hello world' style program I am trying to run.

#include 
#include 

int main(int argc, char **argv)
{
  int *buf, i, rank, nints, len;
  char hostname[256];

  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  gethostname(hostname,255);
  printf("Hello world!  I am process number: %d on host %s\n", rank,
hostname);
  MPI_Finalize();
  return 0;
}


On either CPU, I can successfully compile and run, but when trying to run
the program using two CPUS it fails with this output:

--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--


With no additional information or errors,  What can I do to go about finding
out what is wrong?



I have read the FAQ and followed the instructions.  I can ssh into the slave
without entering a password and have the libraries installed on both
machines.

The only thing pertinent I could find is this faq
http://www.open-mpi.org/faq/?category=running#missing-prereqs  but I do not
know if it applies since I have installed open mpi from the Ubuntu
repositories and assume the libraries are correctly set.