[OMPI users] Mi Yan is out of the office.

2008-12-21 Thread Mi Yan

I will be out of the office starting  12/21/2008 and will not return until
01/02/2009.


Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C++)

2008-11-01 Thread Mi Yan

So your tests show:
1.  "Shared library in FORTRAN   +   MPI executable in FORTRAN" works.
2. "Shared library in C++   + MPI executable in FORTRAN " does not work.

It seems to me that the symbols in  C library are not really recognized by
FORTRAN executable as you thought.What compilers  did yo use to built
OpenMPI?

 Different compiler has different convention to handle symbols.   E.g.  if
there is a variable "var_foo"  in your FORTRAN code,  some FORTRN compiler
will save "var_foo_"  in the object file by default;  if you want to access
"var_foo"  in C code, you actually need to refer "var_foo_"  in C code.
If you define "var_foo" in a module in the FORTAN compiler,  some FORTRAN
compiler may append the module name to "var_foo".
So I suggest to check the symbols in the object files generated by your
FORTAN and C compiler to see the difference.

Mi


   
 "Rajesh Ramaya"   
To
 Sent by:  "'Open MPI Users'"  
 users-bounces@ope , "'Jeff
 n-mpi.org Squyres'"   
cc
   
 10/31/2008 03:07  Subject
 PMRe: [OMPI users] MPI + Mixed
   language coding(Fortran90 + C++)
   
 Please respond to 
  Open MPI Users   
 
   
   




Hello Jeff Squyres,
   Thank you very much for the immediate reply. I am able to successfully
access the data from the common block but the values are zero. In my
algorithm I even update a common block but the update made by the shared
library is not taken in to account by the executable. Can you please be
very
specific how to make the parallel algorithm aware of the data? Actually I
am
not writing any MPI code inside? It's the executable (third party software)
who does that part. All that I am doing is to compile my code with MPI c
compiler and add it in the LD_LIBIRARY_PATH.
In fact I did a simple test by creating a shared library using a FORTRAN
code and the update made to the common block is taken in to account by the
executable. Is there any flag or pragma that need to be activated for mixed
language MPI?
Thank you once again for the reply.

Rajesh

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: vendredi 31 octobre 2008 18:53
To: Open MPI Users
Subject: Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C++)

On Oct 31, 2008, at 11:57 AM, Rajesh Ramaya wrote:

> I am completely new to MPI. I have a basic question concerning
> MPI and mixed language coding. I hope any of you could help me out.
> Is it possible to access FORTRAN common blocks in C++ in a MPI
> compiled code. It works without MPI but as soon I switch to MPI the
> access of common block does not work anymore.
> I have a Linux MPI executable which loads a shared library at
> runtime and resolves all undefined symbols etc  The shared library
> is written in C++ and the MPI executable in written in FORTRAN. Some
> of the input that the shared library looking for are in the Fortran
> common blocks. As I access those common blocks during runtime the
> values are not  initialized.  I would like to know if what I am
> doing is possible ?I hope that my problem is clear..


Generally, MPI should not get in the way of sharing common blocks
between Fortran and C/C++.  Indeed, in Open MPI itself, we share a few
common blocks between Fortran and the main C Open MPI implementation.

What is the exact symptom that you are seeing?  Is the application
failing to resolve symbols at run-time, possibly indicating that
something hasn't instantiated a common block?  Or are you able to
successfully access the data from the common block, but it doesn't
have the values you expect (e.g., perhaps you're seeing all zeros)?

If the former, you might want to check your build procedure.  You
*should* be able to simply replace your C++ / F90 compilers with
mpicxx and mpif90, respectively, and be able to build an MPI version
of your app.  If the latter, you might need to make your parallel
algorithm aware of what data is available in which MPI process --
perhaps not all the data is filled in on each MPI process...?

--
Jeff Squyres
Cisco Systems


_

Re: [OMPI users] problem running Open MPI on Cells

2008-10-31 Thread Mi Yan

Where did you put the environment variable related to  MCF licence file and
MCF share libraries?
What is your default shell?

Did you test  indicate the following?
Suppose you have 4 nodes,
on node 1,  " mpirun -np 4 --host  node1,node2,node3,node4 hostname" works,
but "mpirun -np4 --host node1,node2,node3,node4  foocbe"  does not work,
where foocbe is executable generated with MCF.

 It is possible that  MCF license is limited to a few concurrent use?  e.g.
the license is limited to 4 current use,  and mpi application  will fails
on 8 nodes?

Regards,
Mi


   
 Hahn Kim  
   
 Sent by:   To
 users-bounces@ope Open MPI Users 
 n-mpi.org  cc
   
   Subject
 10/31/2008 03:38  [OMPI users] problem running Open
 PMMPI on Cells
   
   
 Please respond to 
  Open MPI Users   
 
   
   




Hello,

I'm having problems using Open MPI on a cluster of Mercury Computer's
Cell Accelerator Boards (CABs).

We have an MPI application that is running on multiple CABs.  The
application uses Mercury's MultiCore Framework (MCF) to use the Cell's
SPEs.  Here's the basic problem.  I can log into each CAB and run the
application in serial directly from the command line (i.e. without
using mpirun) without a problem.  I can also launch a serial job onto
each CAB from another machine using mpirun without a problem.

The problem occurs when I try to launch onto multiple CABs using
mpirun.  MCF requires a license file.  After the application
initializes MPI, it tries to initialized MCF on each node.  The
initialization routine loads the MCF license file and checks for valid
license keys.  If the keys are valid, then it continues to initialize
MCF.  If not, it throws an error.

When I run on multiple CABs, most of the time several of the CABs
throw an error saying MCF cannot find a valid license key.  The
strange this is that this behavior doesn't appear when I launch serial
jobs using MCF, only multiple CABs.  Additionally, the errors are
inconsistent.  Not all the CABs throw an error, sometimes a few of
them error out, sometimes all of them, sometimes none.

I've talked with the Mercury folks and they're just as stumped as I
am.  The only thing we can think of is that OpenMPI is somehow
modifying the environment and is interfering with MCF, but we can't
think of any reason why.

Any ideas out there?  Thanks.

Hahn

--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Working with a CellBlade cluster

2008-10-31 Thread Mi Yan

Gilbert,

  I did not know the MCA parameters that can monitor the message
passing.  I have tried a few MCA verbose parameters and did not identify
anyone helpful.

 One way to check if the message goes via IB or SM maybe  to check the
counters in /sys/class/infiniband.

Regards,
Mi


   
 Gilbert   
 Grosdidier
   Open MPI Users 
 Sent by:   cc
 users-bounces@ope 
 n-mpi.org Subject
   Re: [OMPI users] Working with a 
   CellBlade cluster   
 10/29/2008 12:36  
 PM
   
   
 Please respond to 
  Open MPI Users   
 
   
   




Thank you very much Mi and Lenny for your detailed replies.

 I believe I can summarize the infos to allow for
'Working with a QS22 CellBlade cluster' like this:
- Yes, messages are efficiently handled with "-mca btl openib,sm,self"
- Better to go to the OMPI-1.3 version ASAP
- It is currently more efficient/easy to use numactl to control
processor affinity on a QS22.

 So far so good.

 One question remains: how could I monitor in details message passing
thru IB (on one side) and thru SM (on the other side) thru the use of mca
parameters, please ? Additionnal info about the verbosity level
of this monitoring will be highly appreciated ... A lengthy travel
inside the list of such parameters provided by ompi_info did not
enlighten me (there are so many xxx_sm_yyy type params that I don't know
which
could be the right one ;-)

 Thanks in advance for your hints,  Best Regards, Gilbert.


On Thu, 23 Oct 2008, Mi Yan wrote:

>
> 1.  MCA BTL parameters
> With "-mca btl openib,self", both message between two Cell processors on
> one QS22 and   messages between two QS22s go through IB.
>
> With "-mca btl openib,sm,slef",  message on one QS22 go through shared
> memory,  message between QS22 go through IB,
>
> Depending on the message size and other MCA parameters,  it does not
> guarantee message passing on shared memory is faster than on IB.   E.g.
> the bandwidth for 64KB message is 959MB/s on shared-memory and is 694MB/s
> on IB;  the bandwidth for 4MB message is 539 MB/s and 1092 MB/s on  IB.
> The bandwidth of 4MB message on shared memory may be higher if you tune
> some MCA parameter.
>
> 2.  mpi_paffinity_alone
>   "mpi_paffinity_alone =1"  is not a good choice for QS22.  There are two
> sockets with two physical  Cell/B.E. on one QS22.  Each Cell/B.E. has two
> SMT threads.   So there are four logical CPUs on one QS22.  CBE Linux
> kernel maps logical cpu 0 and 1 to socket1 and maps logical cpu 1 and 2
to
> socket 2.If mpi_paffinity_alone is set to 1,   the two MPI instances
> will be assigned to logical cpu 0 and cpu 1 on socket 1.  I believe this
is
> not what you want.
>
> A temporaily solution to  force the affinity on  QS22 is to use
> "numactl",   E.g.  assuming the hostname is "qs22" and the executable is
> "foo".  the following command can be used
> mpirun -np 1 -H qs22 numactl -c0 -m0  foo :   -np 1 -H
qs22
> numactl -c1 -m1 foo
>
>In the long run,  I wish CBE kernel export  CPU topology  in /sys  and
> use  PLPA to force the processor affinity.
>
> Best Regards,
> Mi
>
>
>
>
>  "Lenny
>  Verkhovsky"
>@gmail.com>   "Open MPI Users"
>  Sent by:  
>  users-bounces@ope
cc
>  n-mpi.org
>
Subject
>Re: [OMPI users] Working with a
>  10/23/2008 05:48  CellBlade cluster
>  AM
>
>
>  Please respond to
>   Open MPI Users
>   rg>
>
>
>
>
>
>
> Hi,
>
>
> If I understand you correctly the most 

Re: [OMPI users] Working with a CellBlade cluster

2008-10-29 Thread Mi Yan

 "dmesg |grep Node"  on Cell will show :
Node 0: CPUS 0-1
Node 1:  CPUS 2-3
.
Linux on Cell/BE  puts the  CPU-node mapping in /sys/devices/system/node
instead of /sys/devices/system/cpu.

Regards,
Mi


   
 "Lenny
 Verkhovsky"   
"Open MPI Users"
 Sent by:  
 users-bounces@ope  cc
 n-mpi.org 
   Subject
   Re: [OMPI users] Working with a 
 10/27/2008 04:58  CellBlade cluster   
 PM
   
   
 Please respond to 
  Open MPI Users   
 
   
   




can you update me with the mapping or the way to get it from the OS on the
Cell.


thanks



On Thu, Oct 23, 2008 at 8:08 PM, Mi Yan  wrote:.
  Lenny,

  Thanks.
  I asked the Cell/BE Linux Kernel developer to get the CPU mapping :) The
  mapping is fixed in current kernel.

  Mi
  "Lenny Verkhovsky" 




   
 "Lenny
 Verkhovsky"
 Sent by:   To
 users-bounces@op  
 en-mpi.org"Open MPI Users" <
   us...@open-mpi.org>
   
 10/23/2008 01:52   cc
 PM
   
   Subject
 Please respond to 
   Open MPI UsersRe: [OMPI users]
   Working with a  
   CellBlade cluster
   
   
   
   
   
   



  According to
  https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 very soon,
  but you can download trunk version http://www.open-mpi.org/svn/ and check
  if it works for you.

  how can you check mapping CPUs by OS , my cat /proc/cpuinfo shows very
  little info
  # cat /proc/cpuinfo
  processor : 0
  cpu : Cell Broadband Engine, altivec supported
  clock : 3200.00MHz
  revision : 48.0 (pvr 0070 3000)
  processor : 1
  cpu : Cell Broadband Engine, altivec supported
  clock : 3200.00MHz
  revision : 48.0 (pvr 0070 3000)
  processor : 2
  cpu : Cell Broadband Engine, altivec supported
  clock : 3200.00MHz
  revision : 48.0 (pvr 0070 3000)
  processor : 3
  cpu : Cell Broadband Engine, altivec supported
  clock : 3200.00MHz
  revision : 48.0 (pvr 0070 3000)
  timebase : 2666
  platform : Cell
  machine : CHRP IBM,0793-1RZ



  On Thu, Oct 23, 2008 at 3:00 PM, Mi Yan  wrote:.
Hi, Lenny,

So rank file map will be supported in OpenMPI 1.3? I'm using
OpenMPI1.2.6 and did not find parameter "rmaps_rank_file_".
Do you have idea when OpenMPI 1.3 will be available? OpenMPI 1.3
has quite a few features I'm looking for.

Thanks,

Mi
"Lenny Verkhovsky" 

   
   "Lenny Verkhovsky"  
   Sent by: users-boun...@open-mpi.org 

Re: [OMPI users] Working with a CellBlade cluster

2008-10-23 Thread Mi Yan

Lenny,

 Thanks.
 I asked the Cell/BE Linux Kernel developer to get the CPU  mapping :)
The mapping is fixed in current kernel.

 Mi


   
 "Lenny
 Verkhovsky"   
"Open MPI Users"
 Sent by:  
 users-bounces@ope  cc
 n-mpi.org 
   Subject
   Re: [OMPI users] Working with a 
 10/23/2008 01:52  CellBlade cluster   
 PM
   
   
 Please respond to 
  Open MPI Users   
 
   
   




According to https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3
very soon,
but you can download trunk version http://www.open-mpi.org/svn/  and check
if it works for you.

how can you check mapping CPUs by OS , my cat /proc/cpuinfo shows very
little info
# cat /proc/cpuinfo
processor   : 0
cpu : Cell Broadband Engine, altivec supported
clock   : 3200.00MHz
revision: 48.0 (pvr 0070 3000)
processor   : 1
cpu : Cell Broadband Engine, altivec supported
clock   : 3200.00MHz
revision: 48.0 (pvr 0070 3000)
processor   : 2
cpu : Cell Broadband Engine, altivec supported
clock   : 3200.00MHz
revision: 48.0 (pvr 0070 3000)
processor   : 3
cpu : Cell Broadband Engine, altivec supported
clock   : 3200.00MHz
revision: 48.0 (pvr 0070 3000)
timebase: 2666
platform: Cell
machine : CHRP IBM,0793-1RZ



On Thu, Oct 23, 2008 at 3:00 PM, Mi Yan  wrote:.
  Hi, Lenny,

  So rank file map will be supported in OpenMPI 1.3? I'm using OpenMPI1.2.6
  and did not find parameter "rmaps_rank_file_".
  Do you have idea when OpenMPI 1.3 will be available? OpenMPI 1.3 has
  quite a few features I'm looking for.

  Thanks,

  Mi
  Inactive hide details for "Lenny Verkhovsky" 
  "Lenny Verkhovsky" 




   
  "Lenny   
  Verkhovsky" .
  Sent by: 
  users-bounces@o  
  pen-mpi.org   To
   
"Open MPI Users" <
  10/23/2008us...@open-mpi.org
  05:48 AM  >  
   
cc
 Please respond to 
   Open MPI Users
   
   
   Subject
   
Re: [OMPI users]
Working with a 
CellBlade cluster
   
   
   
   
   
   



  Hi,


  If I understand you correctly the most suitable way to do it is by
  paffinity that we have in Open MPI 1.3 and the trank.
  how ever usually OS is distributing processes evenly be

Re: [OMPI users] Working with a CellBlade cluster

2008-10-23 Thread Mi Yan

Hi, Lenny,

So rank file map will be supported in OpenMPI 1.3?I'm using
OpenMPI1.2.6 and did not find parameter "rmaps_rank_file_".
   Do you have idea when OpenMPI 1.3 will be available?OpenMPI 1.3
has quite a few features I'm looking for.

Thanks,
Mi


   
 "Lenny
 Verkhovsky"   
"Open MPI Users"
 Sent by:  
 users-bounces@ope  cc
 n-mpi.org 
   Subject
   Re: [OMPI users] Working with a 
 10/23/2008 05:48  CellBlade cluster   
 AM
   
   
 Please respond to 
  Open MPI Users   
 
   
   




Hi,


If I understand you correctly the most suitable way to do it is by
paffinity that we have in Open MPI 1.3 and the trank.
how ever usually OS is distributing processes evenly between sockets by it
self.

There still no formal FAQ due to a multiple reasons but you can read how to
use it in the attached scratch ( there were few name changings of the
params, so check with ompi_info )

shared memory is used between processes that share same machine, and openib
is used between different machines ( hostnames ), no special mca params are
needed.

Best Regards
Lenny,







On Sun, Oct 19, 2008 at 10:32 AM, Gilbert Grosdidier 
wrote:
   Working with a CellBlade cluster (QS22), the requirement is to have one
  instance of the executable running on each socket of the blade (there are
  2
  sockets). The application is of the 'domain decomposition' type, and each
  instance is required to often send/receive data with both the remote
  blades and
  the neighbor socket.

   Question is : which specification must be used for the mca btl component
  to force 1) shmem type messages when communicating with this neighbor
  socket,
  while 2) using openib to communicate with the remote blades ?
  Is '-mca btl sm,openib,self' suitable for this ?

   Also, which debug flags could be used to crosscheck that the messages
  are
  _actually_ going thru the right channel for a given channel, please ?

   We are currently using OpenMPI 1.2.5 shipped with RHEL5.2 (ppc64).
  Which version do you think is currently the most optimised for these
  processors and problem type ? Should we go towards OpenMPI 1.2.8
  instead ?
  Or even try some OpenMPI 1.3 nightly build ?

   Thanks in advance for your help,  Gilbert.

  ___
  users mailing list
  us...@open-mpi.org
  http://www.open-mpi.org/mailman/listinfo.cgi/users
(See attached file: RANKS_FAQ.doc)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

RANKS_FAQ.doc
Description: MS-Word document


Re: [OMPI users] Working with a CellBlade cluster

2008-10-23 Thread Mi Yan

1.  MCA BTL parameters
With "-mca btl openib,self", both message between two Cell processors on
one QS22 and   messages between two QS22s go through IB.

With "-mca btl openib,sm,slef",  message on one QS22 go through shared
memory,  message between QS22 go through IB,

Depending on the message size and other MCA parameters,  it does not
guarantee message passing on shared memory is faster than on IB.   E.g.
the bandwidth for 64KB message is 959MB/s on shared-memory and is 694MB/s
on IB;  the bandwidth for 4MB message is 539 MB/s and 1092 MB/s on  IB.
The bandwidth of 4MB message on shared memory may be higher if you tune
some MCA parameter.

2.  mpi_paffinity_alone
  "mpi_paffinity_alone =1"  is not a good choice for QS22.  There are two
sockets with two physical  Cell/B.E. on one QS22.  Each Cell/B.E. has two
SMT threads.   So there are four logical CPUs on one QS22.  CBE Linux
kernel maps logical cpu 0 and 1 to socket1 and maps logical cpu 1 and 2 to
socket 2.If mpi_paffinity_alone is set to 1,   the two MPI instances
will be assigned to logical cpu 0 and cpu 1 on socket 1.  I believe this is
not what you want.

A temporaily solution to  force the affinity on  QS22 is to use
"numactl",   E.g.  assuming the hostname is "qs22" and the executable is
"foo".  the following command can be used
mpirun -np 1 -H qs22 numactl -c0 -m0  foo :   -np 1 -H qs22
numactl -c1 -m1 foo

   In the long run,  I wish CBE kernel export  CPU topology  in /sys  and
use  PLPA to force the processor affinity.

Best Regards,
Mi



   
 "Lenny
 Verkhovsky"   
"Open MPI Users"
 Sent by:  
 users-bounces@ope  cc
 n-mpi.org 
   Subject
   Re: [OMPI users] Working with a 
 10/23/2008 05:48  CellBlade cluster   
 AM
   
   
 Please respond to 
  Open MPI Users   
 
   
   




Hi,


If I understand you correctly the most suitable way to do it is by
paffinity that we have in Open MPI 1.3 and the trank.
how ever usually OS is distributing processes evenly between sockets by it
self.

There still no formal FAQ due to a multiple reasons but you can read how to
use it in the attached scratch ( there were few name changings of the
params, so check with ompi_info )

shared memory is used between processes that share same machine, and openib
is used between different machines ( hostnames ), no special mca params are
needed.

Best Regards
Lenny,







On Sun, Oct 19, 2008 at 10:32 AM, Gilbert Grosdidier 
wrote:
   Working with a CellBlade cluster (QS22), the requirement is to have one
  instance of the executable running on each socket of the blade (there are
  2
  sockets). The application is of the 'domain decomposition' type, and each
  instance is required to often send/receive data with both the remote
  blades and
  the neighbor socket.

   Question is : which specification must be used for the mca btl component
  to force 1) shmem type messages when communicating with this neighbor
  socket,
  while 2) using openib to communicate with the remote blades ?
  Is '-mca btl sm,openib,self' suitable for this ?

   Also, which debug flags could be used to crosscheck that the messages
  are
  _actually_ going thru the right channel for a given channel, please ?

   We are currently using OpenMPI 1.2.5 shipped with RHEL5.2 (ppc64).
  Which version do you think is currently the most optimised for these
  processors and problem type ? Should we go towards OpenMPI 1.2.8
  instead ?
  Or even try some OpenMPI 1.3 nightly build ?

   Thanks in advance for your help,  Gilbert.

  ___
  users mailing list
  us...@open-mpi.org
  http://www.open-mpi.org/mailman/listinfo.cgi/users
(See attached file: RANKS_FAQ.doc)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

RANKS_FAQ.doc
Description: MS-Word document


Re: [OMPI users] RDMA over IB between heterogenous processors with different endianness

2008-08-25 Thread Mi Yan

Brian,

  I'm using OpenMPI 1.2.6 (r17946).   Could you plese check which
version works ?  Thanks a lot,
Mi


   
 "Brian W. 
 Barrett"  
 Open MPI Users 
 Sent by:   cc
 users-bounces@ope Greg
 n-mpi.org Rodgers/Poughkeepsie/IBM@IBMUS, 
   Brad Benton/Austin/IBM@IBMUS
   Subject
 08/25/2008 01:44  Re: [OMPI users] RDMA over IB   
 PMbetween heterogenous processors 
   with different endianness   
   
 Please respond to 
  Open MPI Users   
 
   
   




On Mon, 25 Aug 2008, Mi Yan wrote:

> Does OpenMPI always use SEND/RECV protocol between heterogeneous
> processors with different endianness?
>
> I tried btl_openib_flags to be 2 , 4 and 6 respectively to allowe RDMA,
> but the bandwidth between the two heterogeneous nodes is slow, same as
> the bandwidth when btl_openib_flags to be 1. Seems to me SEND/RECV is
> always used no matter btl_openib_flags is. Can I force OpenMPI to use
> RDMA between x86 and PPC? I only transfer MPI_BYTE, so we do not need the
> support for endianness.

Which version of Open MPI are you using?  In recent versions (I don't
remember exactly when the change occured, unfortuantely), the decision
between send/recv and rdma was moved from being solely based on the
architecture of the remote process to being based on the architecture and
datatype.  It's possible this has been broken again, but there defintiely
was some window (possibly only on the development trunk) when that worked
correctly.

Brian
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] RDMA over IB between heterogenous processors with different endianness

2008-08-25 Thread Mi Yan


 Does OpenMPI always  use SEND/RECV protocol  between heterogeneous
processors with different endianness?

I tried  btl_openib_flags to be 2 , 4 and 6 respectively to allowe RDMA,
but the bandwidth between the two heterogeneous nodes is slow,  same as
the bandwidth when btl_openib_flags to be 1.  Seems to me SEND/RECV  is
always used no matter btl_openib_flags is.   Can I force  OpenMPI to use
RDMA between x86 and PPC? I only transfer MPI_BYTE, so we do not need
the support for endianness.

thanks,
Mi  Yan

Re: [OMPI users] problem when mpi_paffinity_alone is set to 1

2008-08-22 Thread Mi Yan

Ralph,

  How does OpenMPI pick up the map between physical vs.  logical
processors?Does OMPI  look into "/sys/devices/system/node/node for
the cpu topology?


Thanks,
Mi Yan


   
 Ralph Castain 
 
 Sent by:   To
 users-bounces@ope Open MPI Users 
 n-mpi.org  cc
   
   Subject
 08/22/2008 09:16  Re: [OMPI users] problem when   
 AMmpi_paffinity_alone is set to 1 
   
   
 Please respond to 
  Open MPI Users   
 
   
   




Okay, I'll look into it. I suspect the problem is due to the
redefinition of the paffinity API to clarify physical vs logical
processors - more than likely, the maffinity interface suffers from
the same problem we had to correct over there.

We'll report back later with an estimate of how quickly this can be
fixed.

Thanks
Ralph

On Aug 22, 2008, at 7:03 AM, Camille Coti wrote:

>
> Ralph,
>
> I compiled a clean checkout from the trunk (r19392), the problem is
> still the same.
>
> Camille
>
>
> Ralph Castain a écrit :
>> Hi Camille
>> What OMPI version are you using? We just changed the paffinity
>> module last night, but did nothing to maffinity. However, it is
>> possible that the maffinity framework makes some calls into
>> paffinity that need to adjust.
>> So version number would help a great deal in this case.
>> Thanks
>> Ralph
>> On Aug 22, 2008, at 5:23 AM, Camille Coti wrote:
>>> Hello,
>>>
>>> I am trying to run applications on a shared-memory machine. For
>>> the moment I am just trying to run tests on point-to-point
>>> communications (a  trivial token ring) and collective operations
>>> (from the SkaMPI tests suite).
>>>
>>> It runs smoothly if mpi_paffinity_alone is set to 0. For a number
>>> of processes which is larger than about 10, global communications
>>> just don't seem possible. Point-to-point communications seem to be
>>> OK.
>>>
>>> But when I specify  --mca mpi_paffinity_alone 1 in my command
>>> line, I get the following error:
>>>
>>> mbind: Invalid argument
>>>
>>> I looked into the code of maffinity/libnuma, and found out the
>>> error comes from
>>>
>>>   numa_setlocal_memory(segments[i].mbs_start_addr,
>>>segments[i].mbs_len);
>>>
>>> in maffinity_libnuma_module.c.
>>>
>>> The machine I am using is a Linux box running a 2.6.5-7 kernel.
>>>
>>> Has anyone experienced a similar problem?
>>>
>>> Camille
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] run OpenMPI job on heterogeneous processor

2008-08-20 Thread Mi Yan

Ralph,

 Thanks!
  I checked output of "ompi_info" and found that OpenMPI on PowerPC is
not built with heterogeneous support.  We will rebuild  OpenMPI and then
try the command you suggested.

Best Regards,
Mi


   
 Ralph Castain 
 
 Sent by:   To
 users-bounces@ope Open MPI Users 
 n-mpi.org  cc
   
   Subject
 08/20/2008 12:53  Re: [OMPI users] run OpenMPI job on
 PMheterogeneous processor 
   
   
 Please respond to 
  Open MPI Users   
 
   
   




First, I trust that you built Open MPI to support heterogeneous
operations? I'm not sure what version you are using, but it may well
have done it by default.

Second, there is an error on your cmd line that is causing the
problem. It should read:

mpirun -np 1 -host b1 foo_x86 : -np 1 -host b2 foo_ppc

The way you wrote it, foo_x86 will run anywhere it wants (which would
default to whatever node you were on when you executed this), while
foo_ppc will run on both hosts b1 and b2 (which means the first rank
will always go on b1).

Hope that helps
Ralph


On Aug 20, 2008, at 10:02 AM, Mi Yan wrote:

> I have one MPI job consisting of two parts. One is "foo_x86", the
> other is "foo_ppc", and there is MPI communication between "foo_x86"
> and "foo_ppc".
> "foo_x86" is built on X86 box "b1", "foo_pcc" is built on PPC box
> "b2". Anyone can tell me how to start this MPI job?
>
> I tried "mpirun -np 1 foo_x86 : -np 1 foo_ppc -H b1,b2"
>
> I tried the above command on "b1", the X86 box, and I got "foo_ppc:
> Exec Format error"
> I tired on "b2", the PPC box, and I got "foo_x86: Exec format error"
>
> Anybody has a clue? Thanks in advance.
>
> Mi Yan
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] run OpenMPI job on heterogeneous processor

2008-08-20 Thread Mi Yan

I have one MPI  job consisting of two  parts.  One is  "foo_x86", the other
is "foo_ppc", and there is  MPI communication between "foo_x86" and
"foo_ppc".
   "foo_x86" is built on X86 box "b1",   "foo_pcc" is built  on  PPC box
"b2".Anyone can tell me how to start this MPI job?

I tried "mpirun   -np 1 foo_x86 : -np 1 foo_ppc  -H b1,b2"

I tried the above command on "b1", the X86 box,   and  I got "foo_ppc:
Exec Format error"
 I tired  on "b2", the PPC box, and I got  "foo_x86: Exec format error"

Anybody has a clue?   Thanks in advance.

Mi Yan