Re: [OMPI users] valgrind complaint in openmpi1.3 (mca_mpool_sm_alloc)

2009-03-14 Thread George Bosilca
I set it based on the only available information we have in the init  
function. This way the variable is always initialized, and the upper  
layer (whatever it is) has the responsibility to set it to something  
useful.


Looking at the code it seems that the upper layer in question is the  
mpool sm component who has this information. r20780 fixes this problem.


  george.

On Mar 14, 2009, at 09:23 , Jeff Squyres wrote:


George --

Any particular reason you fixed it this way?


On Mar 10, 2009, at 1:40 PM, Åke Sandgren wrote:


On Tue, 2009-03-10 at 09:23 -0800, Eugene Loh wrote:
> Åke Sandgren wrote:
>
> >Hi!
> >
> >Valgrind seems to think that there is an use of uninitialized  
value in

> >mca_mpool_sm_alloc, i.e. the if(mpool_sm->mem_node >= 0) {
> >Backtracking that i found that mem_node is not set during  
initializing

> >in mca_mpool_sm_init.
> >The resources parameter is never used and the mpool_module- 
>mem_node is

> >never initalized.
> >
> >Bug or not?
> >
> >
> Apparently George fixed this in the trunk in r19257
> 
https://svn.open-mpi.org/source/history/ompi-trunk/ompi/mca/mpool/sm/mpool_sm_module.c
> .  So, the resources parameter is never used, but you call
> mca_mpool_sm_module_init(), which has the decency to set mem_node  
to

> -1.  Not a helpful value, but a legal one.

So why not set it in the calling function which have access to the
precomputed resources value?

--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90  
7866126

Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] core dump while running openmpi

2009-03-14 Thread Ted Yu
Hi there:

I'm trying to install an old version of openmpi 1.1.1
on a 32 bit cluster and running a program with it.  This program runs fine for 
another 64 bit
cluster which has openmpi 1.1.1 installed, but when running this on the 32 bit 
cluster, I get the
following error:

/var/spool/pbs/mom_priv/jobs/282832.borg.SC:
line 37: 13154 Segmentation fault  (core dumped)
/ul/tedhyu/openmpi/openmpi-1.1.1/install/bin/mpirun -machinefile
${PBS_NODEFILE} -np ${NPROCS} ${CODE}

Has anybody encountered this error before?  If you have any advice, it would be 
much appreciated.

Regards,

Ted


  

Re: [OMPI users] valgrind complaint in openmpi1.3 (mca_mpool_sm_alloc)

2009-03-14 Thread Jeff Squyres

George --

Any particular reason you fixed it this way?


On Mar 10, 2009, at 1:40 PM, Åke Sandgren wrote:


On Tue, 2009-03-10 at 09:23 -0800, Eugene Loh wrote:
> Åke Sandgren wrote:
>
> >Hi!
> >
> >Valgrind seems to think that there is an use of uninitialized  
value in

> >mca_mpool_sm_alloc, i.e. the if(mpool_sm->mem_node >= 0) {
> >Backtracking that i found that mem_node is not set during  
initializing

> >in mca_mpool_sm_init.
> >The resources parameter is never used and the mpool_module- 
>mem_node is

> >never initalized.
> >
> >Bug or not?
> >
> >
> Apparently George fixed this in the trunk in r19257
> 
https://svn.open-mpi.org/source/history/ompi-trunk/ompi/mca/mpool/sm/mpool_sm_module.c
> .  So, the resources parameter is never used, but you call
> mca_mpool_sm_module_init(), which has the decency to set mem_node to
> -1.  Not a helpful value, but a legal one.

So why not set it in the calling function which have access to the
precomputed resources value?

--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems




Re: [OMPI users] Run-time problem

2009-03-14 Thread Jeff Squyres
Sorry for the delay in replying; this week unexpectedly turned  
exceptionally hectic for several us...


On Mar 9, 2009, at 2:53 PM, justin oppenheim wrote:

Yes. As I indicated earlier, I did use these options to compile my  
program


MPI_CXX=/programs/openmpi/bin/mpicxx
MPI_CC=/programs/openmpi/bin/mpicc
MPI_INCLUDE=/programs/openmpi/include/
MPI_LIB=mpi /programs/openmpi/
MPI_LIBDIR=/programs/openmpi/lib/ MPI_LINKERFORPROGRAMS=/programs/ 
openmpi/bin/mpicxx


Ah; I think Ralph was asking because we don't know exactly how these ? 
environment variables? are being used to build your application.


where /programs/openmpi/ is the chosen location for installing the  
openmpi package (specifically, openmpi-1.3.tar.gz)  that I  
downloaded from  www.open-mpi.org.


Can you ensure that you have exactly the same version of Open MPI  
installed on all nodes in exactly the same location in the filesystem  
(it doesn't *have* to be the same location on the filesystem on all  
the nodes, but it sure is easier if it is).  Also be sure that when  
you mpirun across multiple nodes that the same version of Open MPI  
(both executables and libraries) are being found on all nodes.




Any clue? Again, my system is Suse 10.3 64-bit, which should be  
pretty standard. Would another package openmpi-1.3-1.src.rpm work  
better for my system?


Thanks,

JO





--- On Mon, 3/9/09, Ralph Castain  wrote:
From: Ralph Castain 
Subject: Re: [OMPI users] Run-time problem
To: jl09...@yahoo.com
Cc: us...@open-mpi.org
Date: Monday, March 9, 2009, 7:59 AM

Did you try compiling your program with the provided mpicc (or  
mpiCC, mpif90, etc. - as appropriate) wrapper compiler? The wrapper  
compilers contain all the required library definitions to make the  
application work.


Compiling without the wrapper compilers is a very bad idea...

Ralph


On Mar 6, 2009, at 11:02 AM, justin oppenheim wrote:

Please let me go over it again, and maybe it helps clarifying  
things a bit better. All the OS involved are Suse 10.3.


I have a place for the the installed programs, say /programs.

In /programs I have installed openmpi and my mpi program, say  
my_mpi_program.  When I am in the working directory, my  
LD_LIBRARY_PATH does include both


/programs/my_mpi_program/lib
/programs/openmpi/lib

And my PATH includes
/programs/my_mpi_program/bin
/programs/openmpi/bin

So, then I do

mpirun -machinefile machinefile  -np 20 my_mpi_program 

and I get

/programs/my_mpi_program: symbol lookup error: /programs/openmpi/ 
lib/libmpi_cxx.so.0: undefined symbol: ompi_registered_datareps


When I configured openmpi, I did

./configure --prefix=/programs/openmpi

and then compiled it. Subsequently, I compiled my_mpi_program with  
the options:


MPI_CXX=/programs/openmpi/bin/mpicxx
MPI_CC=/programs/openmpi/bin/mpicc
MPI_INCLUDE=/programs/openmpi/include/
MPI_LIB=mpi
MPI_LIBDIR=/programs/openmpi/lib/ MPI_LINKERFORPROGRAMS=/programs/ 
openmpi/bin/mpicxx


Any clue? The directory /programs is NSF mounted on the nodes.

Many thanks again,

JO










--- On Thu, 3/5/09, justin oppenheim  wrote:
From: justin oppenheim 
Subject: Re: [OMPI users] Run-time problem
To: "Ralph Castain" 
Date: Thursday, March 5, 2009, 5:28 PM

Hi Ralph:

Sorry for my ignorance, but in you option 2: what command should I  
add the option
--prefix=path-to-install? when I configure openmpi? I already did  
that when I configured  and compiled openmpi.  Also, in response to  
your option 1, I did add the paths to libraries of openmpi in the  
LD_LIBRARY_PATH  in the .cshrc of the nodes.


Thank you,
JO

--- On Thu, 3/5/09, Ralph Castain  wrote:
From: Ralph Castain 
Subject: Re: [OMPI users] Run-time problem
To: jl09...@yahoo.com
Cc: "Open MPI Users " 
Date: Thursday, March 5, 2009, 12:46 PM

First, you can add --launch-agent rsh to the command line and that  
will have OMPI use rsh.


It sounds like your remote nodes may not be seeing your OMPI  
install directory. Several ways you can resolve that - here are a  
couple:


1. add the install directory to your LD_LIBRARY_PATH in your .cshrc  
(or whatever shell rc you are using) - be sure this is being  
executed on the remote nodes


2. add --prefix=path-to-install on your cmd line - this will direct  
your remote procs to the proper libraries


Ralph


On Mar 5, 2009, at 10:18 AM, justin oppenheim wrote:


Maybe I should also add that the program
my_mpi_executable is locally installed under the same root  
directory as that under which  openmpi-1.3 is installed. This root  
directory is NSF mounted on the working nodes.


Thanks,
JO

--- On Thu, 3/5/09, justin oppenheim  wrote:
From: justin oppenheim 
Subject: Re: [OMPI users] Run-time problem
To: "Ralph Castain" 
Date: Thursday, March 5, 2009, 12:04 PM

Hi Ralph:

Thanks for your 

Re: [OMPI users] Can't start program across network

2009-03-14 Thread Jeff Squyres

Can you send all the information here:

http://www.open-mpi.org/community/help/

(including the network information)

Thanks!


On Mar 13, 2009, at 9:12 PM, Raymond Wan wrote:



Hi Jeff,


Jeff Squyres wrote:
> On Mar 13, 2009, at 6:17 AM, Raymond Wan wrote:
>
>> What doesn't work is:
>>
>> [On Y] mpirun --host Y,Z --np 2 uname -a
>> [On Y] mpirun --host X,Y,Z --np 3 uname -a
>>
>> ...and similarly for machine Z.  I can confirm that from any of  
the 3

>
> Do you see "rsh" or "ssh" in the output of "ps -eadf" when mpirun is
> hanging, perchance?  If you, what happens if you copy-n-paste those
> command lines and run them manually?
>


No, I don't see either rsh or ssh when mpirun is hanging.  Is that  
odd?  Something I'm doing wrong?


I only see an mpirun command and an orted command.


rwan 22800 22761  0 09:52 pts/200:00:00 mpirun --host X,Y,Z  
--np 3 sleep 1000
rwan 22804 1  0 09:52 ?00:00:00 orted --bootproxy 1  
--name 0.0.2 --num_procs 4 --vpid_start 0 --nodename Y --universe  
rwan@Y:default-universe-22800 --nsreplica "0.0.0;tcp://Y:36889" -- 
gprreplica "0.0.0;tcp://Y:36889" --set-sid



Actually, when I run the above mpirun command, I don't see "sleep"  
running locally on machine Y, either.  However, if I did this:


mpirun --host Y --np 3 sleep 1000

I see 3 instances of "sleep" when I do ps -aedf.  Does mpirun try to  
"ssh" all networked machines first before it starts the program  
(even if one of those instances will run locally?).  Perhaps  
unrelated...but when I am on Y and I do an rsh to Z, I get a "No  
route to host".  I asked the sysadmin about it (I'm not the sysadmin  
of Y or Z) and he doesn't know why but as we should be using ssh  
anyway, he isn't going to address the problem (unless it is a side- 
effect of my mpirun problem). I only presume rsh hasn't been set up  
properly; ssh works fine, though.


Thank you!

Ray


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Problem in MPI::Finalize when freeingintercommunicators

2009-03-14 Thread Jeff Squyres

On Mar 13, 2009, at 5:15 PM, Mikael Djurfeldt wrote:

On Fri, Mar 13, 2009 at 9:28 PM, Jeff Squyres   
wrote:

> No you should not need to do this.
>
> Is there any chance you could upgrade to Open MPI v1.3?

Yes. It works without a Barrier under v1.3.  Is this a known problem?



Possibly...?  I can't name any particular issue offhand that is a  
known culprit for this, but it's possible someone else can.  There are  
many changes and fixes in the v1.3 series as compared to the v1.2  
series.



What is the best way for me to test in my configure script that I'm
running under OpenMPI version >= 1.3 so that I can disable the Barrier
for such versions?




In mpi.h, we have a few macros that should help you:

-
/*
 * Just in case you need it.  :-)
 */
#define OPEN_MPI 1

/* Major, minor, and release version of Open MPI */
#define OMPI_MAJOR_VERSION 1
#define OMPI_MINOR_VERSION 3
#define OMPI_RELEASE_VERSION 0
-

You should be able to construct a fairly simple AC_TRY_RUN test that  
checks #if defined(), etc.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h

2009-03-14 Thread Jeff Squyres
Oops!  I sent the patch to George but didn't send it to everyone  
else.  Here's a patch showing how I propose to fix this problem:


Index: ompi/mca/op/op.h
===
--- ompi/mca/op/op.h(revision 20777)
+++ ompi/mca/op/op.h(working copy)
@@ -258,14 +258,41 @@
 typedef ompi_op_base_handler_fn_1_0_0_t ompi_op_base_handler_fn_t;

 /*
+ * Per the thread starting here:
+ *
+ * http://www.open-mpi.org/community/lists/users/2009/03/8402.php
+ *
+ * We [re-]discovered that AC_C_RESTRICT only checks for "restrict" in
+ * the C compiler.  But this header file is included in components.cc
+ * (i.e., ompi_info), so the "restrict" here may be problematic for
+ * the C++ compiler.
+ *
+ * Since we *know* that this function is only used in C code in OMPI
+ * (e.g., it's not used in ompi_info or the C++ bindings), just
+ * have an "alternate"
+ */
+#if defined(c_plusplus) || defined(__cplusplus)
+#define OMPI_SAFE_RESTRICT
+#else
+#define OMPI_SAFE_RESTRICT restrict
+#endif
+/*
  * Typedef for 3-buffer (two input and one output) op functions.
  */
-typedef void (*ompi_op_base_3buff_handler_fn_1_0_0_t)(void *,
-  void *,
-  void *, int *,
+typedef void (*ompi_op_base_3buff_handler_fn_1_0_0_t)(void  
*OMPI_SAFE_RESTRICT,
+  void  
*OMPI_SAFE_RESTRICT,
+  void  
*OMPI_SAFE_RESTRICT,

+  int *,
   struct  
ompi_datatype_t **,
   struct  
ompi_op_base_module_1_0_0_t *);


+/*
+ * We don't want anyone else using OMPI_SAFE_RESTRICT elsewhere in the
+ * code base; this hack is only because we don't have an
+ * AC_CXX_RESTRICT Autoconf test.
+ */
+#undef OMPI_SAFE_RESTRICT
+
 typedef ompi_op_base_3buff_handler_fn_1_0_0_t  
ompi_op_base_3buff_handler_fn_t;


 /**



On Mar 14, 2009, at 8:22 AM, Jeff Squyres (jsquyres) wrote:


Yes, it does.

It re-looking at this problem, it seemed to me:

1. The real fix is to talk to the AC people and get something like
AC_CXX_RESTRICT.  The PGI compiler is one place where "restrict"
support may be different in the C and C++ compilers.  I'm not sure
what the Right answer is there, but I'll ask them about it.

2. In this specific case, the use of "restrict" *does not matter* in
components.cc.  This particular part of the file is not what
components.cc needs/uses.  So it's ok to #define it away to nothing.

3. Since this problem now exists in at least *2* compilers that we
know about (Sun, PGI), it seemed that -- at least while waiting for
some kind of proper fix from AC -- just #define restrict away for C++
for this particular case was ok, rather than try to adapt to every
compiler.  Rolf's fix was ok previously because we thought it was
specific to one compiler.  But now the door is open to other
compilers, so let's use a broad stroke to work around it for all C++
compilers.

That's why I coded it up this way.



On Mar 14, 2009, at 7:39 AM, Terry Dontje wrote:

> You know this all looks very similar to the reason why rolfv putback
> r20351 which essentially defined out restrict within
> opal_config_bottom.h when using Sun Studio.
>
> --td
>
> Date: Fri, 13 Mar 2009 16:40:49 -0400
> From: Jeff Squyres 
> Subject: Re: [OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h
> To: "Open MPI Users" 
> Message-ID: <2aca69ab-5f23-4ae9-8826-77a6348e9...@cisco.com>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> On Mar 13, 2009, at 4:37 PM, Mostyn Lewis wrote:
>
>
> > > >From config.log
> > >
> > > configure:21522: checking for C/C++ restrict keyword
> > > configure:21558: pgcc -c -DNDEBUG -fast -Msignextend -tp  
p7-64   >

> > conftest.c >&5
> > > configure:21564: $? = 0
> > > configure:21582: result: restrict
> > >
> > > So you only check using pgcc (not pgCC)?
> > >
> >
>
> The AC_C_RESTRICT test only checks the C compiler, yet.  It's an
> Autoconf-builtin test; we didn't write it.
>
> Odd that you get "restrict" and I get "__restrict".  Hrm.
>
> Well, I suppose that one solution might be to disable those  
prototypes
> in the op.h header file when they're included in components.cc  
(that's

> a source file in the ompi_info executable; it shouldn't need the
> specific MPI_Op callback prototypes).  Fortunately, we have very
> little
> C++ code in OMPI, so this isn't a huge issue (C++ is only used  for
> the
> MPI C++ bindings -- of course -- and in some of the command  line
> executables).
>
> Let me see what I can cook up, and then let me see if I can convince
> George that it's the correct answer.   ;-)
> -- Jeff Squyres Cisco Systems
> ___
> users mailing list
> 

Re: [OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h

2009-03-14 Thread Jeff Squyres

Yes, it does.

It re-looking at this problem, it seemed to me:

1. The real fix is to talk to the AC people and get something like  
AC_CXX_RESTRICT.  The PGI compiler is one place where "restrict"  
support may be different in the C and C++ compilers.  I'm not sure  
what the Right answer is there, but I'll ask them about it.


2. In this specific case, the use of "restrict" *does not matter* in  
components.cc.  This particular part of the file is not what  
components.cc needs/uses.  So it's ok to #define it away to nothing.


3. Since this problem now exists in at least *2* compilers that we  
know about (Sun, PGI), it seemed that -- at least while waiting for  
some kind of proper fix from AC -- just #define restrict away for C++  
for this particular case was ok, rather than try to adapt to every  
compiler.  Rolf's fix was ok previously because we thought it was  
specific to one compiler.  But now the door is open to other  
compilers, so let's use a broad stroke to work around it for all C++  
compilers.


That's why I coded it up this way.



On Mar 14, 2009, at 7:39 AM, Terry Dontje wrote:


You know this all looks very similar to the reason why rolfv putback
r20351 which essentially defined out restrict within
opal_config_bottom.h when using Sun Studio.

--td

Date: Fri, 13 Mar 2009 16:40:49 -0400
From: Jeff Squyres 
Subject: Re: [OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h
To: "Open MPI Users" 
Message-ID: <2aca69ab-5f23-4ae9-8826-77a6348e9...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

On Mar 13, 2009, at 4:37 PM, Mostyn Lewis wrote:


> > >From config.log
> >
> > configure:21522: checking for C/C++ restrict keyword
> > configure:21558: pgcc -c -DNDEBUG -fast -Msignextend -tp p7-64   >
> conftest.c >&5
> > configure:21564: $? = 0
> > configure:21582: result: restrict
> >
> > So you only check using pgcc (not pgCC)?
> >
>

The AC_C_RESTRICT test only checks the C compiler, yet.  It's an
Autoconf-builtin test; we didn't write it.

Odd that you get "restrict" and I get "__restrict".  Hrm.

Well, I suppose that one solution might be to disable those prototypes
in the op.h header file when they're included in components.cc (that's
a source file in the ompi_info executable; it shouldn't need the
specific MPI_Op callback prototypes).  Fortunately, we have very   
little
C++ code in OMPI, so this isn't a huge issue (C++ is only used  for  
the

MPI C++ bindings -- of course -- and in some of the command  line
executables).

Let me see what I can cook up, and then let me see if I can convince
George that it's the correct answer.   ;-)
-- Jeff Squyres Cisco Systems
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h

2009-03-14 Thread Terry Dontje
You know this all looks very similar to the reason why rolfv putback 
r20351 which essentially defined out restrict within 
opal_config_bottom.h when using Sun Studio.


--td

List-Post: users@lists.open-mpi.org
Date: Fri, 13 Mar 2009 16:40:49 -0400
From: Jeff Squyres 
Subject: Re: [OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h
To: "Open MPI Users" 
Message-ID: <2aca69ab-5f23-4ae9-8826-77a6348e9...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

On Mar 13, 2009, at 4:37 PM, Mostyn Lewis wrote:



> >From config.log
>
> configure:21522: checking for C/C++ restrict keyword
> configure:21558: pgcc -c -DNDEBUG -fast -Msignextend -tp p7-64   > 
conftest.c >&5

> configure:21564: $? = 0
> configure:21582: result: restrict
>
> So you only check using pgcc (not pgCC)?
>



The AC_C_RESTRICT test only checks the C compiler, yet.  It's an  
Autoconf-builtin test; we didn't write it.


Odd that you get "restrict" and I get "__restrict".  Hrm.

Well, I suppose that one solution might be to disable those prototypes  
in the op.h header file when they're included in components.cc (that's  
a source file in the ompi_info executable; it shouldn't need the  
specific MPI_Op callback prototypes).  Fortunately, we have very  little 
C++ code in OMPI, so this isn't a huge issue (C++ is only used  for the 
MPI C++ bindings -- of course -- and in some of the command  line 
executables).


Let me see what I can cook up, and then let me see if I can convince  
George that it's the correct answer.   ;-)

-- Jeff Squyres Cisco Systems


Re: [OMPI users] Compiling ompi for use on another machine

2009-03-14 Thread Raymond Wan


Hi Ben,


ben rodriguez wrote:

I have compiled ompi and another program for use on another rhel5/x86_64 
machine, after transfering the binaries and setting up environment variables is 
there anything else I need to do for ompi to run properly? When executing my 
prog I get:
--
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 1.


NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--



Just a few thoughts about your problem...

Are the two machines identical in architecture and RH installation?  Is there any reason 
why you cannot compile on the other machine too?  (Sometimes the location of dynamic 
libraries, etc. changes so I try to make a note to always recompile on each machine.)  
Are you having problems running your program on each node individually first?  If not, 
you might try that first (i.e., with "--np 1").

Ray



Re: [OMPI users] MPI jobs ending up in one node

2009-03-14 Thread Peter Teoh
oopssorryit is in Intel MPI library.   Thanks!!!

On Fri, Mar 13, 2009 at 9:47 PM, Ralph Castain  wrote:
> Hmmm...your comments don't sound like anything relating to Open MPI. Are you
> sure you are not using some other MPI?
>
> Our mpiexec isn't a script, for example, nor do we have anything named
> I_MPI_PIN_PROCESSOR_LIST in our code.
>
> :-)
>
> On Mar 13, 2009, at 4:00 AM, Peter Teoh wrote:
>
>> I saw the following problem posed somewhere - can anyone shed some
>> light?   Thanks.
>>
>> I have a cluster of 8-sock quad core systems running Redhat 5.2. It
>> seems that whenever I try to run multiple MPI jobs to a single node
>> all the jobs end up running on the same processors. For example, if I
>> were to submit 4 8-way jobs to a single box they all end up in CPUs 0
>> to 7, leaving 8 to 31 idle.
>>
>> I then tried all sorts of I_MPI_PIN_PROCESSOR_LIST combinations but
>> short of explicitly listing out the processors at each run, they all
>> end up still hanging on to CPUs 0-7. Browsing through the mpiexec
>> script, I realise that it is doing a taskset on each run.
>> As my jobs are all submitted through a scheduler (PBS in this case) I
>> cannot possibly know at job submission time which CPUs are not used.
>> So is there a simple way to tell mpiexec to set the taskset affinity
>> correctly at each run so that it will choose only the idle processors?
>> Thanks.
>>
>> --
>> Regards,
>> Peter Teoh
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Regards,
Peter Teoh