Re: [OMPI devel] Migrate the OpenMPI to VxWorks

2010-04-16 Thread Ralf Wildenhues
* Ralph Castain wrote on Fri, Apr 16, 2010 at 05:35:37AM CEST:
> I have not personally tried, but I am pretty sure that you can install
> the autotools under VxWorks - have you tried to download the latest
> autotool tarballs and build them?

I don't know if that works well out of the box, but if you build any of
the autotools, run their testsuites and find any failures, then we would
be happy to hear about them at the respective mailing lists.  If the
testsuites pass, you can be fairly confident that they work well on your
system.

To find out where the OpenMPI configure script hangs, try running it
after adding 'set -x' as second line to the script.  The output will be
large, so beware.  If /bin/sh is not a Posix shell, you might want to
try something like
  CONFIG_SHELL=/bin/bash; export CONFIG_SHELL
  $CONFIG_SHELL ./configure [OPTIONS...]

instead.

Hope that helps,
Ralf


[OMPI devel] 答复: [OMPI devel] Migrate the OpenMPI to VxWorks

2010-04-16 Thread 张晶
Hi Castain

 

Does “install the autotools under VxWorks” mean install the autotools on
the host or on the target ? 

 

Jing Zhang  

发件人: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] 代表
Ralph Castain
发送时间: 2010年4月16日 11:36
收件人: Open MPI Developers
主题: Re: [OMPI devel] Migrate the OpenMPI to VxWorks

 

I have not personally tried, but I am pretty sure that you can install the
autotools under VxWorks - have you tried to download the latest autotool
tarballs and build them?

 

On Apr 15, 2010, at 9:30 PM, 张晶 wrote:





Hello everyone ,

 

For the purpose to migrate the OpenMPI to VxWorks ,I have set up a VxWorks
development environment with WorkBench 3.0. It is really a good news that
the WorkBench supports gnu building tools, bash and some frequently used
command like sed ,awd and so on but not the Autotools (Autoconf/Automake)
.Once I wonder the configure script file generated under the linux may work
well under the workbench shell environment . But it is a frustrate to find
when I launch the configure ,it just hang without any prompt .I look into
the problem and found maybe the version of the bash in the VxWorks is too
low to parse the configure file .

Then I come across the cmake which is used to as building tools under
windows for the openmpi. Because of the feature of “cross-platform” ,I
think it is possible to use cmake to migrate .Also I found boostcmake
project which supports the vxworks gnu compile tools using cmake .

But considering my poor knowledge about the building system
(Autotools,cmake,etc),I need some guide and advice to review what I plan to
do is a possible .

 

Thank you in advance !

Jing Zhang  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

 



Re: [OMPI devel] RFC: Deprecate rankfile?

2010-04-16 Thread Terry Dontje

Ralph,

If you are suggesting that you will make code that breaks a current 
rankfile feature, note I am not talking about adding a new feature that 
isn't supported by rankfile but something that used to work, then I 
think you are acting in poor form.  At a minimum you should at least 
give the community a heads up that you are borking a feature.


There are a lot of pieces of code that everyone changes that require 
other changes that one is not completely responsible for.  If everyone 
decided it wasn't necessary to maintain/support those pieces our code 
will soon be useless (maybe it is).


--td

Ralph Castain wrote:

Read the other "no" votes, and I can understand the points made. I noted that 
neither respondent offered to maintain this feature, but indicated that it had some value.

There really isn't any need to make a decision about this because it will take care of 
itself. Since no-one is maintaining it (and I really don't have time to continue to do 
so), and it tends to break easily, this module will "self-deprecate" rather 
soon. When that happens, we can wait and see if anyone cares enough to step forward and 
take responsibility for maintaining it.

If not, then we can note that support for this feature went the way of other 
things that died due to lack of interest within the developer community (e.g., 
xgrid)...which is totally okay since this is, after all, an open source effort.


On Apr 15, 2010, at 4:00 PM, Jeff Squyres wrote:

  

WHAT: Deprecate the "rankfile" component in the v1.5 series; remove it in the 
1.7 series

WHY: It's old, creaky, and difficult to maintain.  It'll require maintenance 
someday soon, when we support hardware / hyperthreads in OMPI.

WHERE: svn rm orte/mca/rmaps/rank_file

WHEN: Deprecate in 1.5, remove in 1.7.

TIMEOUT: Teleconf on Tue, 27 April 2010

-

Now that we have nice paffinity binding options, can we deprecate rankfile in 
the 1.5 series and remove it in 1.7?

I only ask because it's kind of a pain to maintain.  It's not a huge deal, but 
Ralph and I were talking about another paffinity issue today and we both 
groaned at the prospect of extending rank file to support hyperthreads (and/or 
boards).  Perhaps it should just go away...?

Pro: less maintenance, especially since the original developers no longer 
maintain it

Con: it's the only way to have completely custom affinity bindings (i.e., outside of our 
pre-constructed "bind to socket", "bind to core", etc. options).  ...do any 
other MPI's offer completely custom binding options?  I.e., do any real users care?

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI devel] RFC: Deprecate rankfile?

2010-04-16 Thread Jeff Squyres
On Apr 16, 2010, at 6:43 AM, Terry Dontje wrote:

> If you are suggesting that you will make code that breaks a current rankfile 
> feature, note I am not talking about adding a new feature that isn't 
> supported by rankfile but something that used to work, then I think you are 
> acting in poor form.  At a minimum you should at least give the community a 
> heads up that you are borking a feature.

Er... no.

There is nothing nefarious going on here.  Ralph and I were just chatting 
yesterday about some process affinity issues and the topic of rank_file came up 
(again).  Remember that rank_file was a "throw over the wall" kind of code 
contribution and has historically been difficult to maintain.  Neither of us 
were excited at the prospect of adding hyperthreading support (once hwloc is 
finally released -- unfortunately, it's blocking on me, at the moment...) and 
also having to extend rank file to support it.

I asked Ralph if we should deprecate rank_file since the other binding options 
are available.  He assumed (correctly, it turns out) that no one would go for 
that.  But I figured I'd ask anyway.

I think all Ralph is saying is that we're (I'm) likely to add hyperthreading 
support in the not-distant future (and maybe Oracle will add support for 
boards).  This work is not likely to *break* rank_file, but neither of us are 
excited about extending rank_file to support hyperthreading.  If no one else 
steps up to extend it, then it may become obsolete over time because it doesn't 
support the things that people want.

Terry -- perhaps it's time to resurrect the new processor affinity proposal 
that you've been promising me for many months.  If rank_file were replaced with 
Something Better, I'd certainly be happy.  ;-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Deprecate rankfile?

2010-04-16 Thread Terry Dontje

Jeff Squyres wrote:

On Apr 16, 2010, at 6:43 AM, Terry Dontje wrote:

  

If you are suggesting that you will make code that breaks a current rankfile 
feature, note I am not talking about adding a new feature that isn't supported 
by rankfile but something that used to work, then I think you are acting in 
poor form.  At a minimum you should at least give the community a heads up that 
you are borking a feature.



Er... no.

There is nothing nefarious going on here.  Ralph and I were just chatting yesterday about 
some process affinity issues and the topic of rank_file came up (again).  Remember that 
rank_file was a "throw over the wall" kind of code contribution and has 
historically been difficult to maintain.  Neither of us were excited at the prospect of 
adding hyperthreading support (once hwloc is finally released -- unfortunately, it's 
blocking on me, at the moment...) and also having to extend rank file to support it.

I asked Ralph if we should deprecate rank_file since the other binding options 
are available.  He assumed (correctly, it turns out) that no one would go for 
that.  But I figured I'd ask anyway.

I think all Ralph is saying is that we're (I'm) likely to add hyperthreading 
support in the not-distant future (and maybe Oracle will add support for 
boards).  This work is not likely to *break* rank_file, but neither of us are 
excited about extending rank_file to support hyperthreading.  If no one else 
steps up to extend it, then it may become obsolete over time because it doesn't 
support the things that people want.

  
I am ok with the above. 

Terry -- perhaps it's time to resurrect the new processor affinity proposal 
that you've been promising me for many months.  If rank_file were replaced with 
Something Better, I'd certainly be happy.  ;-)

  

Can we then have Ralph implement it :-)...  That was a joke Ralph!!!


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI devel] RFC: Deprecate rankfile?

2010-04-16 Thread Ralph Castain
To be clear, I wasn't implying anyone would intentionally break rank_file. 
However, it is rarely (if ever?) tested before we release - AFAIK, none of the 
MTT tests run by the community test this feature. Thus, it inevitably breaks 
without detection as changes are made elsewhere in the system. We typically 
don't know it is broken until someone complains about it, which usually is 
several months after the release.

So I'll stand by my "self deprecate" comment. It has been the history of this 
feature, and I don't see anything changing to improve that situation.

Now if you implement a replacement... :-)

On Apr 16, 2010, at 5:08 AM, Terry Dontje wrote:

> Jeff Squyres wrote:
>> 
>> On Apr 16, 2010, at 6:43 AM, Terry Dontje wrote:
>> 
>>   
>>> If you are suggesting that you will make code that breaks a current 
>>> rankfile feature, note I am not talking about adding a new feature that 
>>> isn't supported by rankfile but something that used to work, then I think 
>>> you are acting in poor form.  At a minimum you should at least give the 
>>> community a heads up that you are borking a feature.
>>> 
>> 
>> Er... no.
>> 
>> There is nothing nefarious going on here.  Ralph and I were just chatting 
>> yesterday about some process affinity issues and the topic of rank_file came 
>> up (again).  Remember that rank_file was a "throw over the wall" kind of 
>> code contribution and has historically been difficult to maintain.  Neither 
>> of us were excited at the prospect of adding hyperthreading support (once 
>> hwloc is finally released -- unfortunately, it's blocking on me, at the 
>> moment...) and also having to extend rank file to support it.
>> 
>> I asked Ralph if we should deprecate rank_file since the other binding 
>> options are available.  He assumed (correctly, it turns out) that no one 
>> would go for that.  But I figured I'd ask anyway.
>> 
>> I think all Ralph is saying is that we're (I'm) likely to add hyperthreading 
>> support in the not-distant future (and maybe Oracle will add support for 
>> boards).  This work is not likely to *break* rank_file, but neither of us 
>> are excited about extending rank_file to support hyperthreading.  If no one 
>> else steps up to extend it, then it may become obsolete over time because it 
>> doesn't support the things that people want.
>> 
>>   
> I am ok with the above.  
>> Terry -- perhaps it's time to resurrect the new processor affinity proposal 
>> that you've been promising me for many months.  If rank_file were replaced 
>> with Something Better, I'd certainly be happy.  ;-)
>> 
>>   
> Can we then have Ralph implement it :-)...  That was a joke Ralph!!!
> 
> 
> -- 
> 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.650.633.7054
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] 答复: Migrate the OpenMPI to VxWorks

2010-04-16 Thread Ralph Castain
You would install it on the host where you are doing development. Only the 
eventual OMPI libraries get moved to the target.

On Apr 16, 2010, at 12:11 AM, 张晶 wrote:

> Hi Castain
>  
> Does “install the autotools under VxWorks” mean install the autotools on the 
> host or on the target ?
>  
> Jing Zhang  
> 发件人: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] 代表 Ralph 
> Castain
> 发送时间: 2010年4月16日 11:36
> 收件人: Open MPI Developers
> 主题: Re: [OMPI devel] Migrate the OpenMPI to VxWorks
>  
> I have not personally tried, but I am pretty sure that you can install the 
> autotools under VxWorks - have you tried to download the latest autotool 
> tarballs and build them?
>  
> On Apr 15, 2010, at 9:30 PM, 张晶 wrote:
> 
> 
> Hello everyone ,
>  
> For the purpose to migrate the OpenMPI to VxWorks ,I have set up a VxWorks 
> development environment with WorkBench 3.0. It is really a good news that the 
> WorkBench supports gnu building tools, bash and some frequently used command 
> like sed ,awd and so on but not the Autotools (Autoconf/Automake) .Once I 
> wonder the configure script file generated under the linux may work well 
> under the workbench shell environment . But it is a frustrate to find when I 
> launch the configure ,it just hang without any prompt .I look into the 
> problem and found maybe the version of the bash in the VxWorks is too low to 
> parse the configure file .
> Then I come across the cmake which is used to as building tools under windows 
> for the openmpi. Because of the feature of “cross-platform” ,I think it is 
> possible to use cmake to migrate .Also I found boostcmake project which 
> supports the vxworks gnu compile tools using cmake .
> But considering my poor knowledge about the building system 
> (Autotools,cmake,etc),I need some guide and advice to review what I plan to 
> do is a possible .
>  
> Thank you in advance !
> Jing Zhang  
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>  
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] RFC: Deprecate rankfile?

2010-04-16 Thread Terry Dontje

Ralph Castain wrote:
To be clear, I wasn't implying anyone would intentionally break 
rank_file. However, it is rarely (if ever?) tested before we release - 
AFAIK, none of the MTT tests run by the community test this feature. 
Thus, it inevitably breaks without detection as changes are made 
elsewhere in the system. We typically don't know it is broken until 
someone complains about it, which usually is several months after the 
release.



Fair enough.  I guess my yellow fever shot has made me cranky today.

So I'll stand by my "self deprecate" comment. It has been the history 
of this feature, and I don't see anything changing to improve that 
situation.


Now if you implement a replacement... :-)
I'll get right on that after you approve the RFC that I am also suppose 
to send out :-).


-td


On Apr 16, 2010, at 5:08 AM, Terry Dontje wrote:


Jeff Squyres wrote:

On Apr 16, 2010, at 6:43 AM, Terry Dontje wrote:

  

If you are suggesting that you will make code that breaks a current rankfile 
feature, note I am not talking about adding a new feature that isn't supported 
by rankfile but something that used to work, then I think you are acting in 
poor form.  At a minimum you should at least give the community a heads up that 
you are borking a feature.



Er... no.

There is nothing nefarious going on here.  Ralph and I were just chatting yesterday about 
some process affinity issues and the topic of rank_file came up (again).  Remember that 
rank_file was a "throw over the wall" kind of code contribution and has 
historically been difficult to maintain.  Neither of us were excited at the prospect of 
adding hyperthreading support (once hwloc is finally released -- unfortunately, it's 
blocking on me, at the moment...) and also having to extend rank file to support it.

I asked Ralph if we should deprecate rank_file since the other binding options 
are available.  He assumed (correctly, it turns out) that no one would go for 
that.  But I figured I'd ask anyway.

I think all Ralph is saying is that we're (I'm) likely to add hyperthreading 
support in the not-distant future (and maybe Oracle will add support for 
boards).  This work is not likely to *break* rank_file, but neither of us are 
excited about extending rank_file to support hyperthreading.  If no one else 
steps up to extend it, then it may become obsolete over time because it doesn't 
support the things that people want.

  
I am ok with the above. 

Terry -- perhaps it's time to resurrect the new processor affinity proposal 
that you've been promising me for many months.  If rank_file were replaced with 
Something Better, I'd certainly be happy.  ;-)

  

Can we then have Ralph implement it :-)...  That was a joke Ralph!!!


--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 

___
devel mailing list
de...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



[OMPI devel] 答复: [OMPI devel] 答复: Migrate the OpenMPI to VxWorks

2010-04-16 Thread 张晶
Hi Castain

 

I think I should switch the host now in the windows to the linux ,or I will
have little chance to build the autotool, Thank you for your advice !

 

JING ZHANG

 

发件人: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] 代表
Ralph Castain
发送时间: 2010年4月16日 20:36
收件人: Open MPI Developers
主题: Re: [OMPI devel] 答复: Migrate the OpenMPI to VxWorks

 

You would install it on the host where you are doing development. Only the
eventual OMPI libraries get moved to the target.

 

On Apr 16, 2010, at 12:11 AM, 张晶 wrote:





Hi Castain

 

Does “install the autotools under VxWorks” mean install the autotools on
the host or on the target ?

 

Jing Zhang  

发件人: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] 代表
Ralph Castain
发送时间: 2010年4月16日 11:36
收件人: Open MPI Developers
主题: Re: [OMPI devel] Migrate the OpenMPI to VxWorks

 

I have not personally tried, but I am pretty sure that you can install the
autotools under VxWorks - have you tried to download the latest autotool
tarballs and build them?

 

On Apr 15, 2010, at 9:30 PM, 张晶 wrote:






Hello everyone ,

 

For the purpose to migrate the OpenMPI to VxWorks ,I have set up a VxWorks
development environment with WorkBench 3.0. It is really a good news that
the WorkBench supports gnu building tools, bash and some frequently used
command like sed ,awd and so on but not the Autotools (Autoconf/Automake)
.Once I wonder the configure script file generated under the linux may work
well under the workbench shell environment . But it is a frustrate to find
when I launch the configure ,it just hang without any prompt .I look into
the problem and found maybe the version of the bash in the VxWorks is too
low to parse the configure file .

Then I come across the cmake which is used to as building tools under
windows for the openmpi. Because of the feature of “cross-platform” ,I
think it is possible to use cmake to migrate .Also I found boostcmake
project which supports the vxworks gnu compile tools using cmake .

But considering my poor knowledge about the building system
(Autotools,cmake,etc),I need some guide and advice to review what I plan to
do is a possible .

 

Thank you in advance !

Jing Zhang  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

 

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

 



Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-16 Thread Ralph Castain
Well, I guess I got sucked back into paffinity again...sigh.

I have committed a solution to this issue in r22984 and r22985. I have tested 
it against a range of scenarios, but hardly an exhaustive test. So please do 
stress it.

The following comments are by no means intended as criticism, but rather as me 
taking advantage of an opportunity to educate the general community regarding 
this topic. Since my available time to maintain this capability has diminished, 
the more people who understand all the nuances of it, the more likely we are to 
efficiently execute changes such as this one.

I couldn't use the provided patch for several reasons:

* you cannot print a message out of an odls module after the fork occurs unless 
you also report an error - i.e., you cannot print a message out as a warning 
and then continue processing. If you do so, everything will appear correct when 
you are operating in an environment where no processes are local to mpirun - 
e.g., when running under slurm as it is normally configured. However, when 
processes are local to mpirun, then using orte_show_help after the fork causes 
the print statement to occur in separate process instances. This prevents 
mpirun from realizing that multiple copies of the message are being printed, 
and thus it cannot aggregate them.

As a result, the provided patch generated one warning for every local process, 
plus one aggregated warning for all the remote processes. This isn't what we 
wanted users to see.

The correct solution was to write an integer indicating the warning to be 
issued back to the parent process, and then let that process output the actual 
warning. This allows mpirun to aggregate the result.

Nadia wasn't the only one to make this mistake. I found that I had also made it 
in an earlier revision when reporting the "could not bind" message. So it is 
easy to make, but one we need to avoid.

* it didn't address the full range of binding scenarios - it only addressed 
bind-to-socket. While I know that solved Nadia's immediate problem, it helps if 
we try to address the broader issue when making such changes. Otherwise, we 
wind up with a piecemeal approach to the problem. So I added support for all 
the binding methods in the odls_default module.

* it missed the use-case where processes are launched outside of mpirun with 
paffinity_alone or slot-list set - e.g., when direct-launching processes in 
slurm. In this case, MPI_Init actually attempts to set the process affinity - 
the odls is never called.

Here is why it is important to remember that use-case. While implementing the 
warning message there, I discovered that the code in ompi_mpi_init.c would 
actually deal with Nadia's scenario incorrectly. It would identify the process 
as unbound because it had been "bound" to all available processors. Since 
paffinity_alone is set, it would then have automatically bound the process to a 
single core based on that process' node rank.

So even though the odls had "bound" us to the socket, mpi_init would turn 
around and bind us to a core - which is not at all what Nadia wanted to have 
happen.

The solution here was to pass a parameter to the spawned process indicating 
that mpirun had "bound" it, even if the "binding" was a no-op. This value is 
then checked in mpi_init - if set, mpi_init makes no attempt to re-bind the 
process. If not set, then mpi_init is free to do whatever it deems appropriate.

So looking at all the use-cases can expose some unintended interactions. 
Unfortunately, I suspect that many people are unaware of this second  method of 
setting affinity, and so wouldn't realize that their intended actions were not 
getting the desired result.

Again, no criticism intended here. Hopefully, the above explanation will help 
future changes!
Ralph


On Apr 13, 2010, at 5:34 AM, Nadia Derbey wrote:

> On Tue, 2010-04-13 at 01:27 -0600, Ralph Castain wrote:
>> On Apr 13, 2010, at 1:02 AM, Nadia Derbey wrote:
>> 
>>> On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote:
 By definition, if you bind to all available cpus in the OS, you are
 bound to nothing (i.e., "unbound") as your process runs on any
 available cpu.
 
 
 PLPA doesn't care, and I personally don't care. I was just explaining
 why it generates an error in the odls.
 
 
 A user app would detect its binding by (a) getting the affiinity mask
 from the OS, and then (b) seeing if the bits are set to '1' for all
 available processors. If it is, then you are not bound - there is no
 mechanism available for checking "are the bits set only for the
 processors I asked to be bound to". The OS doesn't track what you
 asked for, it only tracks where you are bound - and a mask with all
 '1's is defined as "unbound".
 
 
 So the reason for my question was simple: a user asked us to "bind"
 their process. If their process checks to see if it is bound, it will
 return "no". The user woul

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-16 Thread Ralph Castain
Forgot to mention this tip for debugging paffinity:

There is a test module in the paffinity framework. The module has mca params 
that let you define the number of sockets/node (default: 4) and the 
#cores/socket (also default: 4). So by setting -mca paffinity test and 
adjusting those two parameters, you can test a fairly wide range of 
configurations without being constrained by available hardware.

Get the param names with: ompi_info --param paffinity test

Any contributions to that module that extend its range are welcome.
Ralph

On Apr 16, 2010, at 7:59 PM, Ralph Castain wrote:

> Well, I guess I got sucked back into paffinity again...sigh.
> 
> I have committed a solution to this issue in r22984 and r22985. I have tested 
> it against a range of scenarios, but hardly an exhaustive test. So please do 
> stress it.
> 
> The following comments are by no means intended as criticism, but rather as 
> me taking advantage of an opportunity to educate the general community 
> regarding this topic. Since my available time to maintain this capability has 
> diminished, the more people who understand all the nuances of it, the more 
> likely we are to efficiently execute changes such as this one.
> 
> I couldn't use the provided patch for several reasons:
> 
> * you cannot print a message out of an odls module after the fork occurs 
> unless you also report an error - i.e., you cannot print a message out as a 
> warning and then continue processing. If you do so, everything will appear 
> correct when you are operating in an environment where no processes are local 
> to mpirun - e.g., when running under slurm as it is normally configured. 
> However, when processes are local to mpirun, then using orte_show_help after 
> the fork causes the print statement to occur in separate process instances. 
> This prevents mpirun from realizing that multiple copies of the message are 
> being printed, and thus it cannot aggregate them.
> 
> As a result, the provided patch generated one warning for every local 
> process, plus one aggregated warning for all the remote processes. This isn't 
> what we wanted users to see.
> 
> The correct solution was to write an integer indicating the warning to be 
> issued back to the parent process, and then let that process output the 
> actual warning. This allows mpirun to aggregate the result.
> 
> Nadia wasn't the only one to make this mistake. I found that I had also made 
> it in an earlier revision when reporting the "could not bind" message. So it 
> is easy to make, but one we need to avoid.
> 
> * it didn't address the full range of binding scenarios - it only addressed 
> bind-to-socket. While I know that solved Nadia's immediate problem, it helps 
> if we try to address the broader issue when making such changes. Otherwise, 
> we wind up with a piecemeal approach to the problem. So I added support for 
> all the binding methods in the odls_default module.
> 
> * it missed the use-case where processes are launched outside of mpirun with 
> paffinity_alone or slot-list set - e.g., when direct-launching processes in 
> slurm. In this case, MPI_Init actually attempts to set the process affinity - 
> the odls is never called.
> 
> Here is why it is important to remember that use-case. While implementing the 
> warning message there, I discovered that the code in ompi_mpi_init.c would 
> actually deal with Nadia's scenario incorrectly. It would identify the 
> process as unbound because it had been "bound" to all available processors. 
> Since paffinity_alone is set, it would then have automatically bound the 
> process to a single core based on that process' node rank.
> 
> So even though the odls had "bound" us to the socket, mpi_init would turn 
> around and bind us to a core - which is not at all what Nadia wanted to have 
> happen.
> 
> The solution here was to pass a parameter to the spawned process indicating 
> that mpirun had "bound" it, even if the "binding" was a no-op. This value is 
> then checked in mpi_init - if set, mpi_init makes no attempt to re-bind the 
> process. If not set, then mpi_init is free to do whatever it deems 
> appropriate.
> 
> So looking at all the use-cases can expose some unintended interactions. 
> Unfortunately, I suspect that many people are unaware of this second  method 
> of setting affinity, and so wouldn't realize that their intended actions were 
> not getting the desired result.
> 
> Again, no criticism intended here. Hopefully, the above explanation will help 
> future changes!
> Ralph
> 
> 
> On Apr 13, 2010, at 5:34 AM, Nadia Derbey wrote:
> 
>> On Tue, 2010-04-13 at 01:27 -0600, Ralph Castain wrote:
>>> On Apr 13, 2010, at 1:02 AM, Nadia Derbey wrote:
>>> 
 On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote:
> By definition, if you bind to all available cpus in the OS, you are
> bound to nothing (i.e., "unbound") as your process runs on any
> available cpu.
> 
> 
> PLPA doesn't care, and I persona