Re: [OMPI devel] IOF repair

2008-07-11 Thread Bogdan Costescu

On Thu, 10 Jul 2008, Ralph Castain wrote:

We would appreciate it if people could test this to the extent 
possible over the next few days. Please let us know (good or bad) so 
we can decide whether or not to move it to the 1.3 release branch.


I've tested with r18878 and the strange behaviour mentioned in a 
previous e-mail (which happened with 1.3a1r18769) has disappeared, 
CHARMM can again read its instructions properly from stdin.


Thanks for the quick resolution!

--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI devel] open ib dependency question

2008-07-11 Thread Bogdan Costescu

On Thu, 10 Jul 2008, Pavel Shamis (Pasha) wrote:


FYI the issue was resolved - https://svn.open-mpi.org/trac/ompi/ticket/1376


Indeed, no more IBCM error message displayed with r18878. Thank you !

--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI devel] === CREATE FAILURE ===

2008-07-11 Thread Jeff Squyres

I can find no reason that this failed.  :-\

I am unable to duplicate the problem, and this area of code has not  
changed in a while -- I don't know why plpa/src/plpa-taskset/parser.c  
would suddenly be left behind.




On Jul 10, 2008, at 9:24 PM, MPI Team wrote:



ERROR: Command returned a non-zero exist status
  make distcheck

Start time: Thu Jul 10 21:00:14 EDT 2008
End time:   Thu Jul 10 21:24:55 EDT 2008

= 
==

[... previous lines snipped ...]
test -z "" || rm -f
rm -f class/.deps/.dirstamp
rm -f class/.dirstamp
rm -f dss/.deps/.dirstamp
rm -f dss/.dirstamp
rm -f memoryhooks/.deps/.dirstamp
rm -f memoryhooks/.dirstamp
rm -f runtime/.deps/.dirstamp
rm -f runtime/.dirstamp
rm -f threads/.deps/.dirstamp
rm -f threads/.dirstamp
rm -f win32/.deps/.dirstamp
rm -f win32/.dirstamp
rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags
make[3]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- 
build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/opal'
rm -rf class/.deps dss/.deps memoryhooks/.deps runtime/.deps  
threads/.deps win32/.deps

rm -f Makefile
make[2]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- 
build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/opal'

Making distclean in contrib
make[2]: Entering directory `/home/mpiteam/openmpi/nightly-tarball- 
build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/contrib'

test -z "*~ .#*" || rm -f *~ .#*
rm -rf .libs _libs
rm -f *.lo
test -z "" || rm -f
rm -f Makefile
make[2]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- 
build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/contrib'

Making distclean in config
make[2]: Entering directory `/home/mpiteam/openmpi/nightly-tarball- 
build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/config'

test -z "*~ .#*" || rm -f *~ .#*
rm -rf .libs _libs
rm -f *.lo
test -z "" || rm -f
rm -f Makefile
make[2]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- 
build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/config'

Making distclean in .
make[2]: Entering directory `/home/mpiteam/openmpi/nightly-tarball- 
build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build'

test -z "*~ .#*" || rm -f *~ .#*
rm -rf .libs _libs
rm -f *.lo
test -z "ompi/include/ompi/version.h orte/include/orte/version.h  
opal/include/opal/version.h" || rm -f ompi/include/ompi/version.h  
orte/include/orte/version.h opal/include/opal/version.h

rm -f libtool
rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags
make[2]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- 
build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build'
rm -f config.status config.cache config.log configure.lineno  
config.status.lineno

rm -f Makefile
ERROR: files left in build directory after distclean:
./opal/mca/paffinity/linux/plpa/src/plpa-taskset/parser.c
make[1]: *** [distcleancheck] Error 1
make[1]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- 
build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build'

make: *** [distcheck] Error 2
= 
==


Your friendly daemon,
Cyrador
___
testing mailing list
test...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/testing



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] v1.3 RM: need a ruling

2008-07-11 Thread Terry Dontje

Jeff Squyres wrote:
Check that -- Ralph and I talked more about #1383 and have come up 
with a decent/better solution that a) is not wonky and b) does not 
involve MCA parameter synonyms.  We're working on it in an hg and will 
put it back when done (probably within a business day or three).


So I think the MCA synonym stuff *isn't* needed for v1.3 after all.


I am not dead yet!!!

So, there was also the name change of pls_rsh_agent to plm_rsh_agent 
because the pls's were sucked into plm's (or so I believe).  Anyways, 
this seems like another case to support synonyms.  Also are there other 
pls mca parameters that have had their names changed to plm?


--td
I think the MCA param synonyms and "deprecated" stuff is useful for 
the future, but at this point, nothing in v1.3 would use it.  So my 
$0.02 is that we should leave it out.




On Jul 10, 2008, at 2:00 PM, Jeff Squyres (jsquyres) wrote:


K, will do.  Note that it turns out that we did not yet solve the
mpi_paffinity_alone issue, but we're working on it.  I'm working on
the IOF issue ATM; will return to mpi_paffinity_alone in a bit...


On Jul 10, 2008, at 1:56 PM, George Bosilca wrote:

> I'm 100% with Brad on this. Please go ahead and include this feature
> in the 1.3.
>
>  george.
>
> On Jul 10, 2008, at 11:33 AM, Brad Benton wrote:
>
>> I think this is very reasonable to go ahead and include for 1.3.  I
>> find that preferable to a 1.3-specific "wonky" workaround.  Plus,
>> this sounds like something that is very good to have in general.
>>
>> --brad
>>
>>
>> On Wed, Jul 9, 2008 at 8:49 PM, Jeff Squyres 
>> wrote:
>> v1.3 RMs: Due to some recent work, the MCA parameter
>> mpi_paffinity_alone disappeared -- it was moved and renamed to be
>> opal_paffinity_alone.  This is Bad because we have a lot of
>> historical precent based on the MCA param name
>> "mpi_paffinity_alone" (FAQ, PPT presentations, e-mails on public
>> lists, etc.).  So it needed to be restored for v1.3.  I just
>> noticed that I hadn't opened a ticket on this -- sorry -- I opened
>> #1383 tonight.
>>
>> For a variety of reasons described in the commit message r1383,
>> Lenny and I first decided that it would be best to fix this problem
>> by the functionality committed in r18770 (have the ability to find
>> out where an MCA parameter was set).  This would allow us to
>> register two MCA params: mpi_paffinity_alone and
>> opal_paffinity_alone, and generally do the Right Thing (because we
>> could then tell if a user had set a value or whether it was a
>> default MCA param value).  This functionality will also be useful
>> in the openib BTL, where there is a blend of MCA parameters and INI
>> file parameters.
>>
>> However, after doing that, it seemed like only a few more steps to
>> implement an overall better solution: implement "synonyms" for MCA
>> parameters.  I.e., register the name "mpi_paffinity_alone" as a
>> synonym for opal_paffinity_alone.  Along the way, it was trivial to
>> add a "deprecated" flag for MCA parameters that we no longer want
>> to use anymore (this deprecated flag is also useful in the OB1 PML
>> and openib BTL).
>>
>> So to fix a problem that needed to be fixed for v1.3 (restore the
>> MCA parameter "mpi_paffinity_alone"), I ended up implementing new
>> functionality.
>>
>> Can this go into v1.3, or do we need to implement some kind of
>> alternate fix?  (I admit to not having thought through what it
>> would take to fix without the new MCA parameter functionality -- it
>> might be kinda wonky)
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] v1.3 RM: need a ruling

2008-07-11 Thread Ralph H Castain



On 7/11/08 7:48 AM, "Terry Dontje"  wrote:

> Jeff Squyres wrote:
>> Check that -- Ralph and I talked more about #1383 and have come up
>> with a decent/better solution that a) is not wonky and b) does not
>> involve MCA parameter synonyms.  We're working on it in an hg and will
>> put it back when done (probably within a business day or three).
>> 
>> So I think the MCA synonym stuff *isn't* needed for v1.3 after all.
>> 
> I am not dead yet!!!
> 
> So, there was also the name change of pls_rsh_agent to plm_rsh_agent
> because the pls's were sucked into plm's (or so I believe).  Anyways,
> this seems like another case to support synonyms.  Also are there other
> pls mca parameters that have had their names changed to plm?

I think you're opening a really ugly can of worms. How far back do you want
to go? v1.0? v0.1? We have a history of changing mca param names across
major releases, so trying to keep everything alive could well become a
nightmare.

I would hate to try and figure out all the changes - and what about the
params that simply have disappeared, or had their functionality absorbed by
some combination of other params?

My head aches already... :-)

Ralph

> 
> --td
>> I think the MCA param synonyms and "deprecated" stuff is useful for
>> the future, but at this point, nothing in v1.3 would use it.  So my
>> $0.02 is that we should leave it out.
>> 
>> 
>> 
>> On Jul 10, 2008, at 2:00 PM, Jeff Squyres (jsquyres) wrote:
>> 
>>> K, will do.  Note that it turns out that we did not yet solve the
>>> mpi_paffinity_alone issue, but we're working on it.  I'm working on
>>> the IOF issue ATM; will return to mpi_paffinity_alone in a bit...
>>> 
>>> 
>>> On Jul 10, 2008, at 1:56 PM, George Bosilca wrote:
>>> 
 I'm 100% with Brad on this. Please go ahead and include this feature
 in the 1.3.
 
  george.
 
 On Jul 10, 2008, at 11:33 AM, Brad Benton wrote:
 
> I think this is very reasonable to go ahead and include for 1.3.  I
> find that preferable to a 1.3-specific "wonky" workaround.  Plus,
> this sounds like something that is very good to have in general.
> 
> --brad
> 
> 
> On Wed, Jul 9, 2008 at 8:49 PM, Jeff Squyres 
> wrote:
> v1.3 RMs: Due to some recent work, the MCA parameter
> mpi_paffinity_alone disappeared -- it was moved and renamed to be
> opal_paffinity_alone.  This is Bad because we have a lot of
> historical precent based on the MCA param name
> "mpi_paffinity_alone" (FAQ, PPT presentations, e-mails on public
> lists, etc.).  So it needed to be restored for v1.3.  I just
> noticed that I hadn't opened a ticket on this -- sorry -- I opened
> #1383 tonight.
> 
> For a variety of reasons described in the commit message r1383,
> Lenny and I first decided that it would be best to fix this problem
> by the functionality committed in r18770 (have the ability to find
> out where an MCA parameter was set).  This would allow us to
> register two MCA params: mpi_paffinity_alone and
> opal_paffinity_alone, and generally do the Right Thing (because we
> could then tell if a user had set a value or whether it was a
> default MCA param value).  This functionality will also be useful
> in the openib BTL, where there is a blend of MCA parameters and INI
> file parameters.
> 
> However, after doing that, it seemed like only a few more steps to
> implement an overall better solution: implement "synonyms" for MCA
> parameters.  I.e., register the name "mpi_paffinity_alone" as a
> synonym for opal_paffinity_alone.  Along the way, it was trivial to
> add a "deprecated" flag for MCA parameters that we no longer want
> to use anymore (this deprecated flag is also useful in the OB1 PML
> and openib BTL).
> 
> So to fix a problem that needed to be fixed for v1.3 (restore the
> MCA parameter "mpi_paffinity_alone"), I ended up implementing new
> functionality.
> 
> Can this go into v1.3, or do we need to implement some kind of
> alternate fix?  (I admit to not having thought through what it
> would take to fix without the new MCA parameter functionality -- it
> might be kinda wonky)
> 
> --
> Jeff Squyres
> Cisco Systems
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> Cisco Systems
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mail

[OMPI devel] PLM consistency: launch agent param

2008-07-11 Thread Ralph H Castain
Since the question of backward compatibility of params came up... ;-)

I've been perusing the various PLM modules to check consistency. One thing I
noted right away is that -every- PLM module registers an MCA param to let
the user specify an orted cmd. I believe this specifically was done so
people could insert their favorite debugger in front of the "orted" on the
spawned command line - e.g., "valgrind orted".

The problem is that this forces the user to have to figure out the name of
the PLM module being used as the param is called "-mca plm_rsh_agent", or
"-mca plm_lsf_orted", or...you name it.

For users that only ever operate in one environment, who cares. However,
many users (at least around here) operate in multiple environments, and this
creates confusion.

I propose to create a single MCA param name for this value - something like
"-mca plm_launch_agent" or whatever - and get rid of all these individual
registrations to reduce the user confusion.

Comments? I'll put my helmet on
Ralph




Re: [OMPI devel] v1.3 RM: need a ruling

2008-07-11 Thread Jeff Squyres

On Jul 11, 2008, at 9:48 AM, Terry Dontje wrote:

Check that -- Ralph and I talked more about #1383 and have come up  
with a decent/better solution that a) is not wonky and b) does not  
involve MCA parameter synonyms.  We're working on it in an hg and  
will put it back when done (probably within a business day or three).


So I think the MCA synonym stuff *isn't* needed for v1.3 after all.


I am not dead yet!!!

So, there was also the name change of pls_rsh_agent to plm_rsh_agent  
because the pls's were sucked into plm's (or so I believe).   
Anyways, this seems like another case to support synonyms.  Also are  
there other pls mca parameters that have had their names changed to  
plm?



All of them, right?  The whole pls framework is gone -- replaced by plm.

There are some OB1 and openib parameters that got renamed, too  
(probably in other BTLs as well -- the pipeline parameters, etc.).  So  
if we want to bring this functionality over, we can, but we should  
then also commit to adding deprecated synonyms for all the old names.   
It's not hard to do (2 function calls per deprecated name: 1) lookup  
the index of the new name, 2) register a deprecated synonym for that  
new name), but it does involve some menial labor.


Ralph raises a good point -- perhaps we should have a definitive  
policy (that starts in v1.3) about MCA parameters.  I know that Sun  
has examples for this stuff.  Is there one that we can implement easily?


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] v1.3 RM: need a ruling

2008-07-11 Thread Terry Dontje

Ralph H Castain wrote:


On 7/11/08 7:48 AM, "Terry Dontje"  wrote:

  

Jeff Squyres wrote:


Check that -- Ralph and I talked more about #1383 and have come up
with a decent/better solution that a) is not wonky and b) does not
involve MCA parameter synonyms.  We're working on it in an hg and will
put it back when done (probably within a business day or three).

So I think the MCA synonym stuff *isn't* needed for v1.3 after all.

  

I am not dead yet!!!

So, there was also the name change of pls_rsh_agent to plm_rsh_agent
because the pls's were sucked into plm's (or so I believe).  Anyways,
this seems like another case to support synonyms.  Also are there other
pls mca parameters that have had their names changed to plm?



I think you're opening a really ugly can of worms. How far back do you want
to go? v1.0? v0.1? We have a history of changing mca param names across
major releases, so trying to keep everything alive could well become a
nightmare.

  
I am only asking to be compatible with the last release (however that 
might have an interpretation of inifinity :-).  Seriously, though I 
think we need to be very careful about renaming mca parameters because 
this will screw production sites and ISV's which use scripts to launch 
their apps.  So a change could render their scripts useless (the 
paffinity param is a perfect example of this). 

I don't really want to promote keeping everything alive forever but in 
cases where the only change is a 3-4 letter prefix it almost looks 
random to people outside of the community.



I would hate to try and figure out all the changes - and what about the
params that simply have disappeared, or had their functionality absorbed by
some combination of other params?

  
So, I think if a functionality is not supported or the way you drive it 
is completely different then I agree with you trying to support a round 
peg to fit in a square hole is silly.  But if the feature is one for one 
except in name only then I think we need to ask ourselves if we really 
want/need to drop the original name. 

My head aches already... :-)

  

Take two aspirins...

--td

Ralph

  

--td


I think the MCA param synonyms and "deprecated" stuff is useful for
the future, but at this point, nothing in v1.3 would use it.  So my
$0.02 is that we should leave it out.



On Jul 10, 2008, at 2:00 PM, Jeff Squyres (jsquyres) wrote:

  

K, will do.  Note that it turns out that we did not yet solve the
mpi_paffinity_alone issue, but we're working on it.  I'm working on
the IOF issue ATM; will return to mpi_paffinity_alone in a bit...


On Jul 10, 2008, at 1:56 PM, George Bosilca wrote:



I'm 100% with Brad on this. Please go ahead and include this feature
in the 1.3.

 george.

On Jul 10, 2008, at 11:33 AM, Brad Benton wrote:

  

I think this is very reasonable to go ahead and include for 1.3.  I
find that preferable to a 1.3-specific "wonky" workaround.  Plus,
this sounds like something that is very good to have in general.

--brad


On Wed, Jul 9, 2008 at 8:49 PM, Jeff Squyres 
wrote:
v1.3 RMs: Due to some recent work, the MCA parameter
mpi_paffinity_alone disappeared -- it was moved and renamed to be
opal_paffinity_alone.  This is Bad because we have a lot of
historical precent based on the MCA param name
"mpi_paffinity_alone" (FAQ, PPT presentations, e-mails on public
lists, etc.).  So it needed to be restored for v1.3.  I just
noticed that I hadn't opened a ticket on this -- sorry -- I opened
#1383 tonight.

For a variety of reasons described in the commit message r1383,
Lenny and I first decided that it would be best to fix this problem
by the functionality committed in r18770 (have the ability to find
out where an MCA parameter was set).  This would allow us to
register two MCA params: mpi_paffinity_alone and
opal_paffinity_alone, and generally do the Right Thing (because we
could then tell if a user had set a value or whether it was a
default MCA param value).  This functionality will also be useful
in the openib BTL, where there is a blend of MCA parameters and INI
file parameters.

However, after doing that, it seemed like only a few more steps to
implement an overall better solution: implement "synonyms" for MCA
parameters.  I.e., register the name "mpi_paffinity_alone" as a
synonym for opal_paffinity_alone.  Along the way, it was trivial to
add a "deprecated" flag for MCA parameters that we no longer want
to use anymore (this deprecated flag is also useful in the OB1 PML
and openib BTL).

So to fix a problem that needed to be fixed for v1.3 (restore the
MCA parameter "mpi_paffinity_alone"), I ended up implementing new
functionality.

Can this go into v1.3, or do we need to implement some kind of
alternate fix?  (I admit to not having thought through what it
would take to fix without the new MCA parameter functionality -- it
might be kinda wonky)

--
Jeff Squyres
Cisco Systems

___
devel mailing list

Re: [OMPI devel] PLM consistency: launch agent param

2008-07-11 Thread Jeff Squyres
Sounds good to me.  We've done similar things in other frameworks --  
put in MCA base params for things that all components could use.  How  
about plm_base_launch_agent?



On Jul 11, 2008, at 10:17 AM, Ralph H Castain wrote:


Since the question of backward compatibility of params came up... ;-)

I've been perusing the various PLM modules to check consistency. One  
thing I
noted right away is that -every- PLM module registers an MCA param  
to let

the user specify an orted cmd. I believe this specifically was done so
people could insert their favorite debugger in front of the "orted"  
on the

spawned command line - e.g., "valgrind orted".

The problem is that this forces the user to have to figure out the  
name of
the PLM module being used as the param is called "-mca  
plm_rsh_agent", or

"-mca plm_lsf_orted", or...you name it.

For users that only ever operate in one environment, who cares.  
However,
many users (at least around here) operate in multiple environments,  
and this

creates confusion.

I propose to create a single MCA param name for this value -  
something like
"-mca plm_launch_agent" or whatever - and get rid of all these  
individual

registrations to reduce the user confusion.

Comments? I'll put my helmet on
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] PLM consistency: launch agent param

2008-07-11 Thread Don Kerr
For something as fundamental as launch do we still need to specify the 
component, could it just be "launch_agent"?


Jeff Squyres wrote:
Sounds good to me.  We've done similar things in other frameworks -- 
put in MCA base params for things that all components could use.  How 
about plm_base_launch_agent?



On Jul 11, 2008, at 10:17 AM, Ralph H Castain wrote:


Since the question of backward compatibility of params came up... ;-)

I've been perusing the various PLM modules to check consistency. One 
thing I
noted right away is that -every- PLM module registers an MCA param to 
let

the user specify an orted cmd. I believe this specifically was done so
people could insert their favorite debugger in front of the "orted" 
on the

spawned command line - e.g., "valgrind orted".

The problem is that this forces the user to have to figure out the 
name of
the PLM module being used as the param is called "-mca 
plm_rsh_agent", or

"-mca plm_lsf_orted", or...you name it.

For users that only ever operate in one environment, who cares. However,
many users (at least around here) operate in multiple environments, 
and this

creates confusion.

I propose to create a single MCA param name for this value - 
something like
"-mca plm_launch_agent" or whatever - and get rid of all these 
individual

registrations to reduce the user confusion.

Comments? I'll put my helmet on
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] PLM consistency: launch agent param

2008-07-11 Thread Ralph H Castain
I suppose we could even just make it an mpirun cmd line param, at that
point. As an MCA param, though, we have typically insisted on a particular
syntax that includes framework and component...


On 7/11/08 8:41 AM, "Don Kerr"  wrote:

> For something as fundamental as launch do we still need to specify the
> component, could it just be "launch_agent"?
> 
> Jeff Squyres wrote:
>> Sounds good to me.  We've done similar things in other frameworks --
>> put in MCA base params for things that all components could use.  How
>> about plm_base_launch_agent?
>> 
>> 
>> On Jul 11, 2008, at 10:17 AM, Ralph H Castain wrote:
>> 
>>> Since the question of backward compatibility of params came up... ;-)
>>> 
>>> I've been perusing the various PLM modules to check consistency. One
>>> thing I
>>> noted right away is that -every- PLM module registers an MCA param to
>>> let
>>> the user specify an orted cmd. I believe this specifically was done so
>>> people could insert their favorite debugger in front of the "orted"
>>> on the
>>> spawned command line - e.g., "valgrind orted".
>>> 
>>> The problem is that this forces the user to have to figure out the
>>> name of
>>> the PLM module being used as the param is called "-mca
>>> plm_rsh_agent", or
>>> "-mca plm_lsf_orted", or...you name it.
>>> 
>>> For users that only ever operate in one environment, who cares. However,
>>> many users (at least around here) operate in multiple environments,
>>> and this
>>> creates confusion.
>>> 
>>> I propose to create a single MCA param name for this value -
>>> something like
>>> "-mca plm_launch_agent" or whatever - and get rid of all these
>>> individual
>>> registrations to reduce the user confusion.
>>> 
>>> Comments? I'll put my helmet on
>>> Ralph
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] PLM consistency: priority

2008-07-11 Thread Ralph H Castain
Okay, another fun one. Some of the PLM modules use MCA params to adjust
their relative selection priority. This can lead to very unexpected behavior
as which module gets selected will depend on the priorities of the other
selectable modules - which changes from release to release as people
independently make adjustments and/or new modules are added.

Fortunately, this doesn't bite us too often since many environments only
support one module, and since there is nothing to tell the user that the plm
module whose priority they raised actually -didn't- get used!

However, in the interest of "least astonishment", some of us working on the
RTE had changed our coding approach to avoid this confusion. Given that we
have this nice mca component select logic that takes the specified module -
i.e., "-mca plm foo" always yields foo if it can run, or errors out if it
can't - then the safest course is to remove MCA params that adjust module
priorities and have the user simply tell us which module they want us to
use.

Do we want to make this consistent, at least in the PLM? Or do you want to
leave the user guessing? :-)

Ralph




Re: [OMPI devel] PLM consistency: priority

2008-07-11 Thread Aurélien Bouteiller
We don't want the user to have to select by hand the best PML. The  
logic inside the current selection process selects the best pml for  
the underlying network. However changing the priority is pretty  
meaningless from the user's point of view. So while retaining the  
selection process including priorities, we might want to remove the  
priority parameter, and use only the pml=ob1,cm syntax from the user's  
point of view.


Aurelien

Le 11 juil. 08 à 10:56, Ralph H Castain a écrit :

Okay, another fun one. Some of the PLM modules use MCA params to  
adjust
their relative selection priority. This can lead to very unexpected  
behavior
as which module gets selected will depend on the priorities of the  
other

selectable modules - which changes from release to release as people
independently make adjustments and/or new modules are added.

Fortunately, this doesn't bite us too often since many environments  
only
support one module, and since there is nothing to tell the user that  
the plm

module whose priority they raised actually -didn't- get used!

However, in the interest of "least astonishment", some of us working  
on the
RTE had changed our coding approach to avoid this confusion. Given  
that we
have this nice mca component select logic that takes the specified  
module -
i.e., "-mca plm foo" always yields foo if it can run, or errors out  
if it
can't - then the safest course is to remove MCA params that adjust  
module
priorities and have the user simply tell us which module they want  
us to

use.

Do we want to make this consistent, at least in the PLM? Or do you  
want to

leave the user guessing? :-)

Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] PLM consistency: priority

2008-07-11 Thread Ralph H Castain
Ummm...I actually was talking about the "PLM", not the "PML".

But I believe what you suggest concurs with what I said. In the PLM, you
could still provide multiple components you want considered, though it has
less meaning there. My suggestion really is only that we eliminate the
params to adjust relative priority as they are just confusing the user and
potentially misleading them as to what is going to happen.

Ralph



On 7/11/08 9:07 AM, "Aurélien Bouteiller"  wrote:

> We don't want the user to have to select by hand the best PML. The
> logic inside the current selection process selects the best pml for
> the underlying network. However changing the priority is pretty
> meaningless from the user's point of view. So while retaining the
> selection process including priorities, we might want to remove the
> priority parameter, and use only the pml=ob1,cm syntax from the user's
> point of view.
> 
> Aurelien
> 
> Le 11 juil. 08 à 10:56, Ralph H Castain a écrit :
> 
>> Okay, another fun one. Some of the PLM modules use MCA params to
>> adjust
>> their relative selection priority. This can lead to very unexpected
>> behavior
>> as which module gets selected will depend on the priorities of the
>> other
>> selectable modules - which changes from release to release as people
>> independently make adjustments and/or new modules are added.
>> 
>> Fortunately, this doesn't bite us too often since many environments
>> only
>> support one module, and since there is nothing to tell the user that
>> the plm
>> module whose priority they raised actually -didn't- get used!
>> 
>> However, in the interest of "least astonishment", some of us working
>> on the
>> RTE had changed our coding approach to avoid this confusion. Given
>> that we
>> have this nice mca component select logic that takes the specified
>> module -
>> i.e., "-mca plm foo" always yields foo if it can run, or errors out
>> if it
>> can't - then the safest course is to remove MCA params that adjust
>> module
>> priorities and have the user simply tell us which module they want
>> us to
>> use.
>> 
>> Do we want to make this consistent, at least in the PLM? Or do you
>> want to
>> leave the user guessing? :-)
>> 
>> Ralph
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel





[OMPI devel] SM latency regression

2008-07-11 Thread Terry Dontje
Has anyone else seen the trunk incur approximately a 10% increase in 
latency?  I think this has happened in the last couple weeks.  I have 
verified that it isn't due to the recheck put into the 
sm_component_progress.  I am about ready to try and track this down but 
wanted to throw this out there just in case someone is ahead of me.


--td


[OMPI devel] User request: add envar?

2008-07-11 Thread Ralph H Castain
Yo folks

For those not following the user list, this request was generated today:

 Absolutely, these are useful time and time again so should be part of
 the API and hence stable.  Care to mention what they are and I'll add it
 to my note as something to change when upgrading to 1.3 (we are looking
 at testing a snapshot in the near future).
>>> 
>>> Surely:
>>> 
>>> OMPI_COMM_WORLD_SIZE#procs in the job
>>> OMPI_COMM_WORLD_LOCAL_SIZE  #procs in this job that are sharing the node
>>> OMPI_UNIVERSE_SIZE  total #slots allocated to this user
>>> (across all nodes)
>>> OMPI_COMM_WORLD_RANKproc's rank
>>> OMPI_COMM_WORLD_LOCAL_RANK  local rank on node - lowest rank'd proc on
>>> the node is given local_rank=0
>>> 
>>> If there are others that would be useful, now is definitely the time to
>>> speak up!
>> 
>> The only other one I'd like to see is some kind of global identifier for
>> the job but as far as I can see I don't believe that openmpi has such a
>> concept.
> 
> Not really - of course, many environments have a jobid they assign at time
> of allocation. We could create a unified identifier from that to ensure a
> consistent name was always available, but the problem would be that not all
> environments provide it (e.g., rsh). To guarantee that the variable would
> always be there, we would have to make something up in those cases.

I could easily create such an envar, even for non-managed environments. The
plan would be to use the RM-provided jobid where one was available, and to
use the mpirun jobid where not.

My thought was to call it OMPI_JOB_ID, unless someone has another
suggestion.

Any objection to my doing so, and/or suggestions on alternative
implementations?

Ralph