Re: [OMPI devel] why does --rankfile need hostlist?

2009-06-22 Thread Lenny Verkhovsky
I personally prefer the way it's now.
This way guaranties me total control over mapping and allocating slots.
When I am using rankfile mapper, I know exactly what and where I am putting,
OS can easily oversubscribe my CPU with unmapped by rankfile processes. I am
also not sure how it will effect users that have schedulers.
I am also not sure that users, who got used to work with hostfile would
change their scripts according to the mapper.
Lenny.

On Mon, Jun 22, 2009 at 1:23 AM, Ralph Castain  wrote:

> Had a chance to think about how this might be done, and looked at it for
> awhile after getting home. I -think- I found a way to do it, but there are a
> couple of caveats:
> 1. Len's point about oversubscribing without warning would definitely hold
> true - this would positively be a "user beware" option
>
> 2. there could be no RM-provided allocation, hostfile, or -host options
> specified. Basically, I would be adding the "read rankfile" option to the
> end of the current allocation determination procedure
>
> I would still allow more procs than shown in the rankfile (mapping the rest
> bynode on the nodes specified in the rankfile - can't do byslot because I
> don't know how many slots are on each node), which means the only change in
> behavior would be the forced bynode mapping of unspecified procs.
>
> So use of this option will entail some risks and a slight difference in
> behavior, but would relieve you from the burden of having to provide a
> hostfile. I'm not personally convinced it is worth the risk and probable
> user complaints of "it didn't work", but since we don't use this option, I
> don't have a strong opinion on the matter.
>
> Let's just avoid going back-and-forth over wanting it, or how it should be
> implemented - let's get it all ironed out, and then implement it once, like
> we finally did at the end with the whole hostfile thing.
>
> Let me know if you want me to do this - it obviously isn't at the top of my
> priority list, but still could be done in the next few weeks.
>
> Ralph
>
>
> On Jun 21, 2009, at 9:00 AM, Lenny Verkhovsky wrote:
>
> Sorry for the delay in response,
> I totally agree with Ralph that it's not as easy as it seems,
> 1. rankfile mapper uses already allocated machines ( by scheduler or
> hostfile ), by using rankfile as a hostfile we can run into problem where
> trying to use unallocated nodes, what can hang the run.
> 2. we can't define in rankfile number of slots on each machine, which means
> oversubscribing can take place without any warning.
> 3. I personally dont see any problem using hostfile, even if it has
> redundant info, hostfile and rankfile belong to different layers in the
> system and solve different problems. The original hostfile ( if I recall
> correctly ) could bind rank to the node, but the syntax wasn't very flexible
> and clear.
> Lenny.
>
> On Sun, Jun 21, 2009 at 5:15 PM, Ralph Castain  wrote:
>
>> Let me suggest a two-step process, then:
>> 1. let's change the error message as this is easily done and thus can be
>> done now
>>
>> 2. I can look at how to eat the rankfile as a hostfile. This may not even
>> be possible - the problem is that the entire system is predicated on certain
>> ordering due to our framework architecture. So we get an allocation, and
>> then do a mapping against that allocation, filtering the allocation through
>> hostfiles, -host, and other options.
>>
>> By the time we reach the rankfile mapper, we have already determined that
>> we don't have an allocation and have to abort. It is the rankfile mapper
>> itself that looks for the -rankfile option, so the system can have no
>> knowledge that someone has specified that option before that point - and
>> thus, even if I could parse the rankfile, I don't know it was given!
>>
>> What will take time is to figure out a way to either:
>>
>> (a) allow us to run the mapper even though we don't have any nodes we know
>> about, and allow the mapper to insert the nodes itself - without causing
>> non-rankfile uses to break (which could be a major feat); or
>>
>> (b) have the overall system check for the rankfile option and pass it as a
>> hostfile as well, assuming that a hostfile wasn't also given, no RM-based
>> allocation exists, etc. - which breaks our abstraction rules and also opens
>> a possible can of worms.
>>
>> Either way, I also then have to teach the hostfile parser how to realize
>> it is a rankfile format and convert the info in it into what we expected to
>> receive from a hostfile - another non-trivial problem.
>>
>> I'm willing to give it a try - just trying to make clear why my response
>> was negative. It isn't as simple as it sounds...which is why Len and I
>> didn't pursue it when this was originally developed.
>>
>> Ralph
>>
>>
>> On Sun, Jun 21, 2009 at 5:28 AM, Terry Dontje wrote:
>>
>>> Being a part of these discussions I can understand your reticence to
>>> reopen this discussion.  However, I think this is a major usability issue
>>> with this fe

Re: [OMPI devel] why does --rankfile need hostlist?

2009-06-22 Thread Terry Dontje

Let us think about this some more.  We'll try and reply later today.

--td

Ralph Castain wrote:
Had a chance to think about how this might be done, and looked at it 
for awhile after getting home. I -think- I found a way to do it, but 
there are a couple of caveats:


1. Len's point about oversubscribing without warning would definitely 
hold true - this would positively be a "user beware" option


2. there could be no RM-provided allocation, hostfile, or -host 
options specified. Basically, I would be adding the "read rankfile" 
option to the end of the current allocation determination procedure


I would still allow more procs than shown in the rankfile (mapping the 
rest bynode on the nodes specified in the rankfile - can't do byslot 
because I don't know how many slots are on each node), which means the 
only change in behavior would be the forced bynode mapping of 
unspecified procs.


So use of this option will entail some risks and a slight difference 
in behavior, but would relieve you from the burden of having to 
provide a hostfile. I'm not personally convinced it is worth the risk 
and probable user complaints of "it didn't work", but since we don't 
use this option, I don't have a strong opinion on the matter.


Let's just avoid going back-and-forth over wanting it, or how it 
should be implemented - let's get it all ironed out, and then 
implement it once, like we finally did at the end with the whole 
hostfile thing.


Let me know if you want me to do this - it obviously isn't at the top 
of my priority list, but still could be done in the next few weeks.


Ralph


On Jun 21, 2009, at 9:00 AM, Lenny Verkhovsky wrote:

Sorry for the delay in response, 
I totally agree with Ralph that it's not as easy as it seems, 
1. rankfile mapper uses already allocated machines ( by scheduler or 
hostfile ), by using rankfile as a hostfile we can run into problem 
where trying to use unallocated nodes, what can hang the run.
2. we can't define in rankfile number of slots on each machine, which 
means oversubscribing can take place without any warning.
3. I personally dont see any problem using hostfile, even if it has 
redundant info, hostfile and rankfile belong to different layers in 
the system and solve different problems. The original hostfile ( if I 
recall correctly ) could bind rank to the node, but the syntax wasn't 
very flexible and clear.

Lenny.

On Sun, Jun 21, 2009 at 5:15 PM, Ralph Castain > wrote:


Let me suggest a two-step process, then:

1. let's change the error message as this is easily done and thus
can be done now

2. I can look at how to eat the rankfile as a hostfile. This may
not even be possible - the problem is that the entire system is
predicated on certain ordering due to our framework architecture.
So we get an allocation, and then do a mapping against that
allocation, filtering the allocation through hostfiles, -host,
and other options.

By the time we reach the rankfile mapper, we have already
determined that we don't have an allocation and have to abort. It
is the rankfile mapper itself that looks for the -rankfile
option, so the system can have no knowledge that someone has
specified that option before that point - and thus, even if I
could parse the rankfile, I don't know it was given!

What will take time is to figure out a way to either:

(a) allow us to run the mapper even though we don't have any
nodes we know about, and allow the mapper to insert the nodes
itself - without causing non-rankfile uses to break (which could
be a major feat); or

(b) have the overall system check for the rankfile option and
pass it as a hostfile as well, assuming that a hostfile wasn't
also given, no RM-based allocation exists, etc. - which breaks
our abstraction rules and also opens a possible can of worms.

Either way, I also then have to teach the hostfile parser how to
realize it is a rankfile format and convert the info in it into
what we expected to receive from a hostfile - another non-trivial
problem.

I'm willing to give it a try - just trying to make clear why my
response was negative. It isn't as simple as it sounds...which is
why Len and I didn't pursue it when this was originally developed.

Ralph


On Sun, Jun 21, 2009 at 5:28 AM, Terry Dontje
mailto:terry.don...@sun.com>> wrote:

Being a part of these discussions I can understand your
reticence to reopen this discussion.  However, I think this
is a major usability issue with this feature which actually
is fairly important in order to get things to run performant.
Which IMO is important.

That being said I think there are one of two things that
could be done to mitigate the issue.

1.  To eliminate the element of surprise by changing mpirun
to eat rankfile without the hostfile.
2.  To

Re: [OMPI devel] MPI_REAL16

2009-06-22 Thread Iain Bason

Jeff Squyres wrote:

Thanks for looking into this, David.

So if I understand that correctly, it means you have to assign all 
literals in your fortran program with a "_16" suffix. I don't know if 
that's standard Fortran or not. 


Yes, it is.

Iain



Re: [OMPI devel] [OMPI svn] svn:open-mpi r21480

2009-06-22 Thread Iain Bason

Ralph Castain wrote:

I'm sorry, but this change is incorrect.

If you look in orte/mca/ess/base/ess_base_std_orted.c, you will see 
that -all- orteds, regardless of how they are launched, open and 
select the PLM.


I believe you are mistaken.  Look in plm_base_launch_support.c:

   /* The daemon will attempt to open the PLM on the remote
* end. Only a few environments allow this, so the daemon
* only opens the PLM -if- it is specifically told to do
* so by giving it a specific PLM module. To ensure we avoid
* confusion, do not include any directives here
*/
   if (0 == strcmp(orted_cmd_line[i+1], "plm")) {
   continue;
   }

That code strips out anything like "-mca plm rsh" from the command
line passed to a remote daemon.

Meanwhile, over in ess_base_std_orted.c:

   /* some environments allow remote launches - e.g., ssh - so
* open the PLM and select something -only- if we are given
* a specific module to use
*/
   mca_base_param_reg_string_name("plm", NULL,
  "Which plm component to use (empty = 
none)",

  false, false,
  NULL, &plm_to_use);

   if (NULL == plm_to_use) {
   plm_in_use = false;
   } else {
   plm_in_use = true;

   if (ORTE_SUCCESS != (ret = orte_plm_base_open())) {
   ORTE_ERROR_LOG(ret);
   error = "orte_plm_base_open";
   goto error;
   }

   if (ORTE_SUCCESS != (ret = orte_plm_base_select())) {
   ORTE_ERROR_LOG(ret);
   error = "orte_plm_base_select";
   goto error;
   }
   }

So a PLM is loaded only if specified with "-mca plm foo", but that -mca
flag is stripped out when launching the remote daemon.

I also ran into this issue with tree spawning.  (I didn't putback a fix 
because

I couldn't get tree spawning actually to improve performance.  My fix was
not to strip out the "-mca plm foo" parameters if tree spawning had been
requested.)

Iain



Re: [OMPI devel] [OMPI svn] svn:open-mpi r21480

2009-06-22 Thread Ralph Castain
Yes, but look at orte/mca/plm/rsh/plm_rsh_module.c:


/* ensure that only the ssh plm is selected on the remote daemon */
var = mca_base_param_environ_variable("plm", NULL, NULL);
opal_setenv(var, "rsh", true, &env);
free(var);

This is done in "ssh_child", right before we fork_exec the ssh command to
launch the remote daemon. This is why slave spawn works, for example.

I agree that tree_spawn doesn't seem to work right now, but it is not due to
the plm not being selected. There are other factors involved.

Ralph



On Mon, Jun 22, 2009 at 9:58 AM, Iain Bason  wrote:

> Ralph Castain wrote:
>
>> I'm sorry, but this change is incorrect.
>>
>> If you look in orte/mca/ess/base/ess_base_std_orted.c, you will see that
>> -all- orteds, regardless of how they are launched, open and select the PLM.
>>
>
> I believe you are mistaken.  Look in plm_base_launch_support.c:
>
>   /* The daemon will attempt to open the PLM on the remote
>* end. Only a few environments allow this, so the daemon
>* only opens the PLM -if- it is specifically told to do
>* so by giving it a specific PLM module. To ensure we avoid
>* confusion, do not include any directives here
>*/
>   if (0 == strcmp(orted_cmd_line[i+1], "plm")) {
>   continue;
>   }
>
> That code strips out anything like "-mca plm rsh" from the command
> line passed to a remote daemon.
>
> Meanwhile, over in ess_base_std_orted.c:
>
>   /* some environments allow remote launches - e.g., ssh - so
>* open the PLM and select something -only- if we are given
>* a specific module to use
>*/
>   mca_base_param_reg_string_name("plm", NULL,
>  "Which plm component to use (empty =
> none)",
>  false, false,
>  NULL, &plm_to_use);
> if (NULL == plm_to_use) {
>   plm_in_use = false;
>   } else {
>   plm_in_use = true;
> if (ORTE_SUCCESS != (ret = orte_plm_base_open())) {
>   ORTE_ERROR_LOG(ret);
>   error = "orte_plm_base_open";
>   goto error;
>   }
> if (ORTE_SUCCESS != (ret = orte_plm_base_select())) {
>   ORTE_ERROR_LOG(ret);
>   error = "orte_plm_base_select";
>   goto error;
>   }
>   }
>
> So a PLM is loaded only if specified with "-mca plm foo", but that -mca
> flag is stripped out when launching the remote daemon.
>
> I also ran into this issue with tree spawning.  (I didn't putback a fix
> because
> I couldn't get tree spawning actually to improve performance.  My fix was
> not to strip out the "-mca plm foo" parameters if tree spawning had been
> requested.)
>
> Iain
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] MPI_REAL16

2009-06-22 Thread N.M. Maclaren

On Jun 22 2009, Iain Bason wrote:

Jeff Squyres wrote:


Thanks for looking into this, David.

So if I understand that correctly, it means you have to assign all 
literals in your fortran program with a "_16" suffix. I don't know if 
that's standard Fortran or not. 


Yes, it is.


Sorry - no, it isn't.  It's syntactically standard, but has an undefined
meaning.

KIND parameters are processor dependent, and do NOT mean the size in bytes,
words or anything else.  On a VAX or Alpha, and potentially on IBM and Intel
systems in the future, you could have several different floating-point types
of the same length.  Currently, not all compilers use the same conventions,
even on the same system.

The correct way to do it is to have a module that defines a suitable 
parameter, include that module everywhere, and use that parameter. For 
example:


MODULE double
   INTEGER, PARAMETER :: dp = SELECTED_REAL_KIND(12)
END MODULE double

Include 'USE double' at the start of every procedure and module, and then
use 1.23_DP.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  n...@cam.ac.uk
Tel.:  +44 1223 334761Fax:  +44 1223 334679




Re: [OMPI devel] MPI_REAL16

2009-06-22 Thread Jeff Squyres
Given that I'll inevitably get the language wrong, can someone suggest  
proper verbiage for this statement in the OMPI README:


- MPI_REAL16 and MPI_COMPLEX32 are only supported on platforms where a
  portable C datatype can be found that matches the Fortran type
  REAL*16, both in size and bit representation.  The Intel v11
  compiler, for example, supports these types, but requires the use of
  the "_16" suffix in Fortran when assigning constants to REAL*16
  variables.

Thanks!


On Jun 22, 2009, at 12:34 PM, N.M. Maclaren wrote:


On Jun 22 2009, Iain Bason wrote:
>Jeff Squyres wrote:
>
>> Thanks for looking into this, David.
>>
>> So if I understand that correctly, it means you have to assign all
>> literals in your fortran program with a "_16" suffix. I don't  
know if

>> that's standard Fortran or not.
>
>Yes, it is.

Sorry - no, it isn't.  It's syntactically standard, but has an  
undefined

meaning.

KIND parameters are processor dependent, and do NOT mean the size in  
bytes,
words or anything else.  On a VAX or Alpha, and potentially on IBM  
and Intel
systems in the future, you could have several different floating- 
point types
of the same length.  Currently, not all compilers use the same  
conventions,

even on the same system.

The correct way to do it is to have a module that defines a suitable
parameter, include that module everywhere, and use that parameter. For
example:

MODULE double
INTEGER, PARAMETER :: dp = SELECTED_REAL_KIND(12)
END MODULE double

Include 'USE double' at the start of every procedure and module, and  
then

use 1.23_DP.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  n...@cam.ac.uk
Tel.:  +44 1223 334761Fax:  +44 1223 334679


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn] svn:open-mpi r21480

2009-06-22 Thread Iain Bason

Ralph Castain wrote:

Yes, but look at orte/mca/plm/rsh/plm_rsh_module.c:

   
/* ensure that only the ssh plm is selected on the remote daemon */

var = mca_base_param_environ_variable("plm", NULL, NULL);
opal_setenv(var, "rsh", true, &env);
free(var);
   
This is done in "ssh_child", right before we fork_exec the ssh command 
to launch the remote daemon. This is why slave spawn works, for example.


My ssh does not preserve environment variables:

bash-3.2$ export MY_VERY_OWN_ENVIRONMENT_VARIABLE=yes
bash-3.2$ ssh cubbie env | grep MY_VERY_OWN
WARNING: This is a restricted access server. If you do not have explicit 
permission to access this server, please disconnect immediately. 
Unauthorized access to this system is considered gross misconduct and 
may result in disciplinary action, including revocation of SWAN access 
privileges, immediate termination of employment, and/or prosecution to 
the fullest extent of the law.

bash-3.2$

The rsh man page explicitly states that the local environment is not 
passed to the remote shell.


I haven't checked qrsh.  Maybe it works with that.

I agree that tree_spawn doesn't seem to work right now, but it is not 
due to the plm not being selected. 


It was for me.  I don't know whether it is because your rsh/ssh work 
differently, or for some other reason, but there is no question that my 
tree spawn failed because no PLM was loaded.



There are other factors involved.


The other factors that I came across were:

   * I didn't have my .ssh/config file set up to forward
 authentication.  I added a -A flag to the ssh command in
 plm_base_rsh_support.

   * In plm_rsh_module.c:setup_launch, a NULL orted_cmd made asprintf
 crash.  I used (orted_cmd == NULL ? "" : orted_cmd) in the call to
 asprintf.


Once I fixed those, tree spawning worked for me.  (I believe you 
mentioned a race condition in another conversation.  I haven't run into 
that.)


Iain



Re: [OMPI devel] MPI_REAL16

2009-06-22 Thread Iain Bason
(Thanks, Nick, for explaining that kind values are compiler-dependent. I 
was too lazy to do that.)


Jeff Squyres wrote:
Given that I'll inevitably get the language wrong, can someone suggest 
proper verbiage for this statement in the OMPI README:


- MPI_REAL16 and MPI_COMPLEX32 are only supported on platforms where a
portable C datatype can be found that matches the Fortran type
REAL*16, both in size and bit representation. The Intel v11
compiler, for example, supports these types, but requires the use of
the "_16" suffix in Fortran when assigning constants to REAL*16
variables. 


The _16 suffix really has nothing to do with whether there is a C 
datatype that corresponds to REAL*16. There are two separate issues here:


  1. In Fortran code, any floating point literal has the default kind
 unless otherwise specified. That means that you can get surprising
 results from a simple program designed to test whether a C
 compiler has a data type that corresponds to REAL*16: the least
 significant bits of a REAL*16 variable will be set to zero when
 the literal is assigned to it.
  2. Open MPI requires the C compiler to have a data type that has the
 same bit representation as the Fortran compiler's REAL*16. If the
 C compiler does not have such a data type, then Open MPI cannot
 support REAL*16 in its Fortran interface.

My understanding is that the Intel representative said that there is 
some compiler switch that allows the C compiler to have such a data 
type. I didn't pay enough attention to see whether there was some reason 
not to use the switch.


She also pointed out a bug in the Fortran test code that checks for the 
presence of the C data type. She suggested using a _16 suffix on a 
literal in that test code. Nick pointed out that that _16 suffix means, 
"make this literal a KIND=16 literal," which may mean different things 
to different compilers. In particular, REAL*16 may not be the same as 
REAL(KIND=16).


However, there is no standard way to specify, "make this literal a 
REAL*16 literal." That means that you have to do one of:


   * Declare the variable REAL(KIND=16) and use the _16 suffix on the
 literal.
   * Define some parameter QUAD using the SELECTED_REAL_KIND intrinsic,
 declare the variable REAL(KIND=QUAD), and use the _QUAD suffix on
 the literal.
   * Assume that REAL*16 is the same as REAL(KIND=16) and use the _16
 suffix on the literal.

That assumption turns out to be safer than one might imagine. It is 
certainly true for the Sun and Intel compilers. I am pretty sure it is 
true for the PGI, Pathscale, and GNU compilers. I am not aware of any 
compilers for which it is not true, but that doesn't mean there is no 
such compiler.


All of which is a long winded way of saying that maybe the README ought 
to just say:


   MPI_REAL16 and MPI_COMPLEX32 are only supported on platforms where a
   portable C datatype can be found that matches the Fortran type
   REAL*16, both in size and bit representation.


Iain



Re: [OMPI devel] [OMPI svn] svn:open-mpi r21480

2009-06-22 Thread Ralph Castain
Ah - now that is easily fixed, without breaking the support for everyone
else. I'll commit the fix right away.

Thanks
Ralph


On Mon, Jun 22, 2009 at 11:12 AM, Iain Bason  wrote:

> Ralph Castain wrote:
>
>> Yes, but look at orte/mca/plm/rsh/plm_rsh_module.c:
>>
>>  /* ensure that only the ssh plm is selected on the remote daemon */
>>var = mca_base_param_environ_variable("plm", NULL, NULL);
>>opal_setenv(var, "rsh", true, &env);
>>free(var);
>>   This is done in "ssh_child", right before we fork_exec the ssh command
>> to launch the remote daemon. This is why slave spawn works, for example.
>>
>
> My ssh does not preserve environment variables:
>
> bash-3.2$ export MY_VERY_OWN_ENVIRONMENT_VARIABLE=yes
> bash-3.2$ ssh cubbie env | grep MY_VERY_OWN
> WARNING: This is a restricted access server. If you do not have explicit
> permission to access this server, please disconnect immediately.
> Unauthorized access to this system is considered gross misconduct and may
> result in disciplinary action, including revocation of SWAN access
> privileges, immediate termination of employment, and/or prosecution to the
> fullest extent of the law.
> bash-3.2$
>
> The rsh man page explicitly states that the local environment is not passed
> to the remote shell.
>
> I haven't checked qrsh.  Maybe it works with that.
>
>  I agree that tree_spawn doesn't seem to work right now, but it is not due
>> to the plm not being selected.
>>
>
> It was for me.  I don't know whether it is because your rsh/ssh work
> differently, or for some other reason, but there is no question that my tree
> spawn failed because no PLM was loaded.
>
>  There are other factors involved.
>>
>
> The other factors that I came across were:
>
>   * I didn't have my .ssh/config file set up to forward
> authentication.  I added a -A flag to the ssh command in
> plm_base_rsh_support.
>
>   * In plm_rsh_module.c:setup_launch, a NULL orted_cmd made asprintf
> crash.  I used (orted_cmd == NULL ? "" : orted_cmd) in the call to
> asprintf.
>
>
> Once I fixed those, tree spawning worked for me.  (I believe you mentioned
> a race condition in another conversation.  I haven't run into that.)
>
>
> Iain
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>