Re: [OMPI users] mca:base:select:( ess) No component selected!

2008-09-25 Thread Ralph Castain
FWIW: doing it this way locks you to bash. What you could do instead  
is using ompi_prefix, and then add the other libraries via  
orte_launch_agent. The reason I suggest this is that ompi_prefix will  
cause the ssh launcher to check which shell is being invoked on the  
remote end, and automatically use the correct syntax to formulate the  
setenv (or equiv) command to get the libraries right. Otherwise, your  
approach will reject the envar change if, for example, the remote  
shell is tcsh.


May not matter for what you are doing - so feel free to ignore. :-)

We don't really have a complete design document. A definite hole, but  
priorities and time have just gotten in the way. Feel free to ask  
questions, though, and we do try to fill in the gaps as time permits.  
There are some presentations (and audio walkthroughs) starting to show  
up on the www.open-mpi.org site, so you might keep an eye open there.


Ralph

On Sep 24, 2008, at 1:59 PM, Will Portnoy wrote:


Thank you.  I was able to make everything work by using
orte_launch_agent and bash's $@ to pass the necessary parameters to
orted within my shell script.

I needed to add additional paths to my LD_LIBRARY_PATH/PATH variables
for other necessary libraries, which is why I was pushing on the
orte_launch_agent solution.

Is there a document that covers the design of openmpi a bit?  It looks
pretty interesting, and there's quite a few acronyms that I had
trouble finding on the internet (e.g. "ess").

On Wed, Sep 24, 2008 at 3:40 PM, Ralph Castain  wrote:
Yes - you don't want to use orte_launch_agent at all for that  
purpose. What
you need to set is an info_key in your comm_spawn command for  
"ompi_prefix",
with the value set to the install path. The ssh launcher will  
assemble the

launch cmd using that info.
Ralph


On Sep 24, 2008, at 1:28 PM, Will Portnoy wrote:

Yes, your first sentence is correct.  I intend to use the unmodified
orted, but I need to set up the unix environment after the ssh has
completed but before orted is executed.

In particular, one of the more important tasks for me to do after ssh
connects is to set LD_LIBRARY_PATH and PATH to include the paths of
the openmpi's install lib and bin directories, respectively.
Otherwise, orted will not be on the PATH, and its dependent libraries
will not be in LD_LIBRARY_PATH.

Is there a recommended method to set LD_LIBRARY_PATH and PATH when  
ssh

is used to connect to other hosts when running an mpi job?

thank you,

Will

On Wed, Sep 24, 2008 at 2:36 PM, Ralph Castain  wrote:

So this is a singleton comm_spawn scenario, that requires you  
specify a


launch_agent to execute? Just trying to ensure I understand.

First, let me ensure we have a common understanding of what

orte_launch_agent does. Basically, that param stipulates the  
command to be


used in place of "orted" - it doesn't substitute for "ssh". So if  
you set


-mca orte_launch_agent foo, what will happen is: "ssh nodename foo"  
instead


of "ssh nodename orted".

The intent was to provide a way to do things like run valgrind on  
the orted


itself. So you could do -mca orte_launch_agent "valgrind orted",  
and we


would dutifully run "ssh nodename valrind orted".

Or if you wanted to write your own orted (e.g., bar-orted), you could

substitute it for our "orted".

Or if you wanted to set mca params solely to be seen on the backend

nodes/procs, you could set -mca orte_launch_agent "orted -mca foo  
bar", and


we would launch "ssh nodename orted -mca foo bar". This allows us  
to set mca


params without having mpirun see them - helps us to look at debug  
output,


for example, from only the backend procs.

If what you need to do is set something in the environment for the  
orted,


there are certain cmd line options that will do that for you -

orte_launch_agent may or may not be a good method.

Perhaps it would help if you could tell me exactly what you wanted  
to have


orte_launch_agent actually do?

Thanks

Ralph

On Sep 24, 2008, at 12:22 PM, Will Portnoy wrote:

Sorry for the miscommunication: The processes are started by my

program with MPI_Comm_spawn, so there was no mpirun involved.

If you can suggest a test program I can use with mpirun to validate  
my


openmpi environment and install, that would probably produce the

output you would like to see.

But I'm not sure that will make it clear how the file pointed to by

"orte_launch_agent" in "mca-params.conf" should be written to setup  
an


environment and start orted.

Will

On Wed, Sep 24, 2008 at 2:17 PM, Ralph Castain  wrote:

Afraid I am confused. This was the entire output from the job?? If  
so,


then

that means mpirun itself wasn't able to find a launch environment it

could

use, so you never got to the point of actually launching an orted.

Do you have ssh in your path? My best immediate guess is that you  
don't,


and

that mpirun therefore doesn't see anything it can use to launch 

Re: [OMPI users] mca:base:select:( ess) No component selected!

2008-09-24 Thread Will Portnoy
Thank you.  I was able to make everything work by using
orte_launch_agent and bash's $@ to pass the necessary parameters to
orted within my shell script.

I needed to add additional paths to my LD_LIBRARY_PATH/PATH variables
for other necessary libraries, which is why I was pushing on the
orte_launch_agent solution.

Is there a document that covers the design of openmpi a bit?  It looks
pretty interesting, and there's quite a few acronyms that I had
trouble finding on the internet (e.g. "ess").

On Wed, Sep 24, 2008 at 3:40 PM, Ralph Castain  wrote:
> Yes - you don't want to use orte_launch_agent at all for that purpose. What
> you need to set is an info_key in your comm_spawn command for "ompi_prefix",
> with the value set to the install path. The ssh launcher will assemble the
> launch cmd using that info.
> Ralph
>
>
> On Sep 24, 2008, at 1:28 PM, Will Portnoy wrote:
>
> Yes, your first sentence is correct.  I intend to use the unmodified
> orted, but I need to set up the unix environment after the ssh has
> completed but before orted is executed.
>
> In particular, one of the more important tasks for me to do after ssh
> connects is to set LD_LIBRARY_PATH and PATH to include the paths of
> the openmpi's install lib and bin directories, respectively.
> Otherwise, orted will not be on the PATH, and its dependent libraries
> will not be in LD_LIBRARY_PATH.
>
> Is there a recommended method to set LD_LIBRARY_PATH and PATH when ssh
> is used to connect to other hosts when running an mpi job?
>
> thank you,
>
> Will
>
> On Wed, Sep 24, 2008 at 2:36 PM, Ralph Castain  wrote:
>
> So this is a singleton comm_spawn scenario, that requires you specify a
>
> launch_agent to execute? Just trying to ensure I understand.
>
> First, let me ensure we have a common understanding of what
>
> orte_launch_agent does. Basically, that param stipulates the command to be
>
> used in place of "orted" - it doesn't substitute for "ssh". So if you set
>
> -mca orte_launch_agent foo, what will happen is: "ssh nodename foo" instead
>
> of "ssh nodename orted".
>
> The intent was to provide a way to do things like run valgrind on the orted
>
> itself. So you could do -mca orte_launch_agent "valgrind orted", and we
>
> would dutifully run "ssh nodename valrind orted".
>
> Or if you wanted to write your own orted (e.g., bar-orted), you could
>
> substitute it for our "orted".
>
> Or if you wanted to set mca params solely to be seen on the backend
>
> nodes/procs, you could set -mca orte_launch_agent "orted -mca foo bar", and
>
> we would launch "ssh nodename orted -mca foo bar". This allows us to set mca
>
> params without having mpirun see them - helps us to look at debug output,
>
> for example, from only the backend procs.
>
> If what you need to do is set something in the environment for the orted,
>
> there are certain cmd line options that will do that for you -
>
> orte_launch_agent may or may not be a good method.
>
> Perhaps it would help if you could tell me exactly what you wanted to have
>
> orte_launch_agent actually do?
>
> Thanks
>
> Ralph
>
> On Sep 24, 2008, at 12:22 PM, Will Portnoy wrote:
>
> Sorry for the miscommunication: The processes are started by my
>
> program with MPI_Comm_spawn, so there was no mpirun involved.
>
> If you can suggest a test program I can use with mpirun to validate my
>
> openmpi environment and install, that would probably produce the
>
> output you would like to see.
>
> But I'm not sure that will make it clear how the file pointed to by
>
> "orte_launch_agent" in "mca-params.conf" should be written to setup an
>
> environment and start orted.
>
> Will
>
> On Wed, Sep 24, 2008 at 2:17 PM, Ralph Castain  wrote:
>
> Afraid I am confused. This was the entire output from the job?? If so,
>
> then
>
> that means mpirun itself wasn't able to find a launch environment it
>
> could
>
> use, so you never got to the point of actually launching an orted.
>
> Do you have ssh in your path? My best immediate guess is that you don't,
>
> and
>
> that mpirun therefore doesn't see anything it can use to launch a job. We
>
> have discussed internally that we need to improve that error message -
>
> could
>
> be this is another case emphasizing that point.
>
> 1.3 is fine to use - still patching some bugs, but nothing that should
>
> impact this issue.
>
> Ralph
>
> On Sep 24, 2008, at 12:11 PM, Will Portnoy wrote:
>
> That was the output with plm_base_verbose set to 99 - it's the same
>
> output with 1.
>
> Yes, I'd like to use ssh.
>
> orted wasn't starting properly with orte_launch_agent (which was
>
> needed because my environment on the target machine wasn't set up), so
>
> that's why I thought I would try it directly on the command line on
>
> localhost.  I thought this was a simpler case: to verify that orted
>
> could find all of its necessary components without the complexity of
>
> everything else I'm doing.
>
> If I needed to use orte_launch_agent, how 

Re: [OMPI users] mca:base:select:( ess) No component selected!

2008-09-24 Thread Ralph Castain
Yes - you don't want to use orte_launch_agent at all for that purpose.  
What you need to set is an info_key in your comm_spawn command for  
"ompi_prefix", with the value set to the install path. The ssh  
launcher will assemble the launch cmd using that info.


Ralph



On Sep 24, 2008, at 1:28 PM, Will Portnoy wrote:


Yes, your first sentence is correct.  I intend to use the unmodified
orted, but I need to set up the unix environment after the ssh has
completed but before orted is executed.

In particular, one of the more important tasks for me to do after ssh
connects is to set LD_LIBRARY_PATH and PATH to include the paths of
the openmpi's install lib and bin directories, respectively.
Otherwise, orted will not be on the PATH, and its dependent libraries
will not be in LD_LIBRARY_PATH.

Is there a recommended method to set LD_LIBRARY_PATH and PATH when ssh
is used to connect to other hosts when running an mpi job?

thank you,

Will

On Wed, Sep 24, 2008 at 2:36 PM, Ralph Castain  wrote:
So this is a singleton comm_spawn scenario, that requires you  
specify a

launch_agent to execute? Just trying to ensure I understand.

First, let me ensure we have a common understanding of what
orte_launch_agent does. Basically, that param stipulates the  
command to be
used in place of "orted" - it doesn't substitute for "ssh". So if  
you set
-mca orte_launch_agent foo, what will happen is: "ssh nodename foo"  
instead

of "ssh nodename orted".

The intent was to provide a way to do things like run valgrind on  
the orted
itself. So you could do -mca orte_launch_agent "valgrind orted",  
and we

would dutifully run "ssh nodename valrind orted".

Or if you wanted to write your own orted (e.g., bar-orted), you could
substitute it for our "orted".

Or if you wanted to set mca params solely to be seen on the backend
nodes/procs, you could set -mca orte_launch_agent "orted -mca foo  
bar", and
we would launch "ssh nodename orted -mca foo bar". This allows us  
to set mca
params without having mpirun see them - helps us to look at debug  
output,

for example, from only the backend procs.

If what you need to do is set something in the environment for the  
orted,

there are certain cmd line options that will do that for you -
orte_launch_agent may or may not be a good method.

Perhaps it would help if you could tell me exactly what you wanted  
to have

orte_launch_agent actually do?

Thanks
Ralph

On Sep 24, 2008, at 12:22 PM, Will Portnoy wrote:


Sorry for the miscommunication: The processes are started by my
program with MPI_Comm_spawn, so there was no mpirun involved.

If you can suggest a test program I can use with mpirun to  
validate my

openmpi environment and install, that would probably produce the
output you would like to see.

But I'm not sure that will make it clear how the file pointed to by
"orte_launch_agent" in "mca-params.conf" should be written to  
setup an

environment and start orted.

Will

On Wed, Sep 24, 2008 at 2:17 PM, Ralph Castain  wrote:


Afraid I am confused. This was the entire output from the job??  
If so,

then
that means mpirun itself wasn't able to find a launch environment  
it

could
use, so you never got to the point of actually launching an orted.

Do you have ssh in your path? My best immediate guess is that you  
don't,

and
that mpirun therefore doesn't see anything it can use to launch a  
job. We
have discussed internally that we need to improve that error  
message -

could
be this is another case emphasizing that point.

1.3 is fine to use - still patching some bugs, but nothing that  
should

impact this issue.

Ralph

On Sep 24, 2008, at 12:11 PM, Will Portnoy wrote:

That was the output with plm_base_verbose set to 99 - it's the  
same

output with 1.

Yes, I'd like to use ssh.

orted wasn't starting properly with orte_launch_agent (which was
needed because my environment on the target machine wasn't set  
up), so
that's why I thought I would try it directly on the command line  
on
localhost.  I thought this was a simpler case: to verify that  
orted
could find all of its necessary components without the  
complexity of

everything else I'm doing.

If I needed to use orte_launch_agent, how should I pass the  
necessary

parameters to start orted after I set up my environment?

Am I better off using trunk over 1.3?

thank you,

Will

On Wed, Sep 24, 2008 at 2:01 PM, Ralph Castain   
wrote:


Could you rerun that with -mca plm_base_verbose 1? What  
environment are

you
in - I assume rsh/ssh?

I would like to see the cmd line being used to launch the  
orted. What

this
indicates is that we are not getting the cmd line correct.  
Could just

be
that some patch in the trunk didn't get completely applied to  
the 1.3

branch.

BTW: you probably can't run orted directly off of the cmd line.  
It

likely
needs some cmd line params to get critical info.

Ralph

On Sep 24, 2008, at 9:47 AM, Will Portnoy wrote:

I'm trying to 

Re: [OMPI users] mca:base:select:( ess) No component selected!

2008-09-24 Thread Will Portnoy
Yes, your first sentence is correct.  I intend to use the unmodified
orted, but I need to set up the unix environment after the ssh has
completed but before orted is executed.

In particular, one of the more important tasks for me to do after ssh
connects is to set LD_LIBRARY_PATH and PATH to include the paths of
the openmpi's install lib and bin directories, respectively.
Otherwise, orted will not be on the PATH, and its dependent libraries
will not be in LD_LIBRARY_PATH.

Is there a recommended method to set LD_LIBRARY_PATH and PATH when ssh
is used to connect to other hosts when running an mpi job?

thank you,

Will

On Wed, Sep 24, 2008 at 2:36 PM, Ralph Castain  wrote:
> So this is a singleton comm_spawn scenario, that requires you specify a
> launch_agent to execute? Just trying to ensure I understand.
>
> First, let me ensure we have a common understanding of what
> orte_launch_agent does. Basically, that param stipulates the command to be
> used in place of "orted" - it doesn't substitute for "ssh". So if you set
> -mca orte_launch_agent foo, what will happen is: "ssh nodename foo" instead
> of "ssh nodename orted".
>
> The intent was to provide a way to do things like run valgrind on the orted
> itself. So you could do -mca orte_launch_agent "valgrind orted", and we
> would dutifully run "ssh nodename valrind orted".
>
> Or if you wanted to write your own orted (e.g., bar-orted), you could
> substitute it for our "orted".
>
> Or if you wanted to set mca params solely to be seen on the backend
>  nodes/procs, you could set -mca orte_launch_agent "orted -mca foo bar", and
> we would launch "ssh nodename orted -mca foo bar". This allows us to set mca
> params without having mpirun see them - helps us to look at debug output,
> for example, from only the backend procs.
>
> If what you need to do is set something in the environment for the orted,
> there are certain cmd line options that will do that for you -
> orte_launch_agent may or may not be a good method.
>
> Perhaps it would help if you could tell me exactly what you wanted to have
> orte_launch_agent actually do?
>
> Thanks
> Ralph
>
> On Sep 24, 2008, at 12:22 PM, Will Portnoy wrote:
>
>> Sorry for the miscommunication: The processes are started by my
>> program with MPI_Comm_spawn, so there was no mpirun involved.
>>
>> If you can suggest a test program I can use with mpirun to validate my
>> openmpi environment and install, that would probably produce the
>> output you would like to see.
>>
>> But I'm not sure that will make it clear how the file pointed to by
>> "orte_launch_agent" in "mca-params.conf" should be written to setup an
>> environment and start orted.
>>
>> Will
>>
>> On Wed, Sep 24, 2008 at 2:17 PM, Ralph Castain  wrote:
>>>
>>> Afraid I am confused. This was the entire output from the job?? If so,
>>> then
>>> that means mpirun itself wasn't able to find a launch environment it
>>> could
>>> use, so you never got to the point of actually launching an orted.
>>>
>>> Do you have ssh in your path? My best immediate guess is that you don't,
>>> and
>>> that mpirun therefore doesn't see anything it can use to launch a job. We
>>> have discussed internally that we need to improve that error message -
>>> could
>>> be this is another case emphasizing that point.
>>>
>>> 1.3 is fine to use - still patching some bugs, but nothing that should
>>> impact this issue.
>>>
>>> Ralph
>>>
>>> On Sep 24, 2008, at 12:11 PM, Will Portnoy wrote:
>>>
 That was the output with plm_base_verbose set to 99 - it's the same
 output with 1.

 Yes, I'd like to use ssh.

 orted wasn't starting properly with orte_launch_agent (which was
 needed because my environment on the target machine wasn't set up), so
 that's why I thought I would try it directly on the command line on
 localhost.  I thought this was a simpler case: to verify that orted
 could find all of its necessary components without the complexity of
 everything else I'm doing.

 If I needed to use orte_launch_agent, how should I pass the necessary
 parameters to start orted after I set up my environment?

 Am I better off using trunk over 1.3?

 thank you,

 Will

 On Wed, Sep 24, 2008 at 2:01 PM, Ralph Castain  wrote:
>
> Could you rerun that with -mca plm_base_verbose 1? What environment are
> you
> in - I assume rsh/ssh?
>
> I would like to see the cmd line being used to launch the orted. What
> this
> indicates is that we are not getting the cmd line correct. Could just
> be
> that some patch in the trunk didn't get completely applied to the 1.3
> branch.
>
> BTW: you probably can't run orted directly off of the cmd line. It
> likely
> needs some cmd line params to get critical info.
>
> Ralph
>
> On Sep 24, 2008, at 9:47 AM, Will Portnoy wrote:
>
>> I'm trying to use 

Re: [OMPI users] mca:base:select:( ess) No component selected!

2008-09-24 Thread Will Portnoy
Sorry for the miscommunication: The processes are started by my
program with MPI_Comm_spawn, so there was no mpirun involved.

If you can suggest a test program I can use with mpirun to validate my
openmpi environment and install, that would probably produce the
output you would like to see.

But I'm not sure that will make it clear how the file pointed to by
"orte_launch_agent" in "mca-params.conf" should be written to setup an
environment and start orted.

Will

On Wed, Sep 24, 2008 at 2:17 PM, Ralph Castain  wrote:
> Afraid I am confused. This was the entire output from the job?? If so, then
> that means mpirun itself wasn't able to find a launch environment it could
> use, so you never got to the point of actually launching an orted.
>
> Do you have ssh in your path? My best immediate guess is that you don't, and
> that mpirun therefore doesn't see anything it can use to launch a job. We
> have discussed internally that we need to improve that error message - could
> be this is another case emphasizing that point.
>
> 1.3 is fine to use - still patching some bugs, but nothing that should
> impact this issue.
>
> Ralph
>
> On Sep 24, 2008, at 12:11 PM, Will Portnoy wrote:
>
>> That was the output with plm_base_verbose set to 99 - it's the same
>> output with 1.
>>
>> Yes, I'd like to use ssh.
>>
>> orted wasn't starting properly with orte_launch_agent (which was
>> needed because my environment on the target machine wasn't set up), so
>> that's why I thought I would try it directly on the command line on
>> localhost.  I thought this was a simpler case: to verify that orted
>> could find all of its necessary components without the complexity of
>> everything else I'm doing.
>>
>> If I needed to use orte_launch_agent, how should I pass the necessary
>> parameters to start orted after I set up my environment?
>>
>> Am I better off using trunk over 1.3?
>>
>> thank you,
>>
>> Will
>>
>> On Wed, Sep 24, 2008 at 2:01 PM, Ralph Castain  wrote:
>>>
>>> Could you rerun that with -mca plm_base_verbose 1? What environment are
>>> you
>>> in - I assume rsh/ssh?
>>>
>>> I would like to see the cmd line being used to launch the orted. What
>>> this
>>> indicates is that we are not getting the cmd line correct. Could just be
>>> that some patch in the trunk didn't get completely applied to the 1.3
>>> branch.
>>>
>>> BTW: you probably can't run orted directly off of the cmd line. It likely
>>> needs some cmd line params to get critical info.
>>>
>>> Ralph
>>>
>>> On Sep 24, 2008, at 9:47 AM, Will Portnoy wrote:
>>>
 I'm trying to use MPI_Comm_Spawn with MPI_Info's host key to spawn
 processes from a process not started with mpirun.  This works with the
 host key set to the localhost's hostname, but it does not work when I
 use other hosts.

 I'm using version 1.3a1r19602.  I need to use orte_launch_agent to set
 up my environment a bit before orted is started, but it fails with
 errors listed below.

 When I try to run orted directly on the command line with some of the
 verbosity flags turned to "11", I receive the same messages.

 Does anybody have any suggestions?

 thank you,

 Will


 [fqdn:24761] mca: base: components_open: Looking for ess components
 [fqdn:24761] mca: base: components_open: opening ess components
 [fqdn:24761] mca: base: components_open: found loaded component env
 [fqdn:24761] mca: base: components_open: component env has no register
 function
 [fqdn:24761] mca: base: components_open: component env open function
 successful
 [fqdn:24761] mca: base: components_open: found loaded component hnp
 [fqdn:24761] mca: base: components_open: component hnp has no register
 function
 [fqdn:24761] mca: base: components_open: component hnp open function
 successful
 [fqdn:24761] mca: base: components_open: found loaded component
 singleton
 [fqdn:24761] mca: base: components_open: component singleton has no
 register function
 [fqdn:24761] mca: base: components_open: component singleton open
 function successful
 [fqdn:24761] mca: base: components_open: found loaded component slurm
 [fqdn:24761] mca: base: components_open: component slurm has no
 register function
 [fqdn:24761] mca: base: components_open: component slurm open function
 successful
 [fqdn:24761] mca: base: components_open: found loaded component tool
 [fqdn:24761] mca: base: components_open: component tool has no register
 function
 [fqdn:24761] mca: base: components_open: component tool open function
 successful
 [fqdn:24761] mca:base:select: Auto-selecting ess components
 [fqdn:24761] mca:base:select:(  ess) Querying component [env]
 [fqdn:24761] mca:base:select:(  ess) Skipping component [env]. Query
 failed to return a module
 [fqdn:24761] mca:base:select:(  ess) Querying component [hnp]
 

Re: [OMPI users] mca:base:select:( ess) No component selected!

2008-09-24 Thread Ralph Castain
Afraid I am confused. This was the entire output from the job?? If so,  
then that means mpirun itself wasn't able to find a launch environment  
it could use, so you never got to the point of actually launching an  
orted.


Do you have ssh in your path? My best immediate guess is that you  
don't, and that mpirun therefore doesn't see anything it can use to  
launch a job. We have discussed internally that we need to improve  
that error message - could be this is another case emphasizing that  
point.


1.3 is fine to use - still patching some bugs, but nothing that should  
impact this issue.


Ralph

On Sep 24, 2008, at 12:11 PM, Will Portnoy wrote:


That was the output with plm_base_verbose set to 99 - it's the same
output with 1.

Yes, I'd like to use ssh.

orted wasn't starting properly with orte_launch_agent (which was
needed because my environment on the target machine wasn't set up), so
that's why I thought I would try it directly on the command line on
localhost.  I thought this was a simpler case: to verify that orted
could find all of its necessary components without the complexity of
everything else I'm doing.

If I needed to use orte_launch_agent, how should I pass the necessary
parameters to start orted after I set up my environment?

Am I better off using trunk over 1.3?

thank you,

Will

On Wed, Sep 24, 2008 at 2:01 PM, Ralph Castain  wrote:
Could you rerun that with -mca plm_base_verbose 1? What environment  
are you

in - I assume rsh/ssh?

I would like to see the cmd line being used to launch the orted.  
What this
indicates is that we are not getting the cmd line correct. Could  
just be

that some patch in the trunk didn't get completely applied to the 1.3
branch.

BTW: you probably can't run orted directly off of the cmd line. It  
likely

needs some cmd line params to get critical info.

Ralph

On Sep 24, 2008, at 9:47 AM, Will Portnoy wrote:


I'm trying to use MPI_Comm_Spawn with MPI_Info's host key to spawn
processes from a process not started with mpirun.  This works with  
the
host key set to the localhost's hostname, but it does not work  
when I

use other hosts.

I'm using version 1.3a1r19602.  I need to use orte_launch_agent to  
set

up my environment a bit before orted is started, but it fails with
errors listed below.

When I try to run orted directly on the command line with some of  
the

verbosity flags turned to "11", I receive the same messages.

Does anybody have any suggestions?

thank you,

Will


[fqdn:24761] mca: base: components_open: Looking for ess components
[fqdn:24761] mca: base: components_open: opening ess components
[fqdn:24761] mca: base: components_open: found loaded component env
[fqdn:24761] mca: base: components_open: component env has no  
register

function
[fqdn:24761] mca: base: components_open: component env open function
successful
[fqdn:24761] mca: base: components_open: found loaded component hnp
[fqdn:24761] mca: base: components_open: component hnp has no  
register

function
[fqdn:24761] mca: base: components_open: component hnp open function
successful
[fqdn:24761] mca: base: components_open: found loaded component  
singleton

[fqdn:24761] mca: base: components_open: component singleton has no
register function
[fqdn:24761] mca: base: components_open: component singleton open
function successful
[fqdn:24761] mca: base: components_open: found loaded component  
slurm

[fqdn:24761] mca: base: components_open: component slurm has no
register function
[fqdn:24761] mca: base: components_open: component slurm open  
function

successful
[fqdn:24761] mca: base: components_open: found loaded component tool
[fqdn:24761] mca: base: components_open: component tool has no  
register

function
[fqdn:24761] mca: base: components_open: component tool open  
function

successful
[fqdn:24761] mca:base:select: Auto-selecting ess components
[fqdn:24761] mca:base:select:(  ess) Querying component [env]
[fqdn:24761] mca:base:select:(  ess) Skipping component [env]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [hnp]
[fqdn:24761] mca:base:select:(  ess) Skipping component [hnp]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [singleton]
[fqdn:24761] mca:base:select:(  ess) Skipping component [singleton].
Query failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [slurm]
[fqdn:24761] mca:base:select:(  ess) Skipping component [slurm].  
Query

failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [tool]
[fqdn:24761] mca:base:select:(  ess) Skipping component [tool].  
Query

failed to return a module
[fqdn:24761] mca:base:select:(  ess) No component selected!
[fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
runtime/orte_init.c at line 125
--
It looks like orte_init failed for some reason; your parallel  
process 

Re: [OMPI users] mca:base:select:( ess) No component selected!

2008-09-24 Thread Will Portnoy
That was the output with plm_base_verbose set to 99 - it's the same
output with 1.

Yes, I'd like to use ssh.

orted wasn't starting properly with orte_launch_agent (which was
needed because my environment on the target machine wasn't set up), so
that's why I thought I would try it directly on the command line on
localhost.  I thought this was a simpler case: to verify that orted
could find all of its necessary components without the complexity of
everything else I'm doing.

If I needed to use orte_launch_agent, how should I pass the necessary
parameters to start orted after I set up my environment?

Am I better off using trunk over 1.3?

thank you,

Will

On Wed, Sep 24, 2008 at 2:01 PM, Ralph Castain  wrote:
> Could you rerun that with -mca plm_base_verbose 1? What environment are you
> in - I assume rsh/ssh?
>
> I would like to see the cmd line being used to launch the orted. What this
> indicates is that we are not getting the cmd line correct. Could just be
> that some patch in the trunk didn't get completely applied to the 1.3
> branch.
>
> BTW: you probably can't run orted directly off of the cmd line. It likely
> needs some cmd line params to get critical info.
>
> Ralph
>
> On Sep 24, 2008, at 9:47 AM, Will Portnoy wrote:
>
>> I'm trying to use MPI_Comm_Spawn with MPI_Info's host key to spawn
>> processes from a process not started with mpirun.  This works with the
>> host key set to the localhost's hostname, but it does not work when I
>> use other hosts.
>>
>> I'm using version 1.3a1r19602.  I need to use orte_launch_agent to set
>> up my environment a bit before orted is started, but it fails with
>> errors listed below.
>>
>> When I try to run orted directly on the command line with some of the
>> verbosity flags turned to "11", I receive the same messages.
>>
>> Does anybody have any suggestions?
>>
>> thank you,
>>
>> Will
>>
>>
>> [fqdn:24761] mca: base: components_open: Looking for ess components
>> [fqdn:24761] mca: base: components_open: opening ess components
>> [fqdn:24761] mca: base: components_open: found loaded component env
>> [fqdn:24761] mca: base: components_open: component env has no register
>> function
>> [fqdn:24761] mca: base: components_open: component env open function
>> successful
>> [fqdn:24761] mca: base: components_open: found loaded component hnp
>> [fqdn:24761] mca: base: components_open: component hnp has no register
>> function
>> [fqdn:24761] mca: base: components_open: component hnp open function
>> successful
>> [fqdn:24761] mca: base: components_open: found loaded component singleton
>> [fqdn:24761] mca: base: components_open: component singleton has no
>> register function
>> [fqdn:24761] mca: base: components_open: component singleton open
>> function successful
>> [fqdn:24761] mca: base: components_open: found loaded component slurm
>> [fqdn:24761] mca: base: components_open: component slurm has no
>> register function
>> [fqdn:24761] mca: base: components_open: component slurm open function
>> successful
>> [fqdn:24761] mca: base: components_open: found loaded component tool
>> [fqdn:24761] mca: base: components_open: component tool has no register
>> function
>> [fqdn:24761] mca: base: components_open: component tool open function
>> successful
>> [fqdn:24761] mca:base:select: Auto-selecting ess components
>> [fqdn:24761] mca:base:select:(  ess) Querying component [env]
>> [fqdn:24761] mca:base:select:(  ess) Skipping component [env]. Query
>> failed to return a module
>> [fqdn:24761] mca:base:select:(  ess) Querying component [hnp]
>> [fqdn:24761] mca:base:select:(  ess) Skipping component [hnp]. Query
>> failed to return a module
>> [fqdn:24761] mca:base:select:(  ess) Querying component [singleton]
>> [fqdn:24761] mca:base:select:(  ess) Skipping component [singleton].
>> Query failed to return a module
>> [fqdn:24761] mca:base:select:(  ess) Querying component [slurm]
>> [fqdn:24761] mca:base:select:(  ess) Skipping component [slurm]. Query
>> failed to return a module
>> [fqdn:24761] mca:base:select:(  ess) Querying component [tool]
>> [fqdn:24761] mca:base:select:(  ess) Skipping component [tool]. Query
>> failed to return a module
>> [fqdn:24761] mca:base:select:(  ess) No component selected!
>> [fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>> runtime/orte_init.c at line 125
>> --
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>>  orte_ess_base_select failed
>>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
>> --
>> 

Re: [OMPI users] mca:base:select:( ess) No component selected!

2008-09-24 Thread Ralph Castain
Could you rerun that with -mca plm_base_verbose 1? What environment  
are you in - I assume rsh/ssh?


I would like to see the cmd line being used to launch the orted. What  
this indicates is that we are not getting the cmd line correct. Could  
just be that some patch in the trunk didn't get completely applied to  
the 1.3 branch.


BTW: you probably can't run orted directly off of the cmd line. It  
likely needs some cmd line params to get critical info.


Ralph

On Sep 24, 2008, at 9:47 AM, Will Portnoy wrote:


I'm trying to use MPI_Comm_Spawn with MPI_Info's host key to spawn
processes from a process not started with mpirun.  This works with the
host key set to the localhost's hostname, but it does not work when I
use other hosts.

I'm using version 1.3a1r19602.  I need to use orte_launch_agent to set
up my environment a bit before orted is started, but it fails with
errors listed below.

When I try to run orted directly on the command line with some of the
verbosity flags turned to "11", I receive the same messages.

Does anybody have any suggestions?

thank you,

Will


[fqdn:24761] mca: base: components_open: Looking for ess components
[fqdn:24761] mca: base: components_open: opening ess components
[fqdn:24761] mca: base: components_open: found loaded component env
[fqdn:24761] mca: base: components_open: component env has no  
register function
[fqdn:24761] mca: base: components_open: component env open function  
successful

[fqdn:24761] mca: base: components_open: found loaded component hnp
[fqdn:24761] mca: base: components_open: component hnp has no  
register function
[fqdn:24761] mca: base: components_open: component hnp open function  
successful
[fqdn:24761] mca: base: components_open: found loaded component  
singleton

[fqdn:24761] mca: base: components_open: component singleton has no
register function
[fqdn:24761] mca: base: components_open: component singleton open
function successful
[fqdn:24761] mca: base: components_open: found loaded component slurm
[fqdn:24761] mca: base: components_open: component slurm has no
register function
[fqdn:24761] mca: base: components_open: component slurm open function
successful
[fqdn:24761] mca: base: components_open: found loaded component tool
[fqdn:24761] mca: base: components_open: component tool has no  
register function
[fqdn:24761] mca: base: components_open: component tool open  
function successful

[fqdn:24761] mca:base:select: Auto-selecting ess components
[fqdn:24761] mca:base:select:(  ess) Querying component [env]
[fqdn:24761] mca:base:select:(  ess) Skipping component [env]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [hnp]
[fqdn:24761] mca:base:select:(  ess) Skipping component [hnp]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [singleton]
[fqdn:24761] mca:base:select:(  ess) Skipping component [singleton].
Query failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [slurm]
[fqdn:24761] mca:base:select:(  ess) Skipping component [slurm]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [tool]
[fqdn:24761] mca:base:select:(  ess) Skipping component [tool]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) No component selected!
[fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
runtime/orte_init.c at line 125
--
It looks like orte_init failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_base_select failed
 --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
[fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
orted/orted_main.c at line 315
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] mca:base:select:( ess) No component selected!

2008-09-24 Thread Will Portnoy
I'm trying to use MPI_Comm_Spawn with MPI_Info's host key to spawn
processes from a process not started with mpirun.  This works with the
host key set to the localhost's hostname, but it does not work when I
use other hosts.

I'm using version 1.3a1r19602.  I need to use orte_launch_agent to set
up my environment a bit before orted is started, but it fails with
errors listed below.

When I try to run orted directly on the command line with some of the
verbosity flags turned to "11", I receive the same messages.

Does anybody have any suggestions?

thank you,

Will


[fqdn:24761] mca: base: components_open: Looking for ess components
[fqdn:24761] mca: base: components_open: opening ess components
[fqdn:24761] mca: base: components_open: found loaded component env
[fqdn:24761] mca: base: components_open: component env has no register function
[fqdn:24761] mca: base: components_open: component env open function successful
[fqdn:24761] mca: base: components_open: found loaded component hnp
[fqdn:24761] mca: base: components_open: component hnp has no register function
[fqdn:24761] mca: base: components_open: component hnp open function successful
[fqdn:24761] mca: base: components_open: found loaded component singleton
[fqdn:24761] mca: base: components_open: component singleton has no
register function
[fqdn:24761] mca: base: components_open: component singleton open
function successful
[fqdn:24761] mca: base: components_open: found loaded component slurm
[fqdn:24761] mca: base: components_open: component slurm has no
register function
[fqdn:24761] mca: base: components_open: component slurm open function
successful
[fqdn:24761] mca: base: components_open: found loaded component tool
[fqdn:24761] mca: base: components_open: component tool has no register function
[fqdn:24761] mca: base: components_open: component tool open function successful
[fqdn:24761] mca:base:select: Auto-selecting ess components
[fqdn:24761] mca:base:select:(  ess) Querying component [env]
[fqdn:24761] mca:base:select:(  ess) Skipping component [env]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [hnp]
[fqdn:24761] mca:base:select:(  ess) Skipping component [hnp]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [singleton]
[fqdn:24761] mca:base:select:(  ess) Skipping component [singleton].
Query failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [slurm]
[fqdn:24761] mca:base:select:(  ess) Skipping component [slurm]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) Querying component [tool]
[fqdn:24761] mca:base:select:(  ess) Skipping component [tool]. Query
failed to return a module
[fqdn:24761] mca:base:select:(  ess) No component selected!
[fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
runtime/orte_init.c at line 125
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
[fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
orted/orted_main.c at line 315