[OMPI devel] OpenMPI-v1.3.1 Tentatives dates release eversion

2008-05-09 Thread Mukesh K Srivastava
Hi.

What is the tentatives release dates for OpenMPI-v1.3.1?

Any idea.

BR


Re: [OMPI devel] [OMPI users] OpenMPI Internals & Static-Analysis.

2008-05-09 Thread Jeff Squyres

On May 8, 2008, at 1:20 PM, Mukesh K Srivastava wrote:

The OMPI community should think to come with OpenMPI Internals  
document. Probably having an Internal document will certainly help  
developers of OpenMPI.


Yes, it will.  It's something we've talked about many times, but it  
has unfortunately always come down to a time/resources issue -- no one  
has the time or people to do it (and keep the document up-to-date with  
the ever-changing code base).


What one has to do - if one is thinking to come with OMPI Internals  
document to start with? Is there any link or project repository  
within OMPI to start working on OMPI Internals document.


You might want to talk to the docs sub-project -- their first goal was  
to make user-level documentation, but theyv'e gone kinda quiet over  
the last month or three.  Regardless, they may have some good opinions  
about documentation format, technology, etc.


http://www.open-mpi.org/projects/user-docs/

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] OpenMPI-v1.3.1 Tentatives dates release eversion

2008-05-09 Thread Ralph Castain
Well, history would indicate that 1.3.1 will be released about 1 week after
we release 1.3.0... ;-)

The release date for 1.3.0 remains uncertain ­ see the wiki for the last
guess (currently July).

https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3


On 5/8/08 10:58 PM, "Mukesh K Srivastava"  wrote:

> Hi.
> 
> What is the tentatives release dates for OpenMPI-v1.3.1?
> 
> Any idea.
> 
> BR
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] OpenMPI-v1.3.1 Tentatives dates release eversion

2008-05-09 Thread Jeff Squyres

On May 9, 2008, at 12:58 AM, Mukesh K Srivastava wrote:


What is the tentatives release dates for OpenMPI-v1.3.1?



Please don't CC both mailing lists on future replies to this thread;  
one or the other would be fine; thanks!


Brad Benton and George Bosilca are the release managers for the v1.3  
series.  They're maintaining a wiki for the v1.3 series here:


https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3

We're [finally] darn near feature complete, meaning that we talked  
this week about branching for v1.3 next week.  Then assume that we'll  
test and debug for about 2 months after that.


These are total SWAG's, of course...

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Recv from MTL module hanging on pml_cm_recv.c:mca_pml_cm_recv()

2008-05-09 Thread Caciano Machado
Thank you very much George. I'll check this today.

Caciano

2008/5/8 George Bosilca :

> Caciano,
>
> It's a little bit more complex than that. In fact you should never set the
> req_complete flag to true yourself. Instead you should use
> ompi_request_complete (defined in ompi/request/request.h) which will set the
> flag and trigger a condition broadcast or signal for you. This will allow
> the upper level to be released from the requests condition, and therefore
> discover that the request is completed.
>
>  george.
>
>
>
> On May 8, 2008, at 8:27 PM, Caciano Machado wrote:
>
>  Hi,
>>
>> I'm finishing the implementation of a MTL module but something went wrong.
>> This module is using PML/cm and the Recv operations are hanging in the
>> ompi_request_wait_completion() call in pml_cm_recv.c:mca_pml_cm_recv(). I
>> think that I must set the variable recvreq->req_base.req_ompi.req_complete
>> somewhere but I don't know exactly where is the right place. When I comment
>> out the ompi_request_wait_completion() call the application messages are
>> delivered correctly with my backend.
>>
>> Regards,
>> Caciano
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] [RFC] mca_base_select()

2008-05-09 Thread Ralph Castain
I just hit a problem with this logic - should be a minor change.

We have several frameworks where we have components that are only allowed be
selected if the user specifically requests them by stating -mca foo bar.
Because it is possible for there to be no other components that want to be
selected, and because it is permissible for no components to be selected for
that framework, we set bar's priority to be -1.

The new select logic will not allow a negative priority to be selected, even
if the user specifically requested that component.

If we set the priority to be 0, then the system will allow the component to
be automatically selected. This is not allowed as it can lead to bad
behavior.

So what we need the select system to do is say "if someone specified a
specific component, don't worry about the returned priority - just use it"

Josh: could you please modify this?

Thanks!
Ralph



On 5/8/08 7:04 PM, "Pak Lui"  wrote:

> Thanks very much Josh! Will try it out soon.
> 
> Josh Hursey wrote:
>> Sorry about that. I didn't test that type of option. It should be
>> working in r18418. Let me know if you see any more issues.
>> 
>> -- Josh
>> 
>> On May 8, 2008, at 6:04 PM, Pak Lui wrote:
>> 
>>> I think I have a problem but I am not sure. I used to be able to use the
>>> circumflex (^) to switch between the gridengine launcher and the ssh
>>> launchers by doing something like this, e.g. -mca plm ^gridengine, to
>>> exclude some of the components plm (and also in ras). It doesn't seem
>>> like the 'negate' is in mca_base_component anymore. I guess I just have
>>>   to spell out which component I want explicitly...
>>> 
>>> Josh Hursey wrote:
 This has been committed in r18381
 
 Please let me know if you have any problems with this commit.
 
 Cheers,
 Josh
 
 On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
 
> Awesome.
> 
> The branch is updated to the latest trunk head. I encourage folks to
> check out this repository and make sure that it builds on their
> system. A normal build of the branch should be enough to find out if
> there are any cut-n-paste problems (though I tried to be careful,
> mistakes do happen).
> 
> I haven't heard any problems so this is looking like it will come in
> tomorrow after the teleconf. I'll ask again there to see if there are
> any voices of concern.
> 
> Cheers,
> Josh
> 
> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
> 
>> This all sounds good to me!
>> 
>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
>> 
>>> What:  Add mca_base_select() and adjust frameworks & components to
>>> use
>>> it.
>>> Why:   Consolidation of code for general goodness.
>>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play
>>> When:  Code ready now. Documentation ready soon.
>>> Timeout: May 6, 2008 (After teleconf) [1 week]
>>> 
>>> Discussion:
>>> ---
>>> For a number of years a few developers have been talking about
>>> creating a MCA base component selection function. For various
>>> reasons
>>> this was never implemented. Recently I decided to give it a try.
>>> 
>>> A base select function will allow Open MPI to provide completely
>>> consistent selection behavior for many of its frameworks (18 of 31
>>> to
>>> be exact at the moment). The primary goal of this work is to
>>> improving
>>> code maintainability through code reuse. Other benefits also result
>>> such as a slightly smaller memory footprint.
>>> 
>>> The mca_base_select() function represented the most commonly used
>>> logic for component selection: Select the one component with the
>>> highest priority and close all of the not selected components. This
>>> function can be found at the path below in the branch:
>>> opal/mca/base/mca_base_components_select.c
>>> 
>>> To support this I had to formalize a query() function in the
>>> mca_base_component_t of the form:
>>> int mca_base_query_component_fn(mca_base_module_t **module, int
>>> *priority);
>>> 
>>> This function is specified after the open and close component
>>> functions in this structure as to allow compatibility with
>>> frameworks
>>> that do not use the base selection logic. Frameworks that do *not*
>>> use
>>> this function are *not* effected by this commit. However, every
>>> component in the frameworks that use the mca_base_select function
>>> must
>>> adjust their component query function to fit that specified above.
>>> 
>>> 18 frameworks in Open MPI have been changed. I have updated all of
>>> the
>>> components in the 18 frameworks available in the trunk on my branch.
>>> The effected frameworks are:
>>> - OPAL Carto
>>> - OPAL crs
>>> - OPAL maffinity
>>> - OPAL memchecker
>>> - OPAL paffinity
>>> - ORT

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-09 Thread Josh Hursey

Ralph,

Can you give me an example of a component that I can look at? It will  
allow me to test the fix before committing, and to better understand  
the problem.


-- Josh

On May 9, 2008, at 10:41 AM, Ralph Castain wrote:


I just hit a problem with this logic - should be a minor change.

We have several frameworks where we have components that are only  
allowed be
selected if the user specifically requests them by stating -mca foo  
bar.
Because it is possible for there to be no other components that want  
to be
selected, and because it is permissible for no components to be  
selected for

that framework, we set bar's priority to be -1.

The new select logic will not allow a negative priority to be  
selected, even

if the user specifically requested that component.

If we set the priority to be 0, then the system will allow the  
component to

be automatically selected. This is not allowed as it can lead to bad
behavior.

So what we need the select system to do is say "if someone specified a
specific component, don't worry about the returned priority - just  
use it"


Josh: could you please modify this?

Thanks!
Ralph



On 5/8/08 7:04 PM, "Pak Lui"  wrote:


Thanks very much Josh! Will try it out soon.

Josh Hursey wrote:

Sorry about that. I didn't test that type of option. It should be
working in r18418. Let me know if you see any more issues.

-- Josh

On May 8, 2008, at 6:04 PM, Pak Lui wrote:

I think I have a problem but I am not sure. I used to be able to  
use the
circumflex (^) to switch between the gridengine launcher and the  
ssh
launchers by doing something like this, e.g. -mca plm  
^gridengine, to
exclude some of the components plm (and also in ras). It doesn't  
seem
like the 'negate' is in mca_base_component anymore. I guess I  
just have

 to spell out which component I want explicitly...

Josh Hursey wrote:

This has been committed in r18381

Please let me know if you have any problems with this commit.

Cheers,
Josh

On May 5, 2008, at 10:41 AM, Josh Hursey wrote:


Awesome.

The branch is updated to the latest trunk head. I encourage  
folks to

check out this repository and make sure that it builds on their
system. A normal build of the branch should be enough to find  
out if

there are any cut-n-paste problems (though I tried to be careful,
mistakes do happen).

I haven't heard any problems so this is looking like it will  
come in
tomorrow after the teleconf. I'll ask again there to see if  
there are

any voices of concern.

Cheers,
Josh

On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:


This all sounds good to me!

On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:

What:  Add mca_base_select() and adjust frameworks &  
components to

use
it.
Why:   Consolidation of code for general goodness.
Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca- 
play

When:  Code ready now. Documentation ready soon.
Timeout: May 6, 2008 (After teleconf) [1 week]

Discussion:
---
For a number of years a few developers have been talking about
creating a MCA base component selection function. For various
reasons
this was never implemented. Recently I decided to give it a  
try.


A base select function will allow Open MPI to provide  
completely
consistent selection behavior for many of its frameworks (18  
of 31

to
be exact at the moment). The primary goal of this work is to
improving
code maintainability through code reuse. Other benefits also  
result

such as a slightly smaller memory footprint.

The mca_base_select() function represented the most commonly  
used
logic for component selection: Select the one component with  
the
highest priority and close all of the not selected  
components. This

function can be found at the path below in the branch:
opal/mca/base/mca_base_components_select.c

To support this I had to formalize a query() function in the
mca_base_component_t of the form:
int mca_base_query_component_fn(mca_base_module_t **module, int
*priority);

This function is specified after the open and close component
functions in this structure as to allow compatibility with
frameworks
that do not use the base selection logic. Frameworks that do  
*not*

use
this function are *not* effected by this commit. However, every
component in the frameworks that use the mca_base_select  
function

must
adjust their component query function to fit that specified  
above.


18 frameworks in Open MPI have been changed. I have updated  
all of

the
components in the 18 frameworks available in the trunk on my  
branch.

The effected frameworks are:
- OPAL Carto
- OPAL crs
- OPAL maffinity
- OPAL memchecker
- OPAL paffinity
- ORTE errmgr
- ORTE ess
- ORTE Filem
- ORTE grpcomm
- ORTE odls
- ORTE pml
- ORTE ras
- ORTE rmaps
- ORTE routed
- ORTE snapc
- OMPI crcp
- OMPI dpm
- OMPI pubsub

There was a question of the memory footprint change as a  
result of
this commit. I used 'pmap' to determine process memory  
footprint

of a
hello world MPI program. Static and Shared build numb

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-09 Thread Ralph Castain
Sure - take a look at the hg repository Jeff and I are working on:

http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/channel

Te opal/mca/filter framework illustrates the problem. I have one component
in there right now, with a default module defined in the base. That
component must only be selected if the user calls it. With the current
select logic, I can't do this - if the priority is >=0, then it always is
automatically selected. Priority < 0, never selectable even if specified.

Thanks
Ralph



On 5/9/08 8:52 AM, "Josh Hursey"  wrote:

> Ralph,
> 
> Can you give me an example of a component that I can look at? It will
> allow me to test the fix before committing, and to better understand
> the problem.
> 
> -- Josh
> 
> On May 9, 2008, at 10:41 AM, Ralph Castain wrote:
> 
>> I just hit a problem with this logic - should be a minor change.
>> 
>> We have several frameworks where we have components that are only
>> allowed be
>> selected if the user specifically requests them by stating -mca foo
>> bar.
>> Because it is possible for there to be no other components that want
>> to be
>> selected, and because it is permissible for no components to be
>> selected for
>> that framework, we set bar's priority to be -1.
>> 
>> The new select logic will not allow a negative priority to be
>> selected, even
>> if the user specifically requested that component.
>> 
>> If we set the priority to be 0, then the system will allow the
>> component to
>> be automatically selected. This is not allowed as it can lead to bad
>> behavior.
>> 
>> So what we need the select system to do is say "if someone specified a
>> specific component, don't worry about the returned priority - just
>> use it"
>> 
>> Josh: could you please modify this?
>> 
>> Thanks!
>> Ralph
>> 
>> 
>> 
>> On 5/8/08 7:04 PM, "Pak Lui"  wrote:
>> 
>>> Thanks very much Josh! Will try it out soon.
>>> 
>>> Josh Hursey wrote:
 Sorry about that. I didn't test that type of option. It should be
 working in r18418. Let me know if you see any more issues.
 
 -- Josh
 
 On May 8, 2008, at 6:04 PM, Pak Lui wrote:
 
> I think I have a problem but I am not sure. I used to be able to
> use the
> circumflex (^) to switch between the gridengine launcher and the
> ssh
> launchers by doing something like this, e.g. -mca plm
> ^gridengine, to
> exclude some of the components plm (and also in ras). It doesn't
> seem
> like the 'negate' is in mca_base_component anymore. I guess I
> just have
>  to spell out which component I want explicitly...
> 
> Josh Hursey wrote:
>> This has been committed in r18381
>> 
>> Please let me know if you have any problems with this commit.
>> 
>> Cheers,
>> Josh
>> 
>> On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
>> 
>>> Awesome.
>>> 
>>> The branch is updated to the latest trunk head. I encourage
>>> folks to
>>> check out this repository and make sure that it builds on their
>>> system. A normal build of the branch should be enough to find
>>> out if
>>> there are any cut-n-paste problems (though I tried to be careful,
>>> mistakes do happen).
>>> 
>>> I haven't heard any problems so this is looking like it will
>>> come in
>>> tomorrow after the teleconf. I'll ask again there to see if
>>> there are
>>> any voices of concern.
>>> 
>>> Cheers,
>>> Josh
>>> 
>>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
>>> 
 This all sounds good to me!
 
 On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
 
> What:  Add mca_base_select() and adjust frameworks &
> components to
> use
> it.
> Why:   Consolidation of code for general goodness.
> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-
> play
> When:  Code ready now. Documentation ready soon.
> Timeout: May 6, 2008 (After teleconf) [1 week]
> 
> Discussion:
> ---
> For a number of years a few developers have been talking about
> creating a MCA base component selection function. For various
> reasons
> this was never implemented. Recently I decided to give it a
> try.
> 
> A base select function will allow Open MPI to provide
> completely
> consistent selection behavior for many of its frameworks (18
> of 31
> to
> be exact at the moment). The primary goal of this work is to
> improving
> code maintainability through code reuse. Other benefits also
> result
> such as a slightly smaller memory footprint.
> 
> The mca_base_select() function represented the most commonly
> used
> logic for component selection: Select the one component with
> the
> highest priority and close all of the not se

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-09 Thread Josh Hursey
Ok I think I understand the problem a bit better now. I attached a  
patch that should fix this, but I want you to check it out before I  
commit just to make sure.


If you specify '-mca filter xml' on the command line then only the  
'xml' component should be opened by mca_base_open. The problem was  
that the selection logic used -1 as the lowest acceptable priority,  
which conflicted with the set priority of the 'xml' component. This  
patch sets this value to INT32_MIN which should be well below any  
negative priority that a component would set for itself.


Let me know if this works for you and I'll commit it.

Cheers,
Josh



select.patch
Description: Binary data



On May 9, 2008, at 11:14 AM, Ralph Castain wrote:


Sure - take a look at the hg repository Jeff and I are working on:

http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/channel

Te opal/mca/filter framework illustrates the problem. I have one  
component

in there right now, with a default module defined in the base. That
component must only be selected if the user calls it. With the current
select logic, I can't do this - if the priority is >=0, then it  
always is
automatically selected. Priority < 0, never selectable even if  
specified.


Thanks
Ralph



On 5/9/08 8:52 AM, "Josh Hursey"  wrote:


Ralph,

Can you give me an example of a component that I can look at? It will
allow me to test the fix before committing, and to better understand
the problem.

-- Josh

On May 9, 2008, at 10:41 AM, Ralph Castain wrote:


I just hit a problem with this logic - should be a minor change.

We have several frameworks where we have components that are only
allowed be
selected if the user specifically requests them by stating -mca foo
bar.
Because it is possible for there to be no other components that want
to be
selected, and because it is permissible for no components to be
selected for
that framework, we set bar's priority to be -1.

The new select logic will not allow a negative priority to be
selected, even
if the user specifically requested that component.

If we set the priority to be 0, then the system will allow the
component to
be automatically selected. This is not allowed as it can lead to bad
behavior.

So what we need the select system to do is say "if someone  
specified a

specific component, don't worry about the returned priority - just
use it"

Josh: could you please modify this?

Thanks!
Ralph



On 5/8/08 7:04 PM, "Pak Lui"  wrote:


Thanks very much Josh! Will try it out soon.

Josh Hursey wrote:

Sorry about that. I didn't test that type of option. It should be
working in r18418. Let me know if you see any more issues.

-- Josh

On May 8, 2008, at 6:04 PM, Pak Lui wrote:


I think I have a problem but I am not sure. I used to be able to
use the
circumflex (^) to switch between the gridengine launcher and the
ssh
launchers by doing something like this, e.g. -mca plm
^gridengine, to
exclude some of the components plm (and also in ras). It doesn't
seem
like the 'negate' is in mca_base_component anymore. I guess I
just have
to spell out which component I want explicitly...

Josh Hursey wrote:

This has been committed in r18381

Please let me know if you have any problems with this commit.

Cheers,
Josh

On May 5, 2008, at 10:41 AM, Josh Hursey wrote:


Awesome.

The branch is updated to the latest trunk head. I encourage
folks to
check out this repository and make sure that it builds on their
system. A normal build of the branch should be enough to find
out if
there are any cut-n-paste problems (though I tried to be  
careful,

mistakes do happen).

I haven't heard any problems so this is looking like it will
come in
tomorrow after the teleconf. I'll ask again there to see if
there are
any voices of concern.

Cheers,
Josh

On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:


This all sounds good to me!

On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:


What:  Add mca_base_select() and adjust frameworks &
components to
use
it.
Why:   Consolidation of code for general goodness.
Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-
play
When:  Code ready now. Documentation ready soon.
Timeout: May 6, 2008 (After teleconf) [1 week]

Discussion:
---
For a number of years a few developers have been talking  
about

creating a MCA base component selection function. For various
reasons
this was never implemented. Recently I decided to give it a
try.

A base select function will allow Open MPI to provide
completely
consistent selection behavior for many of its frameworks (18
of 31
to
be exact at the moment). The primary goal of this work is to
improving
code maintainability through code reuse. Other benefits also
result
such as a slightly smaller memory footprint.

The mca_base_select() function represented the most commonly
used
logic for component selection: Select the one component with
the
highest priority and close all of the not selected
components. This
function can be found at the path below in the branch:
opal/mca/base/

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-09 Thread Ralph Castain
Not quite, Josh - I fixed it in our branch. Will send you a revised patch
that does the job off-list for your review.

Thanks
Ralph



On 5/9/08 9:35 AM, "Josh Hursey"  wrote:

> Ok I think I understand the problem a bit better now. I attached a
> patch that should fix this, but I want you to check it out before I
> commit just to make sure.
> 
> If you specify '-mca filter xml' on the command line then only the
> 'xml' component should be opened by mca_base_open. The problem was
> that the selection logic used -1 as the lowest acceptable priority,
> which conflicted with the set priority of the 'xml' component. This
> patch sets this value to INT32_MIN which should be well below any
> negative priority that a component would set for itself.
> 
> Let me know if this works for you and I'll commit it.
> 
> Cheers,
> Josh
> 
> 
> 
> On May 9, 2008, at 11:14 AM, Ralph Castain wrote:
> 
>> Sure - take a look at the hg repository Jeff and I are working on:
>> 
>> http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/channel
>> 
>> Te opal/mca/filter framework illustrates the problem. I have one
>> component
>> in there right now, with a default module defined in the base. That
>> component must only be selected if the user calls it. With the current
>> select logic, I can't do this - if the priority is >=0, then it
>> always is
>> automatically selected. Priority < 0, never selectable even if
>> specified.
>> 
>> Thanks
>> Ralph
>> 
>> 
>> 
>> On 5/9/08 8:52 AM, "Josh Hursey"  wrote:
>> 
>>> Ralph,
>>> 
>>> Can you give me an example of a component that I can look at? It will
>>> allow me to test the fix before committing, and to better understand
>>> the problem.
>>> 
>>> -- Josh
>>> 
>>> On May 9, 2008, at 10:41 AM, Ralph Castain wrote:
>>> 
 I just hit a problem with this logic - should be a minor change.
 
 We have several frameworks where we have components that are only
 allowed be
 selected if the user specifically requests them by stating -mca foo
 bar.
 Because it is possible for there to be no other components that want
 to be
 selected, and because it is permissible for no components to be
 selected for
 that framework, we set bar's priority to be -1.
 
 The new select logic will not allow a negative priority to be
 selected, even
 if the user specifically requested that component.
 
 If we set the priority to be 0, then the system will allow the
 component to
 be automatically selected. This is not allowed as it can lead to bad
 behavior.
 
 So what we need the select system to do is say "if someone
 specified a
 specific component, don't worry about the returned priority - just
 use it"
 
 Josh: could you please modify this?
 
 Thanks!
 Ralph
 
 
 
 On 5/8/08 7:04 PM, "Pak Lui"  wrote:
 
> Thanks very much Josh! Will try it out soon.
> 
> Josh Hursey wrote:
>> Sorry about that. I didn't test that type of option. It should be
>> working in r18418. Let me know if you see any more issues.
>> 
>> -- Josh
>> 
>> On May 8, 2008, at 6:04 PM, Pak Lui wrote:
>> 
>>> I think I have a problem but I am not sure. I used to be able to
>>> use the
>>> circumflex (^) to switch between the gridengine launcher and the
>>> ssh
>>> launchers by doing something like this, e.g. -mca plm
>>> ^gridengine, to
>>> exclude some of the components plm (and also in ras). It doesn't
>>> seem
>>> like the 'negate' is in mca_base_component anymore. I guess I
>>> just have
>>> to spell out which component I want explicitly...
>>> 
>>> Josh Hursey wrote:
 This has been committed in r18381
 
 Please let me know if you have any problems with this commit.
 
 Cheers,
 Josh
 
 On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
 
> Awesome.
> 
> The branch is updated to the latest trunk head. I encourage
> folks to
> check out this repository and make sure that it builds on their
> system. A normal build of the branch should be enough to find
> out if
> there are any cut-n-paste problems (though I tried to be
> careful,
> mistakes do happen).
> 
> I haven't heard any problems so this is looking like it will
> come in
> tomorrow after the teleconf. I'll ask again there to see if
> there are
> any voices of concern.
> 
> Cheers,
> Josh
> 
> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
> 
>> This all sounds good to me!
>> 
>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
>> 
>>> What:  Add mca_base_select() and adjust frameworks &
>>> components to
>>> use
>>> it.
>>> Why:   Consolidation of code for general 

[OMPI devel] Changes: opal_output and opal_show_help

2008-05-09 Thread Jeff Squyres
Per the teleconf this week, Ralph and I worked up two new features  
that we're nearly ready to put back in the trunk:


1. IBM+LANL needed a way to XML-ize all output that comes out of OMPI  
so that 3rd party tools can parse and use it intelligently (e.g., the  
PTP debugger can now distinguish between OMPI error messages and  
stderr from the MPI app).


2. In order to do #1, we created separate logical channels (vs, just  
throwing everything in stderr and letting IOF relay it back to the  
HNP) for the following:

   - stdout/stderr from the MPI app
   - opal_show_help() messages (***)
   - opal_output*() messages (***)
As a side effect, we now filter show_help() messages and only print  
them *once* at the HNP (this has been a very long-standing goal of  
mine).  So if your MPI app barfs, you will no longer see the same  
show_help() error message N times -- you'll see it only once, possibly  
accompanied with a "...and we got the same error message from N other  
processes" notice.


(***) To make both #1 and #2 work, we had to raise the abstraction  
level.  That is, there had to be job-level intelligence about the  
different kinds of output.  So we have created orte_output() (and  
friends) and orte_show_help().  The OPAL variants still exist, but  
they *SHOULD NOT BE USED* by the MPI layer.  Specifically, the OPAL  
variants are for what OPAL does best: single process stuff.  The ORTE  
variants provide the job-level intelligence, such as duplicate  
show_help filtering, relaying to the HNP in a different channel than  
IOF, etc.


So when this stuff hits the trunk, you'll see a ton of s/opal_output/ 
orte_output/g and /opal_show_help/orte_show_help/g changes throughout  
the code base.  Do not be alarmed.  :-)


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Changes: opal_output and opal_show_help

2008-05-09 Thread George Bosilca

I do have some questions about this.

1) If I correctly understood, we need the orte_output and  
orte_show_help in order to be able to make a difference between the  
application stdout/stderr and the MPI library ones ? Who is applying  
the filter ? The local daemon or the HNP ? How do we make sure that  
the remote outputs are not interlaced ?


2) Who is really generating the error message ? In your item #2 I  
wonder how do you make the difference between what need to be printed  
once (such as the PML initialization error) and what is supposed to be  
printed multiple times (such as BTL TCP connection failure) ? If the  
HPN is managing these error messages, this will force us to always  
install all error files, otherwise this approach cannot work on an  
heterogeneous environment (such as the local installation doesn't have  
infiniband support but the remote one include it).


3) What is the OMPI layer supposed to use ? opal_output ?  
orte_output ? or maybe ompi_output ?


  george.

On May 9, 2008, at 5:52 PM, Jeff Squyres wrote:


Per the teleconf this week, Ralph and I worked up two new features
that we're nearly ready to put back in the trunk:

1. IBM+LANL needed a way to XML-ize all output that comes out of OMPI
so that 3rd party tools can parse and use it intelligently (e.g., the
PTP debugger can now distinguish between OMPI error messages and
stderr from the MPI app).

2. In order to do #1, we created separate logical channels (vs, just
throwing everything in stderr and letting IOF relay it back to the
HNP) for the following:
   - stdout/stderr from the MPI app
   - opal_show_help() messages (***)
   - opal_output*() messages (***)
As a side effect, we now filter show_help() messages and only print
them *once* at the HNP (this has been a very long-standing goal of
mine).  So if your MPI app barfs, you will no longer see the same
show_help() error message N times -- you'll see it only once, possibly
accompanied with a "...and we got the same error message from N other
processes" notice.

(***) To make both #1 and #2 work, we had to raise the abstraction
level.  That is, there had to be job-level intelligence about the
different kinds of output.  So we have created orte_output() (and
friends) and orte_show_help().  The OPAL variants still exist, but
they *SHOULD NOT BE USED* by the MPI layer.  Specifically, the OPAL
variants are for what OPAL does best: single process stuff.  The ORTE
variants provide the job-level intelligence, such as duplicate
show_help filtering, relaying to the HNP in a different channel than
IOF, etc.

So when this stuff hits the trunk, you'll see a ton of s/opal_output/
orte_output/g and /opal_show_help/orte_show_help/g changes throughout
the code base.  Do not be alarmed.  :-)

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] Changes: opal_output and opal_show_help

2008-05-09 Thread Josh Hursey

Is there a RFC telling us when we might expect this?

On May 9, 2008, at 5:52 PM, Jeff Squyres wrote:


So when this stuff hits the trunk,