[OMPI devel] OpenMPI-v1.3.1 Tentatives dates release eversion
Hi. What is the tentatives release dates for OpenMPI-v1.3.1? Any idea. BR
Re: [OMPI devel] [OMPI users] OpenMPI Internals & Static-Analysis.
On May 8, 2008, at 1:20 PM, Mukesh K Srivastava wrote: The OMPI community should think to come with OpenMPI Internals document. Probably having an Internal document will certainly help developers of OpenMPI. Yes, it will. It's something we've talked about many times, but it has unfortunately always come down to a time/resources issue -- no one has the time or people to do it (and keep the document up-to-date with the ever-changing code base). What one has to do - if one is thinking to come with OMPI Internals document to start with? Is there any link or project repository within OMPI to start working on OMPI Internals document. You might want to talk to the docs sub-project -- their first goal was to make user-level documentation, but theyv'e gone kinda quiet over the last month or three. Regardless, they may have some good opinions about documentation format, technology, etc. http://www.open-mpi.org/projects/user-docs/ -- Jeff Squyres Cisco Systems
Re: [OMPI devel] OpenMPI-v1.3.1 Tentatives dates release eversion
Well, history would indicate that 1.3.1 will be released about 1 week after we release 1.3.0... ;-) The release date for 1.3.0 remains uncertain see the wiki for the last guess (currently July). https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 On 5/8/08 10:58 PM, "Mukesh K Srivastava" wrote: > Hi. > > What is the tentatives release dates for OpenMPI-v1.3.1? > > Any idea. > > BR > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] OpenMPI-v1.3.1 Tentatives dates release eversion
On May 9, 2008, at 12:58 AM, Mukesh K Srivastava wrote: What is the tentatives release dates for OpenMPI-v1.3.1? Please don't CC both mailing lists on future replies to this thread; one or the other would be fine; thanks! Brad Benton and George Bosilca are the release managers for the v1.3 series. They're maintaining a wiki for the v1.3 series here: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 We're [finally] darn near feature complete, meaning that we talked this week about branching for v1.3 next week. Then assume that we'll test and debug for about 2 months after that. These are total SWAG's, of course... -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Recv from MTL module hanging on pml_cm_recv.c:mca_pml_cm_recv()
Thank you very much George. I'll check this today. Caciano 2008/5/8 George Bosilca : > Caciano, > > It's a little bit more complex than that. In fact you should never set the > req_complete flag to true yourself. Instead you should use > ompi_request_complete (defined in ompi/request/request.h) which will set the > flag and trigger a condition broadcast or signal for you. This will allow > the upper level to be released from the requests condition, and therefore > discover that the request is completed. > > george. > > > > On May 8, 2008, at 8:27 PM, Caciano Machado wrote: > > Hi, >> >> I'm finishing the implementation of a MTL module but something went wrong. >> This module is using PML/cm and the Recv operations are hanging in the >> ompi_request_wait_completion() call in pml_cm_recv.c:mca_pml_cm_recv(). I >> think that I must set the variable recvreq->req_base.req_ompi.req_complete >> somewhere but I don't know exactly where is the right place. When I comment >> out the ompi_request_wait_completion() call the application messages are >> delivered correctly with my backend. >> >> Regards, >> Caciano >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] [RFC] mca_base_select()
I just hit a problem with this logic - should be a minor change. We have several frameworks where we have components that are only allowed be selected if the user specifically requests them by stating -mca foo bar. Because it is possible for there to be no other components that want to be selected, and because it is permissible for no components to be selected for that framework, we set bar's priority to be -1. The new select logic will not allow a negative priority to be selected, even if the user specifically requested that component. If we set the priority to be 0, then the system will allow the component to be automatically selected. This is not allowed as it can lead to bad behavior. So what we need the select system to do is say "if someone specified a specific component, don't worry about the returned priority - just use it" Josh: could you please modify this? Thanks! Ralph On 5/8/08 7:04 PM, "Pak Lui" wrote: > Thanks very much Josh! Will try it out soon. > > Josh Hursey wrote: >> Sorry about that. I didn't test that type of option. It should be >> working in r18418. Let me know if you see any more issues. >> >> -- Josh >> >> On May 8, 2008, at 6:04 PM, Pak Lui wrote: >> >>> I think I have a problem but I am not sure. I used to be able to use the >>> circumflex (^) to switch between the gridengine launcher and the ssh >>> launchers by doing something like this, e.g. -mca plm ^gridengine, to >>> exclude some of the components plm (and also in ras). It doesn't seem >>> like the 'negate' is in mca_base_component anymore. I guess I just have >>> to spell out which component I want explicitly... >>> >>> Josh Hursey wrote: This has been committed in r18381 Please let me know if you have any problems with this commit. Cheers, Josh On May 5, 2008, at 10:41 AM, Josh Hursey wrote: > Awesome. > > The branch is updated to the latest trunk head. I encourage folks to > check out this repository and make sure that it builds on their > system. A normal build of the branch should be enough to find out if > there are any cut-n-paste problems (though I tried to be careful, > mistakes do happen). > > I haven't heard any problems so this is looking like it will come in > tomorrow after the teleconf. I'll ask again there to see if there are > any voices of concern. > > Cheers, > Josh > > On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: > >> This all sounds good to me! >> >> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: >> >>> What: Add mca_base_select() and adjust frameworks & components to >>> use >>> it. >>> Why: Consolidation of code for general goodness. >>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play >>> When: Code ready now. Documentation ready soon. >>> Timeout: May 6, 2008 (After teleconf) [1 week] >>> >>> Discussion: >>> --- >>> For a number of years a few developers have been talking about >>> creating a MCA base component selection function. For various >>> reasons >>> this was never implemented. Recently I decided to give it a try. >>> >>> A base select function will allow Open MPI to provide completely >>> consistent selection behavior for many of its frameworks (18 of 31 >>> to >>> be exact at the moment). The primary goal of this work is to >>> improving >>> code maintainability through code reuse. Other benefits also result >>> such as a slightly smaller memory footprint. >>> >>> The mca_base_select() function represented the most commonly used >>> logic for component selection: Select the one component with the >>> highest priority and close all of the not selected components. This >>> function can be found at the path below in the branch: >>> opal/mca/base/mca_base_components_select.c >>> >>> To support this I had to formalize a query() function in the >>> mca_base_component_t of the form: >>> int mca_base_query_component_fn(mca_base_module_t **module, int >>> *priority); >>> >>> This function is specified after the open and close component >>> functions in this structure as to allow compatibility with >>> frameworks >>> that do not use the base selection logic. Frameworks that do *not* >>> use >>> this function are *not* effected by this commit. However, every >>> component in the frameworks that use the mca_base_select function >>> must >>> adjust their component query function to fit that specified above. >>> >>> 18 frameworks in Open MPI have been changed. I have updated all of >>> the >>> components in the 18 frameworks available in the trunk on my branch. >>> The effected frameworks are: >>> - OPAL Carto >>> - OPAL crs >>> - OPAL maffinity >>> - OPAL memchecker >>> - OPAL paffinity >>> - ORT
Re: [OMPI devel] [RFC] mca_base_select()
Ralph, Can you give me an example of a component that I can look at? It will allow me to test the fix before committing, and to better understand the problem. -- Josh On May 9, 2008, at 10:41 AM, Ralph Castain wrote: I just hit a problem with this logic - should be a minor change. We have several frameworks where we have components that are only allowed be selected if the user specifically requests them by stating -mca foo bar. Because it is possible for there to be no other components that want to be selected, and because it is permissible for no components to be selected for that framework, we set bar's priority to be -1. The new select logic will not allow a negative priority to be selected, even if the user specifically requested that component. If we set the priority to be 0, then the system will allow the component to be automatically selected. This is not allowed as it can lead to bad behavior. So what we need the select system to do is say "if someone specified a specific component, don't worry about the returned priority - just use it" Josh: could you please modify this? Thanks! Ralph On 5/8/08 7:04 PM, "Pak Lui" wrote: Thanks very much Josh! Will try it out soon. Josh Hursey wrote: Sorry about that. I didn't test that type of option. It should be working in r18418. Let me know if you see any more issues. -- Josh On May 8, 2008, at 6:04 PM, Pak Lui wrote: I think I have a problem but I am not sure. I used to be able to use the circumflex (^) to switch between the gridengine launcher and the ssh launchers by doing something like this, e.g. -mca plm ^gridengine, to exclude some of the components plm (and also in ras). It doesn't seem like the 'negate' is in mca_base_component anymore. I guess I just have to spell out which component I want explicitly... Josh Hursey wrote: This has been committed in r18381 Please let me know if you have any problems with this commit. Cheers, Josh On May 5, 2008, at 10:41 AM, Josh Hursey wrote: Awesome. The branch is updated to the latest trunk head. I encourage folks to check out this repository and make sure that it builds on their system. A normal build of the branch should be enough to find out if there are any cut-n-paste problems (though I tried to be careful, mistakes do happen). I haven't heard any problems so this is looking like it will come in tomorrow after the teleconf. I'll ask again there to see if there are any voices of concern. Cheers, Josh On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: This all sounds good to me! On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: What: Add mca_base_select() and adjust frameworks & components to use it. Why: Consolidation of code for general goodness. Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca- play When: Code ready now. Documentation ready soon. Timeout: May 6, 2008 (After teleconf) [1 week] Discussion: --- For a number of years a few developers have been talking about creating a MCA base component selection function. For various reasons this was never implemented. Recently I decided to give it a try. A base select function will allow Open MPI to provide completely consistent selection behavior for many of its frameworks (18 of 31 to be exact at the moment). The primary goal of this work is to improving code maintainability through code reuse. Other benefits also result such as a slightly smaller memory footprint. The mca_base_select() function represented the most commonly used logic for component selection: Select the one component with the highest priority and close all of the not selected components. This function can be found at the path below in the branch: opal/mca/base/mca_base_components_select.c To support this I had to formalize a query() function in the mca_base_component_t of the form: int mca_base_query_component_fn(mca_base_module_t **module, int *priority); This function is specified after the open and close component functions in this structure as to allow compatibility with frameworks that do not use the base selection logic. Frameworks that do *not* use this function are *not* effected by this commit. However, every component in the frameworks that use the mca_base_select function must adjust their component query function to fit that specified above. 18 frameworks in Open MPI have been changed. I have updated all of the components in the 18 frameworks available in the trunk on my branch. The effected frameworks are: - OPAL Carto - OPAL crs - OPAL maffinity - OPAL memchecker - OPAL paffinity - ORTE errmgr - ORTE ess - ORTE Filem - ORTE grpcomm - ORTE odls - ORTE pml - ORTE ras - ORTE rmaps - ORTE routed - ORTE snapc - OMPI crcp - OMPI dpm - OMPI pubsub There was a question of the memory footprint change as a result of this commit. I used 'pmap' to determine process memory footprint of a hello world MPI program. Static and Shared build numb
Re: [OMPI devel] [RFC] mca_base_select()
Sure - take a look at the hg repository Jeff and I are working on: http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/channel Te opal/mca/filter framework illustrates the problem. I have one component in there right now, with a default module defined in the base. That component must only be selected if the user calls it. With the current select logic, I can't do this - if the priority is >=0, then it always is automatically selected. Priority < 0, never selectable even if specified. Thanks Ralph On 5/9/08 8:52 AM, "Josh Hursey" wrote: > Ralph, > > Can you give me an example of a component that I can look at? It will > allow me to test the fix before committing, and to better understand > the problem. > > -- Josh > > On May 9, 2008, at 10:41 AM, Ralph Castain wrote: > >> I just hit a problem with this logic - should be a minor change. >> >> We have several frameworks where we have components that are only >> allowed be >> selected if the user specifically requests them by stating -mca foo >> bar. >> Because it is possible for there to be no other components that want >> to be >> selected, and because it is permissible for no components to be >> selected for >> that framework, we set bar's priority to be -1. >> >> The new select logic will not allow a negative priority to be >> selected, even >> if the user specifically requested that component. >> >> If we set the priority to be 0, then the system will allow the >> component to >> be automatically selected. This is not allowed as it can lead to bad >> behavior. >> >> So what we need the select system to do is say "if someone specified a >> specific component, don't worry about the returned priority - just >> use it" >> >> Josh: could you please modify this? >> >> Thanks! >> Ralph >> >> >> >> On 5/8/08 7:04 PM, "Pak Lui" wrote: >> >>> Thanks very much Josh! Will try it out soon. >>> >>> Josh Hursey wrote: Sorry about that. I didn't test that type of option. It should be working in r18418. Let me know if you see any more issues. -- Josh On May 8, 2008, at 6:04 PM, Pak Lui wrote: > I think I have a problem but I am not sure. I used to be able to > use the > circumflex (^) to switch between the gridengine launcher and the > ssh > launchers by doing something like this, e.g. -mca plm > ^gridengine, to > exclude some of the components plm (and also in ras). It doesn't > seem > like the 'negate' is in mca_base_component anymore. I guess I > just have > to spell out which component I want explicitly... > > Josh Hursey wrote: >> This has been committed in r18381 >> >> Please let me know if you have any problems with this commit. >> >> Cheers, >> Josh >> >> On May 5, 2008, at 10:41 AM, Josh Hursey wrote: >> >>> Awesome. >>> >>> The branch is updated to the latest trunk head. I encourage >>> folks to >>> check out this repository and make sure that it builds on their >>> system. A normal build of the branch should be enough to find >>> out if >>> there are any cut-n-paste problems (though I tried to be careful, >>> mistakes do happen). >>> >>> I haven't heard any problems so this is looking like it will >>> come in >>> tomorrow after the teleconf. I'll ask again there to see if >>> there are >>> any voices of concern. >>> >>> Cheers, >>> Josh >>> >>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: >>> This all sounds good to me! On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: > What: Add mca_base_select() and adjust frameworks & > components to > use > it. > Why: Consolidation of code for general goodness. > Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca- > play > When: Code ready now. Documentation ready soon. > Timeout: May 6, 2008 (After teleconf) [1 week] > > Discussion: > --- > For a number of years a few developers have been talking about > creating a MCA base component selection function. For various > reasons > this was never implemented. Recently I decided to give it a > try. > > A base select function will allow Open MPI to provide > completely > consistent selection behavior for many of its frameworks (18 > of 31 > to > be exact at the moment). The primary goal of this work is to > improving > code maintainability through code reuse. Other benefits also > result > such as a slightly smaller memory footprint. > > The mca_base_select() function represented the most commonly > used > logic for component selection: Select the one component with > the > highest priority and close all of the not se
Re: [OMPI devel] [RFC] mca_base_select()
Ok I think I understand the problem a bit better now. I attached a patch that should fix this, but I want you to check it out before I commit just to make sure. If you specify '-mca filter xml' on the command line then only the 'xml' component should be opened by mca_base_open. The problem was that the selection logic used -1 as the lowest acceptable priority, which conflicted with the set priority of the 'xml' component. This patch sets this value to INT32_MIN which should be well below any negative priority that a component would set for itself. Let me know if this works for you and I'll commit it. Cheers, Josh select.patch Description: Binary data On May 9, 2008, at 11:14 AM, Ralph Castain wrote: Sure - take a look at the hg repository Jeff and I are working on: http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/channel Te opal/mca/filter framework illustrates the problem. I have one component in there right now, with a default module defined in the base. That component must only be selected if the user calls it. With the current select logic, I can't do this - if the priority is >=0, then it always is automatically selected. Priority < 0, never selectable even if specified. Thanks Ralph On 5/9/08 8:52 AM, "Josh Hursey" wrote: Ralph, Can you give me an example of a component that I can look at? It will allow me to test the fix before committing, and to better understand the problem. -- Josh On May 9, 2008, at 10:41 AM, Ralph Castain wrote: I just hit a problem with this logic - should be a minor change. We have several frameworks where we have components that are only allowed be selected if the user specifically requests them by stating -mca foo bar. Because it is possible for there to be no other components that want to be selected, and because it is permissible for no components to be selected for that framework, we set bar's priority to be -1. The new select logic will not allow a negative priority to be selected, even if the user specifically requested that component. If we set the priority to be 0, then the system will allow the component to be automatically selected. This is not allowed as it can lead to bad behavior. So what we need the select system to do is say "if someone specified a specific component, don't worry about the returned priority - just use it" Josh: could you please modify this? Thanks! Ralph On 5/8/08 7:04 PM, "Pak Lui" wrote: Thanks very much Josh! Will try it out soon. Josh Hursey wrote: Sorry about that. I didn't test that type of option. It should be working in r18418. Let me know if you see any more issues. -- Josh On May 8, 2008, at 6:04 PM, Pak Lui wrote: I think I have a problem but I am not sure. I used to be able to use the circumflex (^) to switch between the gridengine launcher and the ssh launchers by doing something like this, e.g. -mca plm ^gridengine, to exclude some of the components plm (and also in ras). It doesn't seem like the 'negate' is in mca_base_component anymore. I guess I just have to spell out which component I want explicitly... Josh Hursey wrote: This has been committed in r18381 Please let me know if you have any problems with this commit. Cheers, Josh On May 5, 2008, at 10:41 AM, Josh Hursey wrote: Awesome. The branch is updated to the latest trunk head. I encourage folks to check out this repository and make sure that it builds on their system. A normal build of the branch should be enough to find out if there are any cut-n-paste problems (though I tried to be careful, mistakes do happen). I haven't heard any problems so this is looking like it will come in tomorrow after the teleconf. I'll ask again there to see if there are any voices of concern. Cheers, Josh On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: This all sounds good to me! On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: What: Add mca_base_select() and adjust frameworks & components to use it. Why: Consolidation of code for general goodness. Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca- play When: Code ready now. Documentation ready soon. Timeout: May 6, 2008 (After teleconf) [1 week] Discussion: --- For a number of years a few developers have been talking about creating a MCA base component selection function. For various reasons this was never implemented. Recently I decided to give it a try. A base select function will allow Open MPI to provide completely consistent selection behavior for many of its frameworks (18 of 31 to be exact at the moment). The primary goal of this work is to improving code maintainability through code reuse. Other benefits also result such as a slightly smaller memory footprint. The mca_base_select() function represented the most commonly used logic for component selection: Select the one component with the highest priority and close all of the not selected components. This function can be found at the path below in the branch: opal/mca/base/
Re: [OMPI devel] [RFC] mca_base_select()
Not quite, Josh - I fixed it in our branch. Will send you a revised patch that does the job off-list for your review. Thanks Ralph On 5/9/08 9:35 AM, "Josh Hursey" wrote: > Ok I think I understand the problem a bit better now. I attached a > patch that should fix this, but I want you to check it out before I > commit just to make sure. > > If you specify '-mca filter xml' on the command line then only the > 'xml' component should be opened by mca_base_open. The problem was > that the selection logic used -1 as the lowest acceptable priority, > which conflicted with the set priority of the 'xml' component. This > patch sets this value to INT32_MIN which should be well below any > negative priority that a component would set for itself. > > Let me know if this works for you and I'll commit it. > > Cheers, > Josh > > > > On May 9, 2008, at 11:14 AM, Ralph Castain wrote: > >> Sure - take a look at the hg repository Jeff and I are working on: >> >> http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/channel >> >> Te opal/mca/filter framework illustrates the problem. I have one >> component >> in there right now, with a default module defined in the base. That >> component must only be selected if the user calls it. With the current >> select logic, I can't do this - if the priority is >=0, then it >> always is >> automatically selected. Priority < 0, never selectable even if >> specified. >> >> Thanks >> Ralph >> >> >> >> On 5/9/08 8:52 AM, "Josh Hursey" wrote: >> >>> Ralph, >>> >>> Can you give me an example of a component that I can look at? It will >>> allow me to test the fix before committing, and to better understand >>> the problem. >>> >>> -- Josh >>> >>> On May 9, 2008, at 10:41 AM, Ralph Castain wrote: >>> I just hit a problem with this logic - should be a minor change. We have several frameworks where we have components that are only allowed be selected if the user specifically requests them by stating -mca foo bar. Because it is possible for there to be no other components that want to be selected, and because it is permissible for no components to be selected for that framework, we set bar's priority to be -1. The new select logic will not allow a negative priority to be selected, even if the user specifically requested that component. If we set the priority to be 0, then the system will allow the component to be automatically selected. This is not allowed as it can lead to bad behavior. So what we need the select system to do is say "if someone specified a specific component, don't worry about the returned priority - just use it" Josh: could you please modify this? Thanks! Ralph On 5/8/08 7:04 PM, "Pak Lui" wrote: > Thanks very much Josh! Will try it out soon. > > Josh Hursey wrote: >> Sorry about that. I didn't test that type of option. It should be >> working in r18418. Let me know if you see any more issues. >> >> -- Josh >> >> On May 8, 2008, at 6:04 PM, Pak Lui wrote: >> >>> I think I have a problem but I am not sure. I used to be able to >>> use the >>> circumflex (^) to switch between the gridengine launcher and the >>> ssh >>> launchers by doing something like this, e.g. -mca plm >>> ^gridengine, to >>> exclude some of the components plm (and also in ras). It doesn't >>> seem >>> like the 'negate' is in mca_base_component anymore. I guess I >>> just have >>> to spell out which component I want explicitly... >>> >>> Josh Hursey wrote: This has been committed in r18381 Please let me know if you have any problems with this commit. Cheers, Josh On May 5, 2008, at 10:41 AM, Josh Hursey wrote: > Awesome. > > The branch is updated to the latest trunk head. I encourage > folks to > check out this repository and make sure that it builds on their > system. A normal build of the branch should be enough to find > out if > there are any cut-n-paste problems (though I tried to be > careful, > mistakes do happen). > > I haven't heard any problems so this is looking like it will > come in > tomorrow after the teleconf. I'll ask again there to see if > there are > any voices of concern. > > Cheers, > Josh > > On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: > >> This all sounds good to me! >> >> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: >> >>> What: Add mca_base_select() and adjust frameworks & >>> components to >>> use >>> it. >>> Why: Consolidation of code for general
[OMPI devel] Changes: opal_output and opal_show_help
Per the teleconf this week, Ralph and I worked up two new features that we're nearly ready to put back in the trunk: 1. IBM+LANL needed a way to XML-ize all output that comes out of OMPI so that 3rd party tools can parse and use it intelligently (e.g., the PTP debugger can now distinguish between OMPI error messages and stderr from the MPI app). 2. In order to do #1, we created separate logical channels (vs, just throwing everything in stderr and letting IOF relay it back to the HNP) for the following: - stdout/stderr from the MPI app - opal_show_help() messages (***) - opal_output*() messages (***) As a side effect, we now filter show_help() messages and only print them *once* at the HNP (this has been a very long-standing goal of mine). So if your MPI app barfs, you will no longer see the same show_help() error message N times -- you'll see it only once, possibly accompanied with a "...and we got the same error message from N other processes" notice. (***) To make both #1 and #2 work, we had to raise the abstraction level. That is, there had to be job-level intelligence about the different kinds of output. So we have created orte_output() (and friends) and orte_show_help(). The OPAL variants still exist, but they *SHOULD NOT BE USED* by the MPI layer. Specifically, the OPAL variants are for what OPAL does best: single process stuff. The ORTE variants provide the job-level intelligence, such as duplicate show_help filtering, relaying to the HNP in a different channel than IOF, etc. So when this stuff hits the trunk, you'll see a ton of s/opal_output/ orte_output/g and /opal_show_help/orte_show_help/g changes throughout the code base. Do not be alarmed. :-) -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Changes: opal_output and opal_show_help
I do have some questions about this. 1) If I correctly understood, we need the orte_output and orte_show_help in order to be able to make a difference between the application stdout/stderr and the MPI library ones ? Who is applying the filter ? The local daemon or the HNP ? How do we make sure that the remote outputs are not interlaced ? 2) Who is really generating the error message ? In your item #2 I wonder how do you make the difference between what need to be printed once (such as the PML initialization error) and what is supposed to be printed multiple times (such as BTL TCP connection failure) ? If the HPN is managing these error messages, this will force us to always install all error files, otherwise this approach cannot work on an heterogeneous environment (such as the local installation doesn't have infiniband support but the remote one include it). 3) What is the OMPI layer supposed to use ? opal_output ? orte_output ? or maybe ompi_output ? george. On May 9, 2008, at 5:52 PM, Jeff Squyres wrote: Per the teleconf this week, Ralph and I worked up two new features that we're nearly ready to put back in the trunk: 1. IBM+LANL needed a way to XML-ize all output that comes out of OMPI so that 3rd party tools can parse and use it intelligently (e.g., the PTP debugger can now distinguish between OMPI error messages and stderr from the MPI app). 2. In order to do #1, we created separate logical channels (vs, just throwing everything in stderr and letting IOF relay it back to the HNP) for the following: - stdout/stderr from the MPI app - opal_show_help() messages (***) - opal_output*() messages (***) As a side effect, we now filter show_help() messages and only print them *once* at the HNP (this has been a very long-standing goal of mine). So if your MPI app barfs, you will no longer see the same show_help() error message N times -- you'll see it only once, possibly accompanied with a "...and we got the same error message from N other processes" notice. (***) To make both #1 and #2 work, we had to raise the abstraction level. That is, there had to be job-level intelligence about the different kinds of output. So we have created orte_output() (and friends) and orte_show_help(). The OPAL variants still exist, but they *SHOULD NOT BE USED* by the MPI layer. Specifically, the OPAL variants are for what OPAL does best: single process stuff. The ORTE variants provide the job-level intelligence, such as duplicate show_help filtering, relaying to the HNP in a different channel than IOF, etc. So when this stuff hits the trunk, you'll see a ton of s/opal_output/ orte_output/g and /opal_show_help/orte_show_help/g changes throughout the code base. Do not be alarmed. :-) -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] Changes: opal_output and opal_show_help
Is there a RFC telling us when we might expect this? On May 9, 2008, at 5:52 PM, Jeff Squyres wrote: So when this stuff hits the trunk,