Hmmm....well, I hit a problem (of course!). I have mca-no-build on the filem
framework on my Mac. If I just mpriun -n 3 ./hello, I get the following
error:

--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_filem_base_select failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS

--------------------------------------------------------------------------

After looking at the source code for filem_select, I can run just fine if I
specify -mca filem none on the cmd line. Otherwise, it looks like your
select logic insists that at least one component must be built and
selectable?

Is that generally true, or is your filem framework the exception? I think
this would not be a good general requirement - frankly, I don't think it is
good for any framework to have such a requirement.

Ralph



On 5/6/08 12:09 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote:

> This has been committed in r18381
> 
> Please let me know if you have any problems with this commit.
> 
> Cheers,
> Josh
> 
> On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
> 
>> Awesome.
>> 
>> The branch is updated to the latest trunk head. I encourage folks to
>> check out this repository and make sure that it builds on their
>> system. A normal build of the branch should be enough to find out if
>> there are any cut-n-paste problems (though I tried to be careful,
>> mistakes do happen).
>> 
>> I haven't heard any problems so this is looking like it will come in
>> tomorrow after the teleconf. I'll ask again there to see if there are
>> any voices of concern.
>> 
>> Cheers,
>> Josh
>> 
>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
>> 
>>> This all sounds good to me!
>>> 
>>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
>>> 
>>>> What:  Add mca_base_select() and adjust frameworks & components to
>>>> use
>>>> it.
>>>> Why:   Consolidation of code for general goodness.
>>>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play
>>>> When:  Code ready now. Documentation ready soon.
>>>> Timeout: May 6, 2008 (After teleconf) [1 week]
>>>> 
>>>> Discussion:
>>>> -----------
>>>> For a number of years a few developers have been talking about
>>>> creating a MCA base component selection function. For various
>>>> reasons
>>>> this was never implemented. Recently I decided to give it a try.
>>>> 
>>>> A base select function will allow Open MPI to provide completely
>>>> consistent selection behavior for many of its frameworks (18 of 31
>>>> to
>>>> be exact at the moment). The primary goal of this work is to
>>>> improving
>>>> code maintainability through code reuse. Other benefits also result
>>>> such as a slightly smaller memory footprint.
>>>> 
>>>> The mca_base_select() function represented the most commonly used
>>>> logic for component selection: Select the one component with the
>>>> highest priority and close all of the not selected components. This
>>>> function can be found at the path below in the branch:
>>>> opal/mca/base/mca_base_components_select.c
>>>> 
>>>> To support this I had to formalize a query() function in the
>>>> mca_base_component_t of the form:
>>>> int mca_base_query_component_fn(mca_base_module_t **module, int
>>>> *priority);
>>>> 
>>>> This function is specified after the open and close component
>>>> functions in this structure as to allow compatibility with
>>>> frameworks
>>>> that do not use the base selection logic. Frameworks that do *not*
>>>> use
>>>> this function are *not* effected by this commit. However, every
>>>> component in the frameworks that use the mca_base_select function
>>>> must
>>>> adjust their component query function to fit that specified above.
>>>> 
>>>> 18 frameworks in Open MPI have been changed. I have updated all of
>>>> the
>>>> components in the 18 frameworks available in the trunk on my branch.
>>>> The effected frameworks are:
>>>> - OPAL Carto
>>>> - OPAL crs
>>>> - OPAL maffinity
>>>> - OPAL memchecker
>>>> - OPAL paffinity
>>>> - ORTE errmgr
>>>> - ORTE ess
>>>> - ORTE Filem
>>>> - ORTE grpcomm
>>>> - ORTE odls
>>>> - ORTE pml
>>>> - ORTE ras
>>>> - ORTE rmaps
>>>> - ORTE routed
>>>> - ORTE snapc
>>>> - OMPI crcp
>>>> - OMPI dpm
>>>> - OMPI pubsub
>>>> 
>>>> There was a question of the memory footprint change as a result of
>>>> this commit. I used 'pmap' to determine process memory footprint
>>>> of a
>>>> hello world MPI program. Static and Shared build numbers are below
>>>> along with variations on launching locally and to a single node
>>>> allocated by SLURM. All of this was on Indiana University's Odin
>>>> machine. We compare against the trunk (r18276) representing the last
>>>> SVN sync point of the branch.
>>>> 
>>>>  Process(shared)| Trunk    | Branch  | Diff (Improvement)
>>>>  ---------------+----------+---------+-------
>>>>  mpirun (orted) |   39976K |  36828K | 3148K
>>>>  hello (0)      |  229288K | 229268K |   20K
>>>>  hello (1)      |  229288K | 229268K |   20K
>>>>  ---------------+----------+---------+-------
>>>>  mpirun         |   40032K |  37924K | 2108K
>>>>  orted          |   34720K |  34660K |   60K
>>>>  hello (0)      |  228404K | 228384K |   20K
>>>>  hello (1)      |  228404K | 228384K |   20K
>>>> 
>>>>  Process(static)| Trunk    | Branch  | Diff (Improvement)
>>>>  ---------------+----------+---------+-------
>>>>  mpirun (orted) |   21384K |  21372K |  12K
>>>>  hello (0)      |  194000K | 193980K |  20K
>>>>  hello (1)      |  194000K | 193980K |  20K
>>>>  ---------------+----------+---------+-------
>>>>  mpirun         |   21384K |  21372K |  12K
>>>>  orted          |   21208K |  21196K |  12K
>>>>  hello (0)      |  193116K | 193096K |  20K
>>>>  hello (1)      |  193116K | 193096K |  20K
>>>> 
>>>> As you can see there are some small memory footprint improvements on
>>>> my branch that result from this work. The size of the Open MPI
>>>> project
>>>> shrinks a bit as well. This commit cuts between 3,500 and 2,000
>>>> lines
>>>> of code (depending on how you count) so about a ~1% code shrink.
>>>> 
>>>> The branch is stable in all of the testing I have done, but there
>>>> are
>>>> some platforms on which I cannot test. So please give this branch a
>>>> try and let me know if you find any problems.
>>>> 
>>>> Cheers,
>>>> Josh
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> Cisco Systems
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to