From the stacktrace, this doesn't look like a problem with base_select, but with 'orte_util_encode_pidmap'. You may want to start looking there.

-- Josh

On May 11, 2008, at 1:30 PM, Lenny Verkhovsky wrote:

Hi,
I tried r 18423 with rank_file component and got seqfault
( I increase priority of the component if rmaps_rank_file_path exist)


/home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun -np 4 -hostfile hostfile_ompi -mca rmaps_rank_file_path rankfile -mca paffinity_base_verbose 5 ./mpi_p_SMD -t bw -output 1 -order 1
[witch1:25456] mca:base:select: Querying component [linux]
[witch1:25456] mca:base:select: Query of component [linux] set priority to 10
[witch1:25456] mca:base:select: Selected component [linux]
[witch1:25456] *** Process received signal ***
[witch1:25456] Signal: Segmentation fault (11)
[witch1:25456] Signal code: Invalid permissions (2)
[witch1:25456] Failing at address: 0x2b2875530030
[witch1:25456] [ 0] /lib64/libpthread.so.0 [0x2b28759dfc10]
[witch1:25456] [ 1] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- pal.so.0 [0x2b28753e2bb6] [witch1:25456] [ 2] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- pal.so.0 [0x2b28753e23b6] [witch1:25456] [ 3] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- pal.so.0 [0x2b28753e22fd] [witch1:25456] [ 4] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- rte.so.0(orte_util_encode_pidmap+0x2f4) [0x2b287527f412] [witch1:25456] [ 5] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- rte.so.0(orte_odls_base_default_get_add_procs_data+0x989) [0x2b28752934f5] [witch1:25456] [ 6] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- rte.so.0(orte_plm_base_launch_apps+0x1a3) [0x2b287529e60b] [witch1:25456] [ 7] /home/USERS/lenny/OMPI_ORTE_SMD/lib/openmpi/ mca_plm_rsh.so [0x2b287612f788] [witch1:25456] [ 8] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun [0x4032bf] [witch1:25456] [ 9] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun [0x402b53] [witch1:25456] [10] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b2875b06154] [witch1:25456] [11] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun [0x402aa9]
[witch1:25456] *** End of error message ***
Segmentation fault




On Tue, May 6, 2008 at 9:09 PM, Josh Hursey <jjhur...@open-mpi.org> wrote:
This has been committed in r18381

Please let me know if you have any problems with this commit.

Cheers,
Josh

On May 5, 2008, at 10:41 AM, Josh Hursey wrote:

> Awesome.
>
> The branch is updated to the latest trunk head. I encourage folks to
> check out this repository and make sure that it builds on their
> system. A normal build of the branch should be enough to find out if
> there are any cut-n-paste problems (though I tried to be careful,
> mistakes do happen).
>
> I haven't heard any problems so this is looking like it will come in
> tomorrow after the teleconf. I'll ask again there to see if there are
> any voices of concern.
>
> Cheers,
> Josh
>
> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
>
>> This all sounds good to me!
>>
>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
>>
>>> What:  Add mca_base_select() and adjust frameworks & components to
>>> use
>>> it.
>>> Why:   Consolidation of code for general goodness.
>>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play
>>> When:  Code ready now. Documentation ready soon.
>>> Timeout: May 6, 2008 (After teleconf) [1 week]
>>>
>>> Discussion:
>>> -----------
>>> For a number of years a few developers have been talking about
>>> creating a MCA base component selection function. For various
>>> reasons
>>> this was never implemented. Recently I decided to give it a try.
>>>
>>> A base select function will allow Open MPI to provide completely
>>> consistent selection behavior for many of its frameworks (18 of 31
>>> to
>>> be exact at the moment). The primary goal of this work is to
>>> improving
>>> code maintainability through code reuse. Other benefits also result
>>> such as a slightly smaller memory footprint.
>>>
>>> The mca_base_select() function represented the most commonly used
>>> logic for component selection: Select the one component with the
>>> highest priority and close all of the not selected components. This
>>> function can be found at the path below in the branch:
>>> opal/mca/base/mca_base_components_select.c
>>>
>>> To support this I had to formalize a query() function in the
>>> mca_base_component_t of the form:
>>> int mca_base_query_component_fn(mca_base_module_t **module, int
>>> *priority);
>>>
>>> This function is specified after the open and close component
>>> functions in this structure as to allow compatibility with
>>> frameworks
>>> that do not use the base selection logic. Frameworks that do *not*
>>> use
>>> this function are *not* effected by this commit. However, every
>>> component in the frameworks that use the mca_base_select function
>>> must
>>> adjust their component query function to fit that specified above.
>>>
>>> 18 frameworks in Open MPI have been changed. I have updated all of
>>> the
>>> components in the 18 frameworks available in the trunk on my branch.
>>> The effected frameworks are:
>>> - OPAL Carto
>>> - OPAL crs
>>> - OPAL maffinity
>>> - OPAL memchecker
>>> - OPAL paffinity
>>> - ORTE errmgr
>>> - ORTE ess
>>> - ORTE Filem
>>> - ORTE grpcomm
>>> - ORTE odls
>>> - ORTE pml
>>> - ORTE ras
>>> - ORTE rmaps
>>> - ORTE routed
>>> - ORTE snapc
>>> - OMPI crcp
>>> - OMPI dpm
>>> - OMPI pubsub
>>>
>>> There was a question of the memory footprint change as a result of
>>> this commit. I used 'pmap' to determine process memory footprint
>>> of a
>>> hello world MPI program. Static and Shared build numbers are below
>>> along with variations on launching locally and to a single node
>>> allocated by SLURM. All of this was on Indiana University's Odin
>>> machine. We compare against the trunk (r18276) representing the last
>>> SVN sync point of the branch.
>>>
>>>  Process(shared)| Trunk    | Branch  | Diff (Improvement)
>>>  ---------------+----------+---------+-------
>>>  mpirun (orted) |   39976K |  36828K | 3148K
>>>  hello (0)      |  229288K | 229268K |   20K
>>>  hello (1)      |  229288K | 229268K |   20K
>>>  ---------------+----------+---------+-------
>>>  mpirun         |   40032K |  37924K | 2108K
>>>  orted          |   34720K |  34660K |   60K
>>>  hello (0)      |  228404K | 228384K |   20K
>>>  hello (1)      |  228404K | 228384K |   20K
>>>
>>>  Process(static)| Trunk    | Branch  | Diff (Improvement)
>>>  ---------------+----------+---------+-------
>>>  mpirun (orted) |   21384K |  21372K |  12K
>>>  hello (0)      |  194000K | 193980K |  20K
>>>  hello (1)      |  194000K | 193980K |  20K
>>>  ---------------+----------+---------+-------
>>>  mpirun         |   21384K |  21372K |  12K
>>>  orted          |   21208K |  21196K |  12K
>>>  hello (0)      |  193116K | 193096K |  20K
>>>  hello (1)      |  193116K | 193096K |  20K
>>>
>>> As you can see there are some small memory footprint improvements on
>>> my branch that result from this work. The size of the Open MPI
>>> project
>>> shrinks a bit as well. This commit cuts between 3,500 and 2,000
>>> lines
>>> of code (depending on how you count) so about a ~1% code shrink.
>>>
>>> The branch is stable in all of the testing I have done, but there
>>> are
>>> some platforms on which I cannot test. So please give this branch a
>>> try and let me know if you find any problems.
>>>
>>> Cheers,
>>> Josh
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to