I -think- I may have found the problem here, but don't have a real test case
- try r18429 and see if it works.


On 5/11/08 4:32 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote:

>  From the stacktrace, this doesn't look like a problem with
> base_select, but with 'orte_util_encode_pidmap'. You may want to
> start looking there.
> 
> -- Josh
> 
> On May 11, 2008, at 1:30 PM, Lenny Verkhovsky wrote:
> 
>> Hi,
>> I tried r 18423 with rank_file component and got seqfault
>> ( I increase priority of the component if rmaps_rank_file_path exist)
>> 
>> 
>> /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun -np 4 -hostfile
>> hostfile_ompi -mca rmaps_rank_file_path rankfile -mca
>> paffinity_base_verbose 5 ./mpi_p_SMD -t bw -output 1 -order 1
>> [witch1:25456] mca:base:select: Querying component [linux]
>> [witch1:25456] mca:base:select: Query of component [linux] set
>> priority to 10
>> [witch1:25456] mca:base:select: Selected component [linux]
>> [witch1:25456] *** Process received signal ***
>> [witch1:25456] Signal: Segmentation fault (11)
>> [witch1:25456] Signal code: Invalid permissions (2)
>> [witch1:25456] Failing at address: 0x2b2875530030
>> [witch1:25456] [ 0] /lib64/libpthread.so.0 [0x2b28759dfc10]
>> [witch1:25456] [ 1] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-
>> pal.so.0 [0x2b28753e2bb6]
>> [witch1:25456] [ 2] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-
>> pal.so.0 [0x2b28753e23b6]
>> [witch1:25456] [ 3] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-
>> pal.so.0 [0x2b28753e22fd]
>> [witch1:25456] [ 4] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-
>> rte.so.0(orte_util_encode_pidmap+0x2f4) [0x2b287527f412]
>> [witch1:25456] [ 5] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-
>> rte.so.0(orte_odls_base_default_get_add_procs_data+0x989)
>> [0x2b28752934f5]
>> [witch1:25456] [ 6] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen-
>> rte.so.0(orte_plm_base_launch_apps+0x1a3) [0x2b287529e60b]
>> [witch1:25456] [ 7] /home/USERS/lenny/OMPI_ORTE_SMD/lib/openmpi/
>> mca_plm_rsh.so [0x2b287612f788]
>> [witch1:25456] [ 8] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun
>> [0x4032bf]
>> [witch1:25456] [ 9] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun
>> [0x402b53]
>> [witch1:25456] [10] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x2b2875b06154]
>> [witch1:25456] [11] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun
>> [0x402aa9]
>> [witch1:25456] *** End of error message ***
>> Segmentation fault
>> 
>> 
>> 
>> 
>> On Tue, May 6, 2008 at 9:09 PM, Josh Hursey <jjhur...@open-mpi.org>
>> wrote:
>> This has been committed in r18381
>> 
>> Please let me know if you have any problems with this commit.
>> 
>> Cheers,
>> Josh
>> 
>> On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
>> 
>>> Awesome.
>>> 
>>> The branch is updated to the latest trunk head. I encourage folks to
>>> check out this repository and make sure that it builds on their
>>> system. A normal build of the branch should be enough to find out if
>>> there are any cut-n-paste problems (though I tried to be careful,
>>> mistakes do happen).
>>> 
>>> I haven't heard any problems so this is looking like it will come in
>>> tomorrow after the teleconf. I'll ask again there to see if there
>> are
>>> any voices of concern.
>>> 
>>> Cheers,
>>> Josh
>>> 
>>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
>>> 
>>>> This all sounds good to me!
>>>> 
>>>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
>>>> 
>>>>> What:  Add mca_base_select() and adjust frameworks & components to
>>>>> use
>>>>> it.
>>>>> Why:   Consolidation of code for general goodness.
>>>>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play
>>>>> When:  Code ready now. Documentation ready soon.
>>>>> Timeout: May 6, 2008 (After teleconf) [1 week]
>>>>> 
>>>>> Discussion:
>>>>> -----------
>>>>> For a number of years a few developers have been talking about
>>>>> creating a MCA base component selection function. For various
>>>>> reasons
>>>>> this was never implemented. Recently I decided to give it a try.
>>>>> 
>>>>> A base select function will allow Open MPI to provide completely
>>>>> consistent selection behavior for many of its frameworks (18 of 31
>>>>> to
>>>>> be exact at the moment). The primary goal of this work is to
>>>>> improving
>>>>> code maintainability through code reuse. Other benefits also
>> result
>>>>> such as a slightly smaller memory footprint.
>>>>> 
>>>>> The mca_base_select() function represented the most commonly used
>>>>> logic for component selection: Select the one component with the
>>>>> highest priority and close all of the not selected components.
>> This
>>>>> function can be found at the path below in the branch:
>>>>> opal/mca/base/mca_base_components_select.c
>>>>> 
>>>>> To support this I had to formalize a query() function in the
>>>>> mca_base_component_t of the form:
>>>>> int mca_base_query_component_fn(mca_base_module_t **module, int
>>>>> *priority);
>>>>> 
>>>>> This function is specified after the open and close component
>>>>> functions in this structure as to allow compatibility with
>>>>> frameworks
>>>>> that do not use the base selection logic. Frameworks that do *not*
>>>>> use
>>>>> this function are *not* effected by this commit. However, every
>>>>> component in the frameworks that use the mca_base_select function
>>>>> must
>>>>> adjust their component query function to fit that specified above.
>>>>> 
>>>>> 18 frameworks in Open MPI have been changed. I have updated all of
>>>>> the
>>>>> components in the 18 frameworks available in the trunk on my
>> branch.
>>>>> The effected frameworks are:
>>>>> - OPAL Carto
>>>>> - OPAL crs
>>>>> - OPAL maffinity
>>>>> - OPAL memchecker
>>>>> - OPAL paffinity
>>>>> - ORTE errmgr
>>>>> - ORTE ess
>>>>> - ORTE Filem
>>>>> - ORTE grpcomm
>>>>> - ORTE odls
>>>>> - ORTE pml
>>>>> - ORTE ras
>>>>> - ORTE rmaps
>>>>> - ORTE routed
>>>>> - ORTE snapc
>>>>> - OMPI crcp
>>>>> - OMPI dpm
>>>>> - OMPI pubsub
>>>>> 
>>>>> There was a question of the memory footprint change as a result of
>>>>> this commit. I used 'pmap' to determine process memory footprint
>>>>> of a
>>>>> hello world MPI program. Static and Shared build numbers are below
>>>>> along with variations on launching locally and to a single node
>>>>> allocated by SLURM. All of this was on Indiana University's Odin
>>>>> machine. We compare against the trunk (r18276) representing the
>> last
>>>>> SVN sync point of the branch.
>>>>> 
>>>>>  Process(shared)| Trunk    | Branch  | Diff (Improvement)
>>>>>  ---------------+----------+---------+-------
>>>>>  mpirun (orted) |   39976K |  36828K | 3148K
>>>>>  hello (0)      |  229288K | 229268K |   20K
>>>>>  hello (1)      |  229288K | 229268K |   20K
>>>>>  ---------------+----------+---------+-------
>>>>>  mpirun         |   40032K |  37924K | 2108K
>>>>>  orted          |   34720K |  34660K |   60K
>>>>>  hello (0)      |  228404K | 228384K |   20K
>>>>>  hello (1)      |  228404K | 228384K |   20K
>>>>> 
>>>>>  Process(static)| Trunk    | Branch  | Diff (Improvement)
>>>>>  ---------------+----------+---------+-------
>>>>>  mpirun (orted) |   21384K |  21372K |  12K
>>>>>  hello (0)      |  194000K | 193980K |  20K
>>>>>  hello (1)      |  194000K | 193980K |  20K
>>>>>  ---------------+----------+---------+-------
>>>>>  mpirun         |   21384K |  21372K |  12K
>>>>>  orted          |   21208K |  21196K |  12K
>>>>>  hello (0)      |  193116K | 193096K |  20K
>>>>>  hello (1)      |  193116K | 193096K |  20K
>>>>> 
>>>>> As you can see there are some small memory footprint
>> improvements on
>>>>> my branch that result from this work. The size of the Open MPI
>>>>> project
>>>>> shrinks a bit as well. This commit cuts between 3,500 and 2,000
>>>>> lines
>>>>> of code (depending on how you count) so about a ~1% code shrink.
>>>>> 
>>>>> The branch is stable in all of the testing I have done, but there
>>>>> are
>>>>> some platforms on which I cannot test. So please give this
>> branch a
>>>>> try and let me know if you find any problems.
>>>>> 
>>>>> Cheers,
>>>>> Josh
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> 
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to