I -think- I may have found the problem here, but don't have a real test case - try r18429 and see if it works.
On 5/11/08 4:32 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: > From the stacktrace, this doesn't look like a problem with > base_select, but with 'orte_util_encode_pidmap'. You may want to > start looking there. > > -- Josh > > On May 11, 2008, at 1:30 PM, Lenny Verkhovsky wrote: > >> Hi, >> I tried r 18423 with rank_file component and got seqfault >> ( I increase priority of the component if rmaps_rank_file_path exist) >> >> >> /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun -np 4 -hostfile >> hostfile_ompi -mca rmaps_rank_file_path rankfile -mca >> paffinity_base_verbose 5 ./mpi_p_SMD -t bw -output 1 -order 1 >> [witch1:25456] mca:base:select: Querying component [linux] >> [witch1:25456] mca:base:select: Query of component [linux] set >> priority to 10 >> [witch1:25456] mca:base:select: Selected component [linux] >> [witch1:25456] *** Process received signal *** >> [witch1:25456] Signal: Segmentation fault (11) >> [witch1:25456] Signal code: Invalid permissions (2) >> [witch1:25456] Failing at address: 0x2b2875530030 >> [witch1:25456] [ 0] /lib64/libpthread.so.0 [0x2b28759dfc10] >> [witch1:25456] [ 1] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> pal.so.0 [0x2b28753e2bb6] >> [witch1:25456] [ 2] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> pal.so.0 [0x2b28753e23b6] >> [witch1:25456] [ 3] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> pal.so.0 [0x2b28753e22fd] >> [witch1:25456] [ 4] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> rte.so.0(orte_util_encode_pidmap+0x2f4) [0x2b287527f412] >> [witch1:25456] [ 5] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> rte.so.0(orte_odls_base_default_get_add_procs_data+0x989) >> [0x2b28752934f5] >> [witch1:25456] [ 6] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> rte.so.0(orte_plm_base_launch_apps+0x1a3) [0x2b287529e60b] >> [witch1:25456] [ 7] /home/USERS/lenny/OMPI_ORTE_SMD/lib/openmpi/ >> mca_plm_rsh.so [0x2b287612f788] >> [witch1:25456] [ 8] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun >> [0x4032bf] >> [witch1:25456] [ 9] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun >> [0x402b53] >> [witch1:25456] [10] /lib64/libc.so.6(__libc_start_main+0xf4) >> [0x2b2875b06154] >> [witch1:25456] [11] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun >> [0x402aa9] >> [witch1:25456] *** End of error message *** >> Segmentation fault >> >> >> >> >> On Tue, May 6, 2008 at 9:09 PM, Josh Hursey <jjhur...@open-mpi.org> >> wrote: >> This has been committed in r18381 >> >> Please let me know if you have any problems with this commit. >> >> Cheers, >> Josh >> >> On May 5, 2008, at 10:41 AM, Josh Hursey wrote: >> >>> Awesome. >>> >>> The branch is updated to the latest trunk head. I encourage folks to >>> check out this repository and make sure that it builds on their >>> system. A normal build of the branch should be enough to find out if >>> there are any cut-n-paste problems (though I tried to be careful, >>> mistakes do happen). >>> >>> I haven't heard any problems so this is looking like it will come in >>> tomorrow after the teleconf. I'll ask again there to see if there >> are >>> any voices of concern. >>> >>> Cheers, >>> Josh >>> >>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: >>> >>>> This all sounds good to me! >>>> >>>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: >>>> >>>>> What: Add mca_base_select() and adjust frameworks & components to >>>>> use >>>>> it. >>>>> Why: Consolidation of code for general goodness. >>>>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play >>>>> When: Code ready now. Documentation ready soon. >>>>> Timeout: May 6, 2008 (After teleconf) [1 week] >>>>> >>>>> Discussion: >>>>> ----------- >>>>> For a number of years a few developers have been talking about >>>>> creating a MCA base component selection function. For various >>>>> reasons >>>>> this was never implemented. Recently I decided to give it a try. >>>>> >>>>> A base select function will allow Open MPI to provide completely >>>>> consistent selection behavior for many of its frameworks (18 of 31 >>>>> to >>>>> be exact at the moment). The primary goal of this work is to >>>>> improving >>>>> code maintainability through code reuse. Other benefits also >> result >>>>> such as a slightly smaller memory footprint. >>>>> >>>>> The mca_base_select() function represented the most commonly used >>>>> logic for component selection: Select the one component with the >>>>> highest priority and close all of the not selected components. >> This >>>>> function can be found at the path below in the branch: >>>>> opal/mca/base/mca_base_components_select.c >>>>> >>>>> To support this I had to formalize a query() function in the >>>>> mca_base_component_t of the form: >>>>> int mca_base_query_component_fn(mca_base_module_t **module, int >>>>> *priority); >>>>> >>>>> This function is specified after the open and close component >>>>> functions in this structure as to allow compatibility with >>>>> frameworks >>>>> that do not use the base selection logic. Frameworks that do *not* >>>>> use >>>>> this function are *not* effected by this commit. However, every >>>>> component in the frameworks that use the mca_base_select function >>>>> must >>>>> adjust their component query function to fit that specified above. >>>>> >>>>> 18 frameworks in Open MPI have been changed. I have updated all of >>>>> the >>>>> components in the 18 frameworks available in the trunk on my >> branch. >>>>> The effected frameworks are: >>>>> - OPAL Carto >>>>> - OPAL crs >>>>> - OPAL maffinity >>>>> - OPAL memchecker >>>>> - OPAL paffinity >>>>> - ORTE errmgr >>>>> - ORTE ess >>>>> - ORTE Filem >>>>> - ORTE grpcomm >>>>> - ORTE odls >>>>> - ORTE pml >>>>> - ORTE ras >>>>> - ORTE rmaps >>>>> - ORTE routed >>>>> - ORTE snapc >>>>> - OMPI crcp >>>>> - OMPI dpm >>>>> - OMPI pubsub >>>>> >>>>> There was a question of the memory footprint change as a result of >>>>> this commit. I used 'pmap' to determine process memory footprint >>>>> of a >>>>> hello world MPI program. Static and Shared build numbers are below >>>>> along with variations on launching locally and to a single node >>>>> allocated by SLURM. All of this was on Indiana University's Odin >>>>> machine. We compare against the trunk (r18276) representing the >> last >>>>> SVN sync point of the branch. >>>>> >>>>> Process(shared)| Trunk | Branch | Diff (Improvement) >>>>> ---------------+----------+---------+------- >>>>> mpirun (orted) | 39976K | 36828K | 3148K >>>>> hello (0) | 229288K | 229268K | 20K >>>>> hello (1) | 229288K | 229268K | 20K >>>>> ---------------+----------+---------+------- >>>>> mpirun | 40032K | 37924K | 2108K >>>>> orted | 34720K | 34660K | 60K >>>>> hello (0) | 228404K | 228384K | 20K >>>>> hello (1) | 228404K | 228384K | 20K >>>>> >>>>> Process(static)| Trunk | Branch | Diff (Improvement) >>>>> ---------------+----------+---------+------- >>>>> mpirun (orted) | 21384K | 21372K | 12K >>>>> hello (0) | 194000K | 193980K | 20K >>>>> hello (1) | 194000K | 193980K | 20K >>>>> ---------------+----------+---------+------- >>>>> mpirun | 21384K | 21372K | 12K >>>>> orted | 21208K | 21196K | 12K >>>>> hello (0) | 193116K | 193096K | 20K >>>>> hello (1) | 193116K | 193096K | 20K >>>>> >>>>> As you can see there are some small memory footprint >> improvements on >>>>> my branch that result from this work. The size of the Open MPI >>>>> project >>>>> shrinks a bit as well. This commit cuts between 3,500 and 2,000 >>>>> lines >>>>> of code (depending on how you count) so about a ~1% code shrink. >>>>> >>>>> The branch is stable in all of the testing I have done, but there >>>>> are >>>>> some platforms on which I cannot test. So please give this >> branch a >>>>> try and let me know if you find any problems. >>>>> >>>>> Cheers, >>>>> Josh >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> -- >>>> Jeff Squyres >>>> Cisco Systems >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel