Nathan, Thanks for your response.
Yes. My previous mail was the result of uncommented code. Now I also pulled latest varList source code which uncommented the section you mentioned, but the result was same. If MPI_T_cvar_get_info should return MPI_T_ERR_INVALID_INDEX for variables for unloaded components, not returning MPI_T_ERR_INVALID_INDEX is the problem. I run varList on GDB and found that MPI_T_cvar_get_info returns MPI_T_ERR_INVALID_INDEX for shmem_sysv_priority (this is sane). But it returns MPI_SUCCESS for shmem_sysv_major_version. The difference is mbv_flags values. mbv_flags is 0x44 for shmem_sysv_priority on MPI_T_cvar_get_info call so that mca_base_var_get function in opal/mca/base/mca_base_var.c returns OPAL_ERR_NOT_FOUND. But mbv_flags is 0x10003 for shmem_sysv_major_version so that mca_base_var_get function returns OPAL_SUCCESS. Control variables for unloaded components are not deregistered completely? I can track it more when I have time. My environment: OS: Debian GNU/Linux wheezy CPU: x86_64 Run: mpiexec -n 1 varList Open MPI source: trunk r32338 (almost latest) Open MPI configure: enable_picky=yes enable_debug=yes enable_mem_debug=yes enable_mem_profile=yes enable_memchecker=no enable_mca_no_build=btl-elan,btl-gm,btl-mx,btl-ofud,btl-portals,btl-sctp,btl-template,btl-udapl,common-mx,common-portals,ess-alps,ess-cnos,ess-lsf,ess-portals_utcp,ess-singleton,ess-slurm,grpcomm-cnos,mpool-fake,mtl,notifier,plm-alps,plm-ccp,plm-lsf,plm-process,plm-slurm,plm-submit,plm-tm,plm-xgrid,pml-cm,pml-csum,pml-example,pml-v,ras enable_contrib_no_build=vt enable_mpi_cxx=no enable_mpi_f77=no enable_mpi_f90=no enable_ipv6=no enable_mpi_io=no with_devel_headers=no with_wrapper_cflags=-g with_wrapper_cxxflags=-g with_wrapper_fflags=-g with_wrapper_fcflags=-g Regards, KAWASHIMA Takahiro > The problem is the code in question does not check the return code of > MPI_T_cvar_handle_alloc . We are returning an error and they still try > to use the handle (which is stale). Uncomment this section of the code: > > > //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This > variable is not recognized by Mvapich. It is OpenMPI specific. > // continue; > > > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich > must not have implemented it (and thus should not claim to be MPI 3.0). > > -Nathan > > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote: > > Hi, > > > > I encountered the same SEGV reported on the users list when > > running varList program. > > > > http://www.open-mpi.org/community/lists/users/2014/07/24792.php > > > > mpiexec -n 1 ./varList: > > ---------------------------------------------------------------- > > ... snip ... > > event U/D-2 CHAR n/a ALL > > event_base_verbose D/D-8 INT n/a > > LOCAL 0 > > event_libevent2021_event_include U/A-3 CHAR n/a > > LOCAL poll > > opal_event_include U/A-3 CHAR n/a > > LOCAL poll > > event_libevent2021_major_version D/A-9 INT n/a > > UNKNOWN 1 > > event_libevent2021_minor_version D/A-9 INT n/a > > UNKNOWN 9 > > event_libevent2021_release_version D/A-9 INT n/a > > UNKNOWN 0 > > shmem U/D-2 CHAR n/a ALL > > shmem_base_verbose D/D-8 INT n/a > > LOCAL 0 > > shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR n/a > > ALL-EQ > > shmem_mmap_priority U/A-3 INT n/a ALL > > 50 > > shmem_mmap_enable_nfs_warning D/A-9 INT n/a > > LOCAL true > > shmem_mmap_relocate_backing_file D/A-9 INT n/a ALL > > 0 > > shmem_mmap_backing_file_base_dir D/A-9 CHAR n/a ALL > > /dev/shm > > shmem_mmap_major_version D/A-9 INT n/a > > UNKNOWN 1 > > shmem_mmap_minor_version D/A-9 INT n/a > > UNKNOWN 9 > > shmem_mmap_release_version D/A-9 INT n/a > > UNKNOWN 0 > > shmem_posix_major_version D/A-9 INT n/a > > UNKNOWN 1201644720 > > shmem_posix_minor_version D/A-9 INT n/a > > UNKNOWN 32756 > > shmem_posix_release_version D/A-9 INT n/a > > UNKNOWN 6 > > [ppc:12688] *** Process received signal *** > > [ppc:12688] Signal: Segmentation fault (11) > > [ppc:12688] Signal code: Invalid permissions (2) > > [ppc:12688] Failing at address: 0x7ff4479f83d8 > > [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0] > > [ppc:12688] [ 1] > > /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7] > > [ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc] > > [ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598] > > [ppc:12688] [ 4] > > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed] > > [ppc:12688] [ 5] ./varlist[0x4016c9] > > [ppc:12688] *** End of error message *** > > ---------------------------------------------------------------- > > > > I tracked this error and found that this seems related to DSO. > > > > The error occurs when accessing value->intval for the > > control variable shmem_sysv_major_version in MPI_T_cvar_read. > > > > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c > > > > The 'value' was gotten by mca_base_var_get_value and it points > > mca_shmem_sysv_component.super.base_version.mca_component_major_version, > > which was dlclose'd in MPI_INIT for DSO. > > (component mmap is selected on my environment) > > > > Abnormal shmem_posix_{major,minor,relase}_version values in > > my output above are the same reason. SEGV occurs if the memory > > was returned to kernel, and abnormal values are printed > > if not yet. > > > > So this SEGV doesn't occur if I configure Open MPI with > > --disable-dlopen option. I think it's the reason why Nathan > > doesn't see this error.