FYI: We have pinged the upstream/LLNL authors of varlist about this issue.
On Jul 29, 2014, at 11:38 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > > The problem is the code in question does not check the return code of > MPI_T_cvar_handle_alloc . We are returning an error and they still try > to use the handle (which is stale). Uncomment this section of the code: > > > //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This > variable is not recognized by Mvapich. It is OpenMPI specific. > // continue; > > > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich > must not have implemented it (and thus should not claim to be MPI 3.0). > > -Nathan > > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote: >> Hi, >> >> I encountered the same SEGV reported on the users list when >> running varList program. >> >> http://www.open-mpi.org/community/lists/users/2014/07/24792.php >> >> mpiexec -n 1 ./varList: >> ---------------------------------------------------------------- >> ... snip ... >> event U/D-2 CHAR n/a ALL >> event_base_verbose D/D-8 INT n/a >> LOCAL 0 >> event_libevent2021_event_include U/A-3 CHAR n/a >> LOCAL poll >> opal_event_include U/A-3 CHAR n/a >> LOCAL poll >> event_libevent2021_major_version D/A-9 INT n/a >> UNKNOWN 1 >> event_libevent2021_minor_version D/A-9 INT n/a >> UNKNOWN 9 >> event_libevent2021_release_version D/A-9 INT n/a >> UNKNOWN 0 >> shmem U/D-2 CHAR n/a ALL >> shmem_base_verbose D/D-8 INT n/a >> LOCAL 0 >> shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR n/a >> ALL-EQ >> shmem_mmap_priority U/A-3 INT n/a ALL >> 50 >> shmem_mmap_enable_nfs_warning D/A-9 INT n/a >> LOCAL true >> shmem_mmap_relocate_backing_file D/A-9 INT n/a ALL >> 0 >> shmem_mmap_backing_file_base_dir D/A-9 CHAR n/a ALL >> /dev/shm >> shmem_mmap_major_version D/A-9 INT n/a >> UNKNOWN 1 >> shmem_mmap_minor_version D/A-9 INT n/a >> UNKNOWN 9 >> shmem_mmap_release_version D/A-9 INT n/a >> UNKNOWN 0 >> shmem_posix_major_version D/A-9 INT n/a >> UNKNOWN 1201644720 >> shmem_posix_minor_version D/A-9 INT n/a >> UNKNOWN 32756 >> shmem_posix_release_version D/A-9 INT n/a >> UNKNOWN 6 >> [ppc:12688] *** Process received signal *** >> [ppc:12688] Signal: Segmentation fault (11) >> [ppc:12688] Signal code: Invalid permissions (2) >> [ppc:12688] Failing at address: 0x7ff4479f83d8 >> [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0] >> [ppc:12688] [ 1] >> /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7] >> [ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc] >> [ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598] >> [ppc:12688] [ 4] >> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed] >> [ppc:12688] [ 5] ./varlist[0x4016c9] >> [ppc:12688] *** End of error message *** >> ---------------------------------------------------------------- >> >> I tracked this error and found that this seems related to DSO. >> >> The error occurs when accessing value->intval for the >> control variable shmem_sysv_major_version in MPI_T_cvar_read. >> >> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c >> >> The 'value' was gotten by mca_base_var_get_value and it points >> mca_shmem_sysv_component.super.base_version.mca_component_major_version, >> which was dlclose'd in MPI_INIT for DSO. >> (component mmap is selected on my environment) >> >> Abnormal shmem_posix_{major,minor,relase}_version values in >> my output above are the same reason. SEGV occurs if the memory >> was returned to kernel, and abnormal values are printed >> if not yet. >> >> So this SEGV doesn't occur if I configure Open MPI with >> --disable-dlopen option. I think it's the reason why Nathan >> doesn't see this error. >> >> Regards, >> KAWASHIMA Takahiro >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15304.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15306.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/