FYI: We have pinged the upstream/LLNL authors of varlist about this issue.

On Jul 29, 2014, at 11:38 AM, Nathan Hjelm <hje...@lanl.gov> wrote:

> 
> The problem is the code in question does not check the return code of
> MPI_T_cvar_handle_alloc . We are returning an error and they still try
> to use the handle (which is stale). Uncomment this section of the code:
> 
> 
>                //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This 
> variable is not recognized by Mvapich. It is OpenMPI specific.
>                //      continue;
> 
> 
> Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich
> must not have implemented it (and thus should not claim to be MPI 3.0).
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote:
>> Hi,
>> 
>> I encountered the same SEGV reported on the users list when
>> running varList program.
>> 
>>  http://www.open-mpi.org/community/lists/users/2014/07/24792.php
>> 
>> mpiexec -n 1 ./varList:
>> ----------------------------------------------------------------
>> ... snip ...
>> event                                             U/D-2 CHAR   n/a      ALL
>> event_base_verbose                                D/D-8 INT    n/a      
>> LOCAL    0
>> event_libevent2021_event_include                  U/A-3 CHAR   n/a      
>> LOCAL    poll
>> opal_event_include                                U/A-3 CHAR   n/a      
>> LOCAL    poll
>> event_libevent2021_major_version                  D/A-9 INT    n/a      
>> UNKNOWN  1
>> event_libevent2021_minor_version                  D/A-9 INT    n/a      
>> UNKNOWN  9
>> event_libevent2021_release_version                D/A-9 INT    n/a      
>> UNKNOWN  0
>> shmem                                             U/D-2 CHAR   n/a      ALL
>> shmem_base_verbose                                D/D-8 INT    n/a      
>> LOCAL    0
>> shmem_base_RUNTIME_QUERY_hint                     D/A-9 CHAR   n/a      
>> ALL-EQ
>> shmem_mmap_priority                               U/A-3 INT    n/a      ALL  
>>     50
>> shmem_mmap_enable_nfs_warning                     D/A-9 INT    n/a      
>> LOCAL    true
>> shmem_mmap_relocate_backing_file                  D/A-9 INT    n/a      ALL  
>>     0
>> shmem_mmap_backing_file_base_dir                  D/A-9 CHAR   n/a      ALL  
>>     /dev/shm
>> shmem_mmap_major_version                          D/A-9 INT    n/a      
>> UNKNOWN  1
>> shmem_mmap_minor_version                          D/A-9 INT    n/a      
>> UNKNOWN  9
>> shmem_mmap_release_version                        D/A-9 INT    n/a      
>> UNKNOWN  0
>> shmem_posix_major_version                         D/A-9 INT    n/a      
>> UNKNOWN  1201644720
>> shmem_posix_minor_version                         D/A-9 INT    n/a      
>> UNKNOWN  32756
>> shmem_posix_release_version                       D/A-9 INT    n/a      
>> UNKNOWN  6
>> [ppc:12688] *** Process received signal ***
>> [ppc:12688] Signal: Segmentation fault (11)
>> [ppc:12688] Signal code: Invalid permissions (2)
>> [ppc:12688] Failing at address: 0x7ff4479f83d8
>> [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0]
>> [ppc:12688] [ 1] 
>> /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7]
>> [ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc]
>> [ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598]
>> [ppc:12688] [ 4] 
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed]
>> [ppc:12688] [ 5] ./varlist[0x4016c9]
>> [ppc:12688] *** End of error message ***
>>      ----------------------------------------------------------------
>> 
>> I tracked this error and found that this seems related to DSO.
>> 
>> The error occurs when accessing value->intval for the
>> control variable shmem_sysv_major_version in MPI_T_cvar_read.
>> 
>>  https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c
>> 
>> The 'value' was gotten by mca_base_var_get_value and it points
>> mca_shmem_sysv_component.super.base_version.mca_component_major_version,
>> which was dlclose'd in MPI_INIT for DSO.
>> (component mmap is selected on my environment)
>> 
>> Abnormal shmem_posix_{major,minor,relase}_version values in
>> my output above are the same reason. SEGV occurs if the memory
>> was returned to kernel, and abnormal values are printed
>> if not yet.
>> 
>> So this SEGV doesn't occur if I configure Open MPI with
>> --disable-dlopen option. I think it's the reason why Nathan
>> doesn't see this error.
>> 
>> Regards,
>> KAWASHIMA Takahiro
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15304.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15306.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to