Hi PETSc-developers,

I'm trying to run ex143 on a cluster (alcf-theta). I compiled PETSc on
login node with cray-fftw-3.3.8.1 and there was no error in either
configure or make.

When I try running ex143 with 1 MPI rank on compute node, everything works
fine but with 2 MPI ranks, it crashes due to illegal instruction due to
memory corruption. I tried running it with valgrind but the available
valgrind module on theta gives the error `valgrind: failed to start tool
'memcheck' for platform 'amd64-linux': No such file or directory`.

To get around this, I tried running it with gdb4hpc and I attached the
backtrace which shows that there is some error with mpi-fftw being called.
I also attach the output with -start_in_debugger command option.

What could possibly cause this error and how do I fix it ?

Thank You,
Sajid Ali
Applied Physics
Northwestern University
sajid@thetamom1:/gpfs/mira-home/sajid/sajid_proj/test_fftw> aprun -n 2 --cc 
depth -d 1 -j 1 -r 1 ./ex143 -start_in_debugger -log_view &> out                
                  
sajid@thetamom1:/gpfs/mira-home/sajid/sajid_proj/test_fftw> cat out             
                                                                                
              
PETSC: Attaching gdb to ./ex143 of pid 62260 on display :0.0 on machine 
nid03832                                                                        
                      
PETSC: Attaching gdb to ./ex143 of pid 62259 on display :0.0 on machine 
nid03832                                                                        
                      
xterm: xterm: Xt error: Can't open display: :0.0                                
                                                                                
              
Xt error: Can't open display: :0.0                                              
                                                                                
              
xterm: xterm: DISPLAY is not set                                                
                                                                                
              
DISPLAY is not set                                                              
                                                                                
              
Use PETSc-FFTW interface...1-DIM: 30                                            
                                                                                
              
[1]PETSC ERROR: [0]PETSC ERROR: 
------------------------------------------------------------------------        
                                                              
------------------------------------------------------------------------        
                                                                                
              
[0]PETSC ERROR: [1]PETSC ERROR: Caught signal number 4 Illegal instruction: 
Likely due to memory corruption                                                 
                  
Caught signal number 4 Illegal instruction: Likely due to memory corruption     
                                                                                
              
[0]PETSC ERROR: [1]PETSC ERROR: Try option -start_in_debugger or 
-on_error_attach_debugger                                                       
                             
Try option -start_in_debugger or -on_error_attach_debugger                      
                                                                                
              
[0]PETSC ERROR: [1]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind                    
                                                       
or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind             
                                                                                
              
[0]PETSC ERROR: [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and 
Apple Mac OS X to find memory corruption errors                                 
                  
or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory 
corruption errors                                                               
                    
[1]PETSC ERROR: [0]PETSC ERROR: likely location of problem given in stack below 
                                                                                
              
likely location of problem given in stack below                                 
                                                                                
              
[1]PETSC ERROR: [0]PETSC ERROR: ---------------------  Stack Frames 
------------------------------------                                            
                          
---------------------  Stack Frames ------------------------------------        
                                                                                
              
[0]PETSC ERROR: [1]PETSC ERROR: Note: The EXACT line numbers in the stack are 
not available,                                                                  
                
Note: The EXACT line numbers in the stack are not available,                    
                                                                                
              
[0]PETSC ERROR: [1]PETSC ERROR:       INSTEAD the line number of the start of 
the function                                                                    
                
      INSTEAD the line number of the start of the function                      
                                                                                
              
[0]PETSC ERROR: [1]PETSC ERROR:       is given.                                 
                                                                                
              
      is given.                                                                 
                                                                                
              
[0]PETSC ERROR: [1]PETSC ERROR: [0] MatMult_MPIFFTW line 236 
/gpfs/mira-home/sajid/packages/petsc/src/mat/impls/fft/fftw/fftw.c              
                                 
[1] MatMult_MPIFFTW line 236 
/gpfs/mira-home/sajid/packages/petsc/src/mat/impls/fft/fftw/fftw.c              
                                                                 
[0]PETSC ERROR: [1]PETSC ERROR: [1] MatMult line 2402 
/gpfs/mira-home/sajid/packages/petsc/src/mat/interface/matrix.c                 
                                        
[0] MatMult line 2402 
/gpfs/mira-home/sajid/packages/petsc/src/mat/interface/matrix.c                 
                                                                        
[1]PETSC ERROR: [0]PETSC ERROR: User provided function() line 0 in  unknown 
file (null)                                                                     
                  
User provided function() line 0 in  unknown file (null)                         
                                                                                
              
_pmiu_daemon(SIGCHLD): [NID 03832] [c7-1c2s14n0] [Mon Jun  3 04:10:53 2019] PE 
RANK 0 exit signal Aborted                                                      
               
[NID 03832] 2019-06-03 04:10:53 Apid 13751865: initiated application 
termination                                                                     
                         
Application 13751865 exit codes: 134                                            
                                                                                
              
Application 13751865 resources: utime ~0s, stime ~2s, Rss ~27708, inblocks 
~9678, outblocks ~0                                                             
                   
sajid@thetamom1:/gpfs/mira-home/sajid/sajid_proj/test_fftw>                     
                                                                                
              
sajid@thetamom1:/gpfs/mira-home/sajid/sajid_proj/test_fftw> gdb4hpc
gdb4hpc 3.0 - Cray Line Mode Parallel Debugger
With Cray Comparative Debugging Technology.
Copyright 2007-2018 Cray Inc. All Rights Reserved.
Copyright 1996-2016 University of Queensland. All Rights Reserved.

Type "help" for a list of commands.
Type "help <cmd>" for detailed help about a command.
dbg all> maint set unsafe on
dbg all> launch $a{2} ./ex143
Starting application, please wait...
Creating MRNet communication network...
Waiting for debug servers to attach to MRNet communications network...
Timeout in 400 seconds. Please wait for the attach to complete.
Number of dbgsrvs connected: [0];  Timeout Counter: [1]
Number of dbgsrvs connected: [1];  Timeout Counter: [0]
Number of dbgsrvs connected: [1];  Timeout Counter: [1]
Number of dbgsrvs connected: [2];  Timeout Counter: [0]
Finalizing setup...
Launch complete.
a{0..1}: Initial breakpoint, main at 
/lus/theta-fs0/projects/large3dxrayADSP/sajid_proj/test_fftw/ex143.c:27
dbg all> step 20
a{0..1}: main at 
/lus/theta-fs0/projects/large3dxrayADSP/sajid_proj/test_fftw/ex143.c:100
dbg all> step 20
<$a>: Use PETSc-FFTW interface...1-DIM: 30
a{0..1}: main at 
/lus/theta-fs0/projects/large3dxrayADSP/sajid_proj/test_fftw/ex143.c:128
dbg all> step 20
a{0..1}: Program received signal SIGILL.
a{0..1}: In sadt at :0
dbg all> backtrace
a{0..1}: #0  0x00002aaab5c769c2 in sadt
a{0..1}: #1  0x00002aaab26399f6 in MatMult_MPIFFTW
a{0..1}: #2  0x00002aaab2579c2a in MatMult
a{0..1}: #3  0x0000000000404ded in main at 
/lus/theta-fs0/projects/large3dxrayADSP/sajid_proj/test_fftw/ex143.c:128
dbg all>

Reply via email to