- Not sure what SIGUSR1 means in this context. - The stack doesn't show any PETSc/user code. Was this code compiled with debug version of PETSc?
- it could be that gdb is unable to look at intel compilers stack [normally gdb should work]. If thats the case - you could run with '-start_in_debugger idb' - It appears that this breakage is from usercode which calls fortran I/O [for_write_dir_xmit()]. There is no fortran I/O from PETSc side of the code. I think it could still be a bug in the usercode. However PETSc does try to detect the availability of _intel_fast_memcpy() and use it from C side. I don't think this is the cause. But to verify you could remove the flag PETSC_HAVE__INTEL_FAST_MEMCPY from petscconf.h and rebuild libraries. Satish On Sun, 28 May 2006, Randall Mackie wrote: > Satish, > > Thanks, using method (2) worked. However, when I run a bt in gdb, > I get the following output: > > Loaded symbols for /lib/libnss_files.so.2 > 0x080b2631 in d3inv_3_3 () at d3inv_3_3.F:2063 > 2063 call VecAssemblyBegin(xyz,ierr) > (gdb) cont > Continuing. > > Program received signal SIGUSR1, User defined signal 1. > [Switching to Thread 1082952160 (LWP 23496)] > 0x088cd729 in _intel_fast_memcpy.J () > Current language: auto; currently fortran > (gdb) bt > #0 0x088cd729 in _intel_fast_memcpy.J () > #1 0x40620628 in for_write_dir_xmit () > from /opt/intel_fc_80/lib/libifcore.so.5 > #2 0xbfffa6b0 in ?? () > #3 0x00000008 in ?? () > #4 0xbfff986c in ?? () > #5 0xbfff9890 in ?? () > #6 0x406873a8 in __dtors_list_end () from /opt/intel_fc_80/lib/libifcore.so.5 > #7 0x00000002 in ?? () > #8 0x00000000 in ?? () > (gdb) > > This all makes me think this is an INTEL compiler bug, and has nothing to > do with my code. > > Any ideas? > > Randy > > > Satish Balay wrote: > > Looks like you have direct access to all the cluster nodes. Perhaps > > you have admin access? You can do either of the following: > > > > * if the cluster frontend/compute nodes have common filesystem [i.e > > all machines can see the same file for ~/.Xauthority] and you can get > > 'sshd' settings on the frontend changed - then: > > > > - configure sshd with 'X11UseLocalhost no' - this way xterms on the > > compute-nodes can connect to the 'ssh-x11' port on the frontend - run > > the PETSc app with: '-display frontend:ssh-x11-port' > > > > * However if the above is not possible - but you can ssh directly to > > all the the compute nodes [perhaps from the frontend] then you can > > cascade x11 forwarding with: > > > > - ssh from desktop to frontend > > - ssh from frontend to node-9 [if you know which machine is node9 > > from the machine file.] > > - If you don't know which one is the node-9 - then ssh from frontend > > to all the nodes :). Mostlikely all nodes will get a display > > 'localhost:l0.0' > > - so now you can run the executable with the option > > -display localhost:10.0 > > > > The other alternative that might work [for interactive runs] is: > > > > -start_in_debugger noxterm -debugger_nodes 9 > > > > Satish > > > > On Sat, 27 May 2006, Randall Mackie wrote: > > > > > I can't seem to get the debugger to pop up on my screen. > > > > > > When I'm logged into the cluster I'm working on, I can > > > type xterm &, and an xterm pops up on my display. So I know > > > I can get something from the remote cluster. > > > > > > Now, when I try this using PETSc, I'm getting the following error > > > message, for example: > > > > > > ------------------------------------------------------------------------ > > > [17]PETSC ERROR: PETSC: Attaching gdb to > > > /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc of pid 3628 on display > > > 24.5.142.138:0.0 on machine compute-0-23.local > > > ------------------------------------------------------------------------ > > > > > > I'm using this in my command file: > > > > > > source ~/.bashrc > > > time /opt/mpich/intel/bin/mpirun -np 20 -nolocal -machinefile machines \ > > > /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc \ > > > -start_in_debugger \ > > > -debugger_node 1 \ > > > -display 24.5.142.138:0.0 \ > > > -em_ksp_type bcgs \ > > > -em_sub_pc_type ilu \ > > > -em_sub_pc_factor_levels 8 \ > > > -em_sub_pc_factor_fill 4 \ > > > -em_sub_pc_factor_reuse_ordering \ > > > -em_sub_pc_factor_reuse_fill \ > > > -em_sub_pc_factor_mat_ordering_type rcm \ > > > -divh_ksp_type cr \ > > > -divh_sub_pc_type icc \ > > > -ppc_sub_pc_type ilu \ > > > << EOF > > > >
