Arrghhh, some bad compilers use SIGUSR1 to communicate with themselves. I have had this. Just keep typing 'cont' until the SEGV.
Matt On 5/28/06, Randall Mackie <randy at geosystem.us> wrote: > > Satish, > > Yes, PETSc was compiled in debug mode. > > Since I'm simply storing vectors in a temporary file, could I get > around this by using VecView and writing each vector to the > same Viewer in binary format, then reading them later? > > In other words: > > do loop=1,n > > call VecView (xvec(:,loop).....) > > end do > > > > then later > > > do loop=1,n > > call VecLoad (xvec(:,loop)....) > > end do > > > Randy > > ps. I'll try your other suggestions as well. However, this code has worked > flawlessly until now, with a model much much larger than I've used in the > past. > > > > Satish Balay wrote: > > - Not sure what SIGUSR1 means in this context. > > > > - The stack doesn't show any PETSc/user code. Was > > this code compiled with debug version of PETSc? > > > > - it could be that gdb is unable to look at intel compilers stack > > [normally gdb should work]. If thats the case - you could run with > > '-start_in_debugger idb'] > > > > - It appears that this breakage is from usercode which calls fortran > > I/O [for_write_dir_xmit()]. There is no fortran I/O from PETSc side > > of the code. I think it could still be a bug in the usercode. > > > > However PETSc does try to detect the availability of > > _intel_fast_memcpy() and use it from C side. I don't think this is the > > cause. But to verify you could remove the flag > > PETSC_HAVE__INTEL_FAST_MEMCPY from petscconf.h and rebuild libraries. > > > > Satish > > > > > > On Sun, 28 May 2006, Randall Mackie wrote: > > > >> Satish, > >> > >> Thanks, using method (2) worked. However, when I run a bt in gdb, > >> I get the following output: > >> > >> Loaded symbols for /lib/libnss_files.so.2 > >> 0x080b2631 in d3inv_3_3 () at d3inv_3_3.F:2063 > >> 2063 call VecAssemblyBegin(xyz,ierr) > >> (gdb) cont > >> Continuing. > >> > >> Program received signal SIGUSR1, User defined signal 1. > >> [Switching to Thread 1082952160 (LWP 23496)] > >> 0x088cd729 in _intel_fast_memcpy.J () > >> Current language: auto; currently fortran > >> (gdb) bt > >> #0 0x088cd729 in _intel_fast_memcpy.J () > >> #1 0x40620628 in for_write_dir_xmit () > >> from /opt/intel_fc_80/lib/libifcore.so.5 > >> #2 0xbfffa6b0 in ?? () > >> #3 0x00000008 in ?? () > >> #4 0xbfff986c in ?? () > >> #5 0xbfff9890 in ?? () > >> #6 0x406873a8 in __dtors_list_end () from > /opt/intel_fc_80/lib/libifcore.so.5 > >> #7 0x00000002 in ?? () > >> #8 0x00000000 in ?? () > >> (gdb) > >> > >> This all makes me think this is an INTEL compiler bug, and has nothing > to > >> do with my code. > >> > >> Any ideas? > >> > >> Randy > >> > >> > >> Satish Balay wrote: > >>> Looks like you have direct access to all the cluster nodes. Perhaps > >>> you have admin access? You can do either of the following: > >>> > >>> * if the cluster frontend/compute nodes have common filesystem [i.e > >>> all machines can see the same file for ~/.Xauthority] and you can get > >>> 'sshd' settings on the frontend changed - then: > >>> > >>> - configure sshd with 'X11UseLocalhost no' - this way xterms on the > >>> compute-nodes can connect to the 'ssh-x11' port on the frontend - > run > >>> the PETSc app with: '-display frontend:ssh-x11-port' > >>> > >>> * However if the above is not possible - but you can ssh directly to > >>> all the the compute nodes [perhaps from the frontend] then you can > >>> cascade x11 forwarding with: > >>> > >>> - ssh from desktop to frontend > >>> - ssh from frontend to node-9 [if you know which machine is node9 > >>> from the machine file.] > >>> - If you don't know which one is the node-9 - then ssh from frontend > >>> to all the nodes :). Mostlikely all nodes will get a display > >>> 'localhost:l0.0' > >>> - so now you can run the executable with the option > >>> -display localhost:10.0 > >>> > >>> The other alternative that might work [for interactive runs] is: > >>> > >>> -start_in_debugger noxterm -debugger_nodes 9 > >>> > >>> Satish > >>> > >>> On Sat, 27 May 2006, Randall Mackie wrote: > >>> > >>>> I can't seem to get the debugger to pop up on my screen. > >>>> > >>>> When I'm logged into the cluster I'm working on, I can > >>>> type xterm &, and an xterm pops up on my display. So I know > >>>> I can get something from the remote cluster. > >>>> > >>>> Now, when I try this using PETSc, I'm getting the following error > >>>> message, for example: > >>>> > >>>> > ------------------------------------------------------------------------ > >>>> [17]PETSC ERROR: PETSC: Attaching gdb to > >>>> /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc of pid 3628 on display > >>>> 24.5.142.138:0.0 on machine compute-0-23.local > >>>> > ------------------------------------------------------------------------ > >>>> > >>>> I'm using this in my command file: > >>>> > >>>> source ~/.bashrc > >>>> time /opt/mpich/intel/bin/mpirun -np 20 -nolocal -machinefile > machines \ > >>>> /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc \ > >>>> -start_in_debugger \ > >>>> -debugger_node 1 \ > >>>> -display 24.5.142.138:0.0 \ > >>>> -em_ksp_type bcgs \ > >>>> -em_sub_pc_type ilu \ > >>>> -em_sub_pc_factor_levels 8 \ > >>>> -em_sub_pc_factor_fill 4 \ > >>>> -em_sub_pc_factor_reuse_ordering \ > >>>> -em_sub_pc_factor_reuse_fill \ > >>>> -em_sub_pc_factor_mat_ordering_type rcm \ > >>>> -divh_ksp_type cr \ > >>>> -divh_sub_pc_type icc \ > >>>> -ppc_sub_pc_type ilu \ > >>>> << EOF > >> > > > > -- > Randall Mackie > GSY-USA, Inc. > PMB# 643 > 2261 Market St., > San Francisco, CA 94114-1600 > Tel (415) 469-8649 > Fax (415) 469-5044 > > California Registered Geophysicist > License No. GP 1034 > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20060528/2ac0ae93/attachment.htm>
