RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, We have for now concluded that this is probably an issue related to lam7.1.4. There were a few other users with mdrun crashes/hangs. What it the status of your problems? Berk Date: Tue, 13 Jan 2009 13:02:47 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, it hangs after approximatively 45000 steps (the system is a simple DLPC bilayer), and there was a cpt file that has been generated (but it was generated [09:48] before it started to hang [9:58]) : - [fu...@cumin 2]$ ls -ltrh [snip] -rw-r--r-- 1 fuchs dsimb 384K janv. 13 09:33 traj.trr -rw-r--r-- 1 fuchs dsimb 385K janv. 13 09:48 state.cpt -rw-r--r-- 1 fuchs dsimb 66K janv. 13 09:57 md.log -rw-r--r-- 1 fuchs dsimb 5,4M janv. 13 09:58 traj.xtc -rw-r--r-- 1 fuchs dsimb 92K janv. 13 09:58 ener.edr [fu...@cumin 2]$ date Tue Jan 13 10:16:22 CET 2009 - The version of MPI is: LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University. So shall I send you the tpr and cpt files off list ? Ciao, Patrick Berk Hess a écrit : Hi, This is strange. You run on 4 nodes and all processes hang at the same MPI call. I see no reason why they should hang if they are all at the correct call. After how many steps does this happen? If it is not much I can try to see if it also hangs on our system. Otherwise, could you try to generate a checkpoint file with which it hangs quickly? What version of MPI are you using? Berk Date: Tue, 13 Jan 2009 10:53:25 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and lam-7.1.4), using a slightly upgraded version of gcc compared to my previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on the same hardware but it still hangs (so both FC9 and FC10 give the same problem, while FC8 does not). Finally I could test mdrun_mpi in the debugger and here are the results of my tests. You were right, it seems that mdrun hangs at an MPI call, here are the outputs of each xterm: XTERM1 === GNU gdb Fedora (6.8-29.fc10) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) run Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi [Thread debugging using libthread_db enabled] [New Thread 0x12df30 (LWP 8285)] NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr NODEID=0 argc=1 :-) G R O M A C S (-: Giant Rising Ordinary Mutants for A Clerical Setup :-) VERSION 4.0.2 (-: [snip] starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water' 500 steps, 1.0 ps. ^C Program received signal SIGINT, Interrupt. 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64 (gdb) where #0 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 #1 0x00770c83 in lam_ssi_rpi_usysv_proc_read_env () #2 0x00784a39 in lam_ssi_rpi_usysv_advance_common () #3 0x0074a1e0 in _mpi_req_advance () #4 0x0073ced0 in lam_send () #5 0x0075328e in MPI_Send () #6 0x0074d7ec in MPI_Sendrecv () #7 0x004aebfd in gmx_sum_qgrid_dd () #8 0x004b40bb in gmx_pme_do () #9 0x00479a58 in do_force_lowlevel () #10 0x004d1d32 in do_force () #11 0x004214d2 in do_md () #12 0x0041bea0 in mdrunner () #13 0x00422b94 in main () (gdb) === XTERM2 === GNU gdb Fedora (6.8-29.fc10) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) run Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi [Thread
RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
On Wed, 2009-01-14 at 12:27 +0100, Berk Hess wrote: Hi, We have for now concluded that this is probably an issue related to lam7.1.4. There were a few other users with mdrun crashes/hangs. What it the status of your problems? You can try with the version in Fedora, which is debuggable and compiled against OpenMPI. # yum -y install gromacs gromacs-mpi All binaries have been renamed to start with g_, e.g. g_grompp, g_mdrun and so on. Suffixes: g_mdrun single precision version g_mdrun_d double precision version g_mdrun_mpi single precision, MPI version g_mdrun_mpi_d double precision, MPI version PS. Could somebody please add the Fedora specifics to the installation part of the webpage? At least switching to new SRPMS would be good. Feel free to use my spec and include it in the GROMACS source distribution. SPEC: https://cvs.fedoraproject.org/viewvc/devel/gromacs/gromacs.spec?view=log -- -- Jussi Lehtola, FM, Tohtorikoulutettava Fysiikan laitos, Helsingin Yliopisto jussi.leht...@helsinki.fi, p. 191 50632 -- Mr. Jussi Lehtola, M. Sc., Doctoral Student Department of Physics, University of Helsinki, Finland jussi.leht...@helsinki.fi -- ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi all, finally we (Berk and I) could find that there is a problem with lam-7.1.4 under Fedora9/Fedora10. Initially I thought it affected only gromacs-4 but a PhD student of my lab reported identical problems with gromacs-3.3 (hanging problems), while under FC8 I had no problem at all with the same hardware. So if you want to run gromacs-4 (or any version) under FC9/FC10, the fix I tested and that works is to use openmpi as an alternative to lam-7.1.4 (I only tested the last version openmpi-1.2.8). I didn't test other versions of lam (7.0.?) but it seems that the developers advice to switch to openmpi. So for the two other users (Bernhard and Antoine) who reported identical problems to the mailing list (see http://www.gromacs.org/pipermail/gmx-users/2008-December/038594.html and http://www.gromacs.org/pipermail/gmx-users/2008-December/038623.html) can you please check out that it works on your hardware using openmpi? Hope it helps, Patrick Berk Hess a écrit : Hi, We have for now concluded that this is probably an issue related to lam7.1.4. There were a few other users with mdrun crashes/hangs. What it the status of your problems? Berk Date: Tue, 13 Jan 2009 13:02:47 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, it hangs after approximatively 45000 steps (the system is a simple DLPC bilayer), and there was a cpt file that has been generated (but it was generated [09:48] before it started to hang [9:58]) : - [fu...@cumin 2]$ ls -ltrh [snip] -rw-r--r-- 1 fuchs dsimb 384K janv. 13 09:33 traj.trr -rw-r--r-- 1 fuchs dsimb 385K janv. 13 09:48 state.cpt -rw-r--r-- 1 fuchs dsimb 66K janv. 13 09:57 md.log -rw-r--r-- 1 fuchs dsimb 5,4M janv. 13 09:58 traj.xtc -rw-r--r-- 1 fuchs dsimb 92K janv. 13 09:58 ener.edr [fu...@cumin 2]$ date Tue Jan 13 10:16:22 CET 2009 - The version of MPI is: LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University. So shall I send you the tpr and cpt files off list ? Ciao, Patrick Berk Hess a écrit : Hi, This is strange. You run on 4 nodes and all processes hang at the same MPI call. I see no reason why they should hang if they are all at the correct call. After how many steps does this happen? If it is not much I can try to see if it also hangs on our system. Otherwise, could you try to generate a checkpoint file with which it hangs quickly? What version of MPI are you using? Berk Date: Tue, 13 Jan 2009 10:53:25 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and lam-7.1.4), using a slightly upgraded version of gcc compared to my previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on the same hardware but it still hangs (so both FC9 and FC10 give the same problem, while FC8 does not). Finally I could test mdrun_mpi in the debugger and here are the results of my tests. You were right, it seems that mdrun hangs at an MPI call, here are the outputs of each xterm: XTERM1 === GNU gdb Fedora (6.8-29.fc10) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) run Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi [Thread debugging using libthread_db enabled] [New Thread 0x12df30 (LWP 8285)] NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr NODEID=0 argc=1 :-) G R O M A C S (-: Giant Rising Ordinary Mutants for A Clerical Setup :-) VERSION 4.0.2 (-: [snip] starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water' 500 steps, 1.0 ps. ^C Program received signal SIGINT, Interrupt. 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64 (gdb) where #0 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 #1 0x00770c83 in lam_ssi_rpi_usysv_proc_read_env () #2 0x00784a39 in lam_ssi_rpi_usysv_advance_common () #3 0x0074a1e0 in _mpi_req_advance () #4 0x0073ced0 in lam_send () #5 0x0075328e
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
. 0x00770c70 in lam_ssi_rpi_usysv_proc_read_env () Missing separate debuginfos, use: debuginfo-install e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64 (gdb) where #0 0x00770c70 in lam_ssi_rpi_usysv_proc_read_env () #1 0x00784a39 in lam_ssi_rpi_usysv_advance_common () #2 0x0074a1e0 in _mpi_req_advance () #3 0x0073ced0 in lam_send () #4 0x0075328e in MPI_Send () #5 0x0074d7ec in MPI_Sendrecv () #6 0x004aed44 in gmx_sum_qgrid_dd () #7 0x004b40bb in gmx_pme_do () #8 0x00479a58 in do_force_lowlevel () #9 0x004d1d32 in do_force () #10 0x004214d2 in do_md () #11 0x0041bea0 in mdrunner () #12 0x00422b94 in main () (gdb) === XTERM4 === GNU gdb Fedora (6.8-29.fc10) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) run Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi [Thread debugging using libthread_db enabled] [New Thread 0x12df30 (LWP 8267)] NNODES=4, MYRANK=3, HOSTNAME=cumin.dsimb.inserm.fr NODEID=3 argc=1 ^C Program received signal SIGINT, Interrupt. 0x00770c70 in lam_ssi_rpi_usysv_proc_read_env () Missing separate debuginfos, use: debuginfo-install e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64 (gdb) where #0 0x00770c70 in lam_ssi_rpi_usysv_proc_read_env () #1 0x00784a39 in lam_ssi_rpi_usysv_advance_common () #2 0x0074a1e0 in _mpi_req_advance () #3 0x0073ea90 in MPI_Wait () #4 0x0074d800 in MPI_Sendrecv () #5 0x004aebfd in gmx_sum_qgrid_dd () #6 0x004b40bb in gmx_pme_do () #7 0x00479a58 in do_force_lowlevel () #8 0x004d1d32 in do_force () #9 0x004214d2 in do_md () #10 0x0041bea0 in mdrunner () #11 0x00422b94 in main () (gdb) === Cheers, Patrick Berk Hess a écrit : Hi, You can do something like: mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun with the appropriate settings for your system. You will have to type run in every xterm to make mdrun run. Or you can make some scripts (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds). When you think it hangs, type ctrl-c in an xterm and type where to see where it hangs. I would guess this would be in an MPI call. Berk Date: Mon, 15 Dec 2008 23:53:45 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC). I recompiled it with CFLAGS=-g and it still hangs... Now, how can we run it in the debugger ? Thanks, Patrick Berk Hess a écrit : Hi, What compiler (and compiler version) are you using? Could you configure with CFLAGS=-g and see if it still hangs? If it also hangs in that case, we can run it in the debugger and find out where it hangs. Berk Date: Mon, 15 Dec 2008 16:32:31 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4
RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, This is strange. You run on 4 nodes and all processes hang at the same MPI call. I see no reason why they should hang if they are all at the correct call. After how many steps does this happen? If it is not much I can try to see if it also hangs on our system. Otherwise, could you try to generate a checkpoint file with which it hangs quickly? What version of MPI are you using? Berk Date: Tue, 13 Jan 2009 10:53:25 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and lam-7.1.4), using a slightly upgraded version of gcc compared to my previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on the same hardware but it still hangs (so both FC9 and FC10 give the same problem, while FC8 does not). Finally I could test mdrun_mpi in the debugger and here are the results of my tests. You were right, it seems that mdrun hangs at an MPI call, here are the outputs of each xterm: XTERM1 === GNU gdb Fedora (6.8-29.fc10) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) run Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi [Thread debugging using libthread_db enabled] [New Thread 0x12df30 (LWP 8285)] NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr NODEID=0 argc=1 :-) G R O M A C S (-: Giant Rising Ordinary Mutants for A Clerical Setup :-) VERSION 4.0.2 (-: [snip] starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water' 500 steps, 1.0 ps. ^C Program received signal SIGINT, Interrupt. 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64 (gdb) where #0 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 #1 0x00770c83 in lam_ssi_rpi_usysv_proc_read_env () #2 0x00784a39 in lam_ssi_rpi_usysv_advance_common () #3 0x0074a1e0 in _mpi_req_advance () #4 0x0073ced0 in lam_send () #5 0x0075328e in MPI_Send () #6 0x0074d7ec in MPI_Sendrecv () #7 0x004aebfd in gmx_sum_qgrid_dd () #8 0x004b40bb in gmx_pme_do () #9 0x00479a58 in do_force_lowlevel () #10 0x004d1d32 in do_force () #11 0x004214d2 in do_md () #12 0x0041bea0 in mdrunner () #13 0x00422b94 in main () (gdb) === XTERM2 === GNU gdb Fedora (6.8-29.fc10) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) run Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi [Thread debugging using libthread_db enabled] [New Thread 0x12df30 (LWP 8294)] NNODES=4, MYRANK=1, HOSTNAME=cumin.dsimb.inserm.fr NODEID=1 argc=1 ^C Program received signal SIGINT, Interrupt. 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64 (gdb) where #0 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 #1 0x00770c83 in lam_ssi_rpi_usysv_proc_read_env () #2 0x00784a39 in lam_ssi_rpi_usysv_advance_common () #3 0x0074a1e0 in _mpi_req_advance () #4 0x0073ea90 in MPI_Wait () #5 0x0074d800 in MPI_Sendrecv () #6 0x004aed44 in gmx_sum_qgrid_dd () #7 0x004b40bb in gmx_pme_do () #8 0x00479a58 in do_force_lowlevel () #9 0x004d1d32 in do_force () #10 0x004214d2 in do_md () #11 0x0041bea0 in mdrunner () #12 0x00422b94 in main () (gdb) === XTERM3
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi Berk, it hangs after approximatively 45000 steps (the system is a simple DLPC bilayer), and there was a cpt file that has been generated (but it was generated [09:48] before it started to hang [9:58]) : - [fu...@cumin 2]$ ls -ltrh [snip] -rw-r--r-- 1 fuchs dsimb 384K janv. 13 09:33 traj.trr -rw-r--r-- 1 fuchs dsimb 385K janv. 13 09:48 state.cpt -rw-r--r-- 1 fuchs dsimb 66K janv. 13 09:57 md.log -rw-r--r-- 1 fuchs dsimb 5,4M janv. 13 09:58 traj.xtc -rw-r--r-- 1 fuchs dsimb 92K janv. 13 09:58 ener.edr [fu...@cumin 2]$ date Tue Jan 13 10:16:22 CET 2009 - The version of MPI is: LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University. So shall I send you the tpr and cpt files off list ? Ciao, Patrick Berk Hess a écrit : Hi, This is strange. You run on 4 nodes and all processes hang at the same MPI call. I see no reason why they should hang if they are all at the correct call. After how many steps does this happen? If it is not much I can try to see if it also hangs on our system. Otherwise, could you try to generate a checkpoint file with which it hangs quickly? What version of MPI are you using? Berk Date: Tue, 13 Jan 2009 10:53:25 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and lam-7.1.4), using a slightly upgraded version of gcc compared to my previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on the same hardware but it still hangs (so both FC9 and FC10 give the same problem, while FC8 does not). Finally I could test mdrun_mpi in the debugger and here are the results of my tests. You were right, it seems that mdrun hangs at an MPI call, here are the outputs of each xterm: XTERM1 === GNU gdb Fedora (6.8-29.fc10) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) run Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi [Thread debugging using libthread_db enabled] [New Thread 0x12df30 (LWP 8285)] NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr NODEID=0 argc=1 :-) G R O M A C S (-: Giant Rising Ordinary Mutants for A Clerical Setup :-) VERSION 4.0.2 (-: [snip] starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water' 500 steps, 1.0 ps. ^C Program received signal SIGINT, Interrupt. 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64 (gdb) where #0 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 #1 0x00770c83 in lam_ssi_rpi_usysv_proc_read_env () #2 0x00784a39 in lam_ssi_rpi_usysv_advance_common () #3 0x0074a1e0 in _mpi_req_advance () #4 0x0073ced0 in lam_send () #5 0x0075328e in MPI_Send () #6 0x0074d7ec in MPI_Sendrecv () #7 0x004aebfd in gmx_sum_qgrid_dd () #8 0x004b40bb in gmx_pme_do () #9 0x00479a58 in do_force_lowlevel () #10 0x004d1d32 in do_force () #11 0x004214d2 in do_md () #12 0x0041bea0 in mdrunner () #13 0x00422b94 in main () (gdb) === XTERM2 === GNU gdb Fedora (6.8-29.fc10) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) run Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi [Thread debugging using libthread_db enabled] [New Thread 0x12df30 (LWP 8294)] NNODES=4, MYRANK=1, HOSTNAME=cumin.dsimb.inserm.fr NODEID=1 argc=1 ^C Program received signal SIGINT, Interrupt. 0x003b978cc087 in sched_yield () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64 (gdb) where #0
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi Berk, I tried this fix, but mdrun_mpi is still hanging. I'll try to use the debugger by the end of the week and let you know. Cheers, Patrick Berk Hess a écrit : Hi, My guess is that this is not the problem. But it is very easy to try, so please do. The diff for src/gmxlib/pbc.c is: 392c392,393 try[d] == 0; --- try[d] = 0; pos[d] = 0; Berk Date: Tue, 6 Jan 2009 18:37:20 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I also fixed a problem with unitialized variables for pbc calculations in trilinic boxes. But up till now I have not observed any effect of this bug. Is your box triclinic? Yes it is. So shall I test your corrected version ? Patrick Express yourself instantly with MSN Messenger! MSN Messenger http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/ ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php -- _ new E-mail address: patrick.fu...@univ-paris-diderot.fr new postal address !!! Patrick FUCHS Equipe de Bioinformatique Genomique et Moleculaire INTS, INSERM UMR-S726, Université Paris Diderot, 6 rue Alexandre Cabanel, 75015 Paris Tel : +33 (0)1-44-49-30-57 - Fax : +33 (0)1-47-34-74-31 Web Site: http://www.dsimb.inserm.fr/~fuchs ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php
RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, I just fixed a bug with virtual sites that were a single charge group. Do you have virtual sites in your system? Berk Date: Wed, 17 Dec 2008 16:55:55 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, thanks for the trick. Unfortunately I'm not in my lab right now and can't open easily xterms over the network. I'll try to catch up once I'm back (end of December), unless Bernhard or Antoine find the solution. Cheers, Patrick Berk Hess a écrit : Hi, You can do something like: mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun with the appropriate settings for your system. You will have to type run in every xterm to make mdrun run. Or you can make some scripts (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds). When you think it hangs, type ctrl-c in an xterm and type where to see where it hangs. I would guess this would be in an MPI call. Berk Date: Mon, 15 Dec 2008 23:53:45 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC). I recompiled it with CFLAGS=-g and it still hangs... Now, how can we run it in the debugger ? Thanks, Patrick Berk Hess a écrit : Hi, What compiler (and compiler version) are you using? Could you configure with CFLAGS=-g and see if it still hangs? If it also hangs in that case, we can run it in the debugger and find out where it hangs. Berk Date: Mon, 15 Dec 2008 16:32:31 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4...? Cheers, Patrick Berk Hess a écrit : Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 The predcition for 1st of July is not surprising since I am always parameterizing the simulation with 200ns to avoid to restart it if something interesting happens in the last frames. for the .log file it is: Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi Berk, no I don't have virtual sites so this might not be the cause of my problem. Ciao, Patrick Berk Hess a écrit : Hi, I just fixed a bug with virtual sites that were a single charge group. Do you have virtual sites in your system? Berk Date: Wed, 17 Dec 2008 16:55:55 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, thanks for the trick. Unfortunately I'm not in my lab right now and can't open easily xterms over the network. I'll try to catch up once I'm back (end of December), unless Bernhard or Antoine find the solution. Cheers, Patrick Berk Hess a écrit : Hi, You can do something like: mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun with the appropriate settings for your system. You will have to type run in every xterm to make mdrun run. Or you can make some scripts (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds). When you think it hangs, type ctrl-c in an xterm and type where to see where it hangs. I would guess this would be in an MPI call. Berk Date: Mon, 15 Dec 2008 23:53:45 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC). I recompiled it with CFLAGS=-g and it still hangs... Now, how can we run it in the debugger ? Thanks, Patrick Berk Hess a écrit : Hi, What compiler (and compiler version) are you using? Could you configure with CFLAGS=-g and see if it still hangs? If it also hangs in that case, we can run it in the debugger and find out where it hangs. Berk Date: Mon, 15 Dec 2008 16:32:31 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4...? Cheers, Patrick Berk Hess a écrit : Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 The predcition for 1st of July is not surprising since I am always parameterizing the simulation with 200ns to avoid to restart it if something interesting happens in the last frames. for the .log file
RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, I also fixed a problem with unitialized variables for pbc calculations in trilinic boxes. But up till now I have not observed any effect of this bug. Is your box triclinic? Berk Date: Tue, 6 Jan 2009 17:08:57 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, no I don't have virtual sites so this might not be the cause of my problem. Ciao, Patrick Berk Hess a écrit : Hi, I just fixed a bug with virtual sites that were a single charge group. Do you have virtual sites in your system? Berk Date: Wed, 17 Dec 2008 16:55:55 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, thanks for the trick. Unfortunately I'm not in my lab right now and can't open easily xterms over the network. I'll try to catch up once I'm back (end of December), unless Bernhard or Antoine find the solution. Cheers, Patrick Berk Hess a écrit : Hi, You can do something like: mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun with the appropriate settings for your system. You will have to type run in every xterm to make mdrun run. Or you can make some scripts (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds). When you think it hangs, type ctrl-c in an xterm and type where to see where it hangs. I would guess this would be in an MPI call. Berk Date: Mon, 15 Dec 2008 23:53:45 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC). I recompiled it with CFLAGS=-g and it still hangs... Now, how can we run it in the debugger ? Thanks, Patrick Berk Hess a écrit : Hi, What compiler (and compiler version) are you using? Could you configure with CFLAGS=-g and see if it still hangs? If it also hangs in that case, we can run it in the debugger and find out where it hangs. Berk Date: Mon, 15 Dec 2008 16:32:31 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4...? Cheers, Patrick Berk Hess a écrit : Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, I also fixed a problem with unitialized variables for pbc calculations in trilinic boxes. But up till now I have not observed any effect of this bug. Is your box triclinic? Yes it is. So shall I test your corrected version ? Patrick Berk Date: Tue, 6 Jan 2009 17:08:57 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, no I don't have virtual sites so this might not be the cause of my problem. Ciao, Patrick Berk Hess a écrit : Hi, I just fixed a bug with virtual sites that were a single charge group. Do you have virtual sites in your system? Berk Date: Wed, 17 Dec 2008 16:55:55 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, thanks for the trick. Unfortunately I'm not in my lab right now and can't open easily xterms over the network. I'll try to catch up once I'm back (end of December), unless Bernhard or Antoine find the solution. Cheers, Patrick Berk Hess a écrit : Hi, You can do something like: mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun with the appropriate settings for your system. You will have to type run in every xterm to make mdrun run. Or you can make some scripts (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds). When you think it hangs, type ctrl-c in an xterm and type where to see where it hangs. I would guess this would be in an MPI call. Berk Date: Mon, 15 Dec 2008 23:53:45 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC). I recompiled it with CFLAGS=-g and it still hangs... Now, how can we run it in the debugger ? Thanks, Patrick Berk Hess a écrit : Hi, What compiler (and compiler version) are you using? Could you configure with CFLAGS=-g and see if it still hangs? If it also hangs in that case, we can run it in the debugger and find out where it hangs. Berk Date: Mon, 15 Dec 2008 16:32:31 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4...? Cheers, Patrick Berk Hess a écrit : Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey
RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, My guess is that this is not the problem. But it is very easy to try, so please do. The diff for src/gmxlib/pbc.c is: 392c392,393 try[d] == 0; --- try[d] = 0; pos[d] = 0; Berk Date: Tue, 6 Jan 2009 18:37:20 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I also fixed a problem with unitialized variables for pbc calculations in trilinic boxes. But up till now I have not observed any effect of this bug. Is your box triclinic? Yes it is. So shall I test your corrected version ? Patrick _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi Berk, thanks for the trick. Unfortunately I'm not in my lab right now and can't open easily xterms over the network. I'll try to catch up once I'm back (end of December), unless Bernhard or Antoine find the solution. Cheers, Patrick Berk Hess a écrit : Hi, You can do something like: mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun with the appropriate settings for your system. You will have to type run in every xterm to make mdrun run. Or you can make some scripts (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds). When you think it hangs, type ctrl-c in an xterm and type where to see where it hangs. I would guess this would be in an MPI call. Berk Date: Mon, 15 Dec 2008 23:53:45 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC). I recompiled it with CFLAGS=-g and it still hangs... Now, how can we run it in the debugger ? Thanks, Patrick Berk Hess a écrit : Hi, What compiler (and compiler version) are you using? Could you configure with CFLAGS=-g and see if it still hangs? If it also hangs in that case, we can run it in the debugger and find out where it hangs. Berk Date: Mon, 15 Dec 2008 16:32:31 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4...? Cheers, Patrick Berk Hess a écrit : Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 The predcition for 1st of July is not surprising since I am always parameterizing the simulation with 200ns to avoid to restart it if something interesting happens in the last frames. for the .log file it is: Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04 LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05 Total Energy Temperature Pressure (bar) Cons. rmsd () -1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05 DD step 88999 load imb.: force 3.1% Step Time Lambda 89000 178.2 0.0 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 8.03089e+03 3.59681e+03
RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, You can do something like: mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun with the appropriate settings for your system. You will have to type run in every xterm to make mdrun run. Or you can make some scripts (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds). When you think it hangs, type ctrl-c in an xterm and type where to see where it hangs. I would guess this would be in an MPI call. Berk Date: Mon, 15 Dec 2008 23:53:45 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi Berk, I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC). I recompiled it with CFLAGS=-g and it still hangs... Now, how can we run it in the debugger ? Thanks, Patrick Berk Hess a écrit : Hi, What compiler (and compiler version) are you using? Could you configure with CFLAGS=-g and see if it still hangs? If it also hangs in that case, we can run it in the debugger and find out where it hangs. Berk Date: Mon, 15 Dec 2008 16:32:31 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4...? Cheers, Patrick Berk Hess a écrit : Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 The predcition for 1st of July is not surprising since I am always parameterizing the simulation with 200ns to avoid to restart it if something interesting happens in the last frames. for the .log file it is: Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04 LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05 Total Energy Temperature Pressure (bar) Cons. rmsd () -1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05 DD step 88999 load imb.: force 3.1% Step Time Lambda 89000 178.2 0.0 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 8.03089e+03 3.59681e+03 2.42628e+03 1.20942e+03 5.12341e+04 LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+06 4.56901e+05 Total Energy Temperature Pressure (bar) Cons. rmsd () -1.93239e+06 3.09508e+02 1.64627e+01 2.08518e-05
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4...? Cheers, Patrick Berk Hess a écrit : Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 The predcition for 1st of July is not surprising since I am always parameterizing the simulation with 200ns to avoid to restart it if something interesting happens in the last frames. for the .log file it is: Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04 LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05 Total Energy Temperature Pressure (bar) Cons. rmsd () -1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05 DD step 88999 load imb.: force 3.1% Step Time Lambda 89000 178.2 0.0 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 8.03089e+03 3.59681e+03 2.42628e+03 1.20942e+03 5.12341e+04 LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+06 4.56901e+05 Total Energy Temperature Pressure (bar) Cons. rmsd () -1.93239e+06 3.09508e+02 1.64627e+01 2.08518e-05 the disk is also free df -h says 2.3G out of 666G used. The only difference between the system with gromacs 3.3 and gromacs 4 is that gromacs 4 is running under suse 11 while gromacs 3.3 is running on a node with suse 10. But I dont think this can be the problem? cheers Bernhard ___ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php Express yourself instantly with MSN Messenger! MSN Messenger http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/ ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php -- _ new E-mail address: patrick.fu...@univ-paris-diderot.fr new postal address !!! Patrick FUCHS Equipe de Bioinformatique Genomique et
RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, What compiler (and compiler version) are you using? Could you configure with CFLAGS=-g and see if it still hangs? If it also hangs in that case, we can run it in the debugger and find out where it hangs. Berk Date: Mon, 15 Dec 2008 16:32:31 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4...? Cheers, Patrick Berk Hess a écrit : Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 The predcition for 1st of July is not surprising since I am always parameterizing the simulation with 200ns to avoid to restart it if something interesting happens in the last frames. for the .log file it is: Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04 LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05 Total Energy Temperature Pressure (bar) Cons. rmsd () -1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05 DD step 88999 load imb.: force 3.1% Step Time Lambda 89000 178.2 0.0 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 8.03089e+03 3.59681e+03 2.42628e+03 1.20942e+03 5.12341e+04 LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+06 4.56901e+05 Total Energy Temperature Pressure (bar) Cons. rmsd () -1.93239e+06 3.09508e+02 1.64627e+01 2.08518e-05 the disk is also free df -h says 2.3G out of 666G used. The only difference between the system with gromacs 3.3 and gromacs 4 is that gromacs 4 is running under suse 11 while gromacs 3.3 is running on a node with suse 10. But I dont think this can be the problem? cheers Bernhard ___ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php Express yourself instantly with MSN Messenger! MSN Messenger http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01
RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 The predcition for 1st of July is not surprising since I am always parameterizing the simulation with 200ns to avoid to restart it if something interesting happens in the last frames. for the .log file it is: Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008 Energies (kJ/mol) G96AngleProper Dih. Improper Dih. LJ-14 Coulomb-14 7.83753e+033.64068e+032.45951e+031.29167e+035.13688e+04 LJ (SR) Coulomb (SR) Coul. recip. PotentialKinetic En. 3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+064.57648e+05 Total EnergyTemperature Pressure (bar) Cons. rmsd () -1.93355e+063.10014e+021.09267e-012.14030e-05 DD step 88999 load imb.: force 3.1% Step Time Lambda 89000 178.20.0 Energies (kJ/mol) G96AngleProper Dih. Improper Dih. LJ-14 Coulomb-14 8.03089e+033.59681e+032.42628e+031.20942e+035.12341e+04 LJ (SR) Coulomb (SR) Coul. recip. PotentialKinetic En. 3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+064.56901e+05 Total EnergyTemperature Pressure (bar) Cons. rmsd () -1.93239e+063.09508e+021.64627e+012.08518e-05 the disk is also free df -h says 2.3G out of 666G used. The only difference between the system with gromacs 3.3 and gromacs 4 is that gromacs 4 is running under suse 11 while gromacs 3.3 is running on a node with suse 10. But I dont think this can be the problem? cheers Bernhard ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi, all gmx users and devs Just a (long) word to tell you a meet the same issue as Bernhard : mdrun stucks as in an infinite loop or lose some output file pointers after a while. The story : mdrun hangs after a variable number of steps (40.000 to 260.000 steps). Outputs are suspended in the shell (mdrun with -v option) as in the md log file (idem Bernhard's ouputs) but CPU(s) still runnning endless (10 hours before i killed the job). Sometimes it induces a complete freeze of the machine and a reboot is needed. No error logged (md.log or syslog). Conditions : This appends with gromacs 4.0.0 and 4.0.2 recompiled from binaries, using mpi or not (mpi3.3.3-1.x86_64 from gromacs website rpm). Computer is an Intel q9...@3.5ghz running OpenSuSE 11.0_x86_64 with ~400Go free (raid 1 HD) and 4Go DDR3. Tries : I first thought this was because of my overclocking parameters but other jobs run perfectly with full cpu load over several days and mdrun also hangs with standard clock settings. OpenSuSE is stable, no problem of any kind with file management or long duration jobs (docking jobs running fine). So now i suspect my md parameters (excessive cutoff distances with PBC perhaps or use of temperature and presure coupling ?). As I'm a noob in md i first suspect the fault is mine and try to fix it by myself (without success for now) before asking help. Still some tries to do but ... Consolation : If Bernhard can run his job with gromacs 3.3 and not with 4.0 perhaps i'm not so stupid ... I follow this thread with interrest ! Antoine -- Antoine Fortuné Ingenieur Modelisation Moleculaire DPM - UMR5063 UJF/CNRS (http://dpm.ujf-grenoble.fr) Pole Chimie bat. E - BP53 - 38042 GRENOBLE CEDEX 9 Tel : 33+ 0 476635292 - Fax : 33+ 0 476635298 ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Hi Berk, I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC). I recompiled it with CFLAGS=-g and it still hangs... Now, how can we run it in the debugger ? Thanks, Patrick Berk Hess a écrit : Hi, What compiler (and compiler version) are you using? Could you configure with CFLAGS=-g and see if it still hangs? If it also hangs in that case, we can run it in the debugger and find out where it hangs. Berk Date: Mon, 15 Dec 2008 16:32:31 +0100 From: patrick.fu...@univ-paris-diderot.fr To: gmx-users@gromacs.org Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Hi, I have exactly the same problem under Fedora 9 on a dual-quadricore (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, it even hangs very quickly before the simulation reaches the writing of the first checkpoint file (in fact the time length before the hang occurs is chaotic, sometimes a couple of minutes, or a few seconds). The CPUs are still loaded but nothing goes to the output (on any file log, xtc, trr, edr...). All gromacs binaries were standardly compiled with --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't see anything strange in the log file. I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) under Fedora 8 and the same system (same mdp, topology etc...) is running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So would it be possible that there's something wrong going on with FC9 and lam-7.1.4...? Cheers, Patrick Berk Hess a écrit : Hi, If your simulations no longer produce output, but still run and there is no error or warning message, my guess would be that they are waiting for MPI communication. But the developers any many users are using 4.0 and I have not heard from problems like this, so I wonder if the problem could be somewhere else. Could you (or have your tried to) continue your simulation from the last checkpoint (mdrun option -cpi) before the hang, to see if it crashes quickly then? Berk Date: Fri, 12 Dec 2008 13:42:43 +0100 From: bernhard.kn...@meduniwien.ac.at To: gmx-users@gromacs.org Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug? Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 The predcition for 1st of July is not surprising since I am always parameterizing the simulation with 200ns to avoid to restart it if something interesting happens in the last frames. for the .log file it is: Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04 LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05 Total Energy Temperature Pressure (bar) Cons. rmsd () -1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05 DD step 88999 load imb.: force 3.1% Step Time Lambda 89000 178.2 0.0 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 8.03089e+03 3.59681e+03 2.42628e+03 1.20942e+03 5.12341e+04 LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+06 4.56901e+05 Total Energy Temperature Pressure (bar) Cons. rmsd () -1.93239e+06 3.09508e+02 1.64627e+01 2.08518e-05 the disk is also free df -h says 2.3G out of 666G used. The only difference between the system with gromacs 3.3 and gromacs 4 is that gromacs 4 is running under suse 11 while gromacs 3.3 is running on a node with suse 10. But I dont think this can be the problem? cheers Bernhard ___ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php Express
Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 The predcition for 1st of July is not surprising since I am always parameterizing the simulation with 200ns to avoid to restart it if something interesting happens in the last frames. for the .log file it is: Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008 Energies (kJ/mol) G96AngleProper Dih. Improper Dih. LJ-14 Coulomb-14 7.83753e+033.64068e+032.45951e+031.29167e+035.13688e+04 LJ (SR) Coulomb (SR) Coul. recip. PotentialKinetic En. 3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+064.57648e+05 Total EnergyTemperature Pressure (bar) Cons. rmsd () -1.93355e+063.10014e+021.09267e-012.14030e-05 DD step 88999 load imb.: force 3.1% Step Time Lambda 89000 178.20.0 Energies (kJ/mol) G96AngleProper Dih. Improper Dih. LJ-14 Coulomb-14 8.03089e+033.59681e+032.42628e+031.20942e+035.12341e+04 LJ (SR) Coulomb (SR) Coul. recip. PotentialKinetic En. 3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+064.56901e+05 Total EnergyTemperature Pressure (bar) Cons. rmsd () -1.93239e+063.09508e+021.64627e+012.08518e-05 the disk is also free df -h says 2.3G out of 666G used. The only difference between the system with gromacs 3.3 and gromacs 4 is that gromacs 4 is running under suse 11 while gromacs 3.3 is running on a node with suse 10. But I dont think this can be the problem? cheers Bernhard ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
Bernhard Knapp wrote: Mark wrote: What's happening in the log files? What's the latest information in the checkpoint files? Could there be some issue with file system availability? Hi Mark Unfortunaltey I already deleted the simulation files which got stuck after 847ps. But here is the output of another simulation done on the same system but with an other pdb file. This one gets stuck after 179ps with the following output: The latest thing the checkpoint file says is: imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009 imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009 OK so there's no indication from GROMACS that it's experiencing trauma within the simulation. So now you need to try to restart close to the point you saw problems to see if the problem is reproducible. If it's not reproducible, then my guess, as last time, is that an NFS share is becoming unavailable, or some such. There are lots of other possibilities - a bug in GROMACS seems unlikely at this stage. Mark ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php