RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-14 Thread Berk Hess

Hi,



We have for now concluded that this is probably an issue related to lam7.1.4.



There were a few other users with mdrun crashes/hangs.

What it the status of your problems?



Berk


 Date: Tue, 13 Jan 2009 13:02:47 +0100
 From: patrick.fu...@univ-paris-diderot.fr
 To: gmx-users@gromacs.org
 Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
 Hi Berk,
 it hangs after approximatively 45000 steps (the system is a simple DLPC 
 bilayer), and there was a cpt file that has been generated (but it was 
 generated [09:48] before it started to hang [9:58]) :
 -
 [fu...@cumin 2]$ ls -ltrh
 [snip]
 -rw-r--r-- 1 fuchs dsimb 384K janv. 13 09:33 traj.trr
 -rw-r--r-- 1 fuchs dsimb 385K janv. 13 09:48 state.cpt
 -rw-r--r-- 1 fuchs dsimb  66K janv. 13 09:57 md.log
 -rw-r--r-- 1 fuchs dsimb 5,4M janv. 13 09:58 traj.xtc
 -rw-r--r-- 1 fuchs dsimb  92K janv. 13 09:58 ener.edr
 [fu...@cumin 2]$ date
 Tue Jan 13 10:16:22 CET 2009
 -
 The version of MPI is: LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University.
 So shall I send you the tpr and cpt files off list ?
 Ciao,
 
 Patrick
 
 Berk Hess a écrit :
  Hi,
  
  This is strange.
  You run on 4 nodes and all processes hang at the same MPI call.
  I see no reason why they should hang if they are all at the correct call.
  
  After how many steps does this happen?
  If it is not much I can try to see if it also hangs on our system.
  Otherwise, could you try to generate a checkpoint file with
  which it hangs quickly?
  
  What version of MPI are you using?
  
  Berk
  
  
Date: Tue, 13 Jan 2009 10:53:25 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi Berk,
I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and
lam-7.1.4), using a slightly upgraded version of gcc compared to my
previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on the same
hardware but it still hangs (so both FC9 and FC10 give the same problem,
while FC8 does not). Finally I could test mdrun_mpi in the debugger and
here are the results of my tests. You were right, it seems that mdrun
hangs at an MPI call, here are the outputs of each xterm:
   
XTERM1
===
GNU gdb Fedora (6.8-29.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type show copying
and show warranty for details.
This GDB was configured as x86_64-redhat-linux-gnu...
(gdb) run
Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
[Thread debugging using libthread_db enabled]
[New Thread 0x12df30 (LWP 8285)]
NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr
NODEID=0 argc=1
:-) G R O M A C S (-:
   
Giant Rising Ordinary Mutants for A Clerical Setup
   
:-) VERSION 4.0.2 (-:
   
[snip]
   
starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water'
500 steps, 1.0 ps.
^C
Program received signal SIGINT, Interrupt.
0x003b978cc087 in sched_yield () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
(gdb) where
#0 0x003b978cc087 in sched_yield () from /lib64/libc.so.6
#1 0x00770c83 in lam_ssi_rpi_usysv_proc_read_env ()
#2 0x00784a39 in lam_ssi_rpi_usysv_advance_common ()
#3 0x0074a1e0 in _mpi_req_advance ()
#4 0x0073ced0 in lam_send ()
#5 0x0075328e in MPI_Send ()
#6 0x0074d7ec in MPI_Sendrecv ()
#7 0x004aebfd in gmx_sum_qgrid_dd ()
#8 0x004b40bb in gmx_pme_do ()
#9 0x00479a58 in do_force_lowlevel ()
#10 0x004d1d32 in do_force ()
#11 0x004214d2 in do_md ()
#12 0x0041bea0 in mdrunner ()
#13 0x00422b94 in main ()
(gdb)
===
   
   
XTERM2
===
GNU gdb Fedora (6.8-29.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type show copying
and show warranty for details.
This GDB was configured as x86_64-redhat-linux-gnu...
(gdb) run
Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
[Thread

RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-14 Thread Jussi Lehtola
On Wed, 2009-01-14 at 12:27 +0100, Berk Hess wrote:
 Hi,
 
 We have for now concluded that this is probably an issue related to
 lam7.1.4.
 
 There were a few other users with mdrun crashes/hangs.
 What it the status of your problems?

You can try with the version in Fedora, which is debuggable and compiled
against OpenMPI.

 # yum -y install gromacs gromacs-mpi

All binaries have been renamed to start with g_, e.g. g_grompp, g_mdrun
and so on.

Suffixes:
g_mdrun single precision version
g_mdrun_d   double precision version
g_mdrun_mpi single precision, MPI version
g_mdrun_mpi_d   double precision, MPI version

PS. Could somebody please add the Fedora specifics to the installation
part of the webpage? At least switching to new SRPMS would be good. Feel
free to use my spec and include it in the GROMACS source distribution.

SPEC:
https://cvs.fedoraproject.org/viewvc/devel/gromacs/gromacs.spec?view=log
-- 
--
Jussi Lehtola, FM, Tohtorikoulutettava
Fysiikan laitos, Helsingin Yliopisto
jussi.leht...@helsinki.fi, p. 191 50632
--
Mr. Jussi Lehtola, M. Sc., Doctoral Student
Department of Physics, University of Helsinki, Finland
jussi.leht...@helsinki.fi
--

___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php


Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-14 Thread patrick fuchs

Hi all,
finally we (Berk and I) could find that there is a problem with 
lam-7.1.4 under Fedora9/Fedora10. Initially I thought it affected only 
gromacs-4 but a PhD student of my lab reported identical problems with 
gromacs-3.3 (hanging problems), while under FC8 I had no problem at all 
with the same hardware. So if you want to run gromacs-4 (or any version) 
under FC9/FC10, the fix I tested and that works is to use openmpi as an 
alternative to lam-7.1.4 (I only tested the last version openmpi-1.2.8). 
I didn't test other versions of lam (7.0.?) but it seems that the 
developers advice to switch to openmpi. So  for the two other users 
(Bernhard and Antoine) who reported identical problems to the mailing 
list (see 
http://www.gromacs.org/pipermail/gmx-users/2008-December/038594.html and 
http://www.gromacs.org/pipermail/gmx-users/2008-December/038623.html) 
can you please check out that it works on your hardware using openmpi?

Hope it helps,

Patrick

Berk Hess a écrit :

Hi,

We have for now concluded that this is probably an issue related to 
lam7.1.4.


There were a few other users with mdrun crashes/hangs.
What it the status of your problems?

Berk


  Date: Tue, 13 Jan 2009 13:02:47 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi Berk,
  it hangs after approximatively 45000 steps (the system is a simple DLPC
  bilayer), and there was a cpt file that has been generated (but it was
  generated [09:48] before it started to hang [9:58]) :
  -
  [fu...@cumin 2]$ ls -ltrh
  [snip]
  -rw-r--r-- 1 fuchs dsimb 384K janv. 13 09:33 traj.trr
  -rw-r--r-- 1 fuchs dsimb 385K janv. 13 09:48 state.cpt
  -rw-r--r-- 1 fuchs dsimb 66K janv. 13 09:57 md.log
  -rw-r--r-- 1 fuchs dsimb 5,4M janv. 13 09:58 traj.xtc
  -rw-r--r-- 1 fuchs dsimb 92K janv. 13 09:58 ener.edr
  [fu...@cumin 2]$ date
  Tue Jan 13 10:16:22 CET 2009
  -
  The version of MPI is: LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University.
  So shall I send you the tpr and cpt files off list ?
  Ciao,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   This is strange.
   You run on 4 nodes and all processes hang at the same MPI call.
   I see no reason why they should hang if they are all at the correct 
call.

  
   After how many steps does this happen?
   If it is not much I can try to see if it also hangs on our system.
   Otherwise, could you try to generate a checkpoint file with
   which it hangs quickly?
  
   What version of MPI are you using?
  
   Berk
  
  
Date: Tue, 13 Jan 2009 10:53:25 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi Berk,
I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and
lam-7.1.4), using a slightly upgraded version of gcc compared to my
previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on 
the same
hardware but it still hangs (so both FC9 and FC10 give the same 
problem,
while FC8 does not). Finally I could test mdrun_mpi in the 
debugger and

here are the results of my tests. You were right, it seems that mdrun
hangs at an MPI call, here are the outputs of each xterm:
   
XTERM1
===
GNU gdb Fedora (6.8-29.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type show 
copying

and show warranty for details.
This GDB was configured as x86_64-redhat-linux-gnu...
(gdb) run
Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
[Thread debugging using libthread_db enabled]
[New Thread 0x12df30 (LWP 8285)]
NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr
NODEID=0 argc=1
:-) G R O M A C S (-:
   
Giant Rising Ordinary Mutants for A Clerical Setup
   
:-) VERSION 4.0.2 (-:
   
[snip]
   
starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water'
500 steps, 1.0 ps.
^C
Program received signal SIGINT, Interrupt.
0x003b978cc087 in sched_yield () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
(gdb) where
#0 0x003b978cc087 in sched_yield () from /lib64/libc.so.6
#1 0x00770c83 in lam_ssi_rpi_usysv_proc_read_env ()
#2 0x00784a39 in lam_ssi_rpi_usysv_advance_common ()
#3 0x0074a1e0 in _mpi_req_advance ()
#4 0x0073ced0 in lam_send ()
#5 0x0075328e

Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-13 Thread patrick fuchs
.
0x00770c70 in lam_ssi_rpi_usysv_proc_read_env ()
Missing separate debuginfos, use: debuginfo-install 
e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 
libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 
libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 
libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64

(gdb) where
#0  0x00770c70 in lam_ssi_rpi_usysv_proc_read_env ()
#1  0x00784a39 in lam_ssi_rpi_usysv_advance_common ()
#2  0x0074a1e0 in _mpi_req_advance ()
#3  0x0073ced0 in lam_send ()
#4  0x0075328e in MPI_Send ()
#5  0x0074d7ec in MPI_Sendrecv ()
#6  0x004aed44 in gmx_sum_qgrid_dd ()
#7  0x004b40bb in gmx_pme_do ()
#8  0x00479a58 in do_force_lowlevel ()
#9  0x004d1d32 in do_force ()
#10 0x004214d2 in do_md ()
#11 0x0041bea0 in mdrunner ()
#12 0x00422b94 in main ()
(gdb)
===


XTERM4
===
GNU gdb Fedora (6.8-29.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
http://gnu.org/licenses/gpl.html

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-redhat-linux-gnu...
(gdb) run
Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
[Thread debugging using libthread_db enabled]
[New Thread 0x12df30 (LWP 8267)]
NNODES=4, MYRANK=3, HOSTNAME=cumin.dsimb.inserm.fr
NODEID=3 argc=1
^C
Program received signal SIGINT, Interrupt.
0x00770c70 in lam_ssi_rpi_usysv_proc_read_env ()
Missing separate debuginfos, use: debuginfo-install 
e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 
libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 
libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 
libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64

(gdb) where
#0  0x00770c70 in lam_ssi_rpi_usysv_proc_read_env ()
#1  0x00784a39 in lam_ssi_rpi_usysv_advance_common ()
#2  0x0074a1e0 in _mpi_req_advance ()
#3  0x0073ea90 in MPI_Wait ()
#4  0x0074d800 in MPI_Sendrecv ()
#5  0x004aebfd in gmx_sum_qgrid_dd ()
#6  0x004b40bb in gmx_pme_do ()
#7  0x00479a58 in do_force_lowlevel ()
#8  0x004d1d32 in do_force ()
#9  0x004214d2 in do_md ()
#10 0x0041bea0 in mdrunner ()
#11 0x00422b94 in main ()
(gdb)
===


Cheers,

Patrick

Berk Hess a écrit :

Hi,

You can do something like:
 mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun

with the appropriate settings for your system.

You will have to type run in every xterm to make mdrun run.
Or you can make some scripts
(gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds).

When you think it hangs, type ctrl-c in an xterm
and type where to see where it hangs.
I would guess this would be in an MPI call.

Berk


  Date: Mon, 15 Dec 2008 23:53:45 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi Berk,
  I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC).
  I recompiled it with CFLAGS=-g and it still hangs...
  Now, how can we run it in the debugger ?
  Thanks,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   What compiler (and compiler version) are you using?
  
   Could you configure with CFLAGS=-g
   and see if it still hangs?
   If it also hangs in that case, we can run it in the debugger
   and find out where it hangs.
  
   Berk
  
Date: Mon, 15 Dec 2008 16:32:31 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi,
I have exactly the same problem under Fedora 9 on a dual-quadricore
(Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same
for gromacs-4.0.0) after a couple of minutes of simulation. 
Sometimes,
it even hangs very quickly before the simulation reaches the 
writing of

the first checkpoint file (in fact the time length before the hang
occurs is chaotic, sometimes a couple of minutes, or a few 
seconds). The
CPUs are still loaded but nothing goes to the output (on any file 
log,

xtc, trr, edr...). All gromacs binaries were standardly compiled with
--enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I 
don't

see anything strange in the log file.
I have another computer single quadricore (Intel Xeon E5430, 2.66 
GHz)

under Fedora 8 and the same system (same mdp, topology etc...) is
running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So
would it be possible that there's something wrong going on with 
FC9 and

lam-7.1.4

RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-13 Thread Berk Hess

Hi,

This is strange.
You run on 4 nodes and all processes hang at the same MPI call.
I see no reason why they should hang if they are all at the correct call.

After how many steps does this happen?
If it is not much I can try to see if it also hangs on our system.
Otherwise, could you try to generate a checkpoint file with
which it hangs quickly?

What version of MPI are you using?

Berk


 Date: Tue, 13 Jan 2009 10:53:25 +0100
 From: patrick.fu...@univ-paris-diderot.fr
 To: gmx-users@gromacs.org
 Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
 Hi Berk,
 I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and 
 lam-7.1.4), using a slightly upgraded version of gcc compared to my 
 previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on the same 
 hardware but it still hangs (so both FC9 and FC10 give the same problem, 
 while FC8 does not). Finally I could test mdrun_mpi in the debugger and 
 here are the results of my tests. You were right, it seems that mdrun 
 hangs at an MPI call, here are the outputs of each xterm:
 
 XTERM1
 ===
 GNU gdb Fedora (6.8-29.fc10)
 Copyright (C) 2008 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later 
 http://gnu.org/licenses/gpl.html
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type show copying
 and show warranty for details.
 This GDB was configured as x86_64-redhat-linux-gnu...
 (gdb) run
 Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
 [Thread debugging using libthread_db enabled]
 [New Thread 0x12df30 (LWP 8285)]
 NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr
 NODEID=0 argc=1
   :-)  G  R  O  M  A  C  S  (-:
 
 Giant Rising Ordinary Mutants for A Clerical Setup
 
  :-)  VERSION 4.0.2  (-:
 
 [snip]
 
 starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water'
 500 steps,  1.0 ps.
 ^C
 Program received signal SIGINT, Interrupt.
 0x003b978cc087 in sched_yield () from /lib64/libc.so.6
 Missing separate debuginfos, use: debuginfo-install 
 e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 
 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 
 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 
 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
 (gdb) where
 #0  0x003b978cc087 in sched_yield () from /lib64/libc.so.6
 #1  0x00770c83 in lam_ssi_rpi_usysv_proc_read_env ()
 #2  0x00784a39 in lam_ssi_rpi_usysv_advance_common ()
 #3  0x0074a1e0 in _mpi_req_advance ()
 #4  0x0073ced0 in lam_send ()
 #5  0x0075328e in MPI_Send ()
 #6  0x0074d7ec in MPI_Sendrecv ()
 #7  0x004aebfd in gmx_sum_qgrid_dd ()
 #8  0x004b40bb in gmx_pme_do ()
 #9  0x00479a58 in do_force_lowlevel ()
 #10 0x004d1d32 in do_force ()
 #11 0x004214d2 in do_md ()
 #12 0x0041bea0 in mdrunner ()
 #13 0x00422b94 in main ()
 (gdb)
 ===
 
 
 XTERM2
 ===
 GNU gdb Fedora (6.8-29.fc10)
 Copyright (C) 2008 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later 
 http://gnu.org/licenses/gpl.html
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type show copying
 and show warranty for details.
 This GDB was configured as x86_64-redhat-linux-gnu...
 (gdb) run
 Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
 [Thread debugging using libthread_db enabled]
 [New Thread 0x12df30 (LWP 8294)]
 NNODES=4, MYRANK=1, HOSTNAME=cumin.dsimb.inserm.fr
 NODEID=1 argc=1
 ^C
 Program received signal SIGINT, Interrupt.
 0x003b978cc087 in sched_yield () from /lib64/libc.so.6
 Missing separate debuginfos, use: debuginfo-install 
 e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 
 libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 
 libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 
 libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
 (gdb) where
 #0  0x003b978cc087 in sched_yield () from /lib64/libc.so.6
 #1  0x00770c83 in lam_ssi_rpi_usysv_proc_read_env ()
 #2  0x00784a39 in lam_ssi_rpi_usysv_advance_common ()
 #3  0x0074a1e0 in _mpi_req_advance ()
 #4  0x0073ea90 in MPI_Wait ()
 #5  0x0074d800 in MPI_Sendrecv ()
 #6  0x004aed44 in gmx_sum_qgrid_dd ()
 #7  0x004b40bb in gmx_pme_do ()
 #8  0x00479a58 in do_force_lowlevel ()
 #9  0x004d1d32 in do_force ()
 #10 0x004214d2 in do_md ()
 #11 0x0041bea0 in mdrunner ()
 #12 0x00422b94 in main ()
 (gdb)
 ===
 
 
 XTERM3

Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-13 Thread patrick fuchs

Hi Berk,
it hangs after approximatively 45000 steps (the system is a simple DLPC 
bilayer), and there was a cpt file that has been generated (but it was 
generated [09:48] before it started to hang [9:58]) :

-
[fu...@cumin 2]$ ls -ltrh
[snip]
-rw-r--r-- 1 fuchs dsimb 384K janv. 13 09:33 traj.trr
-rw-r--r-- 1 fuchs dsimb 385K janv. 13 09:48 state.cpt
-rw-r--r-- 1 fuchs dsimb  66K janv. 13 09:57 md.log
-rw-r--r-- 1 fuchs dsimb 5,4M janv. 13 09:58 traj.xtc
-rw-r--r-- 1 fuchs dsimb  92K janv. 13 09:58 ener.edr
[fu...@cumin 2]$ date
Tue Jan 13 10:16:22 CET 2009
-
The version of MPI is: LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University.
So shall I send you the tpr and cpt files off list ?
Ciao,

Patrick

Berk Hess a écrit :

Hi,

This is strange.
You run on 4 nodes and all processes hang at the same MPI call.
I see no reason why they should hang if they are all at the correct call.

After how many steps does this happen?
If it is not much I can try to see if it also hangs on our system.
Otherwise, could you try to generate a checkpoint file with
which it hangs quickly?

What version of MPI are you using?

Berk


  Date: Tue, 13 Jan 2009 10:53:25 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi Berk,
  I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and
  lam-7.1.4), using a slightly upgraded version of gcc compared to my
  previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on the same
  hardware but it still hangs (so both FC9 and FC10 give the same problem,
  while FC8 does not). Finally I could test mdrun_mpi in the debugger and
  here are the results of my tests. You were right, it seems that mdrun
  hangs at an MPI call, here are the outputs of each xterm:
 
  XTERM1
  ===
  GNU gdb Fedora (6.8-29.fc10)
  Copyright (C) 2008 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later
  http://gnu.org/licenses/gpl.html
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law. Type show copying
  and show warranty for details.
  This GDB was configured as x86_64-redhat-linux-gnu...
  (gdb) run
  Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
  [Thread debugging using libthread_db enabled]
  [New Thread 0x12df30 (LWP 8285)]
  NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr
  NODEID=0 argc=1
  :-) G R O M A C S (-:
 
  Giant Rising Ordinary Mutants for A Clerical Setup
 
  :-) VERSION 4.0.2 (-:
 
  [snip]
 
  starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water'
  500 steps, 1.0 ps.
  ^C
  Program received signal SIGINT, Interrupt.
  0x003b978cc087 in sched_yield () from /lib64/libc.so.6
  Missing separate debuginfos, use: debuginfo-install
  e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
  libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
  libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
  libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
  (gdb) where
  #0 0x003b978cc087 in sched_yield () from /lib64/libc.so.6
  #1 0x00770c83 in lam_ssi_rpi_usysv_proc_read_env ()
  #2 0x00784a39 in lam_ssi_rpi_usysv_advance_common ()
  #3 0x0074a1e0 in _mpi_req_advance ()
  #4 0x0073ced0 in lam_send ()
  #5 0x0075328e in MPI_Send ()
  #6 0x0074d7ec in MPI_Sendrecv ()
  #7 0x004aebfd in gmx_sum_qgrid_dd ()
  #8 0x004b40bb in gmx_pme_do ()
  #9 0x00479a58 in do_force_lowlevel ()
  #10 0x004d1d32 in do_force ()
  #11 0x004214d2 in do_md ()
  #12 0x0041bea0 in mdrunner ()
  #13 0x00422b94 in main ()
  (gdb)
  ===
 
 
  XTERM2
  ===
  GNU gdb Fedora (6.8-29.fc10)
  Copyright (C) 2008 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later
  http://gnu.org/licenses/gpl.html
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law. Type show copying
  and show warranty for details.
  This GDB was configured as x86_64-redhat-linux-gnu...
  (gdb) run
  Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
  [Thread debugging using libthread_db enabled]
  [New Thread 0x12df30 (LWP 8294)]
  NNODES=4, MYRANK=1, HOSTNAME=cumin.dsimb.inserm.fr
  NODEID=1 argc=1
  ^C
  Program received signal SIGINT, Interrupt.
  0x003b978cc087 in sched_yield () from /lib64/libc.so.6
  Missing separate debuginfos, use: debuginfo-install
  e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
  libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
  libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
  libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
  (gdb) where
  #0

Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-08 Thread patrick fuchs

Hi Berk,
I tried this fix, but mdrun_mpi is still hanging. I'll try to use the 
debugger by the end of the week and let you know.

Cheers,

Patrick

Berk Hess a écrit :

Hi,

My guess is that this is not the problem.
But it is very easy to try, so please do.
The diff for src/gmxlib/pbc.c is:
392c392,393
 try[d] == 0;
---
  try[d] = 0;
  pos[d] = 0;

Berk

  Date: Tue, 6 Jan 2009 18:37:20 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi,
 
   I also fixed a problem with unitialized variables for pbc calculations
   in trilinic boxes.
   But up till now I have not observed any effect of this bug.
   Is your box triclinic?
  Yes it is. So shall I test your corrected version ?
 
  Patrick
 



Express yourself instantly with MSN Messenger! MSN Messenger 
http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/





___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/mailing_lists/users.php


--
_
 new E-mail address: patrick.fu...@univ-paris-diderot.fr 
 new postal address !!!
Patrick FUCHS
Equipe de Bioinformatique Genomique et Moleculaire
INTS, INSERM UMR-S726, Université Paris Diderot,
6 rue Alexandre Cabanel, 75015 Paris
Tel : +33 (0)1-44-49-30-57 - Fax : +33 (0)1-47-34-74-31
Web Site: http://www.dsimb.inserm.fr/~fuchs

___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php


RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-06 Thread Berk Hess

Hi,

I just fixed a bug with virtual sites that were a single charge group.
Do you have virtual sites in your system?

Berk

 Date: Wed, 17 Dec 2008 16:55:55 +0100
 From: patrick.fu...@univ-paris-diderot.fr
 To: gmx-users@gromacs.org
 Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
 Hi Berk,
 thanks for the trick. Unfortunately I'm not in my lab right now and 
 can't open easily xterms over the network. I'll try to catch up once I'm 
 back (end of December), unless Bernhard or Antoine find the solution.
 Cheers,
 
 Patrick
 
 Berk Hess a écrit :
  Hi,
  
  You can do something like:
   mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun
  
  with the appropriate settings for your system.
  
  You will have to type run in every xterm to make mdrun run.
  Or you can make some scripts
  (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds).
  
  When you think it hangs, type ctrl-c in an xterm
  and type where to see where it hangs.
  I would guess this would be in an MPI call.
  
  Berk
  
  
Date: Mon, 15 Dec 2008 23:53:45 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi Berk,
I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC).
I recompiled it with CFLAGS=-g and it still hangs...
Now, how can we run it in the debugger ?
Thanks,
   
Patrick
   
Berk Hess a écrit :
 Hi,

 What compiler (and compiler version) are you using?

 Could you configure with CFLAGS=-g
 and see if it still hangs?
 If it also hangs in that case, we can run it in the debugger
 and find out where it hangs.

 Berk

  Date: Mon, 15 Dec 2008 16:32:31 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi,
  I have exactly the same problem under Fedora 9 on a dual-quadricore
  (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same
  for gromacs-4.0.0) after a couple of minutes of simulation. 
  Sometimes,
  it even hangs very quickly before the simulation reaches the 
  writing of
  the first checkpoint file (in fact the time length before the hang
  occurs is chaotic, sometimes a couple of minutes, or a few 
  seconds). The
  CPUs are still loaded but nothing goes to the output (on any file 
  log,
  xtc, trr, edr...). All gromacs binaries were standardly compiled with
  --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I 
  don't
  see anything strange in the log file.
  I have another computer single quadricore (Intel Xeon E5430, 2.66 
  GHz)
  under Fedora 8 and the same system (same mdp, topology etc...) is
  running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So
  would it be possible that there's something wrong going on with 
  FC9 and
  lam-7.1.4...?
  Cheers,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   If your simulations no longer produce output, but still run
   and there is no error or warning message,
   my guess would be that they are waiting for MPI communication.
   But the developers any many users are using 4.0 and I have
   not heard from problems like this, so I wonder if the problem
   could be somewhere else.
  
   Could you (or have your tried to) continue your simulation
   from the last checkpoint (mdrun option -cpi) before the hang,
   to see if it crashes quickly then?
  
   Berk
  
Date: Fri, 12 Dec 2008 13:42:43 +0100
From: bernhard.kn...@meduniwien.ac.at
To: gmx-users@gromacs.org
Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Mark wrote:
   
 What's happening in the log files? What's the latest
 information in
   the
 checkpoint files? Could there be some issue with file system
   availability?
   
Hi Mark
   
Unfortunaltey I already deleted the simulation files which 
  got stuck
after 847ps. But here is the output of another simulation 
  done on the
same system but with an other pdb file. This one gets stuck 
  after
 179ps
with the following output:
   
The latest thing the checkpoint file says is:
   
imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009
imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009
   
The predcition for 1st of July is not surprising since I am 
  always
parameterizing the simulation with 200ns to avoid to restart 
  it if
something interesting happens in the last frames.
   
for the .log file it is:
   
Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008
   
Energies (kJ/mol)
G96Angle Proper Dih. Improper Dih. LJ

Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-06 Thread patrick fuchs

Hi Berk,
no I don't have virtual sites so this might not be the cause of my problem.
Ciao,

Patrick

Berk Hess a écrit :

Hi,

I just fixed a bug with virtual sites that were a single charge group.
Do you have virtual sites in your system?

Berk

  Date: Wed, 17 Dec 2008 16:55:55 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi Berk,
  thanks for the trick. Unfortunately I'm not in my lab right now and
  can't open easily xterms over the network. I'll try to catch up once I'm
  back (end of December), unless Bernhard or Antoine find the solution.
  Cheers,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   You can do something like:
   mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun
  
   with the appropriate settings for your system.
  
   You will have to type run in every xterm to make mdrun run.
   Or you can make some scripts
   (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds).
  
   When you think it hangs, type ctrl-c in an xterm
   and type where to see where it hangs.
   I would guess this would be in an MPI call.
  
   Berk
  
  
Date: Mon, 15 Dec 2008 23:53:45 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi Berk,
I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC).
I recompiled it with CFLAGS=-g and it still hangs...
Now, how can we run it in the debugger ?
Thanks,
   
Patrick
   
Berk Hess a écrit :
 Hi,

 What compiler (and compiler version) are you using?

 Could you configure with CFLAGS=-g
 and see if it still hangs?
 If it also hangs in that case, we can run it in the debugger
 and find out where it hangs.

 Berk

  Date: Mon, 15 Dec 2008 16:32:31 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi,
  I have exactly the same problem under Fedora 9 on a 
dual-quadricore
  (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is 
hanging (same

  for gromacs-4.0.0) after a couple of minutes of simulation.
   Sometimes,
  it even hangs very quickly before the simulation reaches the
   writing of
  the first checkpoint file (in fact the time length before the 
hang

  occurs is chaotic, sometimes a couple of minutes, or a few
   seconds). The
  CPUs are still loaded but nothing goes to the output (on any 
file

   log,
  xtc, trr, edr...). All gromacs binaries were standardly 
compiled with

  --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I
   don't
  see anything strange in the log file.
  I have another computer single quadricore (Intel Xeon E5430, 
2.66

   GHz)
  under Fedora 8 and the same system (same mdp, topology etc...) is
  running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as 
well). So

  would it be possible that there's something wrong going on with
   FC9 and
  lam-7.1.4...?
  Cheers,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   If your simulations no longer produce output, but still run
   and there is no error or warning message,
   my guess would be that they are waiting for MPI communication.
   But the developers any many users are using 4.0 and I have
   not heard from problems like this, so I wonder if the problem
   could be somewhere else.
  
   Could you (or have your tried to) continue your simulation
   from the last checkpoint (mdrun option -cpi) before the hang,
   to see if it crashes quickly then?
  
   Berk
  
Date: Fri, 12 Dec 2008 13:42:43 +0100
From: bernhard.kn...@meduniwien.ac.at
To: gmx-users@gromacs.org
Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Mark wrote:
   
 What's happening in the log files? What's the latest
 information in
   the
 checkpoint files? Could there be some issue with file 
system

   availability?
   
Hi Mark
   
Unfortunaltey I already deleted the simulation files which
   got stuck
after 847ps. But here is the output of another simulation
   done on the
same system but with an other pdb file. This one gets stuck
   after
 179ps
with the following output:
   
The latest thing the checkpoint file says is:
   
imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009
imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009
   
The predcition for 1st of July is not surprising since I am
   always
parameterizing the simulation with 200ns to avoid to restart
   it if
something interesting happens in the last frames.
   
for the .log file

RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-06 Thread Berk Hess

Hi,

I also fixed a problem with unitialized variables for pbc calculations in 
trilinic boxes.
But up till now I have not observed any effect of this bug.
Is your box triclinic?

Berk


 Date: Tue, 6 Jan 2009 17:08:57 +0100
 From: patrick.fu...@univ-paris-diderot.fr
 To: gmx-users@gromacs.org
 Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
 Hi Berk,
 no I don't have virtual sites so this might not be the cause of my problem.
 Ciao,
 
 Patrick
 
 Berk Hess a écrit :
  Hi,
  
  I just fixed a bug with virtual sites that were a single charge group.
  Do you have virtual sites in your system?
  
  Berk
  
Date: Wed, 17 Dec 2008 16:55:55 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi Berk,
thanks for the trick. Unfortunately I'm not in my lab right now and
can't open easily xterms over the network. I'll try to catch up once I'm
back (end of December), unless Bernhard or Antoine find the solution.
Cheers,
   
Patrick
   
Berk Hess a écrit :
 Hi,

 You can do something like:
 mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun

 with the appropriate settings for your system.

 You will have to type run in every xterm to make mdrun run.
 Or you can make some scripts
 (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds).

 When you think it hangs, type ctrl-c in an xterm
 and type where to see where it hangs.
 I would guess this would be in an MPI call.

 Berk


  Date: Mon, 15 Dec 2008 23:53:45 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi Berk,
  I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC).
  I recompiled it with CFLAGS=-g and it still hangs...
  Now, how can we run it in the debugger ?
  Thanks,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   What compiler (and compiler version) are you using?
  
   Could you configure with CFLAGS=-g
   and see if it still hangs?
   If it also hangs in that case, we can run it in the debugger
   and find out where it hangs.
  
   Berk
  
Date: Mon, 15 Dec 2008 16:32:31 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi,
I have exactly the same problem under Fedora 9 on a 
  dual-quadricore
(Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is 
  hanging (same
for gromacs-4.0.0) after a couple of minutes of simulation.
 Sometimes,
it even hangs very quickly before the simulation reaches the
 writing of
the first checkpoint file (in fact the time length before the 
  hang
occurs is chaotic, sometimes a couple of minutes, or a few
 seconds). The
CPUs are still loaded but nothing goes to the output (on any 
  file
 log,
xtc, trr, edr...). All gromacs binaries were standardly 
  compiled with
--enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I
 don't
see anything strange in the log file.
I have another computer single quadricore (Intel Xeon E5430, 
  2.66
 GHz)
under Fedora 8 and the same system (same mdp, topology etc...) is
running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as 
  well). So
would it be possible that there's something wrong going on with
 FC9 and
lam-7.1.4...?
Cheers,
   
Patrick
   
Berk Hess a écrit :
 Hi,

 If your simulations no longer produce output, but still run
 and there is no error or warning message,
 my guess would be that they are waiting for MPI communication.
 But the developers any many users are using 4.0 and I have
 not heard from problems like this, so I wonder if the problem
 could be somewhere else.

 Could you (or have your tried to) continue your simulation
 from the last checkpoint (mdrun option -cpi) before the hang,
 to see if it crashes quickly then?

 Berk

  Date: Fri, 12 Dec 2008 13:42:43 +0100
  From: bernhard.kn...@meduniwien.ac.at
  To: gmx-users@gromacs.org
  Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Mark wrote:
 
   What's happening in the log files? What's the latest
   information in
 the
   checkpoint files? Could there be some issue with file 
  system
 availability?
 
  Hi Mark
 
  Unfortunaltey I already deleted the simulation files which
 got stuck
  after 847ps. But here

Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-06 Thread patrick fuchs

Hi,

I also fixed a problem with unitialized variables for pbc calculations 
in trilinic boxes.

But up till now I have not observed any effect of this bug.
Is your box triclinic?

Yes it is. So shall I test your corrected version ?

Patrick



Berk


  Date: Tue, 6 Jan 2009 17:08:57 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi Berk,
  no I don't have virtual sites so this might not be the cause of my 
problem.

  Ciao,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   I just fixed a bug with virtual sites that were a single charge group.
   Do you have virtual sites in your system?
  
   Berk
  
Date: Wed, 17 Dec 2008 16:55:55 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi Berk,
thanks for the trick. Unfortunately I'm not in my lab right now and
can't open easily xterms over the network. I'll try to catch up 
once I'm

back (end of December), unless Bernhard or Antoine find the solution.
Cheers,
   
Patrick
   
Berk Hess a écrit :
 Hi,

 You can do something like:
 mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun

 with the appropriate settings for your system.

 You will have to type run in every xterm to make mdrun run.
 Or you can make some scripts
 (gdb -x gdb_cmds will read the gdb commands from the file 
gdb_cmds).


 When you think it hangs, type ctrl-c in an xterm
 and type where to see where it hangs.
 I would guess this would be in an MPI call.

 Berk


  Date: Mon, 15 Dec 2008 23:53:45 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi Berk,
  I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC).
  I recompiled it with CFLAGS=-g and it still hangs...
  Now, how can we run it in the debugger ?
  Thanks,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   What compiler (and compiler version) are you using?
  
   Could you configure with CFLAGS=-g
   and see if it still hangs?
   If it also hangs in that case, we can run it in the debugger
   and find out where it hangs.
  
   Berk
  
Date: Mon, 15 Dec 2008 16:32:31 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi,
I have exactly the same problem under Fedora 9 on a
   dual-quadricore
(Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is
   hanging (same
for gromacs-4.0.0) after a couple of minutes of simulation.
 Sometimes,
it even hangs very quickly before the simulation reaches the
 writing of
the first checkpoint file (in fact the time length before 
the

   hang
occurs is chaotic, sometimes a couple of minutes, or a few
 seconds). The
CPUs are still loaded but nothing goes to the output (on any
   file
 log,
xtc, trr, edr...). All gromacs binaries were standardly
   compiled with
--enable-mpi and the latest lam-7.1.4. As Bernhard and 
Antoine I

 don't
see anything strange in the log file.
I have another computer single quadricore (Intel Xeon E5430,
   2.66
 GHz)
under Fedora 8 and the same system (same mdp, topology 
etc...) is

running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as
   well). So
would it be possible that there's something wrong going 
on with

 FC9 and
lam-7.1.4...?
Cheers,
   
Patrick
   
Berk Hess a écrit :
 Hi,

 If your simulations no longer produce output, but still run
 and there is no error or warning message,
 my guess would be that they are waiting for MPI 
communication.

 But the developers any many users are using 4.0 and I have
 not heard from problems like this, so I wonder if the 
problem

 could be somewhere else.

 Could you (or have your tried to) continue your simulation
 from the last checkpoint (mdrun option -cpi) before the 
hang,

 to see if it crashes quickly then?

 Berk

  Date: Fri, 12 Dec 2008 13:42:43 +0100
  From: bernhard.kn...@meduniwien.ac.at
  To: gmx-users@gromacs.org
  Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Mark wrote:
 
   What's happening in the log files? What's the latest
   information in
 the
   checkpoint files? Could there be some issue with file
   system
 availability?
 
  Hi Mark
 
  Unfortunaltey

RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2009-01-06 Thread Berk Hess

Hi,

My guess is that this is not the problem.
But it is very easy to try, so please do.
The diff for src/gmxlib/pbc.c is:
392c392,393
 try[d] == 0;
---
 try[d] = 0;
 pos[d] = 0;

Berk

 Date: Tue, 6 Jan 2009 18:37:20 +0100
 From: patrick.fu...@univ-paris-diderot.fr
 To: gmx-users@gromacs.org
 Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
 Hi,
 
  I also fixed a problem with unitialized variables for pbc calculations 
  in trilinic boxes.
  But up till now I have not observed any effect of this bug.
  Is your box triclinic?
 Yes it is. So shall I test your corrected version ?
 
 Patrick
 


_
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2008-12-17 Thread patrick fuchs

Hi Berk,
thanks for the trick. Unfortunately I'm not in my lab right now and 
can't open easily xterms over the network. I'll try to catch up once I'm 
back (end of December), unless Bernhard or Antoine find the solution.

Cheers,

Patrick

Berk Hess a écrit :

Hi,

You can do something like:
 mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun

with the appropriate settings for your system.

You will have to type run in every xterm to make mdrun run.
Or you can make some scripts
(gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds).

When you think it hangs, type ctrl-c in an xterm
and type where to see where it hangs.
I would guess this would be in an MPI call.

Berk


  Date: Mon, 15 Dec 2008 23:53:45 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi Berk,
  I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC).
  I recompiled it with CFLAGS=-g and it still hangs...
  Now, how can we run it in the debugger ?
  Thanks,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   What compiler (and compiler version) are you using?
  
   Could you configure with CFLAGS=-g
   and see if it still hangs?
   If it also hangs in that case, we can run it in the debugger
   and find out where it hangs.
  
   Berk
  
Date: Mon, 15 Dec 2008 16:32:31 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi,
I have exactly the same problem under Fedora 9 on a dual-quadricore
(Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same
for gromacs-4.0.0) after a couple of minutes of simulation. 
Sometimes,
it even hangs very quickly before the simulation reaches the 
writing of

the first checkpoint file (in fact the time length before the hang
occurs is chaotic, sometimes a couple of minutes, or a few 
seconds). The
CPUs are still loaded but nothing goes to the output (on any file 
log,

xtc, trr, edr...). All gromacs binaries were standardly compiled with
--enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I 
don't

see anything strange in the log file.
I have another computer single quadricore (Intel Xeon E5430, 2.66 
GHz)

under Fedora 8 and the same system (same mdp, topology etc...) is
running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So
would it be possible that there's something wrong going on with 
FC9 and

lam-7.1.4...?
Cheers,
   
Patrick
   
Berk Hess a écrit :
 Hi,

 If your simulations no longer produce output, but still run
 and there is no error or warning message,
 my guess would be that they are waiting for MPI communication.
 But the developers any many users are using 4.0 and I have
 not heard from problems like this, so I wonder if the problem
 could be somewhere else.

 Could you (or have your tried to) continue your simulation
 from the last checkpoint (mdrun option -cpi) before the hang,
 to see if it crashes quickly then?

 Berk

  Date: Fri, 12 Dec 2008 13:42:43 +0100
  From: bernhard.kn...@meduniwien.ac.at
  To: gmx-users@gromacs.org
  Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Mark wrote:
 
   What's happening in the log files? What's the latest
   information in
 the
   checkpoint files? Could there be some issue with file system
 availability?
 
  Hi Mark
 
  Unfortunaltey I already deleted the simulation files which 
got stuck
  after 847ps. But here is the output of another simulation 
done on the
  same system but with an other pdb file. This one gets stuck 
after

   179ps
  with the following output:
 
  The latest thing the checkpoint file says is:
 
  imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009
  imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009
 
  The predcition for 1st of July is not surprising since I am 
always
  parameterizing the simulation with 200ns to avoid to restart 
it if

  something interesting happens in the last frames.
 
  for the .log file it is:
 
  Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008
 
  Energies (kJ/mol)
  G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
  7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04
  LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
  3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05
  Total Energy Temperature Pressure (bar) Cons. rmsd ()
  -1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05
 
  DD step 88999 load imb.: force 3.1%
 
  Step Time Lambda
  89000 178.2 0.0
 
  Energies (kJ/mol)
  G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
  8.03089e+03 3.59681e+03

RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2008-12-16 Thread Berk Hess

Hi,

You can do something like:
 mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun

with the appropriate settings for your system.

You will have to type run in every xterm to make mdrun run.
Or you can make some scripts
(gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds).

When you think it hangs, type ctrl-c in an xterm
and type where to see where it hangs.
I would guess this would be in an MPI call.

Berk


 Date: Mon, 15 Dec 2008 23:53:45 +0100
 From: patrick.fu...@univ-paris-diderot.fr
 To: gmx-users@gromacs.org
 Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
 Hi Berk,
 I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC).
 I recompiled it with CFLAGS=-g and it still hangs...
 Now, how can we run it in the debugger ?
 Thanks,
 
 Patrick
 
 Berk Hess a écrit :
  Hi,
  
  What compiler (and compiler version) are you using?
  
  Could you configure with CFLAGS=-g
  and see if it still hangs?
  If it also hangs in that case, we can run it in the debugger
  and find out where it hangs.
  
  Berk
  
Date: Mon, 15 Dec 2008 16:32:31 +0100
From: patrick.fu...@univ-paris-diderot.fr
To: gmx-users@gromacs.org
Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Hi,
I have exactly the same problem under Fedora 9 on a dual-quadricore
(Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same
for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes,
it even hangs very quickly before the simulation reaches the writing of
the first checkpoint file (in fact the time length before the hang
occurs is chaotic, sometimes a couple of minutes, or a few seconds). The
CPUs are still loaded but nothing goes to the output (on any file log,
xtc, trr, edr...). All gromacs binaries were standardly compiled with
--enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't
see anything strange in the log file.
I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz)
under Fedora 8 and the same system (same mdp, topology etc...) is
running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So
would it be possible that there's something wrong going on with FC9 and
lam-7.1.4...?
Cheers,
   
Patrick
   
Berk Hess a écrit :
 Hi,

 If your simulations no longer produce output, but still run
 and there is no error or warning message,
 my guess would be that they are waiting for MPI communication.
 But the developers any many users are using 4.0 and I have
 not heard from problems like this, so I wonder if the problem
 could be somewhere else.

 Could you (or have your tried to) continue your simulation
 from the last checkpoint (mdrun option -cpi) before the hang,
 to see if it crashes quickly then?

 Berk

  Date: Fri, 12 Dec 2008 13:42:43 +0100
  From: bernhard.kn...@meduniwien.ac.at
  To: gmx-users@gromacs.org
  Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Mark wrote:
 
   What's happening in the log files? What's the latest 
  information in
 the
   checkpoint files? Could there be some issue with file system
 availability?
 
  Hi Mark
 
  Unfortunaltey I already deleted the simulation files which got stuck
  after 847ps. But here is the output of another simulation done on the
  same system but with an other pdb file. This one gets stuck after 
  179ps
  with the following output:
 
  The latest thing the checkpoint file says is:
 
  imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009
  imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009
 
  The predcition for 1st of July is not surprising since I am always
  parameterizing the simulation with 200ns to avoid to restart it if
  something interesting happens in the last frames.
 
  for the .log file it is:
 
  Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008
 
  Energies (kJ/mol)
  G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
  7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04
  LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
  3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05
  Total Energy Temperature Pressure (bar) Cons. rmsd ()
  -1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05
 
  DD step 88999 load imb.: force 3.1%
 
  Step Time Lambda
  89000 178.2 0.0
 
  Energies (kJ/mol)
  G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
  8.03089e+03 3.59681e+03 2.42628e+03 1.20942e+03 5.12341e+04
  LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
  3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+06 4.56901e+05
  Total Energy Temperature Pressure (bar) Cons. rmsd ()
  -1.93239e+06 3.09508e+02 1.64627e+01 2.08518e-05

Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2008-12-15 Thread patrick fuchs

Hi,
I have exactly the same problem under Fedora 9 on a dual-quadricore 
(Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same 
for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, 
it even hangs very quickly before the simulation reaches the writing of 
the first checkpoint file (in fact the time length before the hang 
occurs is chaotic, sometimes a couple of minutes, or a few seconds). The 
CPUs are still loaded but nothing goes to the output (on any file log, 
xtc, trr, edr...). All gromacs binaries were standardly compiled with 
--enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't 
see anything strange in the log file.
I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) 
under Fedora 8 and the same system (same mdp, topology etc...) is 
running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So 
would it be possible that there's something wrong going on with FC9 and 
lam-7.1.4...?

Cheers,

Patrick

Berk Hess a écrit :

Hi,

If your simulations no longer produce output, but still run
and there is no error or warning message,
my guess would be that they are waiting for MPI communication.
But the developers any many users are using 4.0 and I have
not heard from problems like this, so I wonder if the problem
could be somewhere else.

Could you (or have your tried to) continue your simulation
from the last checkpoint (mdrun option -cpi) before the hang,
to see if it crashes quickly then?

Berk

  Date: Fri, 12 Dec 2008 13:42:43 +0100
  From: bernhard.kn...@meduniwien.ac.at
  To: gmx-users@gromacs.org
  Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Mark wrote:
 
   What's happening in the log files? What's the latest information in 
the
   checkpoint files? Could there be some issue with file system 
availability?

 
  Hi Mark
 
  Unfortunaltey I already deleted the simulation files which got stuck
  after 847ps. But here is the output of another simulation done on the
  same system but with an other pdb file. This one gets stuck after 179ps
  with the following output:
 
  The latest thing the checkpoint file says is:
 
  imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009
  imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009
 
  The predcition for 1st of July is not surprising since I am always
  parameterizing the simulation with 200ns to avoid to restart it if
  something interesting happens in the last frames.
 
  for the .log file it is:
 
  Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008
 
  Energies (kJ/mol)
  G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
  7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04
  LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
  3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05
  Total Energy Temperature Pressure (bar) Cons. rmsd ()
  -1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05
 
  DD step 88999 load imb.: force 3.1%
 
  Step Time Lambda
  89000 178.2 0.0
 
  Energies (kJ/mol)
  G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
  8.03089e+03 3.59681e+03 2.42628e+03 1.20942e+03 5.12341e+04
  LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
  3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+06 4.56901e+05
  Total Energy Temperature Pressure (bar) Cons. rmsd ()
  -1.93239e+06 3.09508e+02 1.64627e+01 2.08518e-05
 
 
  the disk is also free df -h says 2.3G out of 666G used.
 
  The only difference between the system with gromacs 3.3 and gromacs 4 is
  that gromacs 4 is running under suse 11 while gromacs 3.3 is running on
  a node with suse 10. But I dont think this can be the problem?
 
  cheers
  Bernhard
 
 
  ___
  gmx-users mailing list gmx-users@gromacs.org
  http://www.gromacs.org/mailman/listinfo/gmx-users
  Please search the archive at http://www.gromacs.org/search before 
posting!

  Please don't post (un)subscribe requests to the list. Use the
  www interface or send it to gmx-users-requ...@gromacs.org.
  Can't post? Read http://www.gromacs.org/mailing_lists/users.php


Express yourself instantly with MSN Messenger! MSN Messenger 
http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/





___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/mailing_lists/users.php


--
_
 new E-mail address: patrick.fu...@univ-paris-diderot.fr 
 new postal address !!!
Patrick FUCHS
Equipe de Bioinformatique Genomique et

RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2008-12-15 Thread Berk Hess

Hi,

What compiler (and compiler version) are you using?

Could you configure with CFLAGS=-g
and see if it still hangs?
If it also hangs in that case, we can run it in the debugger
and find out where it hangs.

Berk

 Date: Mon, 15 Dec 2008 16:32:31 +0100
 From: patrick.fu...@univ-paris-diderot.fr
 To: gmx-users@gromacs.org
 Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
 Hi,
 I have exactly the same problem under Fedora 9 on a dual-quadricore 
 (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same 
 for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes, 
 it even hangs very quickly before the simulation reaches the writing of 
 the first checkpoint file (in fact the time length before the hang 
 occurs is chaotic, sometimes a couple of minutes, or a few seconds). The 
 CPUs are still loaded but nothing goes to the output (on any file log, 
 xtc, trr, edr...). All gromacs binaries were standardly compiled with 
 --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't 
 see anything strange in the log file.
 I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz) 
 under Fedora 8 and the same system (same mdp, topology etc...) is 
 running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So 
 would it be possible that there's something wrong going on with FC9 and 
 lam-7.1.4...?
 Cheers,
 
 Patrick
 
 Berk Hess a écrit :
  Hi,
  
  If your simulations no longer produce output, but still run
  and there is no error or warning message,
  my guess would be that they are waiting for MPI communication.
  But the developers any many users are using 4.0 and I have
  not heard from problems like this, so I wonder if the problem
  could be somewhere else.
  
  Could you (or have your tried to) continue your simulation
  from the last checkpoint (mdrun option -cpi) before the hang,
  to see if it crashes quickly then?
  
  Berk
  
Date: Fri, 12 Dec 2008 13:42:43 +0100
From: bernhard.kn...@meduniwien.ac.at
To: gmx-users@gromacs.org
Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Mark wrote:
   
 What's happening in the log files? What's the latest information in 
  the
 checkpoint files? Could there be some issue with file system 
  availability?
   
Hi Mark
   
Unfortunaltey I already deleted the simulation files which got stuck
after 847ps. But here is the output of another simulation done on the
same system but with an other pdb file. This one gets stuck after 179ps
with the following output:
   
The latest thing the checkpoint file says is:
   
imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009
imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009
   
The predcition for 1st of July is not surprising since I am always
parameterizing the simulation with 200ns to avoid to restart it if
something interesting happens in the last frames.
   
for the .log file it is:
   
Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008
   
Energies (kJ/mol)
G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05
Total Energy Temperature Pressure (bar) Cons. rmsd ()
-1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05
   
DD step 88999 load imb.: force 3.1%
   
Step Time Lambda
89000 178.2 0.0
   
Energies (kJ/mol)
G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
8.03089e+03 3.59681e+03 2.42628e+03 1.20942e+03 5.12341e+04
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+06 4.56901e+05
Total Energy Temperature Pressure (bar) Cons. rmsd ()
-1.93239e+06 3.09508e+02 1.64627e+01 2.08518e-05
   
   
the disk is also free df -h says 2.3G out of 666G used.
   
The only difference between the system with gromacs 3.3 and gromacs 4 is
that gromacs 4 is running under suse 11 while gromacs 3.3 is running on
a node with suse 10. But I dont think this can be the problem?
   
cheers
Bernhard
   
   
___
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before 
  posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php
  
  
  Express yourself instantly with MSN Messenger! MSN Messenger 
  http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01

RE: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2008-12-15 Thread Berk Hess

Hi,

If your simulations no longer produce output, but still run
and there is no error or warning message,
my guess would be that they are waiting for MPI communication.
But the developers any many users are using 4.0 and I have
not heard from problems like this, so I wonder if the problem
could be somewhere else.

Could you (or have your tried to) continue your simulation
from the last checkpoint (mdrun option -cpi) before the hang,
to see if it crashes quickly then?

Berk

 Date: Fri, 12 Dec 2008 13:42:43 +0100
 From: bernhard.kn...@meduniwien.ac.at
 To: gmx-users@gromacs.org
 Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
 Mark wrote:
 
  What's happening in the log files? What's the latest information in the 
  checkpoint files? Could there be some issue with file system availability?
 
 Hi Mark
 
 Unfortunaltey I already deleted the simulation files which got stuck 
 after 847ps. But here is the output of another simulation done on the 
 same system but with an other pdb file. This one gets stuck after 179ps 
 with the following output:
 
 The latest thing the checkpoint file says is:
 
 imb F  3% step 89700, will finish Wed Jul  1 09:11:00 2009
 imb F  3% step 89800, will finish Wed Jul  1 09:02:51 2009
 
 The predcition for 1st of July is not surprising since I am always 
 parameterizing the simulation with 200ns to avoid to restart it if 
 something interesting happens in the last frames.
 
 for the .log file it is:
 
 Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008
 
Energies (kJ/mol)
G96AngleProper Dih.  Improper Dih.  LJ-14 Coulomb-14
 7.83753e+033.64068e+032.45951e+031.29167e+035.13688e+04
 LJ (SR)   Coulomb (SR)   Coul. recip.  PotentialKinetic En.
 3.82346e+05   -2.48883e+06   -3.51313e+05   -2.39119e+064.57648e+05
Total EnergyTemperature Pressure (bar)  Cons. rmsd ()
-1.93355e+063.10014e+021.09267e-012.14030e-05
 
 DD  step 88999 load imb.: force  3.1%
 
Step   Time Lambda
   89000  178.20.0
 
Energies (kJ/mol)
G96AngleProper Dih.  Improper Dih.  LJ-14 Coulomb-14
 8.03089e+033.59681e+032.42628e+031.20942e+035.12341e+04
 LJ (SR)   Coulomb (SR)   Coul. recip.  PotentialKinetic En.
 3.81539e+05   -2.48602e+06   -3.51307e+05   -2.38929e+064.56901e+05
Total EnergyTemperature Pressure (bar)  Cons. rmsd ()
-1.93239e+063.09508e+021.64627e+012.08518e-05
 
 
 the disk is also free df -h says  2.3G out of 666G used.
 
 The only difference between the system with gromacs 3.3 and gromacs 4 is 
 that gromacs 4 is running under suse 11 while gromacs 3.3 is running on 
 a node with suse 10. But I dont think this can be the problem?
 
 cheers
 Bernhard
 
 
 ___
 gmx-users mailing listgmx-users@gromacs.org
 http://www.gromacs.org/mailman/listinfo/gmx-users
 Please search the archive at http://www.gromacs.org/search before posting!
 Please don't post (un)subscribe requests to the list. Use the 
 www interface or send it to gmx-users-requ...@gromacs.org.
 Can't post? Read http://www.gromacs.org/mailing_lists/users.php

_
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2008-12-15 Thread Antoine Fortuné
Hi, all gmx users and devs

Just a (long) word to tell you a meet the same issue as Bernhard :  
mdrun stucks as in an infinite loop or lose some output file pointers  
after a while.

The story :
mdrun hangs after a variable number of steps (40.000 to 260.000  
steps). Outputs are suspended in the shell (mdrun with -v option) as  
in the md log file (idem Bernhard's ouputs) but CPU(s) still runnning  
endless (10 hours before i killed the job). Sometimes it induces a  
complete freeze of the machine and a reboot is needed. No error logged  
(md.log or syslog).

Conditions :
This appends with gromacs 4.0.0 and 4.0.2 recompiled from binaries,  
using mpi  or not (mpi3.3.3-1.x86_64 from gromacs website rpm).  
Computer is an Intel q9...@3.5ghz running OpenSuSE 11.0_x86_64 with  
~400Go free (raid 1 HD) and 4Go DDR3.

Tries :
I first thought this was because of my overclocking parameters but  
other jobs run perfectly with full cpu load over several days and  
mdrun also hangs with standard clock settings. OpenSuSE is stable, no  
problem of any kind with file management or long duration jobs  
(docking jobs running fine).

So now i suspect my md parameters (excessive cutoff distances with PBC  
perhaps or use of temperature and presure coupling ?). As I'm a noob  
in md i first suspect the fault is mine and try to fix it by myself  
(without success for now) before asking help. Still some tries to do  
but ...

Consolation :
If Bernhard can run his job with gromacs 3.3 and not with 4.0 perhaps  
i'm not so stupid ...

I follow this thread with interrest !
Antoine

-- 
Antoine Fortuné
Ingenieur Modelisation Moleculaire
DPM - UMR5063 UJF/CNRS (http://dpm.ujf-grenoble.fr)
Pole Chimie bat. E - BP53 - 38042 GRENOBLE CEDEX 9
Tel : 33+ 0 476635292 - Fax : 33+ 0 476635298

___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php


Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2008-12-15 Thread patrick fuchs

Hi Berk,
I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC).
I recompiled it with CFLAGS=-g and it still hangs...
Now, how can we run it in the debugger ?
Thanks,

Patrick

Berk Hess a écrit :

Hi,

What compiler (and compiler version) are you using?

Could you configure with CFLAGS=-g
and see if it still hangs?
If it also hangs in that case, we can run it in the debugger
and find out where it hangs.

Berk

  Date: Mon, 15 Dec 2008 16:32:31 +0100
  From: patrick.fu...@univ-paris-diderot.fr
  To: gmx-users@gromacs.org
  Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
 
  Hi,
  I have exactly the same problem under Fedora 9 on a dual-quadricore
  (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same
  for gromacs-4.0.0) after a couple of minutes of simulation. Sometimes,
  it even hangs very quickly before the simulation reaches the writing of
  the first checkpoint file (in fact the time length before the hang
  occurs is chaotic, sometimes a couple of minutes, or a few seconds). The
  CPUs are still loaded but nothing goes to the output (on any file log,
  xtc, trr, edr...). All gromacs binaries were standardly compiled with
  --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I don't
  see anything strange in the log file.
  I have another computer single quadricore (Intel Xeon E5430, 2.66 GHz)
  under Fedora 8 and the same system (same mdp, topology etc...) is
  running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So
  would it be possible that there's something wrong going on with FC9 and
  lam-7.1.4...?
  Cheers,
 
  Patrick
 
  Berk Hess a écrit :
   Hi,
  
   If your simulations no longer produce output, but still run
   and there is no error or warning message,
   my guess would be that they are waiting for MPI communication.
   But the developers any many users are using 4.0 and I have
   not heard from problems like this, so I wonder if the problem
   could be somewhere else.
  
   Could you (or have your tried to) continue your simulation
   from the last checkpoint (mdrun option -cpi) before the hang,
   to see if it crashes quickly then?
  
   Berk
  
Date: Fri, 12 Dec 2008 13:42:43 +0100
From: bernhard.kn...@meduniwien.ac.at
To: gmx-users@gromacs.org
Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
   
Mark wrote:
   
 What's happening in the log files? What's the latest 
information in

   the
 checkpoint files? Could there be some issue with file system
   availability?
   
Hi Mark
   
Unfortunaltey I already deleted the simulation files which got stuck
after 847ps. But here is the output of another simulation done on the
same system but with an other pdb file. This one gets stuck after 
179ps

with the following output:
   
The latest thing the checkpoint file says is:
   
imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009
imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009
   
The predcition for 1st of July is not surprising since I am always
parameterizing the simulation with 200ns to avoid to restart it if
something interesting happens in the last frames.
   
for the .log file it is:
   
Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008
   
Energies (kJ/mol)
G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05
Total Energy Temperature Pressure (bar) Cons. rmsd ()
-1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05
   
DD step 88999 load imb.: force 3.1%
   
Step Time Lambda
89000 178.2 0.0
   
Energies (kJ/mol)
G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
8.03089e+03 3.59681e+03 2.42628e+03 1.20942e+03 5.12341e+04
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+06 4.56901e+05
Total Energy Temperature Pressure (bar) Cons. rmsd ()
-1.93239e+06 3.09508e+02 1.64627e+01 2.08518e-05
   
   
the disk is also free df -h says 2.3G out of 666G used.
   
The only difference between the system with gromacs 3.3 and 
gromacs 4 is
that gromacs 4 is running under suse 11 while gromacs 3.3 is 
running on

a node with suse 10. But I dont think this can be the problem?
   
cheers
Bernhard
   
   
___
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before
   posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php
  
   


   Express

Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2008-12-12 Thread Bernhard Knapp

Mark wrote:

What's happening in the log files? What's the latest information in the 
checkpoint files? Could there be some issue with file system availability?


Hi Mark

Unfortunaltey I already deleted the simulation files which got stuck 
after 847ps. But here is the output of another simulation done on the 
same system but with an other pdb file. This one gets stuck after 179ps 
with the following output:


The latest thing the checkpoint file says is:

imb F  3% step 89700, will finish Wed Jul  1 09:11:00 2009
imb F  3% step 89800, will finish Wed Jul  1 09:02:51 2009

The predcition for 1st of July is not surprising since I am always 
parameterizing the simulation with 200ns to avoid to restart it if 
something interesting happens in the last frames.


for the .log file it is:

Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008

  Energies (kJ/mol)
  G96AngleProper Dih.  Improper Dih.  LJ-14 Coulomb-14
   7.83753e+033.64068e+032.45951e+031.29167e+035.13688e+04
   LJ (SR)   Coulomb (SR)   Coul. recip.  PotentialKinetic En.
   3.82346e+05   -2.48883e+06   -3.51313e+05   -2.39119e+064.57648e+05
  Total EnergyTemperature Pressure (bar)  Cons. rmsd ()
  -1.93355e+063.10014e+021.09267e-012.14030e-05

DD  step 88999 load imb.: force  3.1%

  Step   Time Lambda
 89000  178.20.0

  Energies (kJ/mol)
  G96AngleProper Dih.  Improper Dih.  LJ-14 Coulomb-14
   8.03089e+033.59681e+032.42628e+031.20942e+035.12341e+04
   LJ (SR)   Coulomb (SR)   Coul. recip.  PotentialKinetic En.
   3.81539e+05   -2.48602e+06   -3.51307e+05   -2.38929e+064.56901e+05
  Total EnergyTemperature Pressure (bar)  Cons. rmsd ()
  -1.93239e+063.09508e+021.64627e+012.08518e-05


the disk is also free df -h says  2.3G out of 666G used.

The only difference between the system with gromacs 3.3 and gromacs 4 is 
that gromacs 4 is running under suse 11 while gromacs 3.3 is running on 
a node with suse 10. But I dont think this can be the problem?


cheers
Bernhard


___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/mailing_lists/users.php


Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?

2008-12-12 Thread Mark Abraham

Bernhard Knapp wrote:

Mark wrote:

What's happening in the log files? What's the latest information in 
the checkpoint files? Could there be some issue with file system 
availability?


Hi Mark

Unfortunaltey I already deleted the simulation files which got stuck 
after 847ps. But here is the output of another simulation done on the 
same system but with an other pdb file. This one gets stuck after 179ps 
with the following output:


The latest thing the checkpoint file says is:

imb F  3% step 89700, will finish Wed Jul  1 09:11:00 2009
imb F  3% step 89800, will finish Wed Jul  1 09:02:51 2009


OK so there's no indication from GROMACS that it's experiencing trauma 
within the simulation. So now you need to try to restart close to the 
point you saw problems to see if the problem is reproducible. If it's 
not reproducible, then my guess, as last time, is that an NFS share is 
becoming unavailable, or some such. There are lots of other 
possibilities - a bug in GROMACS seems unlikely at this stage.


Mark
___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/mailing_lists/users.php