Re: [gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3)
Hi Carsten, Thanks for your suggestion! But because my simulation will be run for about 200ns, 10ns per day(24 hours is the maximum duration for one single job on the Cluster I am using), which will generate about 20 trajectories! Can anyone find the reason causing such error? regards, Baofu Qiao On 11/26/2010 09:07 AM, Carsten Kutzner wrote: Hi, as a workaround you could run with -noappend and later concatenate the output files. Then you should have no problems with locking. Carsten On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote: Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of Function not implemented? Any suggestion is appreciated! * Getting Loaded... Reading file pre.tpr, VERSION 4.5.3 (single precision) Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25 2010 --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: Failed to lock: pre.log. Function not implemented. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- It Doesn't Have to Be Tip Top (Pulp Fiction) Error on node 0, will try to stop all the nodes Halting parallel program mdrun on CPU 0 out of 64 gcq#147: It Doesn't Have to Be Tip Top (Pulp Fiction) -- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- -- mpiexec has exited due to process rank 0 with PID 32758 on -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- Dr. Baofu Qiao Institute for Computational Physics Universität Stuttgart Pfaffenwaldring 27 70569 Stuttgart Tel: +49(0)711 68563607 Fax: +49(0)711 68563658 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3) Neither of 4.5.1, 4.5.2 and 4.5.3 works
Hi all, I just made some tests by using gmx 4.5.1, 4.5.2 and 4.5.3. Neither of them works on the continuation. --- Program mdrun, VERSION 4.5.1 Source code file: checkpoint.c, line: 1727 Fatal error: Failed to lock: pre.log. Already running simulation? --- Program mdrun, VERSION 4.5.2 Source code file: checkpoint.c, line: 1748 Fatal error: Failed to lock: pre.log. Already running simulation? --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: Failed to lock: pre.log. Function not implemented. = The system to test is 895 SPC/E water,box size of 3nm, (genbox -box 3 -cs). The pre.mdp is attached. I have tested two clusters: Cluster A: 1)compiler/gnu/4.3 2) mpi/openmpi/1.2.8-gnu-4.3 3)FFTW 3.3.2 4) GMX 4.5.1/4.5.2/4.5.3 Cluster B: 1)compiler/gnu/4.3 2) mpi/openmpi/1.4.2-gnu-4.3 3)FFTW 3.3.2 4) GMX 4.5.3 GMX command: mpiexec -np 8 mdrun -deffnm pre -npme 2 -maxh 0.15 -cpt 5 -cpi pre.cpt -append Can anyone provide further help? Thanks a lot! best regards, On 11/26/2010 09:07 AM, Carsten Kutzner wrote: Hi, as a workaround you could run with -noappend and later concatenate the output files. Then you should have no problems with locking. Carsten On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote: Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of Function not implemented? Any suggestion is appreciated! * Getting Loaded... Reading file pre.tpr, VERSION 4.5.3 (single precision) Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25 2010 --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: Failed to lock: pre.log. Function not implemented. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- It Doesn't Have to Be Tip Top (Pulp Fiction) Error on node 0, will try to stop all the nodes Halting parallel program mdrun on CPU 0 out of 64 gcq#147: It Doesn't Have to Be Tip Top (Pulp Fiction) -- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- -- mpiexec has exited due to process rank 0 with PID 32758 on pre.mdp Description: application/mdp -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3)
Baofu, what operating system are you using? On what file system do you try to store the log file? The error (should) mean that the file system you use doesn't support locking of files. Try to store the log file on some other file system. If you want you can still store the (large) trajectory files on the same file system. Roland On Fri, Nov 26, 2010 at 4:55 AM, Baofu Qiao qia...@gmail.com wrote: Hi Carsten, Thanks for your suggestion! But because my simulation will be run for about 200ns, 10ns per day(24 hours is the maximum duration for one single job on the Cluster I am using), which will generate about 20 trajectories! Can anyone find the reason causing such error? regards, Baofu Qiao On 11/26/2010 09:07 AM, Carsten Kutzner wrote: Hi, as a workaround you could run with -noappend and later concatenate the output files. Then you should have no problems with locking. Carsten On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote: Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of Function not implemented? Any suggestion is appreciated! * Getting Loaded... Reading file pre.tpr, VERSION 4.5.3 (single precision) Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25 2010 --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: Failed to lock: pre.log. Function not implemented. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- It Doesn't Have to Be Tip Top (Pulp Fiction) Error on node 0, will try to stop all the nodes Halting parallel program mdrun on CPU 0 out of 64 gcq#147: It Doesn't Have to Be Tip Top (Pulp Fiction) -- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- -- mpiexec has exited due to process rank 0 with PID 32758 on -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- Dr. Baofu Qiao Institute for Computational Physics Universität Stuttgart Pfaffenwaldring 27 70569 Stuttgart Tel: +49(0)711 68563607 Fax: +49(0)711 68563658 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov 865-241-1537, ORNL PO BOX 2008 MS6309 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3)
Hi Roland, Thanks a lot! OS: Scientific Linux 5.5. But the system to store data is called as WORKSPACE, different from the regular hardware system. Maybe this is the reason. I'll try what you suggest! regards, Baofu Qiao On 11/26/2010 04:07 PM, Roland Schulz wrote: Baofu, what operating system are you using? On what file system do you try to store the log file? The error (should) mean that the file system you use doesn't support locking of files. Try to store the log file on some other file system. If you want you can still store the (large) trajectory files on the same file system. Roland On Fri, Nov 26, 2010 at 4:55 AM, Baofu Qiao qia...@gmail.com wrote: Hi Carsten, Thanks for your suggestion! But because my simulation will be run for about 200ns, 10ns per day(24 hours is the maximum duration for one single job on the Cluster I am using), which will generate about 20 trajectories! Can anyone find the reason causing such error? regards, Baofu Qiao On 11/26/2010 09:07 AM, Carsten Kutzner wrote: Hi, as a workaround you could run with -noappend and later concatenate the output files. Then you should have no problems with locking. Carsten On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote: Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of Function not implemented? Any suggestion is appreciated! * Getting Loaded... Reading file pre.tpr, VERSION 4.5.3 (single precision) Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25 2010 --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: Failed to lock: pre.log. Function not implemented. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- It Doesn't Have to Be Tip Top (Pulp Fiction) Error on node 0, will try to stop all the nodes Halting parallel program mdrun on CPU 0 out of 64 gcq#147: It Doesn't Have to Be Tip Top (Pulp Fiction) -- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- -- mpiexec has exited due to process rank 0 with PID 32758 on -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3)
Hi Baofu, could you provide more information about the file system? The command mount provides the file system used. If it is a network-file-system than the operating system and file system used on the file server is also of interest. Roland On Fri, Nov 26, 2010 at 11:00 AM, Baofu Qiao qia...@gmail.com wrote: Hi Roland, Thanks a lot! OS: Scientific Linux 5.5. But the system to store data is called as WORKSPACE, different from the regular hardware system. Maybe this is the reason. I'll try what you suggest! regards, Baofu Qiao On 11/26/2010 04:07 PM, Roland Schulz wrote: Baofu, what operating system are you using? On what file system do you try to store the log file? The error (should) mean that the file system you use doesn't support locking of files. Try to store the log file on some other file system. If you want you can still store the (large) trajectory files on the same file system. Roland On Fri, Nov 26, 2010 at 4:55 AM, Baofu Qiao qia...@gmail.com wrote: Hi Carsten, Thanks for your suggestion! But because my simulation will be run for about 200ns, 10ns per day(24 hours is the maximum duration for one single job on the Cluster I am using), which will generate about 20 trajectories! Can anyone find the reason causing such error? regards, Baofu Qiao On 11/26/2010 09:07 AM, Carsten Kutzner wrote: Hi, as a workaround you could run with -noappend and later concatenate the output files. Then you should have no problems with locking. Carsten On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote: Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of Function not implemented? Any suggestion is appreciated! * Getting Loaded... Reading file pre.tpr, VERSION 4.5.3 (single precision) Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25 2010 --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: Failed to lock: pre.log. Function not implemented. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- It Doesn't Have to Be Tip Top (Pulp Fiction) Error on node 0, will try to stop all the nodes Halting parallel program mdrun on CPU 0 out of 64 gcq#147: It Doesn't Have to Be Tip Top (Pulp Fiction) -- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- -- mpiexec has exited due to process rank 0 with PID 32758 on -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov 865-241-1537, ORNL PO BOX 2008 MS6309 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3)
Hi Roland, The output of mount is : /dev/mapper/grid01-root on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md0 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 172.30.100.254:/home on /home type nfs (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.254) 172.30.100.210:/opt on /opt type nfs (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210) 172.30.100.210:/var/spool/torque/server_logs on /var/spool/pbs/server_logs type nfs (ro,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210) none on /ipathfs type ipathfs (rw) 172.31.100@o2ib,172.30.100@tcp:172.31.100@o2ib,172.30.100@tcp:/lprod on /lustre/ws1 type lustre (rw,noatime,nodiratime) 172.31.100@o2ib,172.30.100@tcp:172.31.100@o2ib,172.30.100@tcp:/lbm on /lustre/lbm type lustre (rw,noatime,nodiratime) 172.30.100.219:/export/necbm on /nfs/nec type nfs (ro,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219) 172.30.100.219:/export/necbm-home on /nfs/nec/home type nfs (rw,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219) On 11/26/2010 05:41 PM, Roland Schulz wrote: Hi Baofu, could you provide more information about the file system? The command mount provides the file system used. If it is a network-file-system than the operating system and file system used on the file server is also of interest. Roland On Fri, Nov 26, 2010 at 11:00 AM, Baofu Qiao qia...@gmail.com wrote: Hi Roland, Thanks a lot! OS: Scientific Linux 5.5. But the system to store data is called as WORKSPACE, different from the regular hardware system. Maybe this is the reason. I'll try what you suggest! regards, Baofu Qiao On 11/26/2010 04:07 PM, Roland Schulz wrote: Baofu, what operating system are you using? On what file system do you try to store the log file? The error (should) mean that the file system you use doesn't support locking of files. Try to store the log file on some other file system. If you want you can still store the (large) trajectory files on the same file system. Roland On Fri, Nov 26, 2010 at 4:55 AM, Baofu Qiao qia...@gmail.com wrote: Hi Carsten, Thanks for your suggestion! But because my simulation will be run for about 200ns, 10ns per day(24 hours is the maximum duration for one single job on the Cluster I am using), which will generate about 20 trajectories! Can anyone find the reason causing such error? regards, Baofu Qiao On 11/26/2010 09:07 AM, Carsten Kutzner wrote: Hi, as a workaround you could run with -noappend and later concatenate the output files. Then you should have no problems with locking. Carsten On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote: Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of Function not implemented? Any suggestion is appreciated! * Getting Loaded... Reading file pre.tpr, VERSION 4.5.3 (single precision) Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25 2010 --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: Failed to lock: pre.log. Function not implemented. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- It Doesn't Have to Be Tip Top (Pulp Fiction) Error on node 0, will try to stop all the nodes Halting parallel program mdrun on CPU 0 out of 64 gcq#147: It Doesn't Have to Be Tip Top (Pulp Fiction) -- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from
Re: [gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 To make things short. The used file system is lustre. /Flo On 11/26/2010 05:49 PM, Baofu Qiao wrote: Hi Roland, The output of mount is : /dev/mapper/grid01-root on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md0 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 172.30.100.254:/home on /home type nfs (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.254) 172.30.100.210:/opt on /opt type nfs (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210) 172.30.100.210:/var/spool/torque/server_logs on /var/spool/pbs/server_logs type nfs (ro,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210) none on /ipathfs type ipathfs (rw) 172.31.100@o2ib,172.30.100@tcp:172.31.100@o2ib,172.30.100@tcp:/lprod on /lustre/ws1 type lustre (rw,noatime,nodiratime) 172.31.100@o2ib,172.30.100@tcp:172.31.100@o2ib,172.30.100@tcp:/lbm on /lustre/lbm type lustre (rw,noatime,nodiratime) 172.30.100.219:/export/necbm on /nfs/nec type nfs (ro,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219) 172.30.100.219:/export/necbm-home on /nfs/nec/home type nfs (rw,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219) On 11/26/2010 05:41 PM, Roland Schulz wrote: Hi Baofu, could you provide more information about the file system? The command mount provides the file system used. If it is a network-file-system than the operating system and file system used on the file server is also of interest. Roland On Fri, Nov 26, 2010 at 11:00 AM, Baofu Qiao qia...@gmail.com wrote: Hi Roland, Thanks a lot! OS: Scientific Linux 5.5. But the system to store data is called as WORKSPACE, different from the regular hardware system. Maybe this is the reason. I'll try what you suggest! regards, Baofu Qiao On 11/26/2010 04:07 PM, Roland Schulz wrote: Baofu, what operating system are you using? On what file system do you try to store the log file? The error (should) mean that the file system you use doesn't support locking of files. Try to store the log file on some other file system. If you want you can still store the (large) trajectory files on the same file system. Roland On Fri, Nov 26, 2010 at 4:55 AM, Baofu Qiao qia...@gmail.com wrote: Hi Carsten, Thanks for your suggestion! But because my simulation will be run for about 200ns, 10ns per day(24 hours is the maximum duration for one single job on the Cluster I am using), which will generate about 20 trajectories! Can anyone find the reason causing such error? regards, Baofu Qiao On 11/26/2010 09:07 AM, Carsten Kutzner wrote: Hi, as a workaround you could run with -noappend and later concatenate the output files. Then you should have no problems with locking. Carsten On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote: Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of Function not implemented? Any suggestion is appreciated! * Getting Loaded... Reading file pre.tpr, VERSION 4.5.3 (single precision) Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25 2010 --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: Failed to lock: pre.log. Function not implemented. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- It Doesn't Have to Be Tip Top (Pulp Fiction) Error on node 0, will try to stop all the nodes Halting parallel program mdrun on CPU 0 out of 64 gcq#147: It Doesn't Have to Be Tip Top (Pulp Fiction) -- MPI_ABORT
Re: [gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3): SOLVED
Hi all, What Roland said is right! the lustre system causes the problem of lock. Now I copy all the files to a folder of /tmp, then run the continuation. It works! Thanks! regards, $于 2010-11-26 22:53, Florian Dommert 写道: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 To make things short. The used file system is lustre. /Flo On 11/26/2010 05:49 PM, Baofu Qiao wrote: Hi Roland, The output of mount is : /dev/mapper/grid01-root on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md0 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 172.30.100.254:/home on /home type nfs (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.254) 172.30.100.210:/opt on /opt type nfs (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210) 172.30.100.210:/var/spool/torque/server_logs on /var/spool/pbs/server_logs type nfs (ro,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210) none on /ipathfs type ipathfs (rw) 172.31.100@o2ib,172.30.100@tcp:172.31.100@o2ib,172.30.100@tcp:/lprod on /lustre/ws1 type lustre (rw,noatime,nodiratime) 172.31.100@o2ib,172.30.100@tcp:172.31.100@o2ib,172.30.100@tcp:/lbm on /lustre/lbm type lustre (rw,noatime,nodiratime) 172.30.100.219:/export/necbm on /nfs/nec type nfs (ro,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219) 172.30.100.219:/export/necbm-home on /nfs/nec/home type nfs (rw,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219) On 11/26/2010 05:41 PM, Roland Schulz wrote: Hi Baofu, could you provide more information about the file system? The command mount provides the file system used. If it is a network-file-system than the operating system and file system used on the file server is also of interest. Roland On Fri, Nov 26, 2010 at 11:00 AM, Baofu Qiaoqia...@gmail.com wrote: Hi Roland, Thanks a lot! OS: Scientific Linux 5.5. But the system to store data is called as WORKSPACE, different from the regular hardware system. Maybe this is the reason. I'll try what you suggest! regards, Baofu Qiao On 11/26/2010 04:07 PM, Roland Schulz wrote: Baofu, what operating system are you using? On what file system do you try to store the log file? The error (should) mean that the file system you use doesn't support locking of files. Try to store the log file on some other file system. If you want you can still store the (large) trajectory files on the same file system. Roland On Fri, Nov 26, 2010 at 4:55 AM, Baofu Qiaoqia...@gmail.com wrote: Hi Carsten, Thanks for your suggestion! But because my simulation will be run for about 200ns, 10ns per day(24 hours is the maximum duration for one single job on the Cluster I am using), which will generate about 20 trajectories! Can anyone find the reason causing such error? regards, Baofu Qiao On 11/26/2010 09:07 AM, Carsten Kutzner wrote: Hi, as a workaround you could run with -noappend and later concatenate the output files. Then you should have no problems with locking. Carsten On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote: Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of Function not implemented? Any suggestion is appreciated! * Getting Loaded... Reading file pre.tpr, VERSION 4.5.3 (single precision) Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25 2010 --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: Failed to lock: pre.log. Function not implemented. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- It Doesn't Have to Be Tip Top (Pulp Fiction) Error on node 0, will try to stop all the nodes Halting parallel program mdrun on CPU 0 out of 64 gcq#147: It Doesn't Have to Be Tip Top (Pulp Fiction) -- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all
Re: [gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3): SOLVED
Hi, we use Lustre too and it doesn't cause any problem. I found this message on the Lustre list: http://lists.lustre.org/pipermail/lustre-discuss/2008-May/007366.html And according to your mount output, lustre on your machine is not mounted with the flock or localflock option. This seems to be the reason for the problem. Thus if you would like to run the simulation directly on lustre you have to ask the sysadmin to mount it with flock or localflock ( I don't recommend localflock. It doesn't guarantee the correct locking). If you would like to have an option to disable the locking than please file a bug report on bugzilla. The reason we lock the logfile is: We want to make sure that only one simulation is appending to the same files. Otherwise the files could get corrupted. This is why the locking is on by default and currently can't be disabled. Roland On Fri, Nov 26, 2010 at 3:17 PM, Baofu Qiao qia...@gmail.com wrote: Hi all, What Roland said is right! the lustre system causes the problem of lock. Now I copy all the files to a folder of /tmp, then run the continuation. It works! Thanks! regards, $于 2010-11-26 22:53, Florian Dommert 写道: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 To make things short. The used file system is lustre. /Flo On 11/26/2010 05:49 PM, Baofu Qiao wrote: Hi Roland, The output of mount is : /dev/mapper/grid01-root on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md0 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 172.30.100.254:/home on /home type nfs (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.254) 172.30.100.210:/opt on /opt type nfs (rw,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210) 172.30.100.210:/var/spool/torque/server_logs on /var/spool/pbs/server_logs type nfs (ro,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.210) none on /ipathfs type ipathfs (rw) 172.31.100@o2ib,172.30.100@tcp:172.31.100@o2ib ,172.30.100@tcp:/lprod on /lustre/ws1 type lustre (rw,noatime,nodiratime) 172.31.100@o2ib,172.30.100@tcp:172.31.100@o2ib ,172.30.100@tcp:/lbm on /lustre/lbm type lustre (rw,noatime,nodiratime) 172.30.100.219:/export/necbm on /nfs/nec type nfs (ro,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219) 172.30.100.219:/export/necbm-home on /nfs/nec/home type nfs (rw,bg,tcp,nfsvers=3,actimeo=10,hard,rsize=65536,wsize=65536,timeo=600,addr=172.30.100.219) On 11/26/2010 05:41 PM, Roland Schulz wrote: Hi Baofu, could you provide more information about the file system? The command mount provides the file system used. If it is a network-file-system than the operating system and file system used on the file server is also of interest. Roland On Fri, Nov 26, 2010 at 11:00 AM, Baofu Qiaoqia...@gmail.com wrote: Hi Roland, Thanks a lot! OS: Scientific Linux 5.5. But the system to store data is called as WORKSPACE, different from the regular hardware system. Maybe this is the reason. I'll try what you suggest! regards, Baofu Qiao On 11/26/2010 04:07 PM, Roland Schulz wrote: Baofu, what operating system are you using? On what file system do you try to store the log file? The error (should) mean that the file system you use doesn't support locking of files. Try to store the log file on some other file system. If you want you can still store the (large) trajectory files on the same file system. Roland On Fri, Nov 26, 2010 at 4:55 AM, Baofu Qiaoqia...@gmail.com wrote: Hi Carsten, Thanks for your suggestion! But because my simulation will be run for about 200ns, 10ns per day(24 hours is the maximum duration for one single job on the Cluster I am using), which will generate about 20 trajectories! Can anyone find the reason causing such error? regards, Baofu Qiao On 11/26/2010 09:07 AM, Carsten Kutzner wrote: Hi, as a workaround you could run with -noappend and later concatenate the output files. Then you should have no problems with locking. Carsten On Nov 25, 2010, at 9:43 PM, Baofu Qiao wrote: Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of Function not implemented? Any
[gmx-users] Re: Failed to lock: pre.log (Gromacs 4.5.3)
Hi all, I just recompiled GMX4.0.7. Such error doesn't occur. But 4.0.7 is about 30% slower than 4.5.3. So I really appreciate if anyone can help me with it! best regards, Baofu Qiao 于 2010-11-25 20:17, Baofu Qiao 写道: Hi all, I got the error message when I am extending the simulation using the following command: mpiexec -np 64 mdrun -deffnm pre -npme 32 -maxh 2 -table table -cpi pre.cpt -append The previous simuluation is succeeded. I wonder why pre.log is locked, and the strange warning of *Function not implemented*? Any suggestion is appreciated! * Getting Loaded... Reading file pre.tpr, VERSION 4.5.3 (single precision) Reading checkpoint file pre.cpt generated: Thu Nov 25 19:43:25 2010 --- Program mdrun, VERSION 4.5.3 Source code file: checkpoint.c, line: 1750 Fatal error: *Failed to lock: pre.log. Function not implemented.* For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- It Doesn't Have to Be Tip Top (Pulp Fiction) Error on node 0, will try to stop all the nodes Halting parallel program mdrun on CPU 0 out of 64 gcq#147: It Doesn't Have to Be Tip Top (Pulp Fiction) -- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- -- mpiexec has exited due to process rank 0 with PID 32758 on -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists