Re: [gmx-users] GPU-gromacs
On Oct 25, 2013, at 4:07 PM, aixintiankong aixintiank...@126.com wrote: Dear prof., i want install gromacs on a multi-core workstation with a GPU(tesla c2075), should i install the openmpi or mpich2? If you want to run Gromacs on just one workstation with a single GPU, you do not need to install an MPI library at all! Carsten -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- Dr. Carsten Kutzner Max Planck Institute for Biophysical Chemistry Theoretical and Computational Biophysics Am Fassberg 11, 37077 Goettingen, Germany Tel. +49-551-2012313, Fax: +49-551-2012302 http://www.mpibpc.mpg.de/grubmueller/kutzner http://www.mpibpc.mpg.de/grubmueller/sppexa -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU version of Gromacs
On 8/19/13 5:38 AM, grita wrote: Hey guys, Is it possible to make a SD simulation with using the pull code in the GPU version of Gromacs? Have you tried it? -Justin -- == Justin A. Lemkul, Ph.D. Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 == -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU metadynamics
On 08/15/2013 11:21 AM, Jacopo Sgrignani wrote: Dear Albert to run parallel jobs on multiple GPUs you should use something like this: mpirun -np (number of parallel sessions on CPU) mdrun_mpi .. -gpu_id so you will have 4 calculations for GPU. Jacopo thanks a lot for reply. but there is some problem with following command: mpirun -np 4 mdrun_mpi -s md.tpr -v -g md.log -o md.trr -x md.xtc -plumed plumed2.dat -e md.edr -gpu_id 0123. --log--- 4 GPUs detected on host node3: #0: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC: no, stat: compatible #2: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC: no, stat: compatible #3: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC: no, stat: compatible --- Program mdrun_mpi, VERSION 4.6.3 Source code file: /home/albert/install/source/gromacs-4.6.3/src/gmxlib/gmx_detect_hardware.c, line: 349 Fatal error: Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node. mdrun_mpi was started with 1 PP MPI process per node, but you provided 4 GPUs. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU + surface
Hello. ¿Have you removed periodicity?. Because you may only be seeing traversal of water molecules among copies of the periodic system. Lucio Montero Ph. D. student Instituto de Biotecnologia, UNAM Mexico El 08/08/13 07:39, Ondrej Kroutil escribió: Dear GMX users. I have done simulation of ions and water near quartz surface (ClayFF) using GPU (GTX580) and Gromacs (4.6.1, single precision, 64 bit, SSE4.1, fftw-3.3.3) and have observed strange behavior of water and ions. Its NVT simulation with freezed surface atoms (see .mdp below) and negative charge on surface (deprotonated silanols), system is overall neutral. I used same mdp for normal CPU simulation and GPU simulation, and just added -testverlet option for GPU simulation. In CPU simulation ions and water behaved as expected (see http://i1315.photobucket.com/albums/t587/Andrew_Twister/cpu-simul_zpscf784b46.png) , but in GPU simulation there was a visible flow of ions toward image of lower surface and all water molecules were oriented with hydrogens facing downward and oxygens oriented upwards (see http://i1315.photobucket.com/albums/t587/Andrew_Twister/gpu-simul_zps2c160ea6.png). It looks like there was an applied electric field but it is not. Do you think there is a problem in initial setup of parameters in mdp file? Or maybe problem of freezing groups? With no freeze situation is better, but there is still visible flow and pairing of same ions (see http://i1315.photobucket.com/albums/t587/Andrew_Twister/gpu-no_freeze_zps72ef3938.png). It look as electrostatics problem. Do you have any hints, please? And sorry if I missed similar topic in mailing list, but I couldn't find anything similar. Ondrej Kroutil integrator = md dt = 0.001 nsteps = 10 comm_mode= linear nstcomm = 1000 nstxout = 0 nstxtcout= 1000 nstvout = 0 nstfout = 0 nstlog = 1000 xtc_precision= 1 nstlist = 10 ns_type = grid rlist= 1.2 coulombtype = PME rcoulomb = 1.2 rvdw = 1.2 constraints = hbonds constraint_algorithm = lincs lincs_iter = 1 fourierspacing = 0.1 pme_order = 4 ewald_rtol = 1e-5 ewald_geometry = 3dc optimize_fft= yes ; Nose-Hoover temperature coupling Tcoupl = nose-hoover tau_t = 1 tc_grps= system ref_t = 298.15 ; No Pressure ; Pcoupl = Parrinello-Rahman pcoupltype = semiisotropic tau_p = 1.0 compressibility = 0 4.6e-5 ref_p = 0 1.0 ; OTHER periodic_molecules = no pbc = xyz ;energygrps = SOL SOH freezegrps = BULK freezedim = Y Y Y gen_vel = yes gen_temp= 298.15 gen_seed= -1 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
RE: [gmx-users] GPU + surface
Hi, The -testverlet option is only for testing (as the name implies). Please set the mdp option cutoff-scheme = Verlet Also please update to 4.6.3, as this, potential, issue might have already been resolved. With the Verlet scheme the CPU and GPU should give the same, correct or incorrect, result. Could it be that your system is located partially above and partially below z=0? This will cause problems with ewald-geometry = 3dc. To use this option you need to ensure your whole system is in the same periodic image. Cheers, Berk --- Date: Thu, 8 Aug 2013 14:39:59 +0200 From: okrou...@gmail.com To: gmx-users@gromacs.org Subject: [gmx-users] GPU + surface Dear GMX users. I have done simulation of ions and water near quartz surface (ClayFF) using GPU (GTX580) and Gromacs (4.6.1, single precision, 64 bit, SSE4.1, fftw-3.3.3) and have observed strange behavior of water and ions. Its NVT simulation with freezed surface atoms (see .mdp below) and negative charge on surface (deprotonated silanols), system is overall neutral. I used same mdp for normal CPU simulation and GPU simulation, and just added -testverlet option for GPU simulation. In CPU simulation ions and water behaved as expected (see http://i1315.photobucket.com/albums/t587/Andrew_Twister/cpu-simul_zpscf784b46.png) , but in GPU simulation there was a visible flow of ions toward image of lower surface and all water molecules were oriented with hydrogens facing downward and oxygens oriented upwards (see http://i1315.photobucket.com/albums/t587/Andrew_Twister/gpu-simul_zps2c160ea6.png). It looks like there was an applied electric field but it is not. Do you think there is a problem in initial setup of parameters in mdp file? Or maybe problem of freezing groups? With no freeze situation is better, but there is still visible flow and pairing of same ions (see http://i1315.photobucket.com/albums/t587/Andrew_Twister/gpu-no_freeze_zps72ef3938.png). It look as electrostatics problem. Do you have any hints, please? And sorry if I missed similar topic in mailing list, but I couldn't find anything similar. Ondrej Kroutil integrator = md dt = 0.001 nsteps = 10 comm_mode = linear nstcomm = 1000 nstxout = 0 nstxtcout = 1000 nstvout = 0 nstfout = 0 nstlog = 1000 xtc_precision = 1 nstlist = 10 ns_type = grid rlist = 1.2 coulombtype = PME rcoulomb = 1.2 rvdw = 1.2 constraints = hbonds constraint_algorithm = lincs lincs_iter = 1 fourierspacing = 0.1 pme_order = 4 ewald_rtol = 1e-5 ewald_geometry = 3dc optimize_fft = yes ; Nose-Hoover temperature coupling Tcoupl = nose-hoover tau_t = 1 tc_grps = system ref_t = 298.15 ; No Pressure ; Pcoupl = Parrinello-Rahman pcoupltype = semiisotropic tau_p = 1.0 compressibility = 0 4.6e-5 ref_p = 0 1.0 ; OTHER periodic_molecules = no pbc = xyz ;energygrps = SOL SOH freezegrps = BULK freezedim = Y Y Y gen_vel = yes gen_temp = 298.15 gen_seed = -1 -- Ondřej Kroutil ,, Faculty of Health and Social Studies ))' University of South Bohemia OOO Jirovcova 24, Ceske Budejovice OOO The Czech Republic | OO E-mail: okrou...@gmail.com -- O Mobile: +420 736 537 190 -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: Re: Re: [gmx-users] GPU-based workstation
I may be late with the reply, but here are my 2 cents. If you need a single very fast machine (i.e. maximum single simulation performance), you should get - either a very fast desktop CPU: i7 3930 or for 2x more the 3970 - which, BTW, I think is not worth it ($600-1000) - or 1-2 fast Xeon E5-s - depending on how many and which these will be $1k-2k each. For a single CPU setup two Titans may be an overkill and (at least with the current code) you may get very little extra performance from using two iso one GPU. With a dual-socket machine (and decently fast CPUs), if you have a large enough input system, two GPUs will work nicely. However, if you care about total simulation throughput and you have multiple simulations to run, I'd suggest that you buy 2-3 machines with the components that give the best ns/day/$: something like i7-4670 or 4770 with GTX 680/770 (or 780). -- Szilárd On Thu, Jun 27, 2013 at 1:01 PM, James Starlight jmsstarli...@gmail.com wrote: Back to my question I want to build gpu-based workstation based onto 2 titans geforces. My current budget allow me only hight-end 6nodes core i 7-3930 and MB with 5 PCI-E (like Asus rampage IV series). Would this system be balanced with two GPUs ? Should I use two 6-8 nodes XEONS instead of i7? James 2013/5/29 James Starlight jmsstarli...@gmail.com Dear Dr. Pall! Thank you for your suggestions! Asumming that I have budget of 5000 $ and I want to build gpu-based desktop on this money. Previously I've used single 4 core i5 with GTX 670 and obtain average 10 ns\day performance for the 70k atoms systems (1.0 cutoffs, no virtual sites , sd integrator). Now I'd like to build system based on 2 hight-end GeForces (e.g like TITAN). Should that system include 2 cpu's for good balancing? (e.g two 6 nodes XEONS with faster clocks for instance could be better for simulations than i7, couldnt it?) What addition properties to the MB should I consider for such system ? James 2013/5/28 lloyd riggs lloyd.ri...@gmx.ch Dear Dr. Pali, Thank you, Stephan Watkins *Gesendet:* Dienstag, 28. Mai 2013 um 19:50 Uhr *Von:* Szilárd Páll szilard.p...@cbr.su.se *An:* Discussion list for GROMACS users gmx-users@gromacs.org *Betreff:* Re: Re: [gmx-users] GPU-based workstation Dear all, As far as I understand, the OP is interested in hardware for *running* GROMACS 4.6 rather than developing code. or running LINPACK. To get best performance it is important to use a machine with hardware balanced for GROMACS' workloads. Too little GPU resources will result in CPU idling; too much GPU resources will lead to the runs being CPU or multi-GPU scaling bound and above a certain level GROMACS won't be able to make use of additional GPUs. Of course, the balance will depend both on hardware and simulation settings (mostly the LJ cut-off used). An additional factor to consider is typical system size. To reach near peak pair-force throughput on GPUs you typically need 20k-40k particles/GPU (depends on the architecture) and throughput drops below these values. Hence, in most cases it is preferred to use fewer and faster GPUs rather than more. Without knowing the budgdet and indented use of the machine it is hard to make suggestions, but I would say for a budget desktop box a quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well. If you're considering dual-socket workstations, I suggest you go with the higher core-count and higher frequency Intel CPUs (6+ cores 2.2 GHz), otherwise you may not see as much benefit as you would expect based on the insane price tag (especially if you compare to an i7 3939K or its IVB successor). Cheers, -- Szilárd On Sat, May 25, 2013 at 1:02 PM, lloyd riggs lloyd.ri...@gmx.ch wrote: More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPU's). There's cooling problems though with GPU's, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPU's for a 300 US$ board, but then the price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000
Re: [gmx-users] gpu cluster explanation
Hi Richard, Thank you for the help and sorry for the delay in my reply. I tried some test run changing some parameters (e.g. removing PME) and I was able to reach 20ns/day, so I think that 9-11 ns/day it's the max that I can obtain for my setting. thank your again for your help. cheers, Fra On Fri, 12 Jul 2013, at 03:41 PM, Richard Broadbent wrote: On 12/07/13 13:26, Francesco wrote: Hi all, I'm working with a 200K atoms system (protein + explicit water) and after a while using a cpu cluster I had to switch to a gpu cluster. I read both Acceleration and parallelization and Gromacs-gpu documentation pages (http://www.gromacs.org/Documentation/Acceleration_and_parallelization and http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM) but it's a bit confusing and I need help to understand if I really have understood correctly. :) I have 2 type of nodes: 3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @ 2.53GHz) 8gpu and 2 cpu (6 cores each) 1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3 MPI max. 2) because I have 12 cores I can open 4 OPenMP threads x MPI, because 4x3= 12 now if I have a node with 8 gpu, I can use 4 gpu: 4 MPI and 3 OpenMP is it right? is it possible to use 8 gpu and 8 cores only? you could set -ntomp 0, however and setup mpi/thread mpi to use 8 cores. However, a system that unbalanced (huge amount of gpu power to comparatively little cpu power) is unlikely to get great performance. Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3 gpu and 12 cores I get 9-11 ns/day. That slowdown is in line with what I got when I tried a similar cpu-gpu setup. That said other's might have some advice that will improve your performance. the command that I use is: mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v with n° gpu set via script : #BSUB -n 3 I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes. The mdp file and some statistics are following: START MDP title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD ; Run parameters integrator = md; Algorithm options nsteps = 2500 ; maximum number of steps to perform [50 ns] dt = 0.002 ; 2 fs = 0.002 ps ; Output control nstxout= 1 ; [steps] freq to write coordinates to trajectory, the last coordinates are always written nstvout= 1 ; [steps] freq to write velocities to trajectory, the last velocities are always written nstlog = 1 ; [steps] freq to write energies to log file, the last energies are always written nstenergy = 1 ; [steps] write energies to disk every nstenergy steps nstxtcout = 1 ; [steps] freq to write coordinates to xtc trajectory xtc_precision = 1000 ; precision to write to xtc trajectory (1000 = default) xtc_grps= system; which coordinate group(s) to write to disk energygrps = system; or System / which energy group(s) to writk ; Bond parameters continuation= yes ; restarting from npt constraints = all-bonds ; Bond types to replace by constraints constraint_algorithm= lincs ; holonomic constraints lincs_iter = 1 ; accuracy of LINCS lincs_order = 4 ; also related to accuracy lincs_warnangle = 30; [degrees] maximum angle that a bond can rotate before LINCS will complain That seems a little loose for constraints but setting that up and checking it's conserving energy and preserving bond lengths is something you'll have to do yourself Richard ; Neighborsearching ns_type = grid ; method of updating neighbor list cutoff-scheme = Verlet nstlist = 10; [steps] frequence to update neighbor list (10) rlist = 1.0 ; [nm] cut-off distance for the short-range neighbor list (1 default) rcoulomb = 1.0 ; [nm] long range electrostatic cut-off rvdw = 1.0 ; [nm] long range Van der Waals cut-off ; Electrostatics coulombtype= PME ; treatment of long range electrostatic interactions vdwtype = cut-off ; treatment of Van der Waals interactions ; Periodic boundary conditions pbc = xyz ; Dispersion correction DispCorr= EnerPres ; appling long range dispersion corrections ; Ewald fourierspacing= 0.12; grid spacing for FFT - controll the higest magnitude of wave vectors (0.12) pme_order = 4 ; interpolation order for PME,
Re: [gmx-users] gpu cluster explanation
On 12/07/13 13:26, Francesco wrote: Hi all, I'm working with a 200K atoms system (protein + explicit water) and after a while using a cpu cluster I had to switch to a gpu cluster. I read both Acceleration and parallelization and Gromacs-gpu documentation pages (http://www.gromacs.org/Documentation/Acceleration_and_parallelization and http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM) but it's a bit confusing and I need help to understand if I really have understood correctly. :) I have 2 type of nodes: 3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @ 2.53GHz) 8gpu and 2 cpu (6 cores each) 1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3 MPI max. 2) because I have 12 cores I can open 4 OPenMP threads x MPI, because 4x3= 12 now if I have a node with 8 gpu, I can use 4 gpu: 4 MPI and 3 OpenMP is it right? is it possible to use 8 gpu and 8 cores only? you could set -ntomp 0, however and setup mpi/thread mpi to use 8 cores. However, a system that unbalanced (huge amount of gpu power to comparatively little cpu power) is unlikely to get great performance. Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3 gpu and 12 cores I get 9-11 ns/day. That slowdown is in line with what I got when I tried a similar cpu-gpu setup. That said other's might have some advice that will improve your performance. the command that I use is: mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v with n° gpu set via script : #BSUB -n 3 I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes. The mdp file and some statistics are following: START MDP title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD ; Run parameters integrator = md; Algorithm options nsteps = 2500 ; maximum number of steps to perform [50 ns] dt = 0.002 ; 2 fs = 0.002 ps ; Output control nstxout= 1 ; [steps] freq to write coordinates to trajectory, the last coordinates are always written nstvout= 1 ; [steps] freq to write velocities to trajectory, the last velocities are always written nstlog = 1 ; [steps] freq to write energies to log file, the last energies are always written nstenergy = 1 ; [steps] write energies to disk every nstenergy steps nstxtcout = 1 ; [steps] freq to write coordinates to xtc trajectory xtc_precision = 1000 ; precision to write to xtc trajectory (1000 = default) xtc_grps= system; which coordinate group(s) to write to disk energygrps = system; or System / which energy group(s) to writk ; Bond parameters continuation= yes ; restarting from npt constraints = all-bonds ; Bond types to replace by constraints constraint_algorithm= lincs ; holonomic constraints lincs_iter = 1 ; accuracy of LINCS lincs_order = 4 ; also related to accuracy lincs_warnangle = 30; [degrees] maximum angle that a bond can rotate before LINCS will complain That seems a little loose for constraints but setting that up and checking it's conserving energy and preserving bond lengths is something you'll have to do yourself Richard ; Neighborsearching ns_type = grid ; method of updating neighbor list cutoff-scheme = Verlet nstlist = 10; [steps] frequence to update neighbor list (10) rlist = 1.0 ; [nm] cut-off distance for the short-range neighbor list (1 default) rcoulomb = 1.0 ; [nm] long range electrostatic cut-off rvdw = 1.0 ; [nm] long range Van der Waals cut-off ; Electrostatics coulombtype= PME ; treatment of long range electrostatic interactions vdwtype = cut-off ; treatment of Van der Waals interactions ; Periodic boundary conditions pbc = xyz ; Dispersion correction DispCorr= EnerPres ; appling long range dispersion corrections ; Ewald fourierspacing= 0.12; grid spacing for FFT - controll the higest magnitude of wave vectors (0.12) pme_order = 4 ; interpolation order for PME, 4 = cubic ewald_rtol= 1e-5 ; relative strength of Ewald-shifted potential at rcoulomb ; Temperature coupling tcoupl = nose-hoover ; temperature coupling with Nose-Hoover ensemble tc_grps = Protein Non-Protein tau_t = 0.40.4; [ps] time constant ref_t = 310310; [K] reference temperature for coupling [310 = 28°C ; Pressure coupling pcoupl = parrinello-rahman pcoupltype=
Re: Re: Re: [gmx-users] GPU-based workstation
Back to my question I want to build gpu-based workstation based onto 2 titans geforces. My current budget allow me only hight-end 6nodes core i 7-3930 and MB with 5 PCI-E (like Asus rampage IV series). Would this system be balanced with two GPUs ? Should I use two 6-8 nodes XEONS instead of i7? James 2013/5/29 James Starlight jmsstarli...@gmail.com Dear Dr. Pall! Thank you for your suggestions! Asumming that I have budget of 5000 $ and I want to build gpu-based desktop on this money. Previously I've used single 4 core i5 with GTX 670 and obtain average 10 ns\day performance for the 70k atoms systems (1.0 cutoffs, no virtual sites , sd integrator). Now I'd like to build system based on 2 hight-end GeForces (e.g like TITAN). Should that system include 2 cpu's for good balancing? (e.g two 6 nodes XEONS with faster clocks for instance could be better for simulations than i7, couldnt it?) What addition properties to the MB should I consider for such system ? James 2013/5/28 lloyd riggs lloyd.ri...@gmx.ch Dear Dr. Pali, Thank you, Stephan Watkins *Gesendet:* Dienstag, 28. Mai 2013 um 19:50 Uhr *Von:* Szilárd Páll szilard.p...@cbr.su.se *An:* Discussion list for GROMACS users gmx-users@gromacs.org *Betreff:* Re: Re: [gmx-users] GPU-based workstation Dear all, As far as I understand, the OP is interested in hardware for *running* GROMACS 4.6 rather than developing code. or running LINPACK. To get best performance it is important to use a machine with hardware balanced for GROMACS' workloads. Too little GPU resources will result in CPU idling; too much GPU resources will lead to the runs being CPU or multi-GPU scaling bound and above a certain level GROMACS won't be able to make use of additional GPUs. Of course, the balance will depend both on hardware and simulation settings (mostly the LJ cut-off used). An additional factor to consider is typical system size. To reach near peak pair-force throughput on GPUs you typically need 20k-40k particles/GPU (depends on the architecture) and throughput drops below these values. Hence, in most cases it is preferred to use fewer and faster GPUs rather than more. Without knowing the budgdet and indented use of the machine it is hard to make suggestions, but I would say for a budget desktop box a quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well. If you're considering dual-socket workstations, I suggest you go with the higher core-count and higher frequency Intel CPUs (6+ cores 2.2 GHz), otherwise you may not see as much benefit as you would expect based on the insane price tag (especially if you compare to an i7 3939K or its IVB successor). Cheers, -- Szilárd On Sat, May 25, 2013 at 1:02 PM, lloyd riggs lloyd.ri...@gmx.ch wrote: More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPU's). There's cooling problems though with GPU's, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPU's for a 300 US$ board, but then the price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eq's Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops I've found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also you've mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPU's in one home-like desktop ? According to my current task I suppose that 2 GPU's would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPU's on one workstation ? What attributes of MB
Re: [gmx-users] GPU / CPU load imblance
On 6/25/13 6:33 PM, Dwey wrote: Hi gmx-users, I used 8-cores AMD CPU with a GTX680 GPU [ with 1536 CUDA Cores] to run an example of Umbrella Sampling provided by Justin. I am happy that GPU acceleration indeed helps me reduce significant time ( from 34 hours to 7 hours) of computation in this example. However, I found there was a NOTE on the screen like ++ The GPU has 20% more load than the CPU. This imbalance causes performance loss, consider using a shorter cut-off and a finer PME grid ++ Given a 20% load imbalance, I wonder if someone can give suggestions as to how to avoid performance loss in terms of hardware (GPU/CPU) improvement or the modification of mdp file (see below). I would avoid tweaking the .mdp settings. There have been several reports where people hacked at nonbonded cutoffs to get better performance, and it resulted in totally useless output. These settings are part of the force field. Avoid changing them. In terms of hardware, dose this NOTE suggest that I should use a higher-capacity GPU like GTX 780 [ with 2304 CUDA Cores] to balance load or catch up speed ? If so, can it help by adding another card with GTX 680 GPU in the same box ? but will it cause GPU/CPU imbalance load again, which two GPU keep waiting for 8-cores CPU ? There has been a lot of discussion on hardware, GPU/CPU balancing, etc. in recent days. Please check the archive. Some of the threads are quite detailed. -Justin Second, ++ Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554 For optimal performance this ratio should be close to 1 ++ I have no idea how this is evaluated by 4.006 ms and 2.578 ms for GPU and CPU time, respectively. It will be very helpful to modify the attached mdp for a better load balance between GPU and CPU. I appreciate kind advice and hints to improve this mdp file. Thanks, Dwey ### courtesy to Justin # title = Umbrella pulling simulation define = -DPOSRES_B ; Run parameters integrator = md dt = 0.002 tinit = 0 nsteps = 500 ; 10 ns nstcomm = 10 ; Output parameters nstxout = 5 ; every 100 ps nstvout = 5 nstfout = 5000 nstxtcout = 5000 ; every 10 ps nstenergy = 5000 ; Bond parameters constraint_algorithm= lincs constraints = all-bonds continuation= yes ; Single-range cutoff scheme nstlist = 5 ns_type = grid rlist = 1.4 rcoulomb= 1.4 rvdw= 1.4 ; PME electrostatics parameters coulombtype = PME fourierspacing = 0.12 fourier_nx = 0 fourier_ny = 0 fourier_nz = 0 pme_order = 4 ewald_rtol = 1e-5 optimize_fft= yes ; Berendsen temperature coupling is on in two groups Tcoupl = Nose-Hoover tc_grps = Protein Non-Protein tau_t = 0.5 0.5 ref_t = 310 310 ; Pressure coupling is on Pcoupl = Parrinello-Rahman pcoupltype = isotropic tau_p = 1.0 compressibility = 4.5e-5 ref_p = 1.0 refcoord_scaling = com ; Generate velocities is off gen_vel = no ; Periodic boundary conditions are on in all directions pbc = xyz ; Long-range dispersion correction DispCorr= EnerPres cutoff-scheme = Verlet ; Pull code pull= umbrella pull_geometry = distance pull_dim= N N Y pull_start = yes pull_ngroups= 1 pull_group0 = Chain_B pull_group1 = Chain_A pull_init1 = 0 pull_rate1 = 0.0 pull_k1 = 1000 ; kJ mol^-1 nm^-2 pull_nstxout= 1000 ; every 2 ps pull_nstfout= 1000 ; every 2 ps -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU ECC question
On Sat, Jun 8, 2013 at 9:21 PM, Albert mailmd2...@gmail.com wrote: Hello: Recently I found a strange question about Gromacs-4.6.2 on GPU workstaion. In my GTX690 machine, when I run md production I found that the ECC is on. However, in my another GTX590 machine, I found the ECC was off: 4 GPUs detected: #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible moreover, there is only two GTX590 in the machine, I don't know why Gromacs claimed 4 GPU detected. However, in my another Linux machine which also have two GTX590, Gromacs-4.6.2 only find 2 GPU, and ECC is still off. I am just wondering: (1) why in GTX690 the ECC can be on while it is off in my GTX590? I compiled Gromacs with the same options and the same version of intel compiler Unless your 690 is in fact a Tesla K10 it does surely not support ECC! Note that ECC is not something I personally think you really need. (2) why in machines both of physically installed two GTX590 cards, one of them was detected with 4 GPU while the other was claimed contains two GPU? Both GTX 590 and 690 are dual-chip boards which means two independent processing units with their own memory mounted on the same card and connected by a PCI switch (NVIDIA NF200). Hence, the two GPUs on these dual-chip boards will be enumerated as a separate devices. You can double-check this in nvidia-smi which should give the same devices as what mdrun reports. I suspect that one of the GPUs which is shown to have only two GPUs suffers from some hardware or software issues. Regards, Szilard thank you very much best Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU problem
Hi Albert, I think using -nt flag (-nt=16) with mdrun would solve your problem. Chandan -- Chandan kumar Choudhury NCL, Pune INDIA On Tue, Jun 4, 2013 at 12:56 PM, Albert mailmd2...@gmail.com wrote: Dear: I've got four GPU in one workstation. I am trying to run two GPU job with command: mdrun -s md.tpr -gpu_id 01 mdrun -s md.tpr -gpu_id 23 there are 32 CPU in this workstation. I found that each job trying to use the whole CPU, and there are 64 sub job when these two GPU mdrun submitted. Moreover, one of the job stopped after short of running, probably because of the CPU issue. I am just wondering, how can we distribute CPU when we run two GPU job in a single workstation? thank you very much best Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU problem
On 06/04/2013 11:22 AM, Chandan Choudhury wrote: Hi Albert, I think using -nt flag (-nt=16) with mdrun would solve your problem. Chandan thank you so much. it works well now. ALBERT -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU problem
-nt is mostly a backward compatibility option and sets the total number of threads (per rank). Instead, you should set both -ntmpi (or -np with MPI) and -ntomp. However, note that unless a single mdrun uses *all* cores/hardware threads on a node, it won't pin the threads to cores. Failing to pin threads will lead to considerable performance degradation; just tried and depending on how (un)lucky the thread placement and migration is, I get 1.5-2x performance degradation with running two mdrun-s on a single dual-socket node without pining threads. My advise is (yet again) that you should check the http://www.gromacs.org/Documentation/Acceleration_and_parallelization wiki page, in particular the section on how to run simulations. If things are not, clear please ask for clarification - input and constructive criticism should help us improve the wiki. We have been patiently pointing everyone to the wiki, so asking without reading up first is neither productive nor really fair. Cheers, -- Szilárd On Tue, Jun 4, 2013 at 11:22 AM, Chandan Choudhury iitd...@gmail.com wrote: Hi Albert, I think using -nt flag (-nt=16) with mdrun would solve your problem. Chandan -- Chandan kumar Choudhury NCL, Pune INDIA On Tue, Jun 4, 2013 at 12:56 PM, Albert mailmd2...@gmail.com wrote: Dear: I've got four GPU in one workstation. I am trying to run two GPU job with command: mdrun -s md.tpr -gpu_id 01 mdrun -s md.tpr -gpu_id 23 there are 32 CPU in this workstation. I found that each job trying to use the whole CPU, and there are 64 sub job when these two GPU mdrun submitted. Moreover, one of the job stopped after short of running, probably because of the CPU issue. I am just wondering, how can we distribute CPU when we run two GPU job in a single workstation? thank you very much best Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
RE:[gmx-users] GPU problem
Dear All or anyone, A stupid question. Is there an script anyone knows of to convert a 53a6ff from .top redirects to the gromacs/top directory to something like a ligand .itp? This is usefull at the moment. Example: [bond] 6 7 2 gb_5 to [bonds] ; ai aj fu c0, c1, ... 6 7 2 0.139 1080.0 0.139 1080.0 ; C CH for everything (a protein/DNA complex) inclusive of angles, dihedrials? Ive been playing with some of the gromacs user supplied files, but nothing yet. Stephan Watkins -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU problem
On 6/4/13 3:52 PM, lloyd riggs wrote: Dear All or anyone, A stupid question. Is there an script anyone knows of to convert a 53a6ff from .top redirects to the gromacs/top directory to something like a ligand .itp? This is usefull at the moment. Example: [bond] 6 7 2gb_5 to [bonds] ; ai aj fuc0, c1, ... 6 7 20.139 1080.00.139 1080.0 ; C CH for everything (a protein/DNA complex) inclusive of angles, dihedrials? Ive been playing with some of the gromacs user supplied files, but nothing yet. Sounds like something grompp -pp should take care of. -Justin -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Aw: Re: [gmx-users] GPU problem
Thanks, thats exact what I was looking for. Stephan Gesendet:Dienstag, 04. Juni 2013 um 22:28 Uhr Von:Justin Lemkul jalem...@vt.edu An:Discussion list for GROMACS users gmx-users@gromacs.org Betreff:Re: [gmx-users] GPU problem On 6/4/13 3:52 PM, lloyd riggs wrote: Dear All or anyone, A stupid question. Is there an script anyone knows of to convert a 53a6ff from .top redirects to the gromacs/top directory to something like a ligand .itp? This is usefull at the moment. Example: [bond] 6 7 2 gb_5 to [bonds] ; ai aj fu c0, c1, ... 6 7 2 0.139 1080.0 0.139 1080.0 ; C CH for everything (a protein/DNA complex) inclusive of angles, dihedrials? Ive been playing with some of the gromacs user supplied files, but nothing yet. Sounds like something grompp -pp should take care of. -Justin -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please dont post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Cant post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: Re: [gmx-users] GPU-based workstation
Dear all, As far as I understand, the OP is interested in hardware for *running* GROMACS 4.6 rather than developing code. or running LINPACK. To get best performance it is important to use a machine with hardware balanced for GROMACS' workloads. Too little GPU resources will result in CPU idling; too much GPU resources will lead to the runs being CPU or multi-GPU scaling bound and above a certain level GROMACS won't be able to make use of additional GPUs. Of course, the balance will depend both on hardware and simulation settings (mostly the LJ cut-off used). An additional factor to consider is typical system size. To reach near peak pair-force throughput on GPUs you typically need 20k-40k particles/GPU (depends on the architecture) and throughput drops below these values. Hence, in most cases it is preferred to use fewer and faster GPUs rather than more. Without knowing the budgdet and indented use of the machine it is hard to make suggestions, but I would say for a budget desktop box a quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well. If you're considering dual-socket workstations, I suggest you go with the higher core-count and higher frequency Intel CPUs (6+ cores 2.2 GHz), otherwise you may not see as much benefit as you would expect based on the insane price tag (especially if you compare to an i7 3939K or its IVB successor). Cheers, -- Szilárd On Sat, May 25, 2013 at 1:02 PM, lloyd riggs lloyd.ri...@gmx.ch wrote: More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPU's). There's cooling problems though with GPU's, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPU's for a 300 US$ board, but then the price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eq's Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops I've found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also you've mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPU's in one home-like desktop ? According to my current task I suppose that 2 GPU's would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPU's on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.ch There's also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPU's...Intell also sells their chips on PCIe cards...but get only about 350 Gflops, and run 1K US$. http://en.wikipedia.org/wiki/Field-programmable_gate_array and vendor http://www.xilinx.com/ They can design them though to fit a PCIe slot and run about the same, but still need the board, ram etc... Mostly just to dream about, they say you can order them with radiation shielding as well...so... Stephan Watkins *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr *Von:* James Starlight jmsstarli...@gmail.com *An:* Discussion list for GROMACS users gmx-users@gromacs.org *Betreff:* [gmx-users] GPU-based workstation Dear Gromacs Users! I'd like to build new workstation for performing simulation on GPU with Gromacs 4.6 native cuda support. Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system with SD integrator) Now I'd like to build multi-gpu wokstation. My question - How much GPU would give me best performance on the typical home-like workstation. What
Re: Aw: Re: [gmx-users] GPU-based workstation
On Sat, May 25, 2013 at 2:16 PM, Broadbent, Richard richard.broadben...@imperial.ac.uk wrote: I've been running on my Universities GPU nodes these are one E5-xeon (6-cores 12 threads) and have 4 Nvidia 690gtx's. My system is 93 000 atoms of DMF under NVE. The performance has been a little disappointing That sounds like a very imbalanced system for GROMACS, you have essentially 8 GPUs with rather poor PCI-E performance (a board share a single PCI-E bus) and only 12 CPU cores to drive the simulation. ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I get 5.4ns/day for the same system. On our HPC system using 32 nodes each with 2 quad-core xeon processors I get 30-40ns/day. That sounds somewhat low if these are all moderately fast CPUs and GPUs. I think that to achieve reasonable performance the system has to be balanced between CPU's and GPU's probably getting 2 high end GPU's and a top end xeon E5 or core i7 would be a good choice. Indeed. Even two GPUs may be too much - unless the CPU in question is a very high end i7 or E5. Cheers, -- Szilárd Richard From: lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch Reply-To: Discussion users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Date: Saturday, 25 May 2013 12:02 To: Discussion users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Subject: Aw: Re: [gmx-users] GPU-based workstation More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPU's). There's cooling problems though with GPU's, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPU's for a 300 US$ board, but then the price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eq's Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.commailto:jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops I've found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also you've mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPU's in one home-like desktop ? According to my current task I suppose that 2 GPU's would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPU's on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch There's also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPU's...Intell also sells their chips on PCIe cards...but get only about 350 Gflops, and run 1K US$. http://en.wikipedia.org/wiki/Field-programmable_gate_array and vendor http://www.xilinx.com/ They can design them though to fit a PCIe slot and run about the same, but still need the board, ram etc... Mostly just to dream about, they say you can order them with radiation shielding as well...so... Stephan Watkins *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr *Von:* James Starlight jmsstarli...@gmail.commailto:jmsstarli...@gmail.com *An:* Discussion list for GROMACS users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org *Betreff:* [gmx-users] GPU-based workstation Dear Gromacs Users! I'd like to build new workstation for performing simulation on GPU with Gromacs 4.6 native cuda support. Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system with SD integrator) Now I'd like to build multi-gpu wokstation. My question - How much GPU would give me best performance
Aw: Re: Re: [gmx-users] GPU-based workstation
Dear Dr. Pali, Thank you, Stephan Watkins Gesendet:Dienstag, 28. Mai 2013 um 19:50 Uhr Von:Szilrd Pll szilard.p...@cbr.su.se An:Discussion list for GROMACS users gmx-users@gromacs.org Betreff:Re: Re: [gmx-users] GPU-based workstation Dear all, As far as I understand, the OP is interested in hardware for *running* GROMACS 4.6 rather than developing code. or running LINPACK. To get best performance it is important to use a machine with hardware balanced for GROMACS workloads. Too little GPU resources will result in CPU idling; too much GPU resources will lead to the runs being CPU or multi-GPU scaling bound and above a certain level GROMACS wont be able to make use of additional GPUs. Of course, the balance will depend both on hardware and simulation settings (mostly the LJ cut-off used). An additional factor to consider is typical system size. To reach near peak pair-force throughput on GPUs you typically need 20k-40k particles/GPU (depends on the architecture) and throughput drops below these values. Hence, in most cases it is preferred to use fewer and faster GPUs rather than more. Without knowing the budgdet and indented use of the machine it is hard to make suggestions, but I would say for a budget desktop box a quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well. If youre considering dual-socket workstations, I suggest you go with the higher core-count and higher frequency Intel CPUs (6+ cores 2.2 GHz), otherwise you may not see as much benefit as you would expect based on the insane price tag (especially if you compare to an i7 3939K or its IVB successor). Cheers, -- Szilrd On Sat, May 25, 2013 at 1:02 PM, lloyd riggs lloyd.ri...@gmx.ch wrote: More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPUs). Theres cooling problems though with GPUs, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPUs for a 300 US board, but then the price goes way up (3-4000 US for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPUs but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eqs Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops Ive found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also youve mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPUs in one home-like desktop ? According to my current task I suppose that 2 GPUs would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPUs on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.ch Theres also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPUs...Intell also sells their chips on PCIe cards...but get only about 350 Gflops, and run 1K US. http://en.wikipedia.org/wiki/Field-programmable_gate_array and vendor http://www.xilinx.com/ They can design them though to fit a PCIe slot and run about the same, but still need the board, ram etc... Mostly just to dream about, they say you can order them with radiation shielding as well...so... Stephan Watkins *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr *Von:* James Starlight jmsstarli...@gmail.com *An:* Discussion list for GROMACS users gmx-users@gromacs.org *Betreff:* [gmx-users] GPU-based workstation Dear Gromacs Users! Id like to build new workstation for performing simulation on GPU with Gromacs 4.6 native cuda support. Recently Ive used such setup with Core i5 cpu and nvidia 670 GTX video and obtain good performance
Re: Re: Re: [gmx-users] GPU-based workstation
Dear Dr. Pall! Thank you for your suggestions! Asumming that I have budget of 5000 $ and I want to build gpu-based desktop on this money. Previously I've used single 4 core i5 with GTX 670 and obtain average 10 ns\day performance for the 70k atoms systems (1.0 cutoffs, no virtual sites , sd integrator). Now I'd like to build system based on 2 hight-end GeForces (e.g like TITAN). Should that system include 2 cpu's for good balancing? (e.g two 6 nodes XEONS with faster clocks for instance could be better for simulations than i7, couldnt it?) What addition properties to the MB should I consider for such system ? James 2013/5/28 lloyd riggs lloyd.ri...@gmx.ch Dear Dr. Pali, Thank you, Stephan Watkins *Gesendet:* Dienstag, 28. Mai 2013 um 19:50 Uhr *Von:* Szilárd Páll szilard.p...@cbr.su.se *An:* Discussion list for GROMACS users gmx-users@gromacs.org *Betreff:* Re: Re: [gmx-users] GPU-based workstation Dear all, As far as I understand, the OP is interested in hardware for *running* GROMACS 4.6 rather than developing code. or running LINPACK. To get best performance it is important to use a machine with hardware balanced for GROMACS' workloads. Too little GPU resources will result in CPU idling; too much GPU resources will lead to the runs being CPU or multi-GPU scaling bound and above a certain level GROMACS won't be able to make use of additional GPUs. Of course, the balance will depend both on hardware and simulation settings (mostly the LJ cut-off used). An additional factor to consider is typical system size. To reach near peak pair-force throughput on GPUs you typically need 20k-40k particles/GPU (depends on the architecture) and throughput drops below these values. Hence, in most cases it is preferred to use fewer and faster GPUs rather than more. Without knowing the budgdet and indented use of the machine it is hard to make suggestions, but I would say for a budget desktop box a quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well. If you're considering dual-socket workstations, I suggest you go with the higher core-count and higher frequency Intel CPUs (6+ cores 2.2 GHz), otherwise you may not see as much benefit as you would expect based on the insane price tag (especially if you compare to an i7 3939K or its IVB successor). Cheers, -- Szilárd On Sat, May 25, 2013 at 1:02 PM, lloyd riggs lloyd.ri...@gmx.ch wrote: More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPU's). There's cooling problems though with GPU's, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPU's for a 300 US$ board, but then the price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eq's Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops I've found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also you've mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPU's in one home-like desktop ? According to my current task I suppose that 2 GPU's would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPU's on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.ch There's also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPU's...Intell also sells their chips on PCIe cards...but get
Re: Re: Re: [gmx-users] GPU-based workstation
On Nvidia benchmarks I've found suggestions of using of the two 6 cores CPU for systems with the 2 GPU. Assuming that I'll be using two 680 GTX cards with 256 bits and 4gb ram (not a profesional nvidia cards like TESLA) what CPU's could give me the best performance- 1 i7 of 8 cores or 2 Xeons e5 with 6 cores ? Does it meaningful to use 2 separate CPU's with several nodes each for the 2 GPU's ? James 2013/5/26 lloyd riggs lloyd.ri...@gmx.ch You can also look at profilling on varied web sites, the high end Nvidia run only slightly better than the 2 year old ones, from an individual point not worth the money yet, but if you have the money? as I've been browsing. Also, the sim I did on the cluster was 180-190,000 atoms so the exact same performance the other person had. Stephan *Gesendet:* Samstag, 25. Mai 2013 um 15:19 Uhr *Von:* James Starlight jmsstarli...@gmail.com *An:* Discussion list for GROMACS users gmx-users@gromacs.org *Betreff:* Re: Aw: Re: [gmx-users] GPU-based workstation Richard, thanks for suggestion! Assuming that I'm using 2 high end GeForce's what performance be better 1) in case of one i7 (4 or 6 nodes ) ? 2) in case of 8 core Xeon like CPU Intel Xeon E5-2650 2.0 GHz / 8core What properties of MB should take into account primarily for such Xenon-based system. Does such MBs support multi-GPU ( I noticed that many such MBs lack for PCI)? James 2013/5/25 Broadbent, Richard richard.broadben...@imperial.ac.uk I've been running on my Universities GPU nodes these are one E5-xeon (6-cores 12 threads) and have 4 Nvidia 690gtx's. My system is 93 000 atoms of DMF under NVE. The performance has been a little disappointing ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I get 5.4ns/day for the same system. On our HPC system using 32 nodes each with 2 quad-core xeon processors I get 30-40ns/day. I think that to achieve reasonable performance the system has to be balanced between CPU's and GPU's probably getting 2 high end GPU's and a top end xeon E5 or core i7 would be a good choice. Richard From: lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch Reply-To: Discussion users gmx-users@gromacs.orgmailto: gmx-users@gromacs.org Date: Saturday, 25 May 2013 12:02 To: Discussion users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Subject: Aw: Re: [gmx-users] GPU-based workstation More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPU's). There's cooling problems though with GPU's, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPU's for a 300 US$ board, but then the price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eq's Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.commailto: jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto: gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops I've found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also you've mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPU's in one home-like desktop ? According to my current task I suppose that 2 GPU's would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPU's on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch There's also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPU's...Intell also sells their chips on PCIe
Re: [gmx-users] GPU-based workstation
Dear Dr. Watkins! Thank you for the suggestions! In the local shops I've found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also you've mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPU's in one home-like desktop ? According to my current task I suppose that 2 GPU's would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPU's on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.ch There's also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPU's...Intell also sells their chips on PCIe cards...but get only about 350 Gflops, and run 1K US$. http://en.wikipedia.org/wiki/Field-programmable_gate_array and vendor http://www.xilinx.com/ They can design them though to fit a PCIe slot and run about the same, but still need the board, ram etc... Mostly just to dream about, they say you can order them with radiation shielding as well...so... Stephan Watkins *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr *Von:* James Starlight jmsstarli...@gmail.com *An:* Discussion list for GROMACS users gmx-users@gromacs.org *Betreff:* [gmx-users] GPU-based workstation Dear Gromacs Users! I'd like to build new workstation for performing simulation on GPU with Gromacs 4.6 native cuda support. Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system with SD integrator) Now I'd like to build multi-gpu wokstation. My question - How much GPU would give me best performance on the typical home-like workstation. What algorithm of Ncidia GPU integration should I use (e.g SLI etc) ? Thanks for help, James -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Aw: Re: [gmx-users] GPU-based workstation
More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPUs). Theres cooling problems though with GPUs, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPUs for a 300 US board, but then the price goes way up (3-4000 US for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPUs but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eqs Stephan Gesendet:Samstag, 25. Mai 2013 um 07:54 Uhr Von:James Starlight jmsstarli...@gmail.com An:Discussion list for GROMACS users gmx-users@gromacs.org Betreff:Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops Ive found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also youve mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPUs in one home-like desktop ? According to my current task I suppose that 2 GPUs would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPUs on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.ch Theres also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPUs...Intell also sells their chips on PCIe cards...but get only about 350 Gflops, and run 1K US. http://en.wikipedia.org/wiki/Field-programmable_gate_array and vendor http://www.xilinx.com/ They can design them though to fit a PCIe slot and run about the same, but still need the board, ram etc... Mostly just to dream about, they say you can order them with radiation shielding as well...so... Stephan Watkins *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr *Von:* James Starlight jmsstarli...@gmail.com *An:* Discussion list for GROMACS users gmx-users@gromacs.org *Betreff:* [gmx-users] GPU-based workstation Dear Gromacs Users! Id like to build new workstation for performing simulation on GPU with Gromacs 4.6 native cuda support. Recently Ive used such setup with Core i5 cpu and nvidia 670 GTX video and obtain good performance ( ~ 20 nsday for typical 60.000 atom system with SD integrator) Now Id like to build multi-gpu wokstation. My question - How much GPU would give me best performance on the typical home-like workstation. What algorithm of Ncidia GPU integration should I use (e.g SLI etc) ? Thanks for help, James -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please dont post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Cant post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please dont post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Cant post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please dont post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Cant post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. *
Re: Aw: Re: [gmx-users] GPU-based workstation
I've been running on my Universities GPU nodes these are one E5-xeon (6-cores 12 threads) and have 4 Nvidia 690gtx's. My system is 93 000 atoms of DMF under NVE. The performance has been a little disappointing ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I get 5.4ns/day for the same system. On our HPC system using 32 nodes each with 2 quad-core xeon processors I get 30-40ns/day. I think that to achieve reasonable performance the system has to be balanced between CPU's and GPU's probably getting 2 high end GPU's and a top end xeon E5 or core i7 would be a good choice. Richard From: lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch Reply-To: Discussion users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Date: Saturday, 25 May 2013 12:02 To: Discussion users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Subject: Aw: Re: [gmx-users] GPU-based workstation More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPU's). There's cooling problems though with GPU's, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPU's for a 300 US$ board, but then the price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eq's Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.commailto:jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops I've found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also you've mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPU's in one home-like desktop ? According to my current task I suppose that 2 GPU's would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPU's on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch There's also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPU's...Intell also sells their chips on PCIe cards...but get only about 350 Gflops, and run 1K US$. http://en.wikipedia.org/wiki/Field-programmable_gate_array and vendor http://www.xilinx.com/ They can design them though to fit a PCIe slot and run about the same, but still need the board, ram etc... Mostly just to dream about, they say you can order them with radiation shielding as well...so... Stephan Watkins *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr *Von:* James Starlight jmsstarli...@gmail.commailto:jmsstarli...@gmail.com *An:* Discussion list for GROMACS users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org *Betreff:* [gmx-users] GPU-based workstation Dear Gromacs Users! I'd like to build new workstation for performing simulation on GPU with Gromacs 4.6 native cuda support. Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system with SD integrator) Now I'd like to build multi-gpu wokstation. My question - How much GPU would give me best performance on the typical home-like workstation. What algorithm of Ncidia GPU integration should I use (e.g SLI etc) ? Thanks for help, James -- gmx-users mailing list gmx-users@gromacs.orgmailto:gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.orgmailto:gmx-users-requ...@gromacs.org. * Can't post
Re: Aw: Re: [gmx-users] GPU-based workstation
Richard, thanks for suggestion! Assuming that I'm using 2 high end GeForce's what performance be better 1) in case of one i7 (4 or 6 nodes ) ? 2) in case of 8 core Xeon like CPU Intel Xeon E5-2650 2.0 GHz / 8core What properties of MB should take into account primarily for such Xenon-based system. Does such MBs support multi-GPU ( I noticed that many such MBs lack for PCI)? James 2013/5/25 Broadbent, Richard richard.broadben...@imperial.ac.uk I've been running on my Universities GPU nodes these are one E5-xeon (6-cores 12 threads) and have 4 Nvidia 690gtx's. My system is 93 000 atoms of DMF under NVE. The performance has been a little disappointing ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I get 5.4ns/day for the same system. On our HPC system using 32 nodes each with 2 quad-core xeon processors I get 30-40ns/day. I think that to achieve reasonable performance the system has to be balanced between CPU's and GPU's probably getting 2 high end GPU's and a top end xeon E5 or core i7 would be a good choice. Richard From: lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch Reply-To: Discussion users gmx-users@gromacs.orgmailto: gmx-users@gromacs.org Date: Saturday, 25 May 2013 12:02 To: Discussion users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Subject: Aw: Re: [gmx-users] GPU-based workstation More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPU's). There's cooling problems though with GPU's, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPU's for a 300 US$ board, but then the price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eq's Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.commailto: jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto: gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops I've found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also you've mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPU's in one home-like desktop ? According to my current task I suppose that 2 GPU's would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPU's on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch There's also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPU's...Intell also sells their chips on PCIe cards...but get only about 350 Gflops, and run 1K US$. http://en.wikipedia.org/wiki/Field-programmable_gate_array and vendor http://www.xilinx.com/ They can design them though to fit a PCIe slot and run about the same, but still need the board, ram etc... Mostly just to dream about, they say you can order them with radiation shielding as well...so... Stephan Watkins *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr *Von:* James Starlight jmsstarli...@gmail.commailto: jmsstarli...@gmail.com *An:* Discussion list for GROMACS users gmx-users@gromacs.orgmailto: gmx-users@gromacs.org *Betreff:* [gmx-users] GPU-based workstation Dear Gromacs Users! I'd like to build new workstation for performing simulation on GPU with Gromacs 4.6 native cuda support. Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system with SD integrator) Now I'd like to build multi-gpu wokstation. My question - How much GPU would give me best performance on the typical home
Aw: Re: Re: [gmx-users] GPU-based workstation
Id go for the i7 6 core, To the other message, funny. I bought ATIs as they clock faster and cost 1/3 the price of Nvidias but then the software all went to Nvidia. The new ATI with twice the shaders runs at the same speed (around 1-1.3 terflops ) due to the same problems the Nvidias ran into with IO (or maybe onboard RAM does solve the problem if they went up to 16 or 32 MB) Gromacs, etc...doesnt run on ATIs, and Ive been hoping they, AMD, catch up, but all I ever see is the constant in 6 months then nothing. I ran around 40 4 ns simulations on University blades with 8 AMD quad cores, using 3 blades I only was able to get 1 ns/day, but never pressed it as far as why so slow, as I needed to finish. With the Nvidia at even 5 ns/day I or alot of people could do some really nice work as far as publishing, with raw data in 2 weeks time, so now I feel a bit saddened... I also just found openCL profilling with CUDA 5 that will take any C or C++ software, and mark all sections you need to convert to openCL, but the trial software is 30 day, then 250 US... Stephan Gesendet:Samstag, 25. Mai 2013 um 15:19 Uhr Von:James Starlight jmsstarli...@gmail.com An:Discussion list for GROMACS users gmx-users@gromacs.org Betreff:Re: Aw: Re: [gmx-users] GPU-based workstation Richard, thanks for suggestion! Assuming that Im using 2 high end GeForces what performance be better 1) in case of one i7 (4 or 6 nodes ) ? 2) in case of 8 core Xeon like CPU Intel Xeon E5-2650 2.0 GHz / 8core What properties of MB should take into account primarily for such Xenon-based system. Does such MBs support multi-GPU ( I noticed that many such MBs lack for PCI)? James 2013/5/25 Broadbent, Richard richard.broadben...@imperial.ac.uk Ive been running on my Universities GPU nodes these are one E5-xeon (6-cores 12 threads) and have 4 Nvidia 690gtxs. My system is 93 000 atoms of DMF under NVE. The performance has been a little disappointing ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I get 5.4ns/day for the same system. On our HPC system using 32 nodes each with 2 quad-core xeon processors I get 30-40ns/day. I think that to achieve reasonable performance the system has to be balanced between CPUs and GPUs probably getting 2 high end GPUs and a top end xeon E5 or core i7 would be a good choice. Richard From: lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch Reply-To: Discussion users gmx-users@gromacs.orgmailto: gmx-users@gromacs.org Date: Saturday, 25 May 2013 12:02 To: Discussion users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Subject: Aw: Re: [gmx-users] GPU-based workstation More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPUs). Theres cooling problems though with GPUs, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPUs for a 300 US board, but then the price goes way up (3-4000 US for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPUs but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eqs Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.commailto: jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto: gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops Ive found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also youve mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPUs in one home-like desktop ? According to my current task I suppose that 2 GPUs would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPUs on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch Theres also these, but 1 chip runs 6K US, they can get
Aw: Re: Re: [gmx-users] GPU-based workstation
You can also look at profilling on varied web sites, the high end Nvidia run only slightly better than the 2 year old ones, from an individual point not worth the money yet, but if you have the money? as Ive been browsing. Also, the sim I did on the cluster was 180-190,000 atoms so the exact same performance the other person had. Stephan Gesendet:Samstag, 25. Mai 2013 um 15:19 Uhr Von:James Starlight jmsstarli...@gmail.com An:Discussion list for GROMACS users gmx-users@gromacs.org Betreff:Re: Aw: Re: [gmx-users] GPU-based workstation Richard, thanks for suggestion! Assuming that Im using 2 high end GeForces what performance be better 1) in case of one i7 (4 or 6 nodes ) ? 2) in case of 8 core Xeon like CPU Intel Xeon E5-2650 2.0 GHz / 8core What properties of MB should take into account primarily for such Xenon-based system. Does such MBs support multi-GPU ( I noticed that many such MBs lack for PCI)? James 2013/5/25 Broadbent, Richard richard.broadben...@imperial.ac.uk Ive been running on my Universities GPU nodes these are one E5-xeon (6-cores 12 threads) and have 4 Nvidia 690gtxs. My system is 93 000 atoms of DMF under NVE. The performance has been a little disappointing ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I get 5.4ns/day for the same system. On our HPC system using 32 nodes each with 2 quad-core xeon processors I get 30-40ns/day. I think that to achieve reasonable performance the system has to be balanced between CPUs and GPUs probably getting 2 high end GPUs and a top end xeon E5 or core i7 would be a good choice. Richard From: lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch Reply-To: Discussion users gmx-users@gromacs.orgmailto: gmx-users@gromacs.org Date: Saturday, 25 May 2013 12:02 To: Discussion users gmx-users@gromacs.orgmailto:gmx-users@gromacs.org Subject: Aw: Re: [gmx-users] GPU-based workstation More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPUs). Theres cooling problems though with GPUs, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the marrier...so yes, normal work stations you can get 4 GPUs for a 300 US board, but then the price goes way up (3-4000 US for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPUs but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eqs Stephan Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr Von: James Starlight jmsstarli...@gmail.commailto: jmsstarli...@gmail.com An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto: gmx-users@gromacs.org Betreff: Re: [gmx-users] GPU-based workstation Dear Dr. Watkins! Thank you for the suggestions! In the local shops Ive found only Core i7 with 6 cores (like Core i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores than with 4 cores in case of i7 cpu (assuming that I run simulation in cpu+gpu mode )? Also youve mentioned about 4 PCeI MD. Does it means that modern work-station could have 4 GPUs in one home-like desktop ? According to my current task I suppose that 2 GPUs would be suitable for my simulations (assuming that I use typical ASUS MB and 650 Watt power unit). Have someone tried to use several GPUs on one workstation ? What attributes of MB should be taken into account for best performance on such multi-gpu station ? James 2013/5/25 lloyd riggs lloyd.ri...@gmx.chmailto:lloyd.ri...@gmx.ch Theres also these, but 1 chip runs 6K US, they can get performance up to 2.3 teraflops per chip though double percission...but have no clue about integration with GPUs...Intell also sells their chips on PCIe cards...but get only about 350 Gflops, and run 1K US. http://en.wikipedia.org/wiki/Field-programmable_gate_array and vendor http://www.xilinx.com/ They can design them though to fit a PCIe slot and run about the same, but still need the board, ram etc... Mostly just to dream about, they say you can order them with radiation shielding as well...so... Stephan Watkins *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr *Von:* James Starlight jmsstarli...@gmail.commailto: jmsstarli...@gmail.com *An:* Discussion list for GROMACS users gmx-users
Re: [gmx-users] GPU job often stopped
the problem is still there... :-( On 04/29/2013 06:06 PM, Szilárd Páll wrote: On Mon, Apr 29, 2013 at 3:51 PM, Albertmailmd2...@gmail.com wrote: On 04/29/2013 03:47 PM, Szilárd Páll wrote: In that case, while it isn't very likely, the issue could be caused by some implementation detail which aims to avoid performance loss caused by an issue in the NVIDIA drivers. Try running with the GMX_CUDA_STREAMSYNC environment variable set. Btw, were there any other processes using the GPU while mdrun was running? Cheers, -- Szilárd thanks for kind reply. There is no any other process when I am running Gromacs. do you mean I should set GMX_CUDA_STREAMSYNC in the job script like: export GMX_CUDA_STREAMSYNC=/opt/cuda-5.0 Sort of, but the value does not matter. So if your shell is bash, the above as well as simply export GMX_CUDA_STREAMSYNC= will work fine. Let us know if this avoided the crash - when you have simulated long enough to be able to judge. Cheers, -- Szilárd -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU job often stopped
Have you tried running on CPUs only just to see if the issue persists? Unless the issue does not occur with the same binary on the same hardware running on CPUs only, I doubt it's a problem in the code. Do you have ECC on? -- Szilárd On Sun, Apr 28, 2013 at 5:27 PM, Albert mailmd2...@gmail.com wrote: Dear: I am running MD jobs in a workstation with 4 K20 GPU and I found that the job always failed with following messages from time to time: [tesla:03432] *** Process received signal *** [tesla:03432] Signal: Segmentation fault (11) [tesla:03432] Signal code: Address not mapped (1) [tesla:03432] Failing at address: 0xfffe02de67e0 [tesla:03432] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f4666da1cb0] [tesla:03432] [ 1] mdrun_mpi() [0x47dd61] [tesla:03432] [ 2] mdrun_mpi() [0x47d8ae] [tesla:03432] [ 3] /opt/intel/lib/intel64/libiomp5.so(__kmp_invoke_microtask+0x93) [0x7f46667904f3] [tesla:03432] *** End of error message *** -- mpirun noticed that process rank 0 with PID 3432 on node tesla exited on signal 11 (Segmentation fault). -- I can continue the jobs with mdrun option -append -cpi, but it still stopped from time to time. I am just wondering what's the problem? thank you very much Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU job often stopped
Hello: yes, I tried the CPU only version, it goes well and didn't stop. I am not sure whether I have ECC on or not. There are 4 Tesla K20 and one GTX650 in the workstation, after compilation, I simple submit the jobs with command: mdrun -s md.tpr -gpu_id 0234 I submit the same system in another GTX690 machine, it also goes well. I compiled Gromacs with the same options in that machine. thank you very much best Albert On 04/29/2013 01:19 PM, Szilárd Páll wrote: Have you tried running on CPUs only just to see if the issue persists? Unless the issue does not occur with the same binary on the same hardware running on CPUs only, I doubt it's a problem in the code. Do you have ECC on? -- Szilárd -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU job often stopped
On 04/28/2013 05:45 PM, Justin Lemkul wrote: Frequent failures suggest instability in the simulated system. Check your .log file or stderr for informative Gromacs diagnostic information. -Justin my log file didn't have any errors, the end of topped log file something like: DD step 2259 vol min/aver 0.967 load imb.: force 0.8% Step Time Lambda 226045200.00.0 Energies (kJ/mol) AngleU-BProper Dih. Improper Dih. LJ-14 9.86437e+034.02406e+043.52809e+046.13542e+02 8.61815e+03 Coulomb-14LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. 1.25055e+043.05477e+04 -9.05956e+03 -6.02400e+05 1.58357e+03 Position Rest. PotentialKinetic En. Total Energy Temperature 1.39149e+02 -4.72066e+051.37165e+05 -3.34901e+05 3.11958e+02 Pres. DC (bar) Pressure (bar) Constr. rmsd -2.94092e+02 -7.91535e+011.79812e-05 also in the information file I only obtained information: step 13300, will finish Tue Apr 30 14:41 NOTE: Turning on dynamic load balancing Probably the machine was restarted from time to time? best Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU job often stopped
On Mon, Apr 29, 2013 at 2:41 PM, Albert mailmd2...@gmail.com wrote: On 04/28/2013 05:45 PM, Justin Lemkul wrote: Frequent failures suggest instability in the simulated system. Check your .log file or stderr for informative Gromacs diagnostic information. -Justin my log file didn't have any errors, the end of topped log file something like: DD step 2259 vol min/aver 0.967 load imb.: force 0.8% Step Time Lambda 226045200.00.0 Energies (kJ/mol) AngleU-BProper Dih. Improper Dih. LJ-14 9.86437e+034.02406e+043.52809e+046.13542e+02 8.61815e+03 Coulomb-14LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. 1.25055e+043.05477e+04 -9.05956e+03 -6.02400e+05 1.58357e+03 Position Rest. PotentialKinetic En. Total Energy Temperature 1.39149e+02 -4.72066e+051.37165e+05 -3.34901e+05 3.11958e+02 Pres. DC (bar) Pressure (bar) Constr. rmsd -2.94092e+02 -7.91535e+011.79812e-05 also in the information file I only obtained information: step 13300, will finish Tue Apr 30 14:41 NOTE: Turning on dynamic load balancing Probably the machine was restarted from time to time? The segv indicates that mdrun crashed and not that the machine was restarted. The GPU detection output (both on stderr and log) should show whether ECC is on (and so does the nvidia-smi tool). Cheers, -- Szilárd best Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU job often stopped
On 04/29/2013 03:31 PM, Szilárd Páll wrote: The segv indicates that mdrun crashed and not that the machine was restarted. The GPU detection output (both on stderr and log) should show whether ECC is on (and so does the nvidia-smi tool). Cheers, -- Szilárd yes it was on: Reading file heavy.tpr, VERSION 4.6.1 (single precision) Using 4 MPI threads Using 8 OpenMP threads per tMPI thread 5 GPUs detected: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC: no, stat: compatible #2: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #3: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #4: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible 4 GPUs user-selected for this run: #0, #2, #3, #4 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU job often stopped
In that case, while it isn't very likely, the issue could be caused by some implementation detail which aims to avoid performance loss caused by an issue in the NVIDIA drivers. Try running with the GMX_CUDA_STREAMSYNC environment variable set. Btw, were there any other processes using the GPU while mdrun was running? Cheers, -- Szilárd On Mon, Apr 29, 2013 at 3:32 PM, Albert mailmd2...@gmail.com wrote: On 04/29/2013 03:31 PM, Szilárd Páll wrote: The segv indicates that mdrun crashed and not that the machine was restarted. The GPU detection output (both on stderr and log) should show whether ECC is on (and so does the nvidia-smi tool). Cheers, -- Szilárd yes it was on: Reading file heavy.tpr, VERSION 4.6.1 (single precision) Using 4 MPI threads Using 8 OpenMP threads per tMPI thread 5 GPUs detected: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC: no, stat: compatible #2: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #3: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #4: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible 4 GPUs user-selected for this run: #0, #2, #3, #4 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU job often stopped
On 04/29/2013 03:47 PM, Szilárd Páll wrote: In that case, while it isn't very likely, the issue could be caused by some implementation detail which aims to avoid performance loss caused by an issue in the NVIDIA drivers. Try running with the GMX_CUDA_STREAMSYNC environment variable set. Btw, were there any other processes using the GPU while mdrun was running? Cheers, -- Szilárd thanks for kind reply. There is no any other process when I am running Gromacs. do you mean I should set GMX_CUDA_STREAMSYNC in the job script like: export GMX_CUDA_STREAMSYNC=/opt/cuda-5.0 ? THX Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU job often stopped
On Mon, Apr 29, 2013 at 3:51 PM, Albert mailmd2...@gmail.com wrote: On 04/29/2013 03:47 PM, Szilárd Páll wrote: In that case, while it isn't very likely, the issue could be caused by some implementation detail which aims to avoid performance loss caused by an issue in the NVIDIA drivers. Try running with the GMX_CUDA_STREAMSYNC environment variable set. Btw, were there any other processes using the GPU while mdrun was running? Cheers, -- Szilárd thanks for kind reply. There is no any other process when I am running Gromacs. do you mean I should set GMX_CUDA_STREAMSYNC in the job script like: export GMX_CUDA_STREAMSYNC=/opt/cuda-5.0 Sort of, but the value does not matter. So if your shell is bash, the above as well as simply export GMX_CUDA_STREAMSYNC= will work fine. Let us know if this avoided the crash - when you have simulated long enough to be able to judge. Cheers, -- Szilárd ? THX Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU job often stopped
On 4/28/13 11:27 AM, Albert wrote: Dear: I am running MD jobs in a workstation with 4 K20 GPU and I found that the job always failed with following messages from time to time: [tesla:03432] *** Process received signal *** [tesla:03432] Signal: Segmentation fault (11) [tesla:03432] Signal code: Address not mapped (1) [tesla:03432] Failing at address: 0xfffe02de67e0 [tesla:03432] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f4666da1cb0] [tesla:03432] [ 1] mdrun_mpi() [0x47dd61] [tesla:03432] [ 2] mdrun_mpi() [0x47d8ae] [tesla:03432] [ 3] /opt/intel/lib/intel64/libiomp5.so(__kmp_invoke_microtask+0x93) [0x7f46667904f3] [tesla:03432] *** End of error message *** -- mpirun noticed that process rank 0 with PID 3432 on node tesla exited on signal 11 (Segmentation fault). -- I can continue the jobs with mdrun option -append -cpi, but it still stopped from time to time. I am just wondering what's the problem? Frequent failures suggest instability in the simulated system. Check your .log file or stderr for informative Gromacs diagnostic information. -Justin -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU efficiency question
Probably the part of the calculation done on the GPU is not rate limiting. There's no point having four chefs to make one dish... Look at the beginning and end of your .log files for diagnostic information. If this is a single node, you should be using threadMPI, not real MPI. Generally four CPU cores vs four GPU cores will require an extremely large PP load for the GPUs to all be effective. Mark On Fri, Apr 26, 2013 at 8:35 PM, Albert mailmd2...@gmail.com wrote: Dear: I've got two GTX690 in a a workstation and I found that when I run the md production with following two command: mpirun -np 4 md_run_mpi or mpirun -np 2 md_run_mpi the efficiency are the same. I notice that gromacs can detect 4 GPU (probably because GTX690 have two core..): 4 GPUs detected on host node4: #0: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC: no, stat: compatible #2: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC: no, stat: compatible #3: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC: no, stat: compatible why the -np 2 and -np 4 are the same efficiency? shouldn't it be faster for -np 4 ? thank you very much Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU performance
On Wed, Apr 10, 2013 at 3:34 AM, Benjamin Bobay bgbo...@ncsu.edu wrote: Szilárd - First, many thanks for the reply. Second, I am glad that I am not crazy. Ok so based on your suggestions, I think I know what the problem is/was. There was a sander process running on 1 of the CPUs. Clearly GROMACS was trying to use 4 with Using 4 OpenMP thread. I just did not catch that. Sorry! Rookie mistake. Which I guess leads me to my next question (sorry if its too naive): (1) When running GROMACS (or a I guess any other CUDA based programs), its best to have all the CPUs free, right? I guess based on my results I have pretty much answered that question. Although I thought that as long as I have one CPU available to run the GPU it would be good: would setting -ntmpi 1 -ntomp 1 help or would I take a major hit in ns/day as well? Such a behavior is not specific to GROMACS or CUDA-accelerated codes, but all compute-intensive codes that expect to be running alone on the set of CPU cores they are started on. As you could see on the output, mdrun automatically detected that you have 4 CPU cores and as Mark saied, it tries to use all of them along the GPU. As one of the cores was busy, you ended up in a situation in which four threads of mdrun plus the (presumably) one thread of sander are competing for four cores. This is made even worse by the fact that when using a full machine, mdrun locks its threads to physical cores to prevent the OS from moving them around (which can cause performance loss). Secondly, using a single core with a GPU will not result in a very good performance in GROMACS. The current GROMACS acceleration expects to run on a couple of CPU cores together with a GPU - which is the typical balance of CPU-GPU hardware most clusters (1 GPU/socket) as well as many home users would have (1-2 GPUs for 4-8 CPU cores). If I try the benchmarks again just to see (for fun) with Using 4 OpenMP thread, under top I have - so I think the CPU is fine : PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 24791 bobayb20 0 48.3g 51m 7576 R 299.1 0.2 11:32.90 mdrun Nope, that just means, roughly speaking, that sander is probably fully using one core and the four thread of mdrun are crammed on the remaining three cores - which is bad. However, you can simply run mdrun using three threads which will run fine along sander. Whether this will be efficient or not, you'll have to see. Note that if some other program is using the GPU as well, don't expect full performance - but the difference will be much less than in the case of oversubscribed CPU cores. Cheers, -- Szilárd When I have a chance (after this sander run is done - hopefully soon) I can try the benchmarks again. Thanks again for the help! Ben -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU performance
Hi Ben, That performance is not reasonable at all - neither for CPU only run on your quad-core Sandy Bridge, nor for the CPU+GPU run. For the latter you should be getting more like 50 ns/day or so. What's strange about your run is that the CPU-GPU load balancing is picking a *very* long cut-off which means that your CPU is for some reason performing very badly. Check how is mdrun behaving while running in top/htop nad if you are not seeing ~400% CPU utilization, there is something wrong - perhaps threads getting locked to the same core (to check that try -pin off). Secondly, note that you are using OpenMM-specific settings from the old GROMACS-OpenMM comparison benchmarks in which the grid spacing is overly coarse (you could use something like a fourier-spacing=0.125 or even larger with rc=1.0). Cheers, -- Szilárd On Tue, Apr 9, 2013 at 10:27 PM, Benjamin Bobay bgbo...@ncsu.edu wrote: Good afternoon - I recently installed gromacs-4.6 on CentOS6.3 and the installation went just fine. I have a Tesla C2075 GPU. I then downloaded the benchmark directories and ran a bench mark on the GPU/ dhfr-solv-PME.bench This is what I got: Using 1 MPI thread Using 4 OpenMP threads 1 GPU detected: #0: NVIDIA Tesla C2075, compute cap.: 2.0, ECC: yes, stat: compatible 1 GPU user-selected for this run: #0 Back Off! I just backed up ener.edr to ./#ener.edr.1# starting mdrun 'Protein in water' -1 steps, infinite ps. step 40: timed with pme grid 64 64 64, coulomb cutoff 1.000: 4122.9 M-cycles step 80: timed with pme grid 56 56 56, coulomb cutoff 1.143: 3685.9 M-cycles step 120: timed with pme grid 48 48 48, coulomb cutoff 1.333: 3110.8 M-cycles step 160: timed with pme grid 44 44 44, coulomb cutoff 1.455: 3365.1 M-cycles step 200: timed with pme grid 40 40 40, coulomb cutoff 1.600: 3499.0 M-cycles step 240: timed with pme grid 52 52 52, coulomb cutoff 1.231: 3982.2 M-cycles step 280: timed with pme grid 48 48 48, coulomb cutoff 1.333: 3129.2 M-cycles step 320: timed with pme grid 44 44 44, coulomb cutoff 1.455: 3425.4 M-cycles step 360: timed with pme grid 42 42 42, coulomb cutoff 1.524: 2979.1 M-cycles optimal pme grid 42 42 42, coulomb cutoff 1.524 step 4300 performance: 1.8 ns/day and from the nvidia-smi output: Tue Apr 9 10:13:46 2013 +--+ | NVIDIA-SMI 4.304.37 Driver Version: 304.37 | |---+--+--+ | GPU Name | Bus-IdDisp. | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===+==+==| | 0 Tesla C2075 | :03:00.0 On | 0 | | 30% 67CP080W / 225W | 4% 200MB / 5375MB | 4% Default | +---+--+--+ +-+ | Compute processes: GPU Memory | | GPU PID Process name Usage | |=| |0 22568 mdrun 59MB | +-+ So I am only getting 1.8ns/day ! Is that right? It seems very very small compared to the CPU test where I am getting the same: step 200 performance: 1.8 ns/dayvol 0.79 imb F 14% From the md.log of the GPU test: Detecting CPU-specific acceleration. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz Family: 6 Model: 45 Stepping: 7 Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2a pic Acceleration most likely to fit this hardware: AVX_256 Acceleration selected at GROMACS compile time: AVX_256 1 GPU detected: #0: NVIDIA Tesla C2075, compute cap.: 2.0, ECC: yes, stat: compatible 1 GPU user-selected for this run: #0 Will do PME sum in reciprocal space. Any thoughts as to why it is so slow? many thanks! Ben -- Research Assistant Professor North Carolina State University Department of Molecular and Structural Biochemistry 128 Polk Hall Raleigh, NC 27695 Phone: (919)-513-0698 Fax: (919)-515-2047 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read
Re: [gmx-users] GPU performance
Szilárd - First, many thanks for the reply. Second, I am glad that I am not crazy. Ok so based on your suggestions, I think I know what the problem is/was. There was a sander process running on 1 of the CPUs. Clearly GROMACS was trying to use 4 with Using 4 OpenMP thread. I just did not catch that. Sorry! Rookie mistake. Which I guess leads me to my next question (sorry if its too naive): (1) When running GROMACS (or a I guess any other CUDA based programs), its best to have all the CPUs free, right? I guess based on my results I have pretty much answered that question. Although I thought that as long as I have one CPU available to run the GPU it would be good: would setting -ntmpi 1 -ntomp 1 help or would I take a major hit in ns/day as well? If I try the benchmarks again just to see (for fun) with Using 4 OpenMP thread, under top I have - so I think the CPU is fine : PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 24791 bobayb20 0 48.3g 51m 7576 R 299.1 0.2 11:32.90 mdrun When I have a chance (after this sander run is done - hopefully soon) I can try the benchmarks again. Thanks again for the help! Ben -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU performance
On Apr 10, 2013 3:34 AM, Benjamin Bobay bgbo...@ncsu.edu wrote: Szilárd - First, many thanks for the reply. Second, I am glad that I am not crazy. Ok so based on your suggestions, I think I know what the problem is/was. There was a sander process running on 1 of the CPUs. Clearly GROMACS was trying to use 4 with Using 4 OpenMP thread. I just did not catch that. Sorry! Rookie mistake. Which I guess leads me to my next question (sorry if its too naive): (1) When running GROMACS (or a I guess any other CUDA based programs), its best to have all the CPUs free, right? I guess based on my results I have pretty much answered that question. Although I thought that as long as I have one CPU available to run the GPU it would be good: would setting -ntmpi 1 -ntomp 1 help or would I take a major hit in ns/day as well? Some codes might treat the CPU as a I/O, MPI and memory-serving co-processor of the GPU; those codes will tend to be insensitive to the CPU config. GROMACS goes to great lengths to use all the hardware in a dynamically load-balanced way, so CPU load and config tend to affect the bottom line immediately. Mark If I try the benchmarks again just to see (for fun) with Using 4 OpenMP thread, under top I have - so I think the CPU is fine : PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 24791 bobayb20 0 48.3g 51m 7576 R 299.1 0.2 11:32.90 mdrun When I have a chance (after this sander run is done - hopefully soon) I can try the benchmarks again. Thanks again for the help! Ben -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU version of GROMACS 4.6 in MacOS cluster
Hi Szilard Thanks for this tip; it was extremely useful. The problem was indeed the incompatibility between the installed NVIDIA driver and the CUDA 5.0 runtime library. Installation of an older driver solved the problem. The programs devideQuery etc can now detect the GPU. GROMACS can also detect now the card but unfortunately aborts with the following error Fatal error: Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node. mdrun_mpi was started with 12 PP MPI processes per node, but only 1 GPU were detected. Here is my command line mpirun -np 12 mdrun_mpi -s test.tpr -deffnm test_out -nb gpu What can be the problem? Thanks again Hi George, As I said before, that just means that most probably the GPU driver is not compatible with the CUDA runtime (libcudart) that you installed with the CUDA toolkit. I've no clue about the Mac OS installers and releases, you'll have to do the research on that. Let us know if you have further (GROMACS-related) issues. Cheers, -- Szil?rd On Fri, Mar 1, 2013 at 2:48 PM, George Patargias g...@bioacademy.gr wrote: Hi Szilαrd Thanks for your reply. I have run the deviceQuery utility and what I got back is /deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 38 - no CUDA-capable device is detected Should I understand from this that the CUDA driver was not installed from the MAC OS X CUDA 5.0 Production Release? George HI, That looks like the driver does not work or is incompatible with the runtime. Please get the SDK, compile a simple program, e.g. deviceQuery and see if that works (I suspect that it won't). Regarding your machines, just FYI, the Quadro 4000 is a pretty slow card (somewhat slower than a GTX 460) so you'll hava a quite strong resource imbalance: a lot of CPU compute power (2x Xeon 5xxx, right?) and little GPU compute power which will lead to the CPU idling while waiting for the GPU. Cheers, -- Szilαrd On Thu, Feb 28, 2013 at 4:52 PM, George Patargias g...@bioacademy.grwrote: Hello We are trying to install the GPU version of GROMACS 4.6 on our own MacOS cluster. So for the cluster nodes that have the NVIDIA Quadro 4000 cards: - We have downloaded and install the MAC OS X CUDA 5.0 Production Release from here: https://developer.nvidia.com/cuda-downloads placing the libraries contained in this download in /usr/local/cuda/lib - We have managed to compile GROMACS 4.6 linking it statically with these CUDA libraries and the MPI libraries (with BUILD_SHARED_LIBS=OFF and GMX_PREFER_STATIC_LIBS=ON) Unfortunately, when we tried to run a test job with the generated mdrun_mpi, GROMACS reported that it cannot detect any CUDA-enabled devices. It also reports 0.0 version for CUDA driver and runtime. Is the actual CUDA driver missing from the MAC OS X CUDA 5.0 Production Release that we installed? Do we need to install it from here: http://www.nvidia.com/object/cuda-mac-driver.html Or is something else that we need to do? Many thanks in advance. George Dr. George Patargias Postdoctoral Researcher Biomedical Research Foundation Academy of Athens 4, Soranou Ephessiou 115 27 Athens Greece Office: +302106597568 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists Dr. George Patargias Postdoctoral Researcher Biomedical Research Foundation Academy of Athens 4, Soranou Ephessiou 115 27 Athens Greece Office: +302106597568 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. *
Re: [gmx-users] GPU version of GROMACS 4.6 in MacOS cluster
Hi Szilαrd Thanks for your reply. I have run the deviceQuery utility and what I got back is /deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 38 - no CUDA-capable device is detected Should I understand from this that the CUDA driver was not installed from the MAC OS X CUDA 5.0 Production Release? George HI, That looks like the driver does not work or is incompatible with the runtime. Please get the SDK, compile a simple program, e.g. deviceQuery and see if that works (I suspect that it won't). Regarding your machines, just FYI, the Quadro 4000 is a pretty slow card (somewhat slower than a GTX 460) so you'll hava a quite strong resource imbalance: a lot of CPU compute power (2x Xeon 5xxx, right?) and little GPU compute power which will lead to the CPU idling while waiting for the GPU. Cheers, -- Szilαrd On Thu, Feb 28, 2013 at 4:52 PM, George Patargias g...@bioacademy.grwrote: Hello We are trying to install the GPU version of GROMACS 4.6 on our own MacOS cluster. So for the cluster nodes that have the NVIDIA Quadro 4000 cards: - We have downloaded and install the MAC OS X CUDA 5.0 Production Release from here: https://developer.nvidia.com/cuda-downloads placing the libraries contained in this download in /usr/local/cuda/lib - We have managed to compile GROMACS 4.6 linking it statically with these CUDA libraries and the MPI libraries (with BUILD_SHARED_LIBS=OFF and GMX_PREFER_STATIC_LIBS=ON) Unfortunately, when we tried to run a test job with the generated mdrun_mpi, GROMACS reported that it cannot detect any CUDA-enabled devices. It also reports 0.0 version for CUDA driver and runtime. Is the actual CUDA driver missing from the MAC OS X CUDA 5.0 Production Release that we installed? Do we need to install it from here: http://www.nvidia.com/object/cuda-mac-driver.html Or is something else that we need to do? Many thanks in advance. George Dr. George Patargias Postdoctoral Researcher Biomedical Research Foundation Academy of Athens 4, Soranou Ephessiou 115 27 Athens Greece Office: +302106597568 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists Dr. George Patargias Postdoctoral Researcher Biomedical Research Foundation Academy of Athens 4, Soranou Ephessiou 115 27 Athens Greece Office: +302106597568 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU version of GROMACS 4.6 in MacOS cluster
Hi George, As I said before, that just means that most probably the GPU driver is not compatible with the CUDA runtime (libcudart) that you installed with the CUDA toolkit. I've no clue about the Mac OS installers and releases, you'll have to do the research on that. Let us know if you have further (GROMACS-related) issues. Cheers, -- Szilárd On Fri, Mar 1, 2013 at 2:48 PM, George Patargias g...@bioacademy.gr wrote: Hi Szilαrd Thanks for your reply. I have run the deviceQuery utility and what I got back is /deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 38 - no CUDA-capable device is detected Should I understand from this that the CUDA driver was not installed from the MAC OS X CUDA 5.0 Production Release? George HI, That looks like the driver does not work or is incompatible with the runtime. Please get the SDK, compile a simple program, e.g. deviceQuery and see if that works (I suspect that it won't). Regarding your machines, just FYI, the Quadro 4000 is a pretty slow card (somewhat slower than a GTX 460) so you'll hava a quite strong resource imbalance: a lot of CPU compute power (2x Xeon 5xxx, right?) and little GPU compute power which will lead to the CPU idling while waiting for the GPU. Cheers, -- Szilαrd On Thu, Feb 28, 2013 at 4:52 PM, George Patargias g...@bioacademy.grwrote: Hello We are trying to install the GPU version of GROMACS 4.6 on our own MacOS cluster. So for the cluster nodes that have the NVIDIA Quadro 4000 cards: - We have downloaded and install the MAC OS X CUDA 5.0 Production Release from here: https://developer.nvidia.com/cuda-downloads placing the libraries contained in this download in /usr/local/cuda/lib - We have managed to compile GROMACS 4.6 linking it statically with these CUDA libraries and the MPI libraries (with BUILD_SHARED_LIBS=OFF and GMX_PREFER_STATIC_LIBS=ON) Unfortunately, when we tried to run a test job with the generated mdrun_mpi, GROMACS reported that it cannot detect any CUDA-enabled devices. It also reports 0.0 version for CUDA driver and runtime. Is the actual CUDA driver missing from the MAC OS X CUDA 5.0 Production Release that we installed? Do we need to install it from here: http://www.nvidia.com/object/cuda-mac-driver.html Or is something else that we need to do? Many thanks in advance. George Dr. George Patargias Postdoctoral Researcher Biomedical Research Foundation Academy of Athens 4, Soranou Ephessiou 115 27 Athens Greece Office: +302106597568 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists Dr. George Patargias Postdoctoral Researcher Biomedical Research Foundation Academy of Athens 4, Soranou Ephessiou 115 27 Athens Greece Office: +302106597568 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU version of GROMACS 4.6 in MacOS cluster
The easiest way for solution is to kill MacOS ans switch to Linux. ;-) Albert On 03/01/2013 06:03 PM, Szilárd Páll wrote: Hi George, As I said before, that just means that most probably the GPU driver is not compatible with the CUDA runtime (libcudart) that you installed with the CUDA toolkit. I've no clue about the Mac OS installers and releases, you'll have to do the research on that. Let us know if you have further (GROMACS-related) issues. Cheers, -- Szilárd -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
On 12/17/2012 08:06 PM, Justin Lemkul wrote: It seems to me that the system is simply crashing like any other that becomes unstable. Does the simulation run at all on plain CPU? -Justin Thank you very much Justin, it's really helpful. I've checked that the structure after minization and found that there is some problem with my ligand. I regenerated the ligand toplogy with acpype, and resubmit for mimization and NVT. Now it goes well. So probably the problems comes from the incorrect ligand topolgy which make the system very unstable. best Albert -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
Hi, That unfortunately tell exactly about the reason why mdrun is stuck. Can you reproduce the issue on another machines or with different launch configurations? At which step does it get stuck (-stepout 1 can help)? Please try the following: - try running on a single GPU; - try running on CPUs only (-nb cpu and to match closer the GPU setup with -ntomp 12); - try running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var set (and to match closer the GPU setup with -ntomp 12) - provide a backtrace (using gdb). Cheers, -- Szilárd On Mon, Dec 17, 2012 at 5:37 PM, Albert mailmd2...@gmail.com wrote: hello: I am running GMX-4.6 beta2 GPU work in a 24 CPU core workstation with two GTX590, it stacked there without any output i.e the .xtc file size is always 0 after hours of running. Here is the md.log file I found: Using CUDA 8x8x8 non-bonded kernels Potential shift: LJ r^-12: 0.112 r^-6 0.335, Ewald 1.000e-05 Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size: 1536 Removing pbc first time Pinning to Hyper-Threading cores with 12 physical cores in a compute node There are 1 flexible constraints WARNING: step size for flexible constraining = 0 All flexible constraints will be rigid. Will try to keep all flexible constraints at their original length, but the lengths may exhibit some drift. Initializing Parallel LINear Constraint Solver Linking all bonded interactions to atoms There are 161872 inter charge-group exclusions, will use an extra communication step for exclusion forces for PME The initial number of communication pulses is: X 1 The initial domain decomposition cell size is: X 1.83 nm The maximum allowed distance for charge groups involved in interactions is: non-bonded interactions 1.200 nm (the following are initial values, they could change due to box deformation) two-body bonded interactions (-rdd) 1.200 nm multi-body bonded interactions (-rdd) 1.200 nm atoms separated by up to 5 constraints (-rcon) 1.826 nm When dynamic load balancing gets turned on, these settings will change to: The maximum number of communication pulses is: X 1 The minimum size for domain decomposition cells is 1.200 nm The requested allowed shrink of DD cells (option -dds) is: 0.80 The allowed shrink of domain decomposition cells is: X 0.66 The maximum allowed distance for charge groups involved in interactions is: non-bonded interactions 1.200 nm two-body bonded interactions (-rdd) 1.200 nm multi-body bonded interactions (-rdd) 1.200 nm atoms separated by up to 5 constraints (-rcon) 1.200 nm Making 1D domain decomposition grid 4 x 1 x 1, home cell index 0 0 0 Center of mass motion removal mode is Linear We have the following groups for center of mass motion removal: 0: Protein_LIG_POPC 1: Water_and_ions PLEASE READ AND CITE THE FOLLOWING REFERENCE G. Bussi, D. Donadio and M. Parrinello Canonical sampling through velocity rescaling J. Chem. Phys. 126 (2007) pp. 014101 --- Thank You --- THX -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
hello: I reduced the GPU to two, and it said: Back Off! I just backed up nvt.log to ./#nvt.log.1# Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision) NOTE: GPU(s) found, but the current simulation can not use GPUs To use a GPU, set the mdp option: cutoff-scheme = Verlet (for quick performance testing you can use the -testverlet option) Using 2 MPI processes 4 GPUs detected on host CUDANodeA: #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible Making 1D domain decomposition 2 x 1 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. when I run it with single GPU, it produced lots of pdb file with prefix step, and then it crashed with messages: Wrote pdb files with previous and current coordinates Warning: 1-4 interaction between 4674 and 4706 at distance 434.986 which is larger than the 1-4 table size 2.200 nm These are ignored for the rest of the simulation This usually means your system is exploding, if not, you should increase table-extension in your mdp file or with user tables increase the table size [CUDANodeA:20659] *** Process received signal *** [CUDANodeA:20659] Signal: Segmentation fault (11) [CUDANodeA:20659] Signal code: Address not mapped (1) [CUDANodeA:20659] Failing at address: 0xc7aa00dc [CUDANodeA:20659] [ 0] /lib64/libpthread.so.0(+0xf2d0) [0x2ab25c76d2d0] [CUDANodeA:20659] [ 1] /opt/gromacs-4.6/lib/libmd_mpi.so.6(+0x11020f) [0x2ab259e0720f] [CUDANodeA:20659] [ 2] /opt/gromacs-4.6/lib/libmd_mpi.so.6(+0x111c94) [0x2ab259e08c94] [CUDANodeA:20659] [ 3] /opt/gromacs-4.6/lib/libmd_mpi.so.6(gmx_pme_do+0x1d2e) [0x2ab259e0cbae] [CUDANodeA:20659] [ 4] /opt/gromacs-4.6/lib/libmd_mpi.so.6(do_force_lowlevel+0x1eef) [0x2ab259ddd62f] [CUDANodeA:20659] [ 5] /opt/gromacs-4.6/lib/libmd_mpi.so.6(do_force_cutsGROUP+0x1495) [0x2ab259e72a45] [CUDANodeA:20659] [ 6] mdrun_mpi(do_md+0x8133) [0x4334c3] [CUDANodeA:20659] [ 7] mdrun_mpi(mdrunner+0x19e9) [0x411639] [CUDANodeA:20659] [ 8] mdrun_mpi(main+0x17db) [0x4373db] [CUDANodeA:20659] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2ab25c999bfd] [CUDANodeA:20659] [10] mdrun_mpi() [0x407f09] [CUDANodeA:20659] *** End of error message *** [1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g nvt.log -x nvt.xtc here is the .mdp file I used: title = NVT equilibration for OR-POPC system define = -DPOSRES -DPOSRES_LIG ; Protein is position restrained (uses the posres.itp file information) ; Parameters describing the details of the NVT simulation protocol integrator = md; Algorithm (md = molecular dynamics [leap-frog integrator]; md-vv = md using velocity verlet; sd = stochastic dynamics) dt = 0.002 ; Time-step (ps) nsteps = 25; Number of steps to run (0.002 * 25 = 500 ps) ; Parameters controlling output writing nstxout = 0 ; Write coordinates to output .trr file every 2 ps nstvout = 0 ; Write velocities to output .trr file every 2 ps nstfout = 0 nstxtcout = 1000 nstenergy = 1000 ; Write energies to output .edr file every 2 ps nstlog = 1000 ; Write output to .log file every 2 ps ; Parameters describing neighbors searching and details about interaction calculations ns_type = grid ; Neighbor list search method (simple, grid) nstlist = 50; Neighbor list update frequency (after every given number of steps) rlist = 1.2 ; Neighbor list search cut-off distance (nm) rlistlong = 1.4 rcoulomb= 1.2 ; Short-range Coulombic interactions cut-off distance (nm) rvdw= 1.2 ; Short-range van der Waals cutoff distance (nm) pbc = xyz ; Direction in which to use Perodic Boundary Conditions (xyz, xy, no) cutoff-scheme =Verlet ; GPU running ; Parameters for treating bonded interactions continuation= no; Whether a fresh start or a continuation from a previous run (yes/no) constraint_algorithm = LINCS; Constraint algorithm (LINCS / SHAKE) constraints = all-bonds ; Which bonds/angles to constrain (all-bonds / hbonds / none / all-angles / h-angles) lincs_iter = 1 ; Number of iterations to correct for rotational lengthening in LINCS (related to accuracy) lincs_order = 4 ; Highest order in the expansion of the constraint
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
Hi, How about GPU emulation or CPU-only runs? Also, please try setting the number of therads to 1 (-ntomp 1). -- Szilárd On Mon, Dec 17, 2012 at 6:01 PM, Albert mailmd2...@gmail.com wrote: hello: I reduced the GPU to two, and it said: Back Off! I just backed up nvt.log to ./#nvt.log.1# Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision) NOTE: GPU(s) found, but the current simulation can not use GPUs To use a GPU, set the mdp option: cutoff-scheme = Verlet (for quick performance testing you can use the -testverlet option) Using 2 MPI processes 4 GPUs detected on host CUDANodeA: #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible Making 1D domain decomposition 2 x 1 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. when I run it with single GPU, it produced lots of pdb file with prefix step, and then it crashed with messages: Wrote pdb files with previous and current coordinates Warning: 1-4 interaction between 4674 and 4706 at distance 434.986 which is larger than the 1-4 table size 2.200 nm These are ignored for the rest of the simulation This usually means your system is exploding, if not, you should increase table-extension in your mdp file or with user tables increase the table size [CUDANodeA:20659] *** Process received signal *** [CUDANodeA:20659] Signal: Segmentation fault (11) [CUDANodeA:20659] Signal code: Address not mapped (1) [CUDANodeA:20659] Failing at address: 0xc7aa00dc [CUDANodeA:20659] [ 0] /lib64/libpthread.so.0(+**0xf2d0) [0x2ab25c76d2d0] [CUDANodeA:20659] [ 1] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(+0x11020f) [0x2ab259e0720f] [CUDANodeA:20659] [ 2] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(+0x111c94) [0x2ab259e08c94] [CUDANodeA:20659] [ 3] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(gmx_pme_do+0x1d2e) [0x2ab259e0cbae] [CUDANodeA:20659] [ 4] /opt/gromacs-4.6/lib/libmd_** mpi.so.6(do_force_lowlevel+**0x1eef) [0x2ab259ddd62f] [CUDANodeA:20659] [ 5] /opt/gromacs-4.6/lib/libmd_** mpi.so.6(do_force_cutsGROUP+**0x1495) [0x2ab259e72a45] [CUDANodeA:20659] [ 6] mdrun_mpi(do_md+0x8133) [0x4334c3] [CUDANodeA:20659] [ 7] mdrun_mpi(mdrunner+0x19e9) [0x411639] [CUDANodeA:20659] [ 8] mdrun_mpi(main+0x17db) [0x4373db] [CUDANodeA:20659] [ 9] /lib64/libc.so.6(__libc_start_**main+0xfd) [0x2ab25c999bfd] [CUDANodeA:20659] [10] mdrun_mpi() [0x407f09] [CUDANodeA:20659] *** End of error message *** [1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g nvt.log -x nvt.xtc here is the .mdp file I used: title = NVT equilibration for OR-POPC system define = -DPOSRES -DPOSRES_LIG ; Protein is position restrained (uses the posres.itp file information) ; Parameters describing the details of the NVT simulation protocol integrator = md; Algorithm (md = molecular dynamics [leap-frog integrator]; md-vv = md using velocity verlet; sd = stochastic dynamics) dt = 0.002 ; Time-step (ps) nsteps = 25; Number of steps to run (0.002 * 25 = 500 ps) ; Parameters controlling output writing nstxout = 0 ; Write coordinates to output .trr file every 2 ps nstvout = 0 ; Write velocities to output .trr file every 2 ps nstfout = 0 nstxtcout = 1000 nstenergy = 1000 ; Write energies to output .edr file every 2 ps nstlog = 1000 ; Write output to .log file every 2 ps ; Parameters describing neighbors searching and details about interaction calculations ns_type = grid ; Neighbor list search method (simple, grid) nstlist = 50; Neighbor list update frequency (after every given number of steps) rlist = 1.2 ; Neighbor list search cut-off distance (nm) rlistlong = 1.4 rcoulomb= 1.2 ; Short-range Coulombic interactions cut-off distance (nm) rvdw= 1.2 ; Short-range van der Waals cutoff distance (nm) pbc = xyz ; Direction in which to use Perodic Boundary Conditions (xyz, xy, no) cutoff-scheme =Verlet ; GPU running ; Parameters for treating bonded interactions continuation= no; Whether a fresh start or a continuation from a previous run (yes/no) constraint_algorithm = LINCS; Constraint algorithm (LINCS / SHAKE) constraints = all-bonds ; Which bonds/angles
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
On 12/17/2012 06:08 PM, Szilárd Páll wrote: Hi, How about GPU emulation or CPU-only runs? Also, please try setting the number of therads to 1 (-ntomp 1). -- Szilárd hello: I am running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var set (and to match closer the GPU setup with -ntomp 12), it failed with log: Back Off! I just backed up step33b.pdb to ./#step33b.pdb.2# Back Off! I just backed up step33c.pdb to ./#step33c.pdb.2# Wrote pdb files with previous and current coordinates [CUDANodeA:20753] *** Process received signal *** [CUDANodeA:20753] Signal: Segmentation fault (11) [CUDANodeA:20753] Signal code: Address not mapped (1) [CUDANodeA:20753] Failing at address: 0x106ae6a00 [1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g nvt.log -x nvt.xtc -ntomp 12 I also tried , number of therads to 1 (-ntomp 1), it failed with following messages: Back Off! I just backed up step33c.pdb to ./#step33c.pdb.1# Wrote pdb files with previous and current coordinates [CUDANodeA:20740] *** Process received signal *** [CUDANodeA:20740] Signal: Segmentation fault (11) [CUDANodeA:20740] Signal code: Address not mapped (1) [CUDANodeA:20740] Failing at address: 0x1f74a96ec [CUDANodeA:20740] [ 0] /lib64/libpthread.so.0(+0xf2d0) [0x2b351d3022d0] [CUDANodeA:20740] [ 1] /opt/gromacs-4.6/lib/libmd_mpi.so.6(+0x11020f) [0x2b351a99c20f] [CUDANodeA:20740] [ 2] /opt/gromacs-4.6/lib/libmd_mpi.so.6(+0x111c94) [0x2b351a99dc94] [CUDANodeA:20740] [ 3] /opt/gromacs-4.6/lib/libmd_mpi.so.6(gmx_pme_do+0x1d2e) [0x2b351a9a1bae] [CUDANodeA:20740] [ 4] /opt/gromacs-4.6/lib/libmd_mpi.so.6(do_force_lowlevel+0x1eef) [0x2b351a97262f] [CUDANodeA:20740] [ 5] /opt/gromacs-4.6/lib/libmd_mpi.so.6(do_force_cutsVERLET+0x1756) [0x2b351aa04736] [CUDANodeA:20740] [ 6] /opt/gromacs-4.6/lib/libmd_mpi.so.6(do_force+0x3bf) [0x2b351aa0a0df] [CUDANodeA:20740] [ 7] mdrun_mpi(do_md+0x8133) [0x4334c3] [CUDANodeA:20740] [ 8] mdrun_mpi(mdrunner+0x19e9) [0x411639] [CUDANodeA:20740] [ 9] mdrun_mpi(main+0x17db) [0x4373db] [CUDANodeA:20740] [10] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2b351d52ebfd] [CUDANodeA:20740] [11] mdrun_mpi() [0x407f09] [CUDANodeA:20740] *** End of error message *** [1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g nvt.log -x nvt.xtc -ntomp 1 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
Hi Albert, Thanks for the testing. Last questions. - What version are you using? Is it beta2 release or latest git? if it's the former, getting the latest git might help if... - (do) you happen to be using GMX_GPU_ACCELERATION=None (you shouldn't!)? A bug triggered only with this setting has been fixed recently. If the above doesn't help, please file a bug report and attach a tpr so we can reproduce. Cheers, -- Szilárd On Mon, Dec 17, 2012 at 6:21 PM, Albert mailmd2...@gmail.com wrote: On 12/17/2012 06:08 PM, Szilárd Páll wrote: Hi, How about GPU emulation or CPU-only runs? Also, please try setting the number of therads to 1 (-ntomp 1). -- Szilárd hello: I am running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var set (and to match closer the GPU setup with -ntomp 12), it failed with log: Back Off! I just backed up step33b.pdb to ./#step33b.pdb.2# Back Off! I just backed up step33c.pdb to ./#step33c.pdb.2# Wrote pdb files with previous and current coordinates [CUDANodeA:20753] *** Process received signal *** [CUDANodeA:20753] Signal: Segmentation fault (11) [CUDANodeA:20753] Signal code: Address not mapped (1) [CUDANodeA:20753] Failing at address: 0x106ae6a00 [1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g nvt.log -x nvt.xtc -ntomp 12 I also tried , number of therads to 1 (-ntomp 1), it failed with following messages: Back Off! I just backed up step33c.pdb to ./#step33c.pdb.1# Wrote pdb files with previous and current coordinates [CUDANodeA:20740] *** Process received signal *** [CUDANodeA:20740] Signal: Segmentation fault (11) [CUDANodeA:20740] Signal code: Address not mapped (1) [CUDANodeA:20740] Failing at address: 0x1f74a96ec [CUDANodeA:20740] [ 0] /lib64/libpthread.so.0(+**0xf2d0) [0x2b351d3022d0] [CUDANodeA:20740] [ 1] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(+0x11020f) [0x2b351a99c20f] [CUDANodeA:20740] [ 2] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(+0x111c94) [0x2b351a99dc94] [CUDANodeA:20740] [ 3] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(gmx_pme_do+0x1d2e) [0x2b351a9a1bae] [CUDANodeA:20740] [ 4] /opt/gromacs-4.6/lib/libmd_** mpi.so.6(do_force_lowlevel+**0x1eef) [0x2b351a97262f] [CUDANodeA:20740] [ 5] /opt/gromacs-4.6/lib/libmd_** mpi.so.6(do_force_cutsVERLET+**0x1756) [0x2b351aa04736] [CUDANodeA:20740] [ 6] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(do_force+0x3bf) [0x2b351aa0a0df] [CUDANodeA:20740] [ 7] mdrun_mpi(do_md+0x8133) [0x4334c3] [CUDANodeA:20740] [ 8] mdrun_mpi(mdrunner+0x19e9) [0x411639] [CUDANodeA:20740] [ 9] mdrun_mpi(main+0x17db) [0x4373db] [CUDANodeA:20740] [10] /lib64/libc.so.6(__libc_start_**main+0xfd) [0x2b351d52ebfd] [CUDANodeA:20740] [11] mdrun_mpi() [0x407f09] [CUDANodeA:20740] *** End of error message *** [1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g nvt.log -x nvt.xtc -ntomp 1 -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
On Mon, Dec 17, 2012 at 6:01 PM, Albert mailmd2...@gmail.com wrote: hello: I reduced the GPU to two, and it said: Back Off! I just backed up nvt.log to ./#nvt.log.1# Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision) This is a development version from October 1. Please use the mdrun version you think you're using :-) Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
On Mon, Dec 17, 2012 at 7:56 PM, Mark Abraham mark.j.abra...@gmail.comwrote: On Mon, Dec 17, 2012 at 6:01 PM, Albert mailmd2...@gmail.com wrote: hello: I reduced the GPU to two, and it said: Back Off! I just backed up nvt.log to ./#nvt.log.1# Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision) This is a development version from October 1. Please use the mdrun version you think you're using :-) Thanks Mark, good catch! -- Szilárd Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
well, that's one of the log files. I've tried both VERSION 4.6-dev-20121004-5d6c49d VERSION 4.6-beta1 VERSION 4.6-beta2 and the latest 5.0 by git. the problems are the same.:-( On 12/17/2012 07:56 PM, Mark Abraham wrote: On Mon, Dec 17, 2012 at 6:01 PM, Albertmailmd2...@gmail.com wrote: hello: I reduced the GPU to two, and it said: Back Off! I just backed up nvt.log to ./#nvt.log.1# Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision) This is a development version from October 1. Please use the mdrun version you think you're using:-) Mark -- -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU running problem with GMX-4.6 beta2
On 12/17/12 2:03 PM, Albert wrote: well, that's one of the log files. I've tried both VERSION 4.6-dev-20121004-5d6c49d VERSION 4.6-beta1 VERSION 4.6-beta2 and the latest 5.0 by git. the problems are the same.:-( It seems to me that the system is simply crashing like any other that becomes unstable. Does the simulation run at all on plain CPU? -Justin On 12/17/2012 07:56 PM, Mark Abraham wrote: On Mon, Dec 17, 2012 at 6:01 PM, Albertmailmd2...@gmail.com wrote: hello: I reduced the GPU to two, and it said: Back Off! I just backed up nvt.log to ./#nvt.log.1# Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision) This is a development version from October 1. Please use the mdrun version you think you're using:-) Mark -- -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
Hi Thomas, It looks like some gcc 4.7-s don't work with CUDA, although I've been using various Ubuntu/Linaro versions, most recently 4.7.2 and had no issues whatsoever. Some people seem to have bumped into the same problem (see http://goo.gl/1onBz or http://goo.gl/JEnuk) and the suggested fix is to put #undef _GLIBCXX_ATOMIC_BUILTINS #undef _GLIBCXX_USE_INT128 in a header and pre-include it for nvcc by calling it like this: nvcc --pre-include undef_atomics_int128.h Cheers, -- Szilárd On Sun, Dec 9, 2012 at 12:18 PM, Thomas Evangelidis teva...@gmail.comwrote: gcc 4.7.2 is not supported by any CUDA version. I suggest that you just fix it by editing the include/host_config.h and changing the version check macro (line 82 AFAIK). I've never had real problems with using new and officially not supported gcc-s, the version check is more of a promise from NVIDIA that we've tested thoroughly internally and we more or less vouch for thins combination. Cheers, -- Szilárd PS: Disclamer: I don't take responsibility if your machine goes up in flames! ;) Hi Szilárd,, I tried to compile gromacs-4.6beta1, is this the version you suggested? If not, please indicate how to download the source cause I am confused with all these development versions. Anyway, this is the error I get with 4.6beta1, gcc 4.7.2 and cuda 5: [ 0%] Building NVCC (Device) object src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir//./cuda_tools_generated_cudautils.cu.o /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../include/c++/4.7.2/ext/atomicity.h(48): error: identifier __atomic_fetch_add is undefined /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../include/c++/4.7.2/ext/atomicity.h(52): error: identifier __atomic_fetch_add is undefined 2 errors detected in the compilation of /tmp/tmpxft_2394_-9_cudautils.compute_30.cpp1.ii. CMake Error at cuda_tools_generated_cudautils.cu.o.cmake:252 (message): Error generating file /home/thomas/Programs/gromacs-4.6-beta1_gnu_cuda5_build/src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir//./cuda_tools_generated_cudautils.cu.o gmake[3]: *** [src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir/./cuda_tools_generated_cudautils.cu.o] Error 1 gmake[2]: *** [src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir/all] Error 2 gmake[1]: *** [src/programs/mdrun/CMakeFiles/mdrun.dir/rule] Error 2 gmake: *** [mdrun] Error 2 Unless I am missing something, cuda 5 does not support gcc 4.7.2. Thomas -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
Am 11.12.2012 16:04, schrieb Szilárd Páll: It looks like some gcc 4.7-s don't work with CUDA, although I've been using various Ubuntu/Linaro versions, most recently 4.7.2 and had no issues whatsoever. Some people seem to have bumped into the same problem (see http://goo.gl/1onBz or http://goo.gl/JEnuk) and the suggested fix is to put #undef _GLIBCXX_ATOMIC_BUILTINS #undef _GLIBCXX_USE_INT128 in a header and pre-include it for nvcc by calling it like this: nvcc --pre-include undef_atomics_int128.h The same problem occurs in SuSE 12.2/x64 with it's default 4.7.2 (20120920). Another possible fix on SuSE 12.2: install the (older) gcc repository from 12.1/x64 (with lower priority), install the gcc/g++ 4.6 from there as an alternative compiler and select the active gcc through the update-alternatives --config gcc mechanism. This works very well. Regards M. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
On Tue, Dec 11, 2012 at 6:49 PM, Mirco Wahab mirco.wa...@chemie.tu-freiberg.de wrote: Am 11.12.2012 16:04, schrieb Szilárd Páll: It looks like some gcc 4.7-s don't work with CUDA, although I've been using various Ubuntu/Linaro versions, most recently 4.7.2 and had no issues whatsoever. Some people seem to have bumped into the same problem (see http://goo.gl/1onBz or http://goo.gl/JEnuk) and the suggested fix is to put #undef _GLIBCXX_ATOMIC_BUILTINS #undef _GLIBCXX_USE_INT128 in a header and pre-include it for nvcc by calling it like this: nvcc --pre-include undef_atomics_int128.h The same problem occurs in SuSE 12.2/x64 with it's default 4.7.2 (20120920). Another possible fix on SuSE 12.2: install the (older) gcc repository from 12.1/x64 (with lower priority), install the gcc/g++ 4.6 from there as an alternative compiler and select the active gcc through the update-alternatives --config gcc mechanism. This works very well. Thanks for the info. The Ubuntu/Linaro version must have a fix for this. Unfortunately, we can't do much about it and gcc 4.7 is anyway blocked by the CUDA 5.0 headers. FYI: Verlet scheme nonbonded kernels (and probably the group scheme as well), especially with AVX, can be quite a bit slower with older gcc versions. I find it really annoying (and stupid) that NVIDIA did not fix their compiler to work with gcc 4.7 which had already been out for almost a half a year at the time of the CUDA 5.0 release. -- Szilárd Regards M. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU compatibility
Correct, C1060 does not have the CUDA 2.0 compute capability required for GROMACS 4.6. We will not have the ability to support GPU cards of lower capability in the future. Unfortunately, your only GROMACS options are probably to use the OpenMM functionality in 4.5.x (which is still present in 4.6, works as far as we know, but is not in our regular test suite and the feature is probably headed for deprecation). This will not perform as well as the new native GPU acceleration, and supports a smaller range of features, but might be better than wasting the GPUs. Regards, Mark On Mon, Dec 10, 2012 at 7:50 AM, Cara Kreck cara_...@hotmail.com wrote: Hi, We've got a GPU cluster in our group and have really been looking forward to running gromacs on it with full functionality. Unfortunately, it looks like our NVIDIA Tesla C1060 cards aren't supported by the 4.6 beta. I was just wondering if there was any chance that they would be supported in the full version? These cards are only a couple of years old now and were bought specifically for running MD. Thanks, Cara -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
gcc 4.7.2 is not supported by any CUDA version. I suggest that you just fix it by editing the include/host_config.h and changing the version check macro (line 82 AFAIK). I've never had real problems with using new and officially not supported gcc-s, the version check is more of a promise from NVIDIA that we've tested thoroughly internally and we more or less vouch for thins combination. Cheers, -- Szilárd PS: Disclamer: I don't take responsibility if your machine goes up in flames! ;) Hi Szilárd,, I tried to compile gromacs-4.6beta1, is this the version you suggested? If not, please indicate how to download the source cause I am confused with all these development versions. Anyway, this is the error I get with 4.6beta1, gcc 4.7.2 and cuda 5: [ 0%] Building NVCC (Device) object src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir//./cuda_tools_generated_cudautils.cu.o /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../include/c++/4.7.2/ext/atomicity.h(48): error: identifier __atomic_fetch_add is undefined /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../include/c++/4.7.2/ext/atomicity.h(52): error: identifier __atomic_fetch_add is undefined 2 errors detected in the compilation of /tmp/tmpxft_2394_-9_cudautils.compute_30.cpp1.ii. CMake Error at cuda_tools_generated_cudautils.cu.o.cmake:252 (message): Error generating file /home/thomas/Programs/gromacs-4.6-beta1_gnu_cuda5_build/src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir//./cuda_tools_generated_cudautils.cu.o gmake[3]: *** [src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir/./cuda_tools_generated_cudautils.cu.o] Error 1 gmake[2]: *** [src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir/all] Error 2 gmake[1]: *** [src/programs/mdrun/CMakeFiles/mdrun.dir/rule] Error 2 gmake: *** [mdrun] Error 2 Unless I am missing something, cuda 5 does not support gcc 4.7.2. Thomas -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
On Sun, Nov 25, 2012 at 8:47 PM, Thomas Evangelidis teva...@gmail.comwrote: Hi Szilárd, I was able to run code compiled with icc 13 on Fedora 17, but as I don't have Intel Compiler v13 on this machine I can't check it now. Please check if it works for you with gcc 4.7.2 (which is the default) and let me know if you succeed. The performance difference between icc and gcc on your processor should be negligible with GPU runs and at most 5-10% with CPU-only runs. As the issue is quite annoying, I'll try to have a look later, probably after the beta is out. gcc 4.7.2 is not supported by any CUDA version. I suggest that you just fix it by editing the include/host_config.h and changing the version check macro (line 82 AFAIK). I've never had real problems with using new and officially not supported gcc-s, the version check is more of a promise from NVIDIA that we've tested thoroughly internally and we more or less vouch for thins combination. Cheers, -- Szilárd PS: Disclamer: I don't take responsibility if your machine goes up in flames! ;) Thomas -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
Hi Szilárd, I was able to run code compiled with icc 13 on Fedora 17, but as I don't have Intel Compiler v13 on this machine I can't check it now. Please check if it works for you with gcc 4.7.2 (which is the default) and let me know if you succeed. The performance difference between icc and gcc on your processor should be negligible with GPU runs and at most 5-10% with CPU-only runs. As the issue is quite annoying, I'll try to have a look later, probably after the beta is out. gcc 4.7.2 is not supported by any CUDA version. Thomas -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
On Mon, Nov 19, 2012 at 6:25 PM, Szilárd Páll szilard.p...@cbr.su.sewrote: On Mon, Nov 19, 2012 at 4:09 PM, Thomas Evangelidis teva...@gmail.comwrote: Hi Szilárd, I compiled with the Intel compilers, not gcc. In case I am missing something, these are the versions I have: Indeed, I see it now in the log file. Let me try with icc 13 and will get back to you. I was able to run code compiled with icc 13 on Fedora 17, but as I don't have Intel Compiler v13 on this machine I can't check it now. Please check if it works for you with gcc 4.7.2 (which is the default) and let me know if you succeed. The performance difference between icc and gcc on your processor should be negligible with GPU runs and at most 5-10% with CPU-only runs. As the issue is quite annoying, I'll try to have a look later, probably after the beta is out. Cheers, Sz. glibc.i6862.15-57.fc17 @updates glibc.x86_64 2.15-57.fc17 @updates glibc-common.x86_64 2.15-57.fc17 @updates glibc-devel.i686 2.15-57.fc17 @updates glibc-devel.x86_642.15-57.fc17 @updates glibc-headers.x86_64 2.15-57.fc17 @updates gcc.x86_644.7.2-2.fc17 @updates gcc-c++.x86_644.7.2-2.fc17 @updates gcc-gfortran.x86_64 4.7.2-2.fc17 @updates libgcc.i686 4.7.2-2.fc17 @updates libgcc.x86_64 4.7.2-2.fc17 @updates Thomas On 19 November 2012 16:57, Szilárd Páll szilard.p...@cbr.su.se wrote: Thomas Albert, We are unable to reproduce the issue on FC 17 with glibc 2.15-58 and gcc 4.7.2. Please try to update your packages (you should have updates available for glibc), try recompiling with the latest 4.6 code and report back whether you succeed. Cheers, -- Szilárd On Fri, Nov 16, 2012 at 4:31 PM, Szilárd Páll szilard.p...@cbr.su.se wrote: Hi Albert, Apologies for hijacking your thread. Do you happen to have Fedora 17 as well? -- Szilárd On Sun, Nov 4, 2012 at 10:55 AM, Albert mailmd2...@gmail.com wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---**messages--** - WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-users http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Search http://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- == Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list.
Re: [gmx-users] GPU warnings
Thomas Albert, We are unable to reproduce the issue on FC 17 with glibc 2.15-58 and gcc 4.7.2. Please try to update your packages (you should have updates available for glibc), try recompiling with the latest 4.6 code and report back whether you succeed. Cheers, -- Szilárd On Fri, Nov 16, 2012 at 4:31 PM, Szilárd Páll szilard.p...@cbr.su.sewrote: Hi Albert, Apologies for hijacking your thread. Do you happen to have Fedora 17 as well? -- Szilárd On Sun, Nov 4, 2012 at 10:55 AM, Albert mailmd2...@gmail.com wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---**messages--** - WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
Hi Szilárd, I compiled with the Intel compilers, not gcc. In case I am missing something, these are the versions I have: glibc.i6862.15-57.fc17 @updates glibc.x86_64 2.15-57.fc17 @updates glibc-common.x86_64 2.15-57.fc17 @updates glibc-devel.i686 2.15-57.fc17 @updates glibc-devel.x86_642.15-57.fc17 @updates glibc-headers.x86_64 2.15-57.fc17 @updates gcc.x86_644.7.2-2.fc17 @updates gcc-c++.x86_644.7.2-2.fc17 @updates gcc-gfortran.x86_64 4.7.2-2.fc17 @updates libgcc.i686 4.7.2-2.fc17 @updates libgcc.x86_64 4.7.2-2.fc17 @updates Thomas On 19 November 2012 16:57, Szilárd Páll szilard.p...@cbr.su.se wrote: Thomas Albert, We are unable to reproduce the issue on FC 17 with glibc 2.15-58 and gcc 4.7.2. Please try to update your packages (you should have updates available for glibc), try recompiling with the latest 4.6 code and report back whether you succeed. Cheers, -- Szilárd On Fri, Nov 16, 2012 at 4:31 PM, Szilárd Páll szilard.p...@cbr.su.se wrote: Hi Albert, Apologies for hijacking your thread. Do you happen to have Fedora 17 as well? -- Szilárd On Sun, Nov 4, 2012 at 10:55 AM, Albert mailmd2...@gmail.com wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---**messages--** - WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-users http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Search http://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- == Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
On Mon, Nov 19, 2012 at 4:09 PM, Thomas Evangelidis teva...@gmail.comwrote: Hi Szilárd, I compiled with the Intel compilers, not gcc. In case I am missing something, these are the versions I have: Indeed, I see it now in the log file. Let me try with icc 13 and will get back to you. glibc.i6862.15-57.fc17 @updates glibc.x86_64 2.15-57.fc17 @updates glibc-common.x86_64 2.15-57.fc17 @updates glibc-devel.i686 2.15-57.fc17 @updates glibc-devel.x86_642.15-57.fc17 @updates glibc-headers.x86_64 2.15-57.fc17 @updates gcc.x86_644.7.2-2.fc17 @updates gcc-c++.x86_644.7.2-2.fc17 @updates gcc-gfortran.x86_64 4.7.2-2.fc17 @updates libgcc.i686 4.7.2-2.fc17 @updates libgcc.x86_64 4.7.2-2.fc17 @updates Thomas On 19 November 2012 16:57, Szilárd Páll szilard.p...@cbr.su.se wrote: Thomas Albert, We are unable to reproduce the issue on FC 17 with glibc 2.15-58 and gcc 4.7.2. Please try to update your packages (you should have updates available for glibc), try recompiling with the latest 4.6 code and report back whether you succeed. Cheers, -- Szilárd On Fri, Nov 16, 2012 at 4:31 PM, Szilárd Páll szilard.p...@cbr.su.se wrote: Hi Albert, Apologies for hijacking your thread. Do you happen to have Fedora 17 as well? -- Szilárd On Sun, Nov 4, 2012 at 10:55 AM, Albert mailmd2...@gmail.com wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---**messages--** - WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-users http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Search http://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- == Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
Hi Thomas, The output you get means that you don't have any of the macros we try to use although your man pages seem to be referring to them. Hence, I'm really clueless why is this happening. Could you please file a bug report on redmine.gromacs.org and add both the initial output as well as my patch and the resulting output. Don't forget to specify version of software you were using. Thanks, -- Szilárd On Thu, Nov 15, 2012 at 3:53 PM, Thomas Evangelidis teva...@gmail.comwrote: Hi Szilárd, This is the warning message I get this time: WARNING: Oversubscribing the available -66 logical CPU cores with 1 thread-MPI threads. This will cause considerable performance loss! I have also attached the md.log file. thanks, Thomas On 14 November 2012 19:48, Szilárd Páll szilard.p...@cbr.su.se wrote: Hi Thomas, Could you please try applying the attached patch (git apply hardware_detect.patch in the 4.6 source root) and let me know what the output is? This should show which sysconf macro is used and what its return value is as well as indicate if none of the macros are in fact defined by your headers. Thanks, -- Szilárd On Sat, Nov 10, 2012 at 5:24 PM, Thomas Evangelidis teva...@gmail.comwrote: On 10 November 2012 03:21, Szilárd Páll szilard.p...@cbr.su.se wrote: Hi, You must have an odd sysconf version! Could you please check what is the sysconf system variable's name in the sysconf man page (man sysconf) where it says something like: _SC_NPROCESSORS_ONLN The number of processors currently online. The first line should be one of the following: _SC_NPROCESSORS_ONLN, _SC_NPROC_ONLN, _SC_NPROCESSORS_CONF, _SC_NPROC_CONF, but I guess yours is something different. The following text is taken from man sysconf: These values also exist, but may not be standard. - _SC_PHYS_PAGES The number of pages of physical memory. Note that it is possible for the product of this value and the value of _SC_PAGE_SIZE to overflow. - _SC_AVPHYS_PAGES The number of currently available pages of physical memory. - _SC_NPROCESSORS_CONF The number of processors configured. - _SC_NPROCESSORS_ONLN The number of processors currently online (available). Can you also check what your glibc version is? $ yum list installed | grep glibc glibc.i6862.15-57.fc17 @updates glibc.x86_64 2.15-57.fc17 @updates glibc-common.x86_64 2.15-57.fc17 @updates glibc-devel.i686 2.15-57.fc17 @updates glibc-devel.x86_642.15-57.fc17 @updates glibc-headers.x86_64 2.15-57.fc17 @updates On Fri, Nov 9, 2012 at 5:51 PM, Thomas Evangelidis teva...@gmail.comwrote: I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench benchmark with the following command line: mdrun_intel_cuda5 -v -s topol.tpr -testverlet WARNING: Oversubscribing the available 0 logical CPU cores with 1 thread-MPI threads. 0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM That is bizzarre. Could you run with -debug 1 and have a look at the mdrun.debug output which should contain a message like: Detected N processors, will use this as the number of supported hardware threads. I'm wondering, is N=0 in your case!? It says Detected 0 processors, will use this as the number of supported hardware threads. (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4 ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5 on 4 cores (8 threads) without the GPU. Yet, I don't see any performance gain with more that 4 -nt threads. mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day I guess there is not much point in not using all cores, is it? Note that the performance drops after 4 threads because Hyper-Threading with OpenMP doesn't always help. I have also attached my log file (from mdrun_intel_cuda5 -v -s topol.tpr -testverlet) in case you find it helpful. I don't see it attached. I have attached both mdrun_intel_cuda5.debug and md.log files. They will possibly be filtered by the mailing list but will be delivered to your email. thanksm Thomas -- == Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tev...@pharm.uoa.gr teva...@gmail.com
Re: [gmx-users] GPU warnings
Hi Albert, Apologies for hijacking your thread. Do you happen to have Fedora 17 as well? -- Szilárd On Sun, Nov 4, 2012 at 10:55 AM, Albert mailmd2...@gmail.com wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---**messages--**- WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
On 11/15/12 9:53 AM, Thomas Evangelidis wrote: Hi Szilárd, This is the warning message I get this time: WARNING: Oversubscribing the available -66 logical CPU cores with 1 thread-MPI threads. This will cause considerable performance loss! I have also attached the md.log file. Attachments are rejected by the mailing list. They either have to be copied and pasted, linked, or sent to an individual specifically off-list. -Justin -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
Hi Thomas, Could you please try applying the attached patch (git apply hardware_detect.patch in the 4.6 source root) and let me know what the output is? This should show which sysconf macro is used and what its return value is as well as indicate if none of the macros are in fact defined by your headers. Thanks, -- Szilárd On Sat, Nov 10, 2012 at 5:24 PM, Thomas Evangelidis teva...@gmail.comwrote: On 10 November 2012 03:21, Szilárd Páll szilard.p...@cbr.su.se wrote: Hi, You must have an odd sysconf version! Could you please check what is the sysconf system variable's name in the sysconf man page (man sysconf) where it says something like: _SC_NPROCESSORS_ONLN The number of processors currently online. The first line should be one of the following: _SC_NPROCESSORS_ONLN, _SC_NPROC_ONLN, _SC_NPROCESSORS_CONF, _SC_NPROC_CONF, but I guess yours is something different. The following text is taken from man sysconf: These values also exist, but may not be standard. - _SC_PHYS_PAGES The number of pages of physical memory. Note that it is possible for the product of this value and the value of _SC_PAGE_SIZE to overflow. - _SC_AVPHYS_PAGES The number of currently available pages of physical memory. - _SC_NPROCESSORS_CONF The number of processors configured. - _SC_NPROCESSORS_ONLN The number of processors currently online (available). Can you also check what your glibc version is? $ yum list installed | grep glibc glibc.i6862.15-57.fc17 @updates glibc.x86_64 2.15-57.fc17 @updates glibc-common.x86_64 2.15-57.fc17 @updates glibc-devel.i686 2.15-57.fc17 @updates glibc-devel.x86_642.15-57.fc17 @updates glibc-headers.x86_64 2.15-57.fc17 @updates On Fri, Nov 9, 2012 at 5:51 PM, Thomas Evangelidis teva...@gmail.comwrote: I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench benchmark with the following command line: mdrun_intel_cuda5 -v -s topol.tpr -testverlet WARNING: Oversubscribing the available 0 logical CPU cores with 1 thread-MPI threads. 0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM That is bizzarre. Could you run with -debug 1 and have a look at the mdrun.debug output which should contain a message like: Detected N processors, will use this as the number of supported hardware threads. I'm wondering, is N=0 in your case!? It says Detected 0 processors, will use this as the number of supported hardware threads. (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4 ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5 on 4 cores (8 threads) without the GPU. Yet, I don't see any performance gain with more that 4 -nt threads. mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day I guess there is not much point in not using all cores, is it? Note that the performance drops after 4 threads because Hyper-Threading with OpenMP doesn't always help. I have also attached my log file (from mdrun_intel_cuda5 -v -s topol.tpr -testverlet) in case you find it helpful. I don't see it attached. I have attached both mdrun_intel_cuda5.debug and md.log files. They will possibly be filtered by the mailing list but will be delivered to your email. thanksm Thomas -- == Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
On 10 November 2012 03:21, Szilárd Páll szilard.p...@cbr.su.se wrote: Hi, You must have an odd sysconf version! Could you please check what is the sysconf system variable's name in the sysconf man page (man sysconf) where it says something like: _SC_NPROCESSORS_ONLN The number of processors currently online. The first line should be one of the following: _SC_NPROCESSORS_ONLN, _SC_NPROC_ONLN, _SC_NPROCESSORS_CONF, _SC_NPROC_CONF, but I guess yours is something different. The following text is taken from man sysconf: These values also exist, but may not be standard. - _SC_PHYS_PAGES The number of pages of physical memory. Note that it is possible for the product of this value and the value of _SC_PAGE_SIZE to overflow. - _SC_AVPHYS_PAGES The number of currently available pages of physical memory. - _SC_NPROCESSORS_CONF The number of processors configured. - _SC_NPROCESSORS_ONLN The number of processors currently online (available). Can you also check what your glibc version is? $ yum list installed | grep glibc glibc.i6862.15-57.fc17 @updates glibc.x86_64 2.15-57.fc17 @updates glibc-common.x86_64 2.15-57.fc17 @updates glibc-devel.i686 2.15-57.fc17 @updates glibc-devel.x86_642.15-57.fc17 @updates glibc-headers.x86_64 2.15-57.fc17 @updates On Fri, Nov 9, 2012 at 5:51 PM, Thomas Evangelidis teva...@gmail.comwrote: I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench benchmark with the following command line: mdrun_intel_cuda5 -v -s topol.tpr -testverlet WARNING: Oversubscribing the available 0 logical CPU cores with 1 thread-MPI threads. 0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM That is bizzarre. Could you run with -debug 1 and have a look at the mdrun.debug output which should contain a message like: Detected N processors, will use this as the number of supported hardware threads. I'm wondering, is N=0 in your case!? It says Detected 0 processors, will use this as the number of supported hardware threads. (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4 ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5 on 4 cores (8 threads) without the GPU. Yet, I don't see any performance gain with more that 4 -nt threads. mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day I guess there is not much point in not using all cores, is it? Note that the performance drops after 4 threads because Hyper-Threading with OpenMP doesn't always help. I have also attached my log file (from mdrun_intel_cuda5 -v -s topol.tpr -testverlet) in case you find it helpful. I don't see it attached. I have attached both mdrun_intel_cuda5.debug and md.log files. They will possibly be filtered by the mailing list but will be delivered to your email. thanksm Thomas -- == Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
Hi, On Tue, Nov 6, 2012 at 12:03 AM, Thomas Evangelidis teva...@gmail.comwrote: Hi, I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench benchmark with the following command line: mdrun_intel_cuda5 -v -s topol.tpr -testverlet WARNING: Oversubscribing the available 0 logical CPU cores with 1 thread-MPI threads. 0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM That is bizzarre. Could you run with -debug 1 and have a look at the mdrun.debug output which should contain a message like: Detected N processors, will use this as the number of supported hardware threads. I'm wondering, is N=0 in your case!? (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4 ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5 on 4 cores (8 threads) without the GPU. Yet, I don't see any performance gain with more that 4 -nt threads. mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day I guess there is not much point in not using all cores, is it? Note that the performance drops after 4 threads because Hyper-Threading with OpenMP doesn't always help. I have also attached my log file (from mdrun_intel_cuda5 -v -s topol.tpr -testverlet) in case you find it helpful. I don't see it attached. -- Szilárd Thanks, Thomas On 5 November 2012 18:54, Szilárd Páll szilard.p...@cbr.su.se wrote: The first warning indicates that you are starting more threads than the hardware supports which would explain the poor performance. Could share a log file of the suspiciously slow run as well as the command line you used to start mdrun? Cheers, -- Szilárd On Sun, Nov 4, 2012 at 5:32 PM, Albert mailmd2...@gmail.com wrote: well, IC. the performance is rather poor than GTX590. 32ns/day vs 4 ns/day probably that's also something related to the warnings? THX On 11/04/2012 01:59 PM, Justin Lemkul wrote: On 11/4/12 4:55 AM, Albert wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---**messages--** - WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. I can't address the first warning, but the second is fairly obvious. You're not using an official release, you're using the development version - let the user beware. The code is not yet production-ready. -Justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-users http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Search http://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- == Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tev...@pharm.uoa.gr teva...@gmail.com website:
Re: [gmx-users] GPU warnings
Hi, You must have an odd sysconf version! Could you please check what is the sysconf system variable's name in the sysconf man page (man sysconf) where it says something like: _SC_NPROCESSORS_ONLN The number of processors currently online. The first line should be one of the following: _SC_NPROCESSORS_ONLN, _SC_NPROC_ONLN, _SC_NPROCESSORS_CONF, _SC_NPROC_CONF, but I guess yours is something different. Can you also check what your glibc version is? Thanks, -- Szilárd On Fri, Nov 9, 2012 at 5:51 PM, Thomas Evangelidis teva...@gmail.comwrote: I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench benchmark with the following command line: mdrun_intel_cuda5 -v -s topol.tpr -testverlet WARNING: Oversubscribing the available 0 logical CPU cores with 1 thread-MPI threads. 0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM That is bizzarre. Could you run with -debug 1 and have a look at the mdrun.debug output which should contain a message like: Detected N processors, will use this as the number of supported hardware threads. I'm wondering, is N=0 in your case!? It says Detected 0 processors, will use this as the number of supported hardware threads. (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4 ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5 on 4 cores (8 threads) without the GPU. Yet, I don't see any performance gain with more that 4 -nt threads. mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day I guess there is not much point in not using all cores, is it? Note that the performance drops after 4 threads because Hyper-Threading with OpenMP doesn't always help. I have also attached my log file (from mdrun_intel_cuda5 -v -s topol.tpr -testverlet) in case you find it helpful. I don't see it attached. I have attached both mdrun_intel_cuda5.debug and md.log files. They will possibly be filtered by the mailing list but will be delivered to your email. thanksm Thomas -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
The first warning indicates that you are starting more threads than the hardware supports which would explain the poor performance. Could share a log file of the suspiciously slow run as well as the command line you used to start mdrun? Cheers, -- Szilárd On Sun, Nov 4, 2012 at 5:32 PM, Albert mailmd2...@gmail.com wrote: well, IC. the performance is rather poor than GTX590. 32ns/day vs 4 ns/day probably that's also something related to the warnings? THX On 11/04/2012 01:59 PM, Justin Lemkul wrote: On 11/4/12 4:55 AM, Albert wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---**messages--** - WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. I can't address the first warning, but the second is fairly obvious. You're not using an official release, you're using the development version - let the user beware. The code is not yet production-ready. -Justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
Hi, I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench benchmark with the following command line: mdrun_intel_cuda5 -v -s topol.tpr -testverlet WARNING: Oversubscribing the available 0 logical CPU cores with 1 thread-MPI threads. 0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4 ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5 on 4 cores (8 threads) without the GPU. Yet, I don't see any performance gain with more that 4 -nt threads. mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day I have also attached my log file (from mdrun_intel_cuda5 -v -s topol.tpr -testverlet) in case you find it helpful. Thanks, Thomas On 5 November 2012 18:54, Szilárd Páll szilard.p...@cbr.su.se wrote: The first warning indicates that you are starting more threads than the hardware supports which would explain the poor performance. Could share a log file of the suspiciously slow run as well as the command line you used to start mdrun? Cheers, -- Szilárd On Sun, Nov 4, 2012 at 5:32 PM, Albert mailmd2...@gmail.com wrote: well, IC. the performance is rather poor than GTX590. 32ns/day vs 4 ns/day probably that's also something related to the warnings? THX On 11/04/2012 01:59 PM, Justin Lemkul wrote: On 11/4/12 4:55 AM, Albert wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---**messages--** - WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. I can't address the first warning, but the second is fairly obvious. You're not using an official release, you're using the development version - let the user beware. The code is not yet production-ready. -Justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-users http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Search http://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- == Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
On 11/4/12 4:55 AM, Albert wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---messages--- WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. I can't address the first warning, but the second is fairly obvious. You're not using an official release, you're using the development version - let the user beware. The code is not yet production-ready. -Justin -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
I 'm also get the first warning (oversubscribing the available...) and see no obvious performance gain. Do you know how to avoid that? thanks, Thomas On 4 November 2012 14:59, Justin Lemkul jalem...@vt.edu wrote: On 11/4/12 4:55 AM, Albert wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---**messages--** - WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. I can't address the first warning, but the second is fairly obvious. You're not using an official release, you're using the development version - let the user beware. The code is not yet production-ready. -Justin -- ==**== Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.**vt.edu/Pages/Personal/justinhttp://www.bevanlab.biochem.vt.edu/Pages/Personal/justin ==**== -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- == Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU warnings
well, IC. the performance is rather poor than GTX590. 32ns/day vs 4 ns/day probably that's also something related to the warnings? THX On 11/04/2012 01:59 PM, Justin Lemkul wrote: On 11/4/12 4:55 AM, Albert wrote: hello: I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344 CUDA cores), and I got the following warnings: thank you very much. ---messages--- WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node with 2 MPI processes. This will cause considerable performance loss! 2 GPUs detected on host boreas: #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC: no, stat: compatible 2 GPUs auto-selected to be used for this run: #0, #1 Using CUDA 8x8x8 non-bonded kernels Making 1D domain decomposition 1 x 2 x 1 * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * We have just committed the new CPU detection code in this branch, and will commit new SSE/AVX kernels in a few days. However, this means that currently only the NxN kernels are accelerated! In the mean time, you might want to avoid production runs in 4.6. I can't address the first warning, but the second is fairly obvious. You're not using an official release, you're using the development version - let the user beware. The code is not yet production-ready. -Justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU-C2075-simulation-solw or GPU only running -reg
On 10/21/12 3:38 PM, venkatesh s wrote: Respected Gromacs people's, my query is my system very slow? how can i improve the speed, its running like or equal to (25 minutes) Intel Core I 7 processors only. Here i am given my entire system information,and i found my system 8 core not taking job (GPU only running). mdrun-gpu -device OpenMM:platform=Cuda,memtest=15,deviceid=0,force-device=yes -v -deffnm nvt Non-supported GPU selected (#0, Tesla C2075), forced continuing.Note, that the simulation can be slow or it migth even crash. Pre-simulation ~15s memtest in progress... Memory test completed without errors. Back Off! I just backed up nvt.log to ./#nvt.log.1# Getting Loaded... Reading file nvt.tpr, VERSION 4.5.5 (single precision) Loaded with Money Back Off! I just backed up nvt.trr to ./#nvt.trr.1# Back Off! I just backed up nvt.edr to ./#nvt.edr.1# WARNING: OpenMM supports only Andersen thermostat with the md/md-vv/md-vv-avek integrators. WARNING: OpenMM provides contraints as a combination of SHAKE, SETTLE and CCMA. Accuracy is based on the SHAKE tolerance set by the shake_tol option. WARNING: Non-supported GPU selected (#0, Tesla C2075), forced continuing.Note, that the simulation can be slow or it migth even crash. Pre-simulation ~15s memtest in progress...done, no errors detected starting mdrun 'Protein in water' 5 steps,100.0 ps. OpenMM run - timing based on wallclock. NODE (s) Real (s) (%) Time: 1319.043 1319.043100.0 21:59 (Mnbf/s) (MFlops) (ns/day) (hour/ns) Performance: 0.000 0.006 6.550 3.664 NVIDIA-SMI -l +--+ | NVIDIA-SMI 3.295.59 Driver Version: 295.59 | |---+--+--+ | Nb. Name | Bus IdDisp. | Volatile ECC SB / DB | | Fan Temp Power Usage /Cap | Memory Usage | GPU Util. Compute M. | |===+==+==| | 0. Tesla C2075 | :01:00.0 On | 0 0 | | 30% 75 C P0 150W / 225W | 8% 435MB / 5375MB | 95% Default| |---+--+--| | Compute processes: GPU Memory | | GPU PID Process name Usage | |=| | 0. 5889 mdrun-gpu 372MB | +-+ system: top top - 22:48:22 up 13 min, 4 users, load average: 0.19, 0.18, 0.09 Tasks: 308 total, 2 running, 304 sleeping, 2 stopped, 0 zombie Cpu0 : 16.4%us, 1.7%sy, 0.0%ni, 81.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 5.4%us, 0.7%sy, 0.0%ni, 94.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 9.3%us, 0.7%sy, 0.0%ni, 90.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.7%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 13.0%us, 0.7%sy, 0.0%ni, 86.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12188656k total, 1191628k used, 10997028k free,34804k buffers Swap:0k total,0k used,0k free, 418428k cached system? protein +sol + NA total atom(nvt.gro) 158 residues 10742234646 npt.mdp file ; Run parameters integrator= md-vv; nsteps= 5; 2 * 5 = 100 ps dt= 0.002; 2 fs ; Output control nstxout= 100; save coordinates every 0.2 ps nstvout= 100; save velocities every 0.2 ps nstenergy= 100; save energies every 0.2 ps nstlog= 100; update log file every 0.2 ps ; Bond parameters continuation= yes; Restarting after NVT constraint_algorithm = lincs; holonomic constraints constraints= all-bonds; all bonds (even heavy atom-H bonds) constrained lincs_iter= 1; accuracy of LINCS lincs_order= 4; also related to accuracy ; Neighborsearching ns_type= grid; search neighboring grid cells nstlist= 5; 10 fs rlist= 1.0; short-range neighborlist cutoff (in nm) rcoulomb= 1.0; short-range electrostatic cutoff (in nm) rvdw= 1.0; short-range van der Waals cutoff (in nm) ; Electrostatics coulombtype= PME; Particle Mesh Ewald for long-range electrostatics pme_order= 4; cubic interpolation fourierspacing= 0.16; grid spacing for FFT ; Temperature coupling is on
Re: [gmx-users] GPU-C2075-simulation-solw -reg
On 10/20/12 1:34 PM, venkatesh s wrote: Respected Gromacs Users i started the energy simulation but its slow (showing following ) Getting Loaded... Reading file em.tpr, VERSION 4.5.5 (single precision) Loaded with Money WARNING: Non-supported GPU selected (#0, Tesla C2075), forced continuing.Note, that the simulation can be slow or it might even crash. Pre-simulation ~15s memtest in progress...done, no errors detected starting mdrun 'Protein in water' 5 steps, 50.0 ps. for increase the speed of gpu what i want to do ? kindly provide the promote solution No one can suggest a solution without a better statement of the problem. What is your system? How many atoms does it have? How fast is it running? What is your .mdp file? How do the benchmark systems perform? -Justin -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU -simulation error -reg
On 10/14/12 8:01 AM, venkatesh s wrote: Respected Gromacs People's, system Containing protein+peptide ( Normally i use the lysosome tutorial md.mdp (only i change the nanosecond) ) mdrun-gpu -v -deffnm md_0_1 while running this i got fatal error like this (Following) -- Getting Loaded... Reading file md_0_1.tpr, VERSION 4.5.5 (single precision) Loaded with Money WARNING: OpenMM does not support leap-frog, will use velocity-verlet integrator. WARNING: OpenMM supports only Andersen thermostat with the md/md-vv/md-vv-avek integrators. --- Program mdrun-gpu, VERSION 4.5.5 Source code file: /opt/softwares/compile/gromacs-4.5.5/src/kernel/openmm_wrapper.cpp, line: 580 Fatal error: OpenMM does not support multiple temperature coupling groups. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- Kindly provide prompt answer The error message is fairly self-explanatory. You are using multiple temperature coupling groups (tc-grps in the .mdp file). You can't do that when running on GPU. Set tc-grps = System. -Justin -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU
On Wed, Jun 13, 2012 at 3:59 AM, Mark Abraham mark.abra...@anu.edu.auwrote: On 12/06/2012 10:49 PM, Ehud Schreiber wrote: Message: 4 Date: Mon, 11 Jun 2012 15:54:39 +1000 From: Mark Abrahammark.abra...@anu.edu.**au mark.abra...@anu.edu.au Subject: Re: [gmx-users] GPU To: Discussion list for GROMACS usersgmx-users@gromacs.org Message-ID:4FD5881F.3040509@**anu.edu.au 4fd5881f.3040...@anu.edu.au Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 11/06/2012 2:32 AM, ifat shub wrote: Hi, If I understand correctly, currently the Gromacs GPU acceleration does not support energy minimization. Is this so? Are there any plans to include it in the 4.6 version or in a later one (i.e. to allow, say, integrator = steep or cg in mdrun-gpu)? I would find such options extremely useful. EM is normally so quick that it's not worth putting much effort into accelerating it, compared to the CPU-months that are spent doing subsequent MD. Mark Currently, my main use of Gromacs entails running multiple minimizations on an ensemble of states. Moreover, these states are not obtained using molecular dynamics but rather using the Concoord algorithm. Therefore, for me the bottleneck is not md but rather minimizations (specifically, cg) and so their acceleration on GPUs would be very advantageous. If such usage is not totally idiosyncratic, I hope the development team would reconsider GPU accelerating also minimizations. I suspect this would not be technically too complex given the work already done on dynamics. I suspect the upcoming 4.6 release will have GPU-accelerated EM available as a side effect of the new Verlet pair-list scheme for computing non-bonded interactions. This development is unrelated to previous GPU efforts, I It does work and has been tested extensively. We are working on the final details, but you can get the code from the nbnxn_hybrid_acc branch -- it's pretty safe to use it for non-production purposes! The pages Mark linked are the resources you want to start with before you start using the NxN kernels. Cheers, -- Szilárd understand. See http://www.gromacs.org/**Documentation/Acceleration_** and_parallelizationhttp://www.gromacs.org/Documentation/Acceleration_and_parallelizationand http://www.gromacs.org/**Documentation/Cut-off_schemeshttp://www.gromacs.org/Documentation/Cut-off_schemesfor some advance details. When you hear a call for alpha testers in the next few months, you might want to spend some time on that so that you're sure GROMACS will best meet your future needs. :-) Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/**mailman/listinfo/gmx-usershttp://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/** Support/Mailing_Lists/Searchhttp://www.gromacs.org/Support/Mailing_Lists/Searchbefore posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/**Support/Mailing_Listshttp://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU
Message: 4 Date: Mon, 11 Jun 2012 15:54:39 +1000 From: Mark Abraham mark.abra...@anu.edu.au Subject: Re: [gmx-users] GPU To: Discussion list for GROMACS users gmx-users@gromacs.org Message-ID: 4fd5881f.3040...@anu.edu.au Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 11/06/2012 2:32 AM, ifat shub wrote: Hi, If I understand correctly, currently the Gromacs GPU acceleration does not support energy minimization. Is this so? Are there any plans to include it in the 4.6 version or in a later one (i.e. to allow, say, integrator = steep or cg in mdrun-gpu)? I would find such options extremely useful. EM is normally so quick that it's not worth putting much effort into accelerating it, compared to the CPU-months that are spent doing subsequent MD. Mark Currently, my main use of Gromacs entails running multiple minimizations on an ensemble of states. Moreover, these states are not obtained using molecular dynamics but rather using the Concoord algorithm. Therefore, for me the bottleneck is not md but rather minimizations (specifically, cg) and so their acceleration on GPUs would be very advantageous. If such usage is not totally idiosyncratic, I hope the development team would reconsider GPU accelerating also minimizations. I suspect this would not be technically too complex given the work already done on dynamics. Thanks, Ehud Schreiber. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU
On 12/06/2012 10:49 PM, Ehud Schreiber wrote: Message: 4 Date: Mon, 11 Jun 2012 15:54:39 +1000 From: Mark Abrahammark.abra...@anu.edu.au Subject: Re: [gmx-users] GPU To: Discussion list for GROMACS usersgmx-users@gromacs.org Message-ID:4fd5881f.3040...@anu.edu.au Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 11/06/2012 2:32 AM, ifat shub wrote: Hi, If I understand correctly, currently the Gromacs GPU acceleration does not support energy minimization. Is this so? Are there any plans to include it in the 4.6 version or in a later one (i.e. to allow, say, integrator = steep or cg in mdrun-gpu)? I would find such options extremely useful. EM is normally so quick that it's not worth putting much effort into accelerating it, compared to the CPU-months that are spent doing subsequent MD. Mark Currently, my main use of Gromacs entails running multiple minimizations on an ensemble of states. Moreover, these states are not obtained using molecular dynamics but rather using the Concoord algorithm. Therefore, for me the bottleneck is not md but rather minimizations (specifically, cg) and so their acceleration on GPUs would be very advantageous. If such usage is not totally idiosyncratic, I hope the development team would reconsider GPU accelerating also minimizations. I suspect this would not be technically too complex given the work already done on dynamics. I suspect the upcoming 4.6 release will have GPU-accelerated EM available as a side effect of the new Verlet pair-list scheme for computing non-bonded interactions. This development is unrelated to previous GPU efforts, I understand. See http://www.gromacs.org/Documentation/Acceleration_and_parallelization and http://www.gromacs.org/Documentation/Cut-off_schemes for some advance details. When you hear a call for alpha testers in the next few months, you might want to spend some time on that so that you're sure GROMACS will best meet your future needs. :-) Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU
On 11/06/2012 2:32 AM, ifat shub wrote: Hi, If I understand correctly, currently the Gromacs GPU acceleration does not support energy minimization. Is this so? Are there any plans to include it in the 4.6 version or in a later one (i.e. to allow, say, integrator = steep or cg in mdrun-gpu)? I would find such options extremely useful. EM is normally so quick that it's not worth putting much effort into accelerating it, compared to the CPU-months that are spent doing subsequent MD. Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU crashes
Did you play with the time step? Just currious, but I woundered what happened with 0.0008, 0.0005, 0.0002. I found if I had a good behaving protein, as soon as I added a small (non-protein) molecule which rotated wildly while attached to the protein, it would crash unless I reduced the time step to the above when constraints were removed after EQ ... always it seemed to me it didnt like the rotation or bond angles, seeing them as a violation but acted like it was an amino acid? (the same bond type but with wider rotation as one end wasnt fixed to a chain) If your loop moves via backbone, the calculated angles, bonds or whatever might appear to the computer to be violating the parameter settings for problems, errors, etc as it cant track them fast enough over the time step. Ie atom 1-2-3 and then delta 1-2-3 with xyz parameters, but then the particular set has additional rotation, etc and may include the chain atoms which bend wildly (n-Ca-Cb-Cg maybe a dihedral) but probab ly not this. Just a thought but probably not the right answere as well, it might be the way it is broken down (above) over GPUs, which convert everything to matricies (non-standard just for basic math operations not real matricies per say) for exicution and then some library problem which would not account for long range rapid (0.0005) movements at the chain (Ca,N,O to something else) and then tries to apply these to Cb-Cg-O-H, etc using the initial points while looking at the parameters for say a single amino acid...Maybe the constraints would cause this, which would make it a pain to EQ, but this allowed me to increase the time step, but would ruin the experiment I had worked on as I needed it unconstrained to show it didnt float away when proteins were pulled, etc...I was using a different integrator though...just normal MD. ANd your cutoffs for vdw, etc...Why are they 0? I dont know if this means a defautl set is then used...but if not ? Wouldnt they try integrating using both types of formula, or would it be just using coulumb or vice versa? (dont know what that would do to the code but assume it means no vdw, and all coulumb but then zeros are alwyas a problem for computers). Thats my thoughts on that. Probably something else though. Good luck, Stephan Original-Nachricht Datum: Wed, 06 Jun 2012 18:42:45 -0400 Von: Justin A. Lemkul jalem...@vt.edu An: Discussion list for GROMACS users gmx-users@gromacs.org Betreff: [gmx-users] GPU crashes Hi All, I'm wondering if anyone has experienced what I'm seeing with Gromacs 4.5.5 on GPU. It seems that certain systems fail inexplicably. The system I am working with is a heterodimeric protein complex bound to DNA. After about 1 ns of simulation time using mdrun-gpu, all the energies become NaN. The simulations don't stop, they just carry on merrily producing nonsense. I would love to see some action regarding http://redmine.gromacs.org/issues/941 for this reason ;) I ran simulations of each of the components of the system individually - each protein alone, and DNA - to try to track down what might be causing this problem. The DNA simulation is perfectly stable out to 10 ns, but each protein fails within 2 ns. Each protein has two domains with a flexible linker, and it seems that as soon as the linker flexes a bit, the simulations go poof. Well-behaved proteins like lysozyme and DHFR (from the benchmark set) seem fine, but anything that twitches even a small amount fails. This is very unfortunate for us, as we are hoping to see domain motions on a feasible time scale using implicit solvent on GPU hardware. Has anyone seen anything like this? Our Gromacs implementation is being run on an x86_64 Linux system with Tesla S2050 GPU cards. The CUDA version is 3.1 and Gromacs is linked against OpenMM-2.0. An .mdp file is appended below. I have also tested finite values for cutoffs, but the results were worse (failures occurred more quickly). I have not been able to use the latest git version of Gromacs to test whether anything has been fixed, but will post separately to gmx-developers regarding the reasons for that soon. -Justin === md.mdp === title = Implicit solvent test ; Run parameters integrator = sd dt = 0.002 nsteps = 500 ; 1 ps (10 ns) nstcomm = 1 comm_mode = angular ; non-periodic system ; Output parameters nstxout = 0 nstvout = 0 nstfout = 0 nstxtcout = 1000 ; every 2 ps nstlog = 5000 ; every 10 ps nstenergy = 1000 ; every 2 ps ; Bond parameters constraint_algorithm= lincs constraints = all-bonds continuation= no; starting up ; required cutoffs for implicit nstlist = 0 ns_type = grid rlist = 0 rcoulomb= 0 rvdw= 0
Re: [gmx-users] GPU crashes
On 6/7/12 3:57 AM, lloyd riggs wrote: Did you play with the time step? Just currious, but I woundered what happened with 0.0008, 0.0005, 0.0002. I found if I had a good behaving protein, as soon as I added a small (non-protein) molecule which rotated wildly while attached to the protein, it would crash unless I reduced the time step to the above when constraints were removed after EQ ... always it seemed to me it didnt like the rotation or bond angles, seeing them as a violation but acted like it was an amino acid? (the same bond type but with wider rotation as one end wasnt fixed to a chain) If your loop moves via backbone, the calculated angles, bonds or whatever might appear to the computer to be violating the parameter settings for problems, errors, etc as it cant track them fast enough over the time step. Ie atom 1-2-3 and then delta 1-2-3 with xyz parameters, but then the particular set has additional rotation, etc and may include the chain atoms which bend wildly (n-Ca-Cb-Cg maybe a dihedral) but proba! bly not this. Just a thought but probably not the right answere as well, it might be the way it is broken down (above) over GPUs, which convert everything to matricies (non-standard just for basic math operations not real matricies per say) for exicution and then some library problem which would not account for long range rapid (0.0005) movements at the chain (Ca,N,O to something else) and then tries to apply these to Cb-Cg-O-H, etc using the initial points while looking at the parameters for say a single amino acid...Maybe the constraints would cause this, which would make it a pain to EQ, but this allowed me to increase the time step, but would ruin the experiment I had worked on as I needed it unconstrained to show it didnt float away when proteins were pulled, etc...I was using a different integrator though...just normal MD. I have long wondered if constraints were properly handled by the OpenMM library. I am constraining all bonds, so in principle, dt of 0.002 should not be a problem. The note printed indicates that the constraint algorithm is changed from the one selected (LINCS) to whatever OpenMM uses (SHAKE and a few others in combination). Perhaps I can try running without constraints and a reduced dt, but I'd like to avoid it. I wish I could efficiently test to see if this behavior was GPU-specific, but unfortunately the non-GPU implementation of the implicit code can currently only be run in serial or on 2 CPU due to an existing bug. I can certainly test it, but due to the large number of atoms, it will take several days to even approach 1 ns. ANd your cutoffs for vdw, etc...Why are they 0? I dont know if this means a defautl set is then used...but if not ? Wouldnt they try integrating using both types of formula, or would it be just using coulumb or vice versa? (dont know what that would do to the code but assume it means no vdw, and all coulumb but then zeros are alwyas a problem for computers). The setup is for the all-vs-all kernels. Setting cutoffs equal to zero and using a fixed neighbor list triggers these special optimized kernels. I have also noticed that long, finite cutoffs (on the order of 4.0 nm) lead to unacceptable energy drift and structural instability in well-behaved systems (even the benchmarks). For instance, the backbone RMSD of lysozyme is twice as large in the case of a 4.0-nm cutoff relative to the all-vs-all setup, and the energy drift is quite substantial. -Justin -- Justin A. Lemkul, Ph.D. Research Scientist Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU gets faster with more molecules in system
On 25/01/2011 8:25 AM, Christian Mötzing wrote: Hi, I compiled mdrun-gpu and tried some waterbox systems with different atoms counts. atoms | GPU| CPU 2.400 | 1.015s | 774s 4.800 | 1.225s | 1.202s 9.600 | 1.142s | 1.353s 19.200 | 2.984s | 2.812s Why does the system with 9.600 atoms finish faster than the one with 4.800? I tripple checked the simualtions and even GROMACs tells me that the atom count in the system is like above. So I think no mistaken there. A diff of md.log only shows differences in output values for each step. Is there any explanation for this behaviour? As a guess, the cost of overheads for molecular simulations tend to have a weaker dependence on system size than the cost of computation (or none at all). Only once the latter dominate the cost do you see scaling with system size. I expect you'd see similar behaviour running systems with 64, 128, 256, 512 atoms on 64 processors. Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] gpu
Hi, Did you read this? http://www.gromacs.org/gpu Rossen On 11/7/10 1:23 PM, Erik Wensink wrote: Dear gmx-users, How to invoke the gpu for simulations, e.g. is there (compiler) flag? Cheers, Erik -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] gpu
tnx. Erik --- On Sun, 11/7/10, Rossen Apostolov ros...@kth.se wrote: From: Rossen Apostolov ros...@kth.se Subject: Re: [gmx-users] gpu To: gmx-users@gromacs.org Date: Sunday, November 7, 2010, 4:27 PM Hi, Did you read this? http://www.gromacs.org/gpu Rossen On 11/7/10 1:23 PM, Erik Wensink wrote: Dear gmx-users, How to invoke the gpu for simulations, e.g. is there (compiler) flag? Cheers, Erik -Inline Attachment Follows- -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU slower than I7
Hi, My OS is Fedora 13 (64 bits) and I used gcc 4.4.4. I ran the program you sent me. Bellow are the results of 5 runs. As you can see the results are rougly the same [ren...@scrat ~]$ ./time 2.09 2.102991 [ren...@scrat ~]$ ./time 2.09 2.102808 [ren...@scrat ~]$ ./time 2.09 2.104577 [ren...@scrat ~]$ ./time 2.09 2.103943 [ren...@scrat ~]$ ./time 2.09 2.104471 Bellow are part of the /src/configure.h . . . /* Define to 1 if you have the MSVC _aligned_malloc() function. */ /* #undef HAVE__ALIGNED_MALLOC */ /* Define to 1 if you have the gettimeofday() function. */ #define HAVE_GETTIMEOFDAY /* Define to 1 if you have the cbrt() function. */ #define HAVE_CBRT . . . Is this OK? Renato 2010/10/22 Roland Schulz rol...@utk.edu: Hi, On Fri, Oct 22, 2010 at 3:20 PM, Renato Freitas renato...@gmail.com wrote: Do you think that the NODE and Real time difference could be attributed to some compilation problem in the mdrun-gpu. Despite I'm asking this I didn't get any error in the compilation. It is very odd that these are different for you system. What operating system and compiler do you use? Is HAVE_GETTIMEOFDAY set in src/config.h? I attached a small test program which uses the two different timers used for NODE and Real time. You can compile it with cc time.c -o time and run it with ./time. Do you get roughly the same time twice with the test program or do you see the same discrepancy as with GROMACS? Roland Thanks, Renato 2010/10/22 Szilárd Páll szilard.p...@cbr.su.se: Hi Renato, First of all, what you're seeing is pretty normal, especially that you have a CPU that is crossing the border of insane :) Why is it normal? The PME algorithms are just simply not very well not well suited for current GPU architectures. With an ill-suited algorithm you won't be able to see the speedups you can often see in other application areas - -even more so that you're comparing to Gromacs on a i7 980X. For more info + benchmarks see the Gromacs-GPU page: http://www.gromacs.org/gpu However, there is one strange thing you also pointed out. The fact that the NODE and Real time in your mdrun-gpu timing summary is not the same, but has 3x deviation is _very_ unusual. I've ran mdrun-gpu on quite a wide variety of hardware but I've never seen those two counter deviate. It might be an artifact from the cycle counters used internally that behave in an unusual way on your CPU. One other thing I should point out is that you would be better off using the standard mdrun which in 4.5 by default has thread-support and therefore will run on a single cpu/node without MPI! Cheers, -- Szilárd On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas renato...@gmail.com wrote: Hi gromacs users, I have installed the lastest version of gromacs (4.5.1) in an i7 980X (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its mpi version. Also I compiled the GPU-accelerated version of gromacs. Then I did a 2 ns simulation using a small system (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi. The results that I got are bellow: My *.mdp is: constraints = all-bonds integrator = md dt = 0.002 ; ps ! nsteps = 100 ; total 2000 ps. nstlist = 10 ns_type = grid coulombtype = PME rvdw = 0.9 rlist = 0.9 rcoulomb = 0.9 fourierspacing = 0.10 pme_order = 4 ewald_rtol = 1e-5 vdwtype = cut-off pbc = xyz epsilon_rf = 0 comm_mode = linear nstxout = 1000 nstvout = 0 nstfout = 0 nstxtcout = 1000 nstlog = 1000 nstenergy = 1000 ; Berendsen temperature coupling is on in four groups tcoupl = berendsen tc-grps = system tau-t = 0.1 ref-t = 298 ; Pressure coupling is on Pcoupl = berendsen pcoupltype = isotropic tau_p = 0.5 compressibility = 4.5e-5 ref_p = 1.0 ; Generate velocites is on at 298 K. gen_vel = no RUNNING GROMACS ON GPU mdrun-gpu -s topol.tpr -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 09:52:09 2010 . . . R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-Cycles Seconds % -- Write traj. 1 1021 106.075 31.7 0.2 Rest 1 64125.577 19178.6 99.8
Re: [gmx-users] GPU slower than I7
Hi Renato, First of all, what you're seeing is pretty normal, especially that you have a CPU that is crossing the border of insane :) Why is it normal? The PME algorithms are just simply not very well not well suited for current GPU architectures. With an ill-suited algorithm you won't be able to see the speedups you can often see in other application areas - -even more so that you're comparing to Gromacs on a i7 980X. For more info + benchmarks see the Gromacs-GPU page: http://www.gromacs.org/gpu However, there is one strange thing you also pointed out. The fact that the NODE and Real time in your mdrun-gpu timing summary is not the same, but has 3x deviation is _very_ unusual. I've ran mdrun-gpu on quite a wide variety of hardware but I've never seen those two counter deviate. It might be an artifact from the cycle counters used internally that behave in an unusual way on your CPU. One other thing I should point out is that you would be better off using the standard mdrun which in 4.5 by default has thread-support and therefore will run on a single cpu/node without MPI! Cheers, -- Szilárd On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas renato...@gmail.com wrote: Hi gromacs users, I have installed the lastest version of gromacs (4.5.1) in an i7 980X (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its mpi version. Also I compiled the GPU-accelerated version of gromacs. Then I did a 2 ns simulation using a small system (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi. The results that I got are bellow: My *.mdp is: constraints = all-bonds integrator = md dt = 0.002 ; ps ! nsteps = 100 ; total 2000 ps. nstlist = 10 ns_type = grid coulombtype = PME rvdw = 0.9 rlist = 0.9 rcoulomb = 0.9 fourierspacing = 0.10 pme_order = 4 ewald_rtol = 1e-5 vdwtype = cut-off pbc = xyz epsilon_rf = 0 comm_mode = linear nstxout = 1000 nstvout = 0 nstfout = 0 nstxtcout = 1000 nstlog = 1000 nstenergy = 1000 ; Berendsen temperature coupling is on in four groups tcoupl = berendsen tc-grps = system tau-t = 0.1 ref-t = 298 ; Pressure coupling is on Pcoupl = berendsen pcoupltype = isotropic tau_p = 0.5 compressibility = 4.5e-5 ref_p = 1.0 ; Generate velocites is on at 298 K. gen_vel = no RUNNING GROMACS ON GPU mdrun-gpu -s topol.tpr -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 09:52:09 2010 . . . R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-Cycles Seconds % -- Write traj. 1 1021 106.075 31.7 0.2 Rest 1 64125.577 19178.6 99.8 -- Total 1 64231.652 19210.3 100.0 -- NODE (s) Real (s) (%) Time: 6381.840 19210.349 33.2 1h46:21 (Mnbf/s) (MFlops) (ns/day) (hour/ns) Performance: 0.000 0.001 27.077 0.886 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010 RUNNING GROMACS ON MPI mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 18:30:52 2010 R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-Cycles Seconds % -- Domain decomp. 3 11 1452.166 434.7 0.6 DD comm. load 3 10001 0.745 0.2 0.0 Send X to PME 3 101 249.003 74.5 0.1 Comm. coord. 3 101 637.329 190.8 0.3 Neighbor search 3 11 8738.669 2616.0 3.5 Force 3 101 99210.202 29699.2 39.2 Wait + Comm. F 3 101 3361.591 1006.3 1.3 PME mesh 3 101 66189.554 19814.2 26.2 Wait +
Re: [gmx-users] GPU slower than I7
Hi Roland, In fact I get better performance values using different rcoulomb, fourierspacing and the values of -npme suggested by g_tune_pme using -nt=12. The simulation using GPU was carried out using the dedicated machine, no other programs was runnig, even the graphical interface was stopped. About the CPU vs GPU simulation time, Szilárd explained that the PME algorithms still are not very well suited for current GPU architectures. I just don't know why the NODE and REAL times are not equal. Thanks, Renato 2010/10/21 Roland Schulz rol...@utk.edu: On Thu, Oct 21, 2010 at 5:53 PM, Renato Freitas renato...@gmail.com wrote: Thanks Roland. I will do a newer test using the fourier spacing equal to 0.11. I'd also suggest to look at g_tune_pme and run with different rcoulomb, fourier_spacing. As long as the ratio is the same you get the same accuracy. And you should get better performance (especially on the GPU) for longer cut-off and larger grid-spacing. However, about the performance of GPU versus CPU (mpi) let me try to explain it better: GPU NODE (s) Real (s) (%) Time: 6381.840 19210.349 33.2 1h46:21 (Mnbf/s) (MFlops) (ns/day) (hour/ns) Performance: 0.000 0.001 27.077 0.886 MPI NODE (s) Real (s) (%) Time: 12621.257 12621.257 100.0 3h30:21 (Mnbf/s) (GFlops) (ns/day) (hour/ns) Performance: 388.633 28.773 13.691 1.753 Yes. Sorry I didn't realize that NODE time and Real time is different. Did you run the GPU calculation on a desktop machine which was also doing other things at the time. This might explain it. As far as I know for a dedicated machine not running any other programs NODE and Real time should be the same. Looking abobe we can see that the gromacs prints in the output that the simulation is faster when the GPU is used. But this is not the reality. The truth is that simulation time with MPI was 106 min faster thatn that with GPU. It seems correct to you? As I said before, I was expecting that GPU should take a lower time than the 6 core MPI. Well the exact time depends on a lot of factors. And you probably can speed up both. But I would expect them to be both about similar fast. Roland -- gmx-users mailing list gmx-us...@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] GPU slower than I7
Hi Szilárd, Thans for your explanation. Do you know if there will be a new improvement of PME algorithms to take the full advantage of GPU video cards? Do you think that the NODE and Real time difference could be attributed to some compilation problem in the mdrun-gpu. Despite I'm asking this I didn't get any error in the compilation. Thanks, Renato 2010/10/22 Szilárd Páll szilard.p...@cbr.su.se: Hi Renato, First of all, what you're seeing is pretty normal, especially that you have a CPU that is crossing the border of insane :) Why is it normal? The PME algorithms are just simply not very well not well suited for current GPU architectures. With an ill-suited algorithm you won't be able to see the speedups you can often see in other application areas - -even more so that you're comparing to Gromacs on a i7 980X. For more info + benchmarks see the Gromacs-GPU page: http://www.gromacs.org/gpu However, there is one strange thing you also pointed out. The fact that the NODE and Real time in your mdrun-gpu timing summary is not the same, but has 3x deviation is _very_ unusual. I've ran mdrun-gpu on quite a wide variety of hardware but I've never seen those two counter deviate. It might be an artifact from the cycle counters used internally that behave in an unusual way on your CPU. One other thing I should point out is that you would be better off using the standard mdrun which in 4.5 by default has thread-support and therefore will run on a single cpu/node without MPI! Cheers, -- Szilárd On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas renato...@gmail.com wrote: Hi gromacs users, I have installed the lastest version of gromacs (4.5.1) in an i7 980X (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its mpi version. Also I compiled the GPU-accelerated version of gromacs. Then I did a 2 ns simulation using a small system (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi. The results that I got are bellow: My *.mdp is: constraints = all-bonds integrator = md dt = 0.002 ; ps ! nsteps = 100 ; total 2000 ps. nstlist = 10 ns_type = grid coulombtype = PME rvdw = 0.9 rlist = 0.9 rcoulomb = 0.9 fourierspacing = 0.10 pme_order = 4 ewald_rtol = 1e-5 vdwtype = cut-off pbc = xyz epsilon_rf = 0 comm_mode = linear nstxout = 1000 nstvout = 0 nstfout = 0 nstxtcout = 1000 nstlog = 1000 nstenergy = 1000 ; Berendsen temperature coupling is on in four groups tcoupl = berendsen tc-grps = system tau-t = 0.1 ref-t = 298 ; Pressure coupling is on Pcoupl = berendsen pcoupltype = isotropic tau_p = 0.5 compressibility = 4.5e-5 ref_p = 1.0 ; Generate velocites is on at 298 K. gen_vel = no RUNNING GROMACS ON GPU mdrun-gpu -s topol.tpr -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 09:52:09 2010 . . . R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-Cycles Seconds % -- Write traj. 1 1021 106.075 31.7 0.2 Rest 1 64125.577 19178.6 99.8 -- Total 1 64231.652 19210.3 100.0 -- NODE (s) Real (s) (%) Time: 6381.840 19210.349 33.2 1h46:21 (Mnbf/s) (MFlops) (ns/day) (hour/ns) Performance: 0.000 0.001 27.077 0.886 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010 RUNNING GROMACS ON MPI mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 18:30:52 2010 R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-Cycles Seconds % -- Domain decomp. 3 11 1452.166 434.7 0.6 DD comm. load 3 10001 0.745 0.2 0.0 Send X to PME 3 101 249.003 74.5 0.1 Comm.
Re: [gmx-users] GPU slower than I7
Hi, On Fri, Oct 22, 2010 at 3:20 PM, Renato Freitas renato...@gmail.com wrote: Do you think that the NODE and Real time difference could be attributed to some compilation problem in the mdrun-gpu. Despite I'm asking this I didn't get any error in the compilation. It is very odd that these are different for you system. What operating system and compiler do you use? Is HAVE_GETTIMEOFDAY set in src/config.h? I attached a small test program which uses the two different timers used for NODE and Real time. You can compile it with cc time.c -o time and run it with ./time. Do you get roughly the same time twice with the test program or do you see the same discrepancy as with GROMACS? Roland Thanks, Renato 2010/10/22 Szilárd Páll szilard.p...@cbr.su.se: Hi Renato, First of all, what you're seeing is pretty normal, especially that you have a CPU that is crossing the border of insane :) Why is it normal? The PME algorithms are just simply not very well not well suited for current GPU architectures. With an ill-suited algorithm you won't be able to see the speedups you can often see in other application areas - -even more so that you're comparing to Gromacs on a i7 980X. For more info + benchmarks see the Gromacs-GPU page: http://www.gromacs.org/gpu However, there is one strange thing you also pointed out. The fact that the NODE and Real time in your mdrun-gpu timing summary is not the same, but has 3x deviation is _very_ unusual. I've ran mdrun-gpu on quite a wide variety of hardware but I've never seen those two counter deviate. It might be an artifact from the cycle counters used internally that behave in an unusual way on your CPU. One other thing I should point out is that you would be better off using the standard mdrun which in 4.5 by default has thread-support and therefore will run on a single cpu/node without MPI! Cheers, -- Szilárd On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas renato...@gmail.com wrote: Hi gromacs users, I have installed the lastest version of gromacs (4.5.1) in an i7 980X (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its mpi version. Also I compiled the GPU-accelerated version of gromacs. Then I did a 2 ns simulation using a small system (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi. The results that I got are bellow: My *.mdp is: constraints = all-bonds integrator = md dt = 0.002; ps ! nsteps = 100 ; total 2000 ps. nstlist = 10 ns_type = grid coulombtype= PME rvdw= 0.9 rlist = 0.9 rcoulomb= 0.9 fourierspacing = 0.10 pme_order = 4 ewald_rtol = 1e-5 vdwtype = cut-off pbc = xyz epsilon_rf= 0 comm_mode = linear nstxout = 1000 nstvout = 0 nstfout = 0 nstxtcout = 1000 nstlog = 1000 nstenergy = 1000 ; Berendsen temperature coupling is on in four groups tcoupl = berendsen tc-grps = system tau-t = 0.1 ref-t = 298 ; Pressure coupling is on Pcoupl = berendsen pcoupltype = isotropic tau_p = 0.5 compressibility = 4.5e-5 ref_p = 1.0 ; Generate velocites is on at 298 K. gen_vel = no RUNNING GROMACS ON GPU mdrun-gpu -s topol.tpr -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 09:52:09 2010 . . . R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-CyclesSeconds % -- Write traj.1 1021106.075 31.7 0.2 Rest 1 64125.577 19178.6 99.8 -- Total 1 64231.652 19210.3 100.0 -- NODE (s)Real (s) (%) Time:6381.84019210.349 33.2 1h46:21 (Mnbf/s) (MFlops) (ns/day)(hour/ns) Performance:0.000 0.001 27.077 0.886 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010 RUNNING GROMACS ON MPI mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 18:30:52 2010 R E A L C Y C L E A N D T I
Re: [gmx-users] GPU slower than I7
On Thu, Oct 21, 2010 at 3:18 PM, Renato Freitas renato...@gmail.com wrote: Hi gromacs users, I have installed the lastest version of gromacs (4.5.1) in an i7 980X (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its mpi version. Also I compiled the GPU-accelerated version of gromacs. Then I did a 2 ns simulation using a small system (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi. The results that I got are bellow: My *.mdp is: constraints = all-bonds integrator = md dt = 0.002; ps ! nsteps = 100 ; total 2000 ps. nstlist = 10 ns_type = grid coulombtype= PME rvdw= 0.9 rlist = 0.9 rcoulomb= 0.9 fourierspacing = 0.10 pme_order = 4 ewald_rtol = 1e-5 vdwtype = cut-off pbc = xyz epsilon_rf= 0 comm_mode = linear nstxout = 1000 nstvout = 0 nstfout = 0 nstxtcout = 1000 nstlog = 1000 nstenergy = 1000 ; Berendsen temperature coupling is on in four groups tcoupl = berendsen tc-grps = system tau-t = 0.1 ref-t = 298 ; Pressure coupling is on Pcoupl = berendsen pcoupltype = isotropic tau_p = 0.5 compressibility = 4.5e-5 ref_p = 1.0 ; Generate velocites is on at 298 K. gen_vel = no RUNNING GROMACS ON GPU mdrun-gpu -s topol.tpr -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 09:52:09 2010 . . . R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-CyclesSeconds % -- Write traj.1 1021106.075 31.7 0.2 Rest 1 64125.577 19178.6 99.8 -- Total 1 64231.652 19210.3 100.0 -- NODE (s)Real (s)(%) Time:6381.84019210.349 33.2 1h46:21 (Mnbf/s) (MFlops) (ns/day)(hour/ns) Performance:0.000 0.001 27.077 0.886 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010 RUNNING GROMACS ON MPI mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 18:30:52 2010 R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-CyclesSeconds % -- Domain decomp. 3 11 1452.166 434.7 0.6 DD comm. load 3 100010.745 0.2 0.0 Send X to PME 3 101249.003 74.5 0.1 Comm. coord. 3 101 637.329190.8 0.3 Neighbor search3 11 8738.669 2616.0 3.5 Force 3 101 99210.202 29699.239.2 Wait + Comm. F 3 101 3361.591 1006.3 1.3 PME mesh 3 101 66189.554 19814.2 26.2 Wait + Comm. X/F3 60294.513 8049.5 23.8 Wait + Recv. PME F 3 101801.897240.1 0.3 Write traj. 3 1015 33.464 10.0 0.0 Update 3 1013295.820 986.6 1.3 Constraints 3 101 6317.568 1891.2 2.5 Comm. energies 3 12 70.784 21.2 0.0 Rest3 2314.844 693.0 0.9 -- Total6 252968.14875727.5 100.0 -- -- PME redist. X/F3 2021945.551 582.4 0.8 PME spread/gather 3 202
Re: [gmx-users] GPU slower than I7
Thanks Roland. I will do a newer test using the fourier spacing equal to 0.11. However, about the performance of GPU versus CPU (mpi) let me try to explain it better: The simulation using gromacs with GPU started and finished: Started mdrun on node 0 Wed Oct 20 09:52:09 2010 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010 Total time = 320 min The simulation using gromacs with mpi started and finished: Started mdrun on node 0 Wed Oct 20 18:30:52 2010 Finished mdrun on node 0 Wed Oct 20 22:01:14 2010 Total time = 211 min Based on this numbers, it was the CPU with mpi that was faster than the GPU, by aproximately 106 min. But looking at the end of each output I have: GPU NODE (s)Real (s)(%) Time:6381.84019210.34933.2 1h46:21 (Mnbf/s) (MFlops) (ns/day)(hour/ns) Performance:0.000 0.001 27.077 0.886 MPI NODE (s) Real (s)(%) Time:12621.257 12621.257 100.0 3h30:21 (Mnbf/s) (GFlops) (ns/day)(hour/ns) Performance: 388.633 28.77313.691 1.753 Looking abobe we can see that the gromacs prints in the output that the simulation is faster when the GPU is used. But this is not the reality. The truth is that simulation time with MPI was 106 min faster thatn that with GPU. It seems correct to you? As I said before, I was expecting that GPU should take a lower time than the 6 core MPI. Thanks, Renato 2010/10/21 Roland Schulz rol...@utk.edu: On Thu, Oct 21, 2010 at 3:18 PM, Renato Freitas renato...@gmail.com wrote: Hi gromacs users, I have installed the lastest version of gromacs (4.5.1) in an i7 980X (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its mpi version. Also I compiled the GPU-accelerated version of gromacs. Then I did a 2 ns simulation using a small system (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi. The results that I got are bellow: My *.mdp is: constraints = all-bonds integrator = md dt = 0.002 ; ps ! nsteps = 100 ; total 2000 ps. nstlist = 10 ns_type = grid coulombtype = PME rvdw = 0.9 rlist = 0.9 rcoulomb = 0.9 fourierspacing = 0.10 pme_order = 4 ewald_rtol = 1e-5 vdwtype = cut-off pbc = xyz epsilon_rf = 0 comm_mode = linear nstxout = 1000 nstvout = 0 nstfout = 0 nstxtcout = 1000 nstlog = 1000 nstenergy = 1000 ; Berendsen temperature coupling is on in four groups tcoupl = berendsen tc-grps = system tau-t = 0.1 ref-t = 298 ; Pressure coupling is on Pcoupl = berendsen pcoupltype = isotropic tau_p = 0.5 compressibility = 4.5e-5 ref_p = 1.0 ; Generate velocites is on at 298 K. gen_vel = no RUNNING GROMACS ON GPU mdrun-gpu -s topol.tpr -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 09:52:09 2010 . . . R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-Cycles Seconds % -- Write traj. 1 1021 106.075 31.7 0.2 Rest 1 64125.577 19178.6 99.8 -- Total 1 64231.652 19210.3 100.0 -- NODE (s) Real (s) (%) Time: 6381.840 19210.349 33.2 1h46:21 (Mnbf/s) (MFlops) (ns/day) (hour/ns) Performance: 0.000 0.001 27.077 0.886 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010 RUNNING GROMACS ON MPI mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v out Here is a part of the md.log: Started mdrun on node 0 Wed Oct 20 18:30:52 2010 R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Number G-Cycles Seconds % -- Domain decomp. 3 11 1452.166 434.7 0.6 DD comm. load 3
Re: [gmx-users] GPU slower than I7
On Thu, Oct 21, 2010 at 5:53 PM, Renato Freitas renato...@gmail.com wrote: Thanks Roland. I will do a newer test using the fourier spacing equal to 0.11. I'd also suggest to look at g_tune_pme and run with different rcoulomb, fourier_spacing. As long as the ratio is the same you get the same accuracy. And you should get better performance (especially on the GPU) for longer cut-off and larger grid-spacing. However, about the performance of GPU versus CPU (mpi) let me try to explain it better: GPU NODE (s)Real (s)(%) Time:6381.84019210.34933.2 1h46:21 (Mnbf/s) (MFlops) (ns/day)(hour/ns) Performance:0.000 0.001 27.077 0.886 MPI NODE (s) Real (s)(%) Time:12621.257 12621.257 100.0 3h30:21 (Mnbf/s) (GFlops) (ns/day)(hour/ns) Performance: 388.633 28.77313.691 1.753 Yes. Sorry I didn't realize that NODE time and Real time is different. Did you run the GPU calculation on a desktop machine which was also doing other things at the time. This might explain it. As far as I know for a dedicated machine not running any other programs NODE and Real time should be the same. Looking abobe we can see that the gromacs prints in the output that the simulation is faster when the GPU is used. But this is not the reality. The truth is that simulation time with MPI was 106 min faster thatn that with GPU. It seems correct to you? As I said before, I was expecting that GPU should take a lower time than the 6 core MPI. Well the exact time depends on a lot of factors. And you probably can speed up both. But I would expect them to be both about similar fast. Roland -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists