Hyperthreading seems to give better performance: 1 MPI thread, 12 Open MP threads. command: mdrun_d -deffnm npt2 -ntomp 12 -ntmpi 1 -pin on -pinoffset 0
Core t (s) Wall t (s) (%) Time: 6638.388 553.519 1199.3 (ns/day) (hour/ns) Performance: 3.297 7.280 Finished mdrun on rank 0 Thu Sep 11 06:24:52 2014 24 MPI threads. 1 OpenMP thread per tMPI thread. command: mdrun_d -deffnm npt2 NOTE: 12.1 % performance was lost because the PME ranks had less work to do than the PP ranks. Core t (s) Wall t (s) (%) Time: 7064.222 294.611 2397.8 (ns/day) (hour/ns) Performance: 4.036 5.947 Finished mdrun on rank 0 Thu Sep 11 06:39:47 2014 I tried 12 tMPI threads, with 1 omp thread each, and with pin on. performance was 3.5 ns/day. I used to compile gromacs 4.6.5 single precision with intel compiler and mkl. This gromacs 5.0.1 double precision was compiled with gcc 4.4.7 because installing the intel compiler now needs root. On Thu, Sep 11, 2014 at 9:48 AM, Johnny Lu <johnny.lu...@gmail.com> wrote: > this mail list thread talks about it: > https://www.mail-archive.com/gromacs.org_gmx-users@maillist.sys.kth.se/msg06331.html > > > On Thu, Sep 11, 2014 at 9:45 AM, Johnny Lu <johnny.lu...@gmail.com> wrote: > >> The gromacs wiki also says that mixing mpi and openmp is bad on small >> computers. >> >> On Thu, Sep 11, 2014 at 9:44 AM, Johnny Lu <johnny.lu...@gmail.com> >> wrote: >> >>> Ah. Thanks a lot. >>> As suggested by ( >>> https://www.ibm.com/developerworks/community/blogs/brian/entry/linux_show_the_number_of_cpu_cores_on_your_system17?lang=en), >>> >>> $ cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l >>> 2 >>> $ cat /proc/cpuinfo | egrep "core id|physical id" | tr -d "\n" | sed >>> s/physical/\\nphysical/g | grep -v ^$ | sort | uniq | wc -l >>> 12 >>> >>> There are 12 real cores. >>> Type "top" and then press 1 sometimes give double the number of real >>> cores, but sometimes doesn't double the number (tested on different >>> machines). >>> >>> How to run "an MPI rank per core" ? By this way? "OMP_NUM_THREADS=12 >>> mdrun" on a 12 core machine? >>> >>> I tried openmp threads instead of mpi thread because gromacs wiki says >>> openmp threads are faster than mpi based parallelization. >>> >>> from the gromacs wiki ( >>> http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Multi-level_parallelization.3a_MPI_and_OpenMP >>> ): >>> >>> In GROMACS 4.6 compiled with thread-MPI, OpenMP-only parallelization is >>> the default with Verlet scheme when using up to 8 cores on AMD platforms >>> and up to 12 and 16 cores on Intel Nehalem and Sandy Bridge, respectively. >>> Note that even running across two CPUs (in different sockets) on Intel >>> platforms OpenMP mutithreading is, in the majority of the cases, >>> significantly faster than MPI-based parallelization. >>> >>> ... >>> >>> Assuming that there are N cores available, the following commands are >>> equivalent: >>> >>> mdrun -ntomp N -ntmpi 1 >>> OMP_NUM_THREADS=N mdrun >>> mdrun #assuming that N <= 8 on AMD or N <= 12/16 on Intel Nehalem/Sandy >>> Bridge >>> >>> >>> >>> >> > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.