Hi, On Wed, Oct 25, 2017 at 4:24 AM Matthew W Hanley <mwhan...@syr.edu> wrote:
> > There's several dozen lines of performance analysis at the end of the log > > > file, which you need to inspect and compare if you want to start to > > understand what is going on :-) > > Thank you for the feedback. Fair warning, I'm more of a system > administrator than a regular gromacs user. What is it that I should be > focused on, and more importantly how do I find the bottleneck? Gromacs > does recommend using AVX2_256, but I was unable to get Gromacs to build > using that. That's your first thing to do, then :-) Presumably you need an updated toolchain (e.g. devtoolset), because the stability focus of CentOS makes it unsuitable for HPC inasmuch as stable basically means "old and lacking good support for newer hardware." However that's not going to help your issue with scaling across nodes. Given that you're not using PME, and assuming that your system is large enough (e.g. at least a few tens of thousands of particles), then the most likely issue is that the network latency is unsuitable. GROMACS can work over gigabit ethernet. Here is more of the log file: > > > On 32 MPI ranks > > > Computing: Num Num Call Wall time Giga-Cycles > > Ranks Threads Count (s) total sum % > > > ----------------------------------------------------------------------------- > > Domain decomp. 32 1 1666 18.920 1509.802 3.8 > > DD comm. load 32 1 1666 0.017 1.394 0.0 > > DD comm. bounds 32 1 1666 0.206 16.406 0.0 > > Vsite constr. 32 1 50001 4.624 369.013 0.9 > > Neighbor search 32 1 1667 19.646 1567.793 4.0 > > Comm. coord. 32 1 48334 8.291 661.640 1.7 > > Force 32 1 50001 339.477 27090.350 68.6 > > Wait + Comm. F 32 1 50001 12.691 1012.783 2.6 > > NB X/F buffer ops. 32 1 146669 13.563 1082.352 2.7 > > Vsite spread 32 1 50001 8.716 695.518 1.8 > > Write traj. 32 1 2 0.080 6.366 0.0 > > Update 32 1 50001 37.268 2973.983 7.5 > > Constraints 32 1 50001 25.674 2048.789 5.2 > > Comm. energies 32 1 5001 0.965 77.013 0.2 > > Rest 4.385 349.931 0.9 > > > ----------------------------------------------------------------------------- > > Total 494.524 39463.132 100.0 > > > -----------------------------------------------------------------------------? > > > > If that's not helpful, I would need more specifics on what part of the log > file would be. That all looks very normal. To see where the bottleneck emerges requires that one compare multiple log files, however. If the network is the problem then most fields apart from Force will increase their % share of run time. You could upload some log files to a file-sharing service and share links if you want some feedback. Failing that, if anyone could recommend some good documentation for > optimizing performance I would greatly appreciate it, thank you! > http://manual.gromacs.org/documentation/2016.4/user-guide/mdrun-performance.html but many points won't apply because you're not using PME, and so you're in the easy case. You should be able to scale to under 500 particles per core, but the actual target varies heavily with hardware, the use of vsites, and the network performance. Mark -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.