As a side-note, your mdrun invocation does not seem suitable for GPU accelerated runs, you'd most likely be better of running fewer ranks. -- Szilárd
On Fri, Mar 23, 2018 at 9:26 PM, Christopher Neale <chris.ne...@alum.utoronto.ca> wrote: > Hello, > > I am running gromacs 5.1.2 on single nodes where the run is set to use 32 > cores and 4 GPUs. The run command is: > > mpirun -np 32 gmx_mpi mdrun -deffnm MD -maxh $maxh -dd 4 4 2 -npme 0 -gpu_id > 00000000111111112222222233333333 -ntomp 1 -notunepme > > Some of my runs die with this error: > cudaMallocHost of size 1024128 bytes failed: unknown error > > Below is the relevant part of the .log file. Searching the internet didn't > turn up any solutions. I'll contact sysadmins if you think this is likely > some problem with the hardware or rogue jobs. In my testing, a collection of > 24 jobs had 6 die with this same error message (including the "1024128 bytes" > and "pmalloc_cuda.cu, line: 70"). All on different nodes, and all those node > next took repeat jobs that run fine. When the error occured, it was always > right at the start of the run. > > > Thank you for your help, > Chris. > > > > Command line: > gmx_mpi mdrun -deffnm MD -maxh 0.9 -dd 4 4 2 -npme 0 -gpu_id > 00000000111111112222222233333333 -ntomp 1 -notunepme > > > Number of logical cores detected (72) does not match the number reported by > OpenMP (2). > Consider setting the launch configuration manually! > > Running on 1 node with total 36 cores, 72 logical cores, 4 compatible GPUs > Hardware detected on host ko026.localdomain (the node of MPI rank 0): > CPU info: > Vendor: GenuineIntel > Brand: Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz > SIMD instructions most likely to fit this hardware: AVX2_256 > SIMD instructions selected at GROMACS compile time: AVX2_256 > GPU info: > Number of GPUs detected: 4 > #0: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: > compatible > #1: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: > compatible > #2: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: > compatible > #3: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: > compatible > > Reading file MD.tpr, VERSION 5.1.2 (single precision) > Can not increase nstlist because verlet-buffer-tolerance is not set or used > Using 32 MPI processes > Using 1 OpenMP thread per MPI process > > On host ko026.localdomain 4 GPUs user-selected for this run. > Mapping of GPU IDs to the 32 PP ranks in this node: > 0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3 > > NOTE: You assigned GPUs to multiple MPI processes. > > NOTE: Your choice of number of MPI ranks and amount of resources results in > using 1 OpenMP threads per rank, which is most likely inefficient. The > optimum is usually between 2 and 6 threads per rank. > > > NOTE: GROMACS was configured without NVML support hence it can not exploit > application clocks of the detected Tesla P100-PCIE-16GB GPU to improve > performance. > Recompile with the NVML library (compatible with the driver used) or > set application clocks manually. > > > ------------------------------------------------------- > Program gmx mdrun, VERSION 5.1.2 > Source code file: > /net/scratch3/cneale/exe/KODIAK/GROMACS/source/gromacs-5.1.2/src/gromacs/gmxlib/cuda_tools/pmalloc_cuda.cu, > line: 70 > > Fatal error: > cudaMallocHost of size 1024128 bytes failed: unknown error > > For more information and tips for troubleshooting, please check the GROMACS > website at http://www.gromacs.org/Documentation/Errors > ------------------------------------------------------- > > Halting parallel program gmx mdrun on rank 31 out of 32 > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31 > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.