Re: [gmx-users] cudaMallocHost filed: unknown error

Szilárd Páll Mon, 26 Mar 2018 06:30:48 -0700

As a side-note, your mdrun invocation does not seem suitable for GPU
accelerated runs, you'd most likely be better of running fewer ranks.
--
Szilárd



On Fri, Mar 23, 2018 at 9:26 PM, Christopher Neale
<chris.ne...@alum.utoronto.ca> wrote:
> Hello,
>
> I am running gromacs 5.1.2 on single nodes where the run is set to use 32 
> cores and 4 GPUs. The run command is:
>
> mpirun -np 32 gmx_mpi mdrun -deffnm MD -maxh $maxh -dd 4 4 2 -npme 0 -gpu_id 
> 00000000111111112222222233333333 -ntomp 1 -notunepme
>
> Some of my runs die with this error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> Below is the relevant part of the .log file. Searching the internet didn't 
> turn up any solutions. I'll contact sysadmins if you think this is likely 
> some problem with the hardware or rogue jobs. In my testing, a collection of 
> 24 jobs had 6 die with this same error message (including the "1024128 bytes" 
> and "pmalloc_cuda.cu, line: 70"). All on different nodes, and all those node 
> next took repeat jobs that run fine. When the error occured, it was always 
> right at the start of the run.
>
>
> Thank you for your help,
> Chris.
>
>
>
> Command line:
>   gmx_mpi mdrun -deffnm MD -maxh 0.9 -dd 4 4 2 -npme 0 -gpu_id 
> 00000000111111112222222233333333 -ntomp 1 -notunepme
>
>
> Number of logical cores detected (72) does not match the number reported by 
> OpenMP (2).
> Consider setting the launch configuration manually!
>
> Running on 1 node with total 36 cores, 72 logical cores, 4 compatible GPUs
> Hardware detected on host ko026.localdomain (the node of MPI rank 0):
>   CPU info:
>     Vendor: GenuineIntel
>     Brand:  Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
>     SIMD instructions most likely to fit this hardware: AVX2_256
>     SIMD instructions selected at GROMACS compile time: AVX2_256
>   GPU info:
>     Number of GPUs detected: 4
>     #0: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
> compatible
>     #1: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
> compatible
>     #2: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
> compatible
>     #3: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
> compatible
>
> Reading file MD.tpr, VERSION 5.1.2 (single precision)
> Can not increase nstlist because verlet-buffer-tolerance is not set or used
> Using 32 MPI processes
> Using 1 OpenMP thread per MPI process
>
> On host ko026.localdomain 4 GPUs user-selected for this run.
> Mapping of GPU IDs to the 32 PP ranks in this node: 
> 0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3
>
> NOTE: You assigned GPUs to multiple MPI processes.
>
> NOTE: Your choice of number of MPI ranks and amount of resources results in 
> using 1 OpenMP threads per rank, which is most likely inefficient. The 
> optimum is usually between 2 and 6 threads per rank.
>
>
> NOTE: GROMACS was configured without NVML support hence it can not exploit
>       application clocks of the detected Tesla P100-PCIE-16GB GPU to improve 
> performance.
>       Recompile with the NVML library (compatible with the driver used) or 
> set application clocks manually.
>
>
> -------------------------------------------------------
> Program gmx mdrun, VERSION 5.1.2
> Source code file: 
> /net/scratch3/cneale/exe/KODIAK/GROMACS/source/gromacs-5.1.2/src/gromacs/gmxlib/cuda_tools/pmalloc_cuda.cu,
>  line: 70
>
> Fatal error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
>
> Halting parallel program gmx mdrun on rank 31 out of 32
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] cudaMallocHost filed: unknown error

Reply via email to