[gmx-users] cudaMallocHost filed: unknown error

2018-03-23 Thread Christopher Neale
Hello,

I am running gromacs 5.1.2 on single nodes where the run is set to use 32 cores 
and 4 GPUs. The run command is:

mpirun -np 32 gmx_mpi mdrun -deffnm MD -maxh $maxh -dd 4 4 2 -npme 0 -gpu_id 
 -ntomp 1 -notunepme

Some of my runs die with this error:
cudaMallocHost of size 1024128 bytes failed: unknown error

Below is the relevant part of the .log file. Searching the internet didn't turn 
up any solutions. I'll contact sysadmins if you think this is likely some 
problem with the hardware or rogue jobs. In my testing, a collection of 24 jobs 
had 6 die with this same error message (including the "1024128 bytes" and 
"pmalloc_cuda.cu, line: 70"). All on different nodes, and all those node next 
took repeat jobs that run fine. When the error occured, it was always right at 
the start of the run.


Thank you for your help,
Chris.



Command line:
  gmx_mpi mdrun -deffnm MD -maxh 0.9 -dd 4 4 2 -npme 0 -gpu_id 
 -ntomp 1 -notunepme


Number of logical cores detected (72) does not match the number reported by 
OpenMP (2).
Consider setting the launch configuration manually!

Running on 1 node with total 36 cores, 72 logical cores, 4 compatible GPUs
Hardware detected on host ko026.localdomain (the node of MPI rank 0):
  CPU info:
Vendor: GenuineIntel
Brand:  Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256
  GPU info:
Number of GPUs detected: 4
#0: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
compatible
#1: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
compatible
#2: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
compatible
#3: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
compatible

Reading file MD.tpr, VERSION 5.1.2 (single precision)
Can not increase nstlist because verlet-buffer-tolerance is not set or used
Using 32 MPI processes
Using 1 OpenMP thread per MPI process

On host ko026.localdomain 4 GPUs user-selected for this run.
Mapping of GPU IDs to the 32 PP ranks in this node: 
0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3

NOTE: You assigned GPUs to multiple MPI processes.

NOTE: Your choice of number of MPI ranks and amount of resources results in 
using 1 OpenMP threads per rank, which is most likely inefficient. The optimum 
is usually between 2 and 6 threads per rank.


NOTE: GROMACS was configured without NVML support hence it can not exploit
  application clocks of the detected Tesla P100-PCIE-16GB GPU to improve 
performance.
  Recompile with the NVML library (compatible with the driver used) or set 
application clocks manually.


---
Program gmx mdrun, VERSION 5.1.2
Source code file: 
/net/scratch3/cneale/exe/KODIAK/GROMACS/source/gromacs-5.1.2/src/gromacs/gmxlib/cuda_tools/pmalloc_cuda.cu,
 line: 70

Fatal error:
cudaMallocHost of size 1024128 bytes failed: unknown error

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
---

Halting parallel program gmx mdrun on rank 31 out of 32
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31


-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] cudaMallocHost filed: unknown error

2018-03-24 Thread Mark Abraham
Hi,

Looks like rogue behavior from the GPU driver's last workload, or something
like that. cudaMallocHost asks the driver to allocate memory on the CPU in
a special way, but for GROMACS that can never run into e.g. lack of
resources.

Mark

On Fri, Mar 23, 2018, 21:27 Christopher Neale 
wrote:

> Hello,
>
> I am running gromacs 5.1.2 on single nodes where the run is set to use 32
> cores and 4 GPUs. The run command is:
>
> mpirun -np 32 gmx_mpi mdrun -deffnm MD -maxh $maxh -dd 4 4 2 -npme 0
> -gpu_id  -ntomp 1 -notunepme
>
> Some of my runs die with this error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> Below is the relevant part of the .log file. Searching the internet didn't
> turn up any solutions. I'll contact sysadmins if you think this is likely
> some problem with the hardware or rogue jobs. In my testing, a collection
> of 24 jobs had 6 die with this same error message (including the "1024128
> bytes" and "pmalloc_cuda.cu, line: 70"). All on different nodes, and all
> those node next took repeat jobs that run fine. When the error occured, it
> was always right at the start of the run.
>
>
> Thank you for your help,
> Chris.
>
>
>
> Command line:
>   gmx_mpi mdrun -deffnm MD -maxh 0.9 -dd 4 4 2 -npme 0 -gpu_id
>  -ntomp 1 -notunepme
>
>
> Number of logical cores detected (72) does not match the number reported
> by OpenMP (2).
> Consider setting the launch configuration manually!
>
> Running on 1 node with total 36 cores, 72 logical cores, 4 compatible GPUs
> Hardware detected on host ko026.localdomain (the node of MPI rank 0):
>   CPU info:
> Vendor: GenuineIntel
> Brand:  Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
> SIMD instructions most likely to fit this hardware: AVX2_256
> SIMD instructions selected at GROMACS compile time: AVX2_256
>   GPU info:
> Number of GPUs detected: 4
> #0: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat:
> compatible
> #1: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat:
> compatible
> #2: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat:
> compatible
> #3: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat:
> compatible
>
> Reading file MD.tpr, VERSION 5.1.2 (single precision)
> Can not increase nstlist because verlet-buffer-tolerance is not set or used
> Using 32 MPI processes
> Using 1 OpenMP thread per MPI process
>
> On host ko026.localdomain 4 GPUs user-selected for this run.
> Mapping of GPU IDs to the 32 PP ranks in this node:
> 0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3
>
> NOTE: You assigned GPUs to multiple MPI processes.
>
> NOTE: Your choice of number of MPI ranks and amount of resources results
> in using 1 OpenMP threads per rank, which is most likely inefficient. The
> optimum is usually between 2 and 6 threads per rank.
>
>
> NOTE: GROMACS was configured without NVML support hence it can not exploit
>   application clocks of the detected Tesla P100-PCIE-16GB GPU to
> improve performance.
>   Recompile with the NVML library (compatible with the driver used) or
> set application clocks manually.
>
>
> ---
> Program gmx mdrun, VERSION 5.1.2
> Source code file:
> /net/scratch3/cneale/exe/KODIAK/GROMACS/source/gromacs-5.1.2/src/gromacs/gmxlib/cuda_tools/
> pmalloc_cuda.cu, line: 70
>
> Fatal error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> ---
>
> Halting parallel program gmx mdrun on rank 31 out of 32
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] cudaMallocHost filed: unknown error

2018-03-26 Thread Szilárd Páll
As a side-note, your mdrun invocation does not seem suitable for GPU
accelerated runs, you'd most likely be better of running fewer ranks.
--
Szilárd


On Fri, Mar 23, 2018 at 9:26 PM, Christopher Neale
 wrote:
> Hello,
>
> I am running gromacs 5.1.2 on single nodes where the run is set to use 32 
> cores and 4 GPUs. The run command is:
>
> mpirun -np 32 gmx_mpi mdrun -deffnm MD -maxh $maxh -dd 4 4 2 -npme 0 -gpu_id 
>  -ntomp 1 -notunepme
>
> Some of my runs die with this error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> Below is the relevant part of the .log file. Searching the internet didn't 
> turn up any solutions. I'll contact sysadmins if you think this is likely 
> some problem with the hardware or rogue jobs. In my testing, a collection of 
> 24 jobs had 6 die with this same error message (including the "1024128 bytes" 
> and "pmalloc_cuda.cu, line: 70"). All on different nodes, and all those node 
> next took repeat jobs that run fine. When the error occured, it was always 
> right at the start of the run.
>
>
> Thank you for your help,
> Chris.
>
>
>
> Command line:
>   gmx_mpi mdrun -deffnm MD -maxh 0.9 -dd 4 4 2 -npme 0 -gpu_id 
>  -ntomp 1 -notunepme
>
>
> Number of logical cores detected (72) does not match the number reported by 
> OpenMP (2).
> Consider setting the launch configuration manually!
>
> Running on 1 node with total 36 cores, 72 logical cores, 4 compatible GPUs
> Hardware detected on host ko026.localdomain (the node of MPI rank 0):
>   CPU info:
> Vendor: GenuineIntel
> Brand:  Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
> SIMD instructions most likely to fit this hardware: AVX2_256
> SIMD instructions selected at GROMACS compile time: AVX2_256
>   GPU info:
> Number of GPUs detected: 4
> #0: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
> compatible
> #1: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
> compatible
> #2: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
> compatible
> #3: NVIDIA Tesla P100-PCIE-16GB, compute cap.: 6.0, ECC: yes, stat: 
> compatible
>
> Reading file MD.tpr, VERSION 5.1.2 (single precision)
> Can not increase nstlist because verlet-buffer-tolerance is not set or used
> Using 32 MPI processes
> Using 1 OpenMP thread per MPI process
>
> On host ko026.localdomain 4 GPUs user-selected for this run.
> Mapping of GPU IDs to the 32 PP ranks in this node: 
> 0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3
>
> NOTE: You assigned GPUs to multiple MPI processes.
>
> NOTE: Your choice of number of MPI ranks and amount of resources results in 
> using 1 OpenMP threads per rank, which is most likely inefficient. The 
> optimum is usually between 2 and 6 threads per rank.
>
>
> NOTE: GROMACS was configured without NVML support hence it can not exploit
>   application clocks of the detected Tesla P100-PCIE-16GB GPU to improve 
> performance.
>   Recompile with the NVML library (compatible with the driver used) or 
> set application clocks manually.
>
>
> ---
> Program gmx mdrun, VERSION 5.1.2
> Source code file: 
> /net/scratch3/cneale/exe/KODIAK/GROMACS/source/gromacs-5.1.2/src/gromacs/gmxlib/cuda_tools/pmalloc_cuda.cu,
>  line: 70
>
> Fatal error:
> cudaMallocHost of size 1024128 bytes failed: unknown error
>
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> ---
>
> Halting parallel program gmx mdrun on rank 31 out of 32
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 31
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.