[QE-users] [QE-GPU] High GPU oversubscription detected
Dear Quantum Espresso Experts, I am encountering an unexpected message regarding GPU oversubscription when running pw.x with GPU on an HPC system. The system has 4 GPU cores per node. I did not receive any warnings when running calculations with 1 node and 1 GPU (ntasks=1, npools=1) or with 1 node and 2 GPUs (ntasks=2, npools=2). However, upon running with 1 node and 4 GPUs (ntasks=4, npools=4), I receive the following message: "Message from routine print_cuda_info: High GPU oversubscription detected. Are you sure this is what you want?" Here's the SLURM batch script I used: #!/bin/bash -x #SBATCH --nodes=1 #SBATCH --ntasks=4 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=1 #SBATCH --output=TbHf.%j #SBATCH --error=mpi-err.%j #SBATCH --gres=gpu:4 export OMP_NUM_THREADS=1 srun ~/qe-7.2-gpu/bin/pw.x -nk 4 -nb 1 -nt 1 -nd 1 < inp_pwscf > out_pwscf Based on the environment.f90 file, this message is triggered when nproc > ndev * nnode * 2. If I understand correctly, I have nproc (Number of parallel processe)=4, ndev(Number of GPU Devices per Node) =4 and nnode (Number of Nodes)=1. This condition seems to be false (4 > 8). Despite this, the message still appears. All 4 GPUs were active during the run. Could you please help me understand why this message is appearing under these conditions? Any insights or suggestions to address this would be greatly appreciated. Thank you in advance for your help! PhD student in Forschungszentrum Jülich Yin-Ying Ting -- Forschungszentrum Jülich GmbH Institute of Energy and Climate Research Theory and Computation of Energy Materials (IEK-13) E-mail: y.t...@fz-juelich.de<mailto:y.t...@fz-juelich.de> Forschungszentrum Jülich GmbH 52425 Jülich Sitz der Gesellschaft: Jülich Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users
Re: [QE-users] [QE-GPU] High GPU oversubscription detected
Dear Paolo, Thank you for your prompt response. Your suggestion was very helpful. I have reviewed the numbers and discovered that regardless of the value set in --gres=gpu:X, the number of ndev remains consistently at 1. Our HPC documentation indicates that --gres=gpu:X is the correct method to set GPUs, with each node having 4 GPUs. Here is the output when I set --gres=gpu:4: GPU acceleration is ACTIVE. GPU-aware MPI enabled nproc (MPI process): 4 ndev (GPU per Node): 1 nnode (Nodes): 1 Message from routine print_cuda_info: High GPU oversubscription detected. Are you sure this is what you want? I monitored GPU core usage every 10 seconds, it appears that all 4 GPU cores are activated when setting --gres=gpu:4: utilization.gpu [%], utilization.memory [%] 96 %, 37 % 95 %, 76 % 95 %, 50 % 95 %, 76 % time = 70 s For reference, here is my sbatch submission script: --- #!/bin/bash -x #SBATCH --gres=gpu:4 --partition=dc-gpu #SBATCH --nodes=1 #SBATCH --ntasks-per-node=4 #SBATCH --time=00:00:20 export OMP_NUM_THREADS=1 module load NVHPC/23.7-CUDA-12 module load CUDA/12 module load OpenMPI/4.1.5 module load mpi-settings/CUDA module load imkl/2023.2.0 monitor_gpu_usage() { while true; do nvidia-smi --query-gpu=utilization.gpu,utilization.memory --format=csv >> gpu_usage_$SLURM_JOB_ID.csv sleep 10 done } monitor_gpu_usage & srun -n 4 pw.x -nk 4 -nd 1 -nb 1 -nt 1 < inp_pwscf > out_pwscf - Could you please provide guidance on resolving the oversubscription issue? Thank you very much in advance. Kind regards, Yin-Ying Ting On 29.11.23 15:53, Paolo Giannozzi wrote: On 11/27/23 11:32, Yin-Ying Ting wrote: Based on the *environment.f90* file, this message is triggered when /nproc > ndev * nnode * 2/. If I understand correctly, I have nproc (Number of parallel processe)=4, ndev(Number of GPU Devices per Node) =4 and nnode (Number of Nodes)=1. This condition seems to be false (4 > 8). Despite this, the message still appears. All 4 GPUs were active during the run. funny. Even funnier, the number of GPUs actually used does not seem to be written anywhere on output. Add a line printing nproc, ndev, nnode just before the warning is issued, recompile and re-run. One (at least) of those numbers is not what you expect. Computers are not among the most reliable machines, but they should be able to find out who is larger between 4 and 8 Paolo -- Forschungszentrum Jülich GmbH Institute of Energy and Climate Research Theory and Computation of Energy Materials (IEK-13) E-mail: y.t...@fz-juelich.de<mailto:y.t...@fz-juelich.de> Forschungszentrum Jülich GmbH 52425 Jülich Sitz der Gesellschaft: Jülich Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users
[QE-users] Hubbard U Parameters in pmw.x
Dear Quantum Espresso Community, I am currently working with the poormanwannier.x (pmw.x) as demonstrated in example05 of PP. In this example, the procedure begins with a self-consistent field (SCF) calculation without the incorporation of Wannier functions (wf) and with very small Hubbard parameters (around 0.001). This is followed by the execution of pmw.x and a subsequent second SCF calculation, this time including Wannier functions and the Hubbard U parameters. My question pertains to the rationale behind this two-step SCF calculation approach, particularly concerning the Hubbard U parameters. Why is it not recommended or feasible to employ the Hubbard U parameters right from the first SCF calculation? Understanding the underlying reasoning for this process would greatly enhance my comprehension of the method and its applications. I appreciate any insights or explanations you can provide. Thank you for your time and assistance. Best regards, PhD student in Forschungszentrum Jülich Yin-Ying Ting -- Forschungszentrum Jülich GmbH Institute of Energy and Climate Research Theory and Computation of Energy Materials (IEK-13) E-mail: y.t...@fz-juelich.de Forschungszentrum Jülich GmbH 52425 Jülich Sitz der Gesellschaft: Jülich Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users
[QE-users] [QE-GPU] Significant Slowdown in GPU Phonon Calculation Preparation
Dear Quantum Espresso Experts, I've been running a phonon calculation using QE-7.2 on both CPU and GPU and have noticed significantly different runtimes in the preparation step. I'd appreciate insights on how to improve the GPU performance. CPU Setup: Nodes: 8 Cores per Node: 128 Parallelization: 8 images, 4 npools (calculation has 4 k-points) GPU Setup: Nodes: 8 GPUs per Node: 4 OMP_NUM_THREADS: 1 Parallelization: 8 images, 4 npools (calculation has 4 k-points) The preparation step, including steps like "Compute atoms," "Ewald sum," and charge density calculations, took around 9 hours on the CPU setup but over 24 hours on the GPU setup. Below is a section from the CPU calculation output before computing scf of each representation mode: Compute atoms: 1,2,3,4,5,6,7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, Alpha used in Ewald sum = 2.8000 negative rho (up, down): 9.781E-01 0.000E+00 PHONON : 8h51m CPU 8h56m WALL Representation # 1 mode # 1 Self-consistent Calculation --- Why might the preparation step for the GPU-based calculation be taking significantly longer? Are there specific optimizations or configurations I can apply to improve GPU performance in QE phonon calculations? Thank you in advance for your help! Best regards, PhD student in Forschungszentrum Jülich Yin-Ying Ting -- Forschungszentrum Jülich GmbH Institute of Energy and Climate Research Theory and Computation of Energy Materials (IEK-13) E-mail: y.t...@fz-juelich.de<mailto:y.t...@fz-juelich.de> Forschungszentrum Jülich GmbH 52425 Jülich Sitz der Gesellschaft: Jülich Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users