Re: [QE-users] QE Multi-GPU mpirun run command

2023-01-02 Thread Louis Stuber via users
Dear Vijeta,

Based on QE error convention codes in mp_world.f90, I think this error is 
CUDA_ERROR_UNSUPPORTED_PTX_VERSION. It can come from an outdated driver, or not 
passing the right flags to NVCC ("-arch sm_70") or PGI (“-gpu=cc70”). Are you 
sure that your build really runs with 1 GPU ? In that case, it might be that 
the 2 GPUs have a different architecture (then use something like 
“-gpu=cc70,cc80” to specify multiple archs).

Best,
Louis

From: users  On Behalf Of Vijeta 
Sharma
Sent: Monday, January 2, 2023 1:05 PM
To: users@lists.quantum-espresso.org
Subject: [QE-users] QE Multi-GPU mpirun run command

External email: Use caution opening links or attachments


Dear QE researchers,

I am running QE-GPU with the following command for 2 GPU (1 process each GPU):

mpirun -np 2 pw.x -inp data.in

But it is giving error of


*** error in Message Passing (mp) module ***
*** error code: 9422
*** error in Message Passing (mp) module ***
*** error code: 9422

Can anyone provide me the correct mpirun command to run QE on multi-GPU ?



-
Regards,
Vijeta Sharma
Pune, India



[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.

___
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples
___
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] [QE 7.0/GPU] Run built with CMake fails with cuFFT error

2022-05-23 Thread Louis Stuber via users
Hi Robert,

I think it's an issue with PGI in NVHPC 22.3.  Please use NVHPC 21.9 instead 
(PGI compiler).

Normally the configure script should catch it and prevent using NVHPC 22.3, 
unfortunately the check has been introduced only after QE 7.0.

Best,
Louis

From: users  On Behalf Of Robert 
MIJAKOVIC
Sent: Monday, May 23, 2022 5:23 PM
To: users@lists.quantum-espresso.org
Subject: [QE-users] [QE 7.0/GPU] Run built with CMake fails with cuFFT error

Some people who received this message don't often get email from 
robert.mijako...@lxp.lu. Learn why this is 
important
External email: Use caution opening links or attachments

# Summary
QE 7.0/GPU compiled with CMake fails on our system in "routine 
fft_scalar_cuFFT: cft_1z_gpu (8)".

# Version
qe-7.0-ReleasePack.tgz

# Environment
## Hardware
1. 2xAMD EPYC 7452
2. 4xNVIDIA A100
3. 512 GB RAM
## Software
1. OS: Rocky Linux release 8.5 (Green Obsidian)
2. NVHPC 22.3
3. OpenMPI 4.1.3 built with NVHPC 22.3
4. CUDA 11.3.1 with Driver 470.82.01
5. libxc 5.1.5
6. CMake 3.20.1
7. M4 1.4.19

# Steps to reproduce
## Configured with:
`-DQE_ENABLE_CUDA=1 -DQE_FFTW_VENDOR=Internal -DQE_ENABLE_LIBXC=1 
-DQE_ENABLE_OPENMP=1 `
## Prebuild options
`cp $EBROOTLIBXC/include/*.mod Modules/mod/qe_modules && export FPP='nvfortran 
-Mpreprocess -E' && export CPP='cpp -E' && export FCPP='cpp -E' && `
## make options
`make all epw`
## Execute
srun

## Input files
QEF AUSURF112 benchmark

# Observed behaviour
When example is started it fails with:
```
%%
Error in routine fft_scalar_cuFFT: cft_1z_gpu (8):
cufftPlanMany failed
```

# Questions
1. What do I do wrong?
2. Why is there no option to set FFTW_VENDOR to cuFFT?
3. Why it got linked against cuFFT if FFTW_VENDOR is set to Internal?

Dr. rer. nat. Robert Mijaković | HPC System Software Architect

LuxProvide
3, Op der Poukewiss | L-7795 Bissen
Grand-Duchy of Luxembourg
M (+352) 691 396 474
robert.mijako...@lxp.lu | 
www.luxprovide.lu

___
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples
___
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] [QE-GPU] Performance of the NGC Container

2021-07-16 Thread Louis Stuber via users
Hi Jonathan,

Thanks for your message and apologies for the late reply, as Paolo mentioned, 
the GPU version should never be slower than the CPU one except if it calls 
routines which are not implemented (fortunately the one you talked about has 
been implemented recently).


  *   CUDA-aware MPI is nice. It appears that the container is configured to 
use the MPI libraries in the container instead of those installed for the local 
cluster. Is this true? Can users take advantage of their local CUDA-aware MPI 
libraries?

Yes, a container will almost never see/use what's on your local cluster except 
for low-level drivers/kernel things . It is not possible to use your own MPI 
installation without rebuilding the container, however, the container that was 
uploaded on NGC already uses CUDA-aware MPI iirc so it should already perform 
well in that regard.

Best,
Louis
From: users  On Behalf Of Paolo 
Giannozzi
Sent: Tuesday, July 6, 2021 8:47 PM
To: Quantum ESPRESSO users Forum 
Subject: Re: [QE-users] [QE-GPU] Performance of the NGC Container

External email: Use caution opening links or attachments

The GPU acceleration of DFT-D3, using openacc,  as well as its MPI 
parallelization, was implemented no more than a few days ago and will appear in 
the next release (soon). Apparently DFT-D3 takes a non-negligible amount of 
time. Without MPI parallelization or GPU acceleration, it may easily become a 
bottleneck when running on many processors, or on GPUs.

Paolo

On Tue, Jul 6, 2021 at 7:44 PM Jonathan D. Halverson 
mailto:halver...@princeton.edu>> wrote:
Hello (@Louis Stuber),

The QE container on NGC 
(https://ngc.nvidia.com/catalog/containers/hpc:quantum_espresso)
 appears to be running very well for us on a node with two A100's for the 
"AUSURF112, Gold surface (112 atoms), DEISA pw" benchmark. We see a speed-up of 
8x in comparison to running on 80 Skylake CPU-cores (no GPUs) where the code 
was built from source.

The procedure we used for the above is here:
https://researchcomputing.princeton.edu/support/knowledge-base/quantum-espresso

However, for one system we see a slow down (i.e., the code runs faster using 
only CPU-cores). Can you tell if the system below should perform well using the 
container?

"My system is basically just two carbon dioxide molecules and doing a single 
point calculation on them using the PBE-D3 functional and basically just 
altering the distance between the two molecules in the atomic coordinates."

Can someone comment in general on when one would expect the container running 
on GPUs to outperform a build-from-source executable running on CPU-cores?

CUDA-aware MPI is nice. It appears that the container is configured to use the 
MPI libraries in the container instead of those installed for the local 
cluster. Is this true? Can users take advantage of their local CUDA-aware MPI 
libraries?

Jon
___
Quantum ESPRESSO is supported by MaX 
(www.max-centre.eu)
users mailing list 
users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users


--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
___
Quantum ESPRESSO is supported by MaX