Re: [QE-users] Optimal pw command line for large systems and only Gamma point
Ciao Nicola You're right, I've mixed two different things too much with a misleading result. The first information was "use Gaussian smearing because in my experience makes scf more stable". The second was "if you use Gaussian smearing and scf_must_converge=.false., then you may reduce smearing to lower values that avoid the smearing of too much charge density across the semiconductor band gap (if there is any in such nanoclusters...), with a partial occupation of orbitals that should be empty". Thanks for the clarification, I think it will be useful to Antonio. Best Giuseppe Quoting Nicola Marzari : On 13/05/2024 17:26, Giuseppe Mattioli wrote: occupations= 'smearing' smearing= 'cold' degauss= 0.05 ! I know it's quite large, but necessary to stabilize the SCF at this preliminary stage (no geometry step done yet) mixing_beta= 0.4 If you want to stabilize the scf it is better to use a Gaussian smearing and to reduce degauss (to 0.01) and mixing beta (to 0.1 or even 0.05~0.01). In the case of a relax calculation with a difficult first step, try to use scf_must_converge=.false. and a reasonable electron_maxstep (30~50). It often helps when the scf is not completely going astray. Ciao Giuseppe, I would agree that in a semiconductor it might be more natural to use Gaussian (although even for cold things are now sorted out - https://journals.aps.org/prb/abstract/10.1103/PhysRevB.107.195122); but I wonder why reducing the smearing would help convergence. To me, the smaller the smearing the more you can be affected by level-crossing instabiities? nicola -- Prof Nicola Marzari, Chair of Theory and Simulation of Materials, EPFL Director, National Centre for Competence in Research NCCR MARVEL, SNSF Head, Laboratory for Materials Simulations, Paul Scherrer Institut Contact info and websites at http://theossrv1.epfl.ch/Main/Contact GIUSEPPE MATTIOLI CNR - ISTITUTO DI STRUTTURA DELLA MATERIA Via Salaria Km 29,300 - C.P. 10 I-00015 - Monterotondo Scalo (RM) Mob (*preferred*) +39 373 7305625 Tel + 39 06 90672342 - Fax +39 06 90672316 E-mail: ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users
Re: [QE-users] Optimal pw command line for large systems and only Gamma point
On 13/05/2024 17:26, Giuseppe Mattioli wrote: occupations= 'smearing' smearing= 'cold' degauss= 0.05 ! I know it's quite large, but necessary to stabilize the SCF at this preliminary stage (no geometry step done yet) mixing_beta= 0.4 If you want to stabilize the scf it is better to use a Gaussian smearing and to reduce degauss (to 0.01) and mixing beta (to 0.1 or even 0.05~0.01). In the case of a relax calculation with a difficult first step, try to use scf_must_converge=.false. and a reasonable electron_maxstep (30~50). It often helps when the scf is not completely going astray. Ciao Giuseppe, I would agree that in a semiconductor it might be more natural to use Gaussian (although even for cold things are now sorted out - https://journals.aps.org/prb/abstract/10.1103/PhysRevB.107.195122); but I wonder why reducing the smearing would help convergence. To me, the smaller the smearing the more you can be affected by level-crossing instabiities? nicola -- Prof Nicola Marzari, Chair of Theory and Simulation of Materials, EPFL Director, National Centre for Competence in Research NCCR MARVEL, SNSF Head, Laboratory for Materials Simulations, Paul Scherrer Institut Contact info and websites at http://theossrv1.epfl.ch/Main/Contact ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users
Re: [QE-users] Optimal pw command line for large systems and only Gamma point
Dear Antonio The actual time spent per scf cycle is about 33 minutes. This is not so bad. :-) The relevant parameters in the input file are the following: Some relevant parameters are not shown. input_dft= 'pz' ecutwfc= 25 Which kind of pseudopotential? You didn't set ecutrho... What about ibrav and celldm? I suppose that you really want to perform LDA calculations for some reason. occupations= 'smearing' smearing= 'cold' degauss= 0.05 ! I know it's quite large, but necessary to stabilize the SCF at this preliminary stage (no geometry step done yet) mixing_beta= 0.4 If you want to stabilize the scf it is better to use a Gaussian smearing and to reduce degauss (to 0.01) and mixing beta (to 0.1 or even 0.05~0.01). In the case of a relax calculation with a difficult first step, try to use scf_must_converge=.false. and a reasonable electron_maxstep (30~50). It often helps when the scf is not completely going astray. nbnd= 2010 diagonalization= 'ppcg' davidson should be faster. And, if possible, also to reduce the number of nodes? Estimated total dynamical RAM > 1441.34 GB you may try with 7-8 nodes according to this estimate. HTH Giuseppe Quoting Antonio Cammarata via users : I did some tests. For 1000 Si atoms, I use 2010 bands because I need to get the band gap value; moreover, being a cluster, the surface states of the truncated bonds might close the gap, especially at the first steps of the geometry optimization, so it's better I use few empty bands. I managed to run the calculation by using 10 nodes and a max of 40 cores per node. My question now is: can you suggest me optimal command line options and/or input settings to speed up the calculation? And, if possible, also to reduce the number of nodes? The relevant parameters in the input file are the following: input_dft= 'pz' ecutwfc= 25 occupations= 'smearing' smearing= 'cold' degauss= 0.05 ! I know it's quite large, but necessary to stabilize the SCF at this preliminary stage (no geometry step done yet) nbnd= 2010 diagonalization= 'ppcg' mixing_mode= 'plain' mixing_beta= 0.4 The actual time spent per scf cycle is about 33 minutes. I use QE v. 7.3 compiled with openmpi and scalapack. I have access to the intel compilers too but I did some tests and the difference is just tens of seconds. I have only the Gamma point; please, here you have some info about the grid and the estimated RAM usage: Dense grid: 24616397 G-vectors FFT dimensions: ( 375, 375, 375) Dynamical RAM for wfc: 235.91 MB Dynamical RAM for wfc (w. buffer): 235.91 MB Dynamical RAM for str. fact: 0.94 MB Dynamical RAM for local pot: 0.00 MB Dynamical RAM for nlocal pot: 2112.67 MB Dynamical RAM for qrad: 0.80 MB Dynamical RAM for rho,v,vnew: 6.04 MB Dynamical RAM for rhoin: 2.01 MB Dynamical RAM for rho*nmix: 15.03 MB Dynamical RAM for G-vectors: 3.99 MB Dynamical RAM for h,s,v(r/c): 0.46 MB Dynamical RAM for : 552.06 MB Dynamical RAM for wfcinit/wfcrot: 1305.21 MB Estimated static dynamical RAM per process > 2.31 GB Estimated max dynamical RAM per process > 3.60 GB Estimated total dynamical RAM > 1441.34 GB Thanks a lot in advance for your kind help. All the best Antonio On 10. 05. 24 12:01, Paolo Giannozzi wrote: On 5/10/24 08:58, Antonio Cammarata via users wrote: pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out too many processors for linear-algebra parallelization. 1000 Si atoms = 2000 bands (assuming an insulator with no spin polarization). Use a few tens of processors at most "some processors have no G-vectors for symmetrization". which sounds strange to me: with the Gamma point symmetrization is not even needed Dense grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400) This is what a 256-atom Si supercell with 30 Ry cutoff yields: Dense grid: 825897 G-vectors FFT dimensions: ( 162, 162, 162) I guess you may reduce the size of your supercell Paolo Dynamical RAM for wfc: 153.50 MB Dynamical RAM for wfc (w. buffer): 153.50 MB Dynamical RAM for str. fact: 0.61 MB Dynamical RAM for local pot: 0.00 MB Dynamical RAM for nlocal pot: 1374.66 MB Dynamical RAM for qrad: 0.87 MB Dynamical RAM for rho,v,vnew: 5.50 MB Dynamical RAM for rhoin: 1.83 MB Dynamical RAM for rho*nmix: 9.78 MB Dynamical RAM for G-vectors: 2.60 MB Dynamical RAM for h,s,v(r/c): 0.25 MB Dy
Re: [QE-users] Optimal pw command line for large systems and only Gamma point
I did some tests. For 1000 Si atoms, I use 2010 bands because I need to get the band gap value; moreover, being a cluster, the surface states of the truncated bonds might close the gap, especially at the first steps of the geometry optimization, so it's better I use few empty bands. I managed to run the calculation by using 10 nodes and a max of 40 cores per node. My question now is: can you suggest me optimal command line options and/or input settings to speed up the calculation? And, if possible, also to reduce the number of nodes? The relevant parameters in the input file are the following: input_dft= 'pz' ecutwfc= 25 occupations= 'smearing' smearing= 'cold' degauss= 0.05 ! I know it's quite large, but necessary to stabilize the SCF at this preliminary stage (no geometry step done yet) nbnd= 2010 diagonalization= 'ppcg' mixing_mode= 'plain' mixing_beta= 0.4 The actual time spent per scf cycle is about 33 minutes. I use QE v. 7.3 compiled with openmpi and scalapack. I have access to the intel compilers too but I did some tests and the difference is just tens of seconds. I have only the Gamma point; please, here you have some info about the grid and the estimated RAM usage: Dense grid: 24616397 G-vectors FFT dimensions: ( 375, 375, 375) Dynamical RAM for wfc: 235.91 MB Dynamical RAM for wfc (w. buffer): 235.91 MB Dynamical RAM for str. fact: 0.94 MB Dynamical RAM for local pot: 0.00 MB Dynamical RAM for nlocal pot: 2112.67 MB Dynamical RAM for qrad: 0.80 MB Dynamical RAM for rho,v,vnew: 6.04 MB Dynamical RAM for rhoin: 2.01 MB Dynamical RAM for rho*nmix: 15.03 MB Dynamical RAM for G-vectors: 3.99 MB Dynamical RAM for h,s,v(r/c): 0.46 MB Dynamical RAM for : 552.06 MB Dynamical RAM for wfcinit/wfcrot: 1305.21 MB Estimated static dynamical RAM per process > 2.31 GB Estimated max dynamical RAM per process > 3.60 GB Estimated total dynamical RAM > 1441.34 GB Thanks a lot in advance for your kind help. All the best Antonio On 10. 05. 24 12:01, Paolo Giannozzi wrote: On 5/10/24 08:58, Antonio Cammarata via users wrote: pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out too many processors for linear-algebra parallelization. 1000 Si atoms = 2000 bands (assuming an insulator with no spin polarization). Use a few tens of processors at most "some processors have no G-vectors for symmetrization". which sounds strange to me: with the Gamma point symmetrization is not even needed Dense grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400) This is what a 256-atom Si supercell with 30 Ry cutoff yields: Dense grid: 825897 G-vectors FFT dimensions: ( 162, 162, 162) I guess you may reduce the size of your supercell Paolo Dynamical RAM for wfc: 153.50 MB Dynamical RAM for wfc (w. buffer): 153.50 MB Dynamical RAM for str. fact: 0.61 MB Dynamical RAM for local pot: 0.00 MB Dynamical RAM for nlocal pot: 1374.66 MB Dynamical RAM for qrad: 0.87 MB Dynamical RAM for rho,v,vnew: 5.50 MB Dynamical RAM for rhoin: 1.83 MB Dynamical RAM for rho*nmix: 9.78 MB Dynamical RAM for G-vectors: 2.60 MB Dynamical RAM for h,s,v(r/c): 0.25 MB Dynamical RAM for : 552.06 MB Dynamical RAM for wfcinit/wfcrot: 977.20 MB Estimated static dynamical RAM per process > 1.51 GB Estimated max dynamical RAM per process > 2.47 GB Estimated total dynamical RAM > 1900.41 GB I managed to run the simulation with 512 atoms, cg diagonalization and 3 nodes on the same machine with command line pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out Please, do you have any suggestion on how to set optimal parallelization parameters to avoid the memory issue and run the calculation? I am also planning to run simulations on nanoclusters with more than 1000 atoms. Thanks a lot in advance for your kind help. Antonio -- ___ Antonio Cammarata, PhD in Physics Associate Professor in Applied Physics Advanced Materials Group Department of Control Engineering - KN:G-204 Faculty of Electrical Engineering Czech Technical University in Prague Karlovo Náměstí, 13 121 35, Prague 2, Czech Republic Phone: +420 224 35 5711 Fax: +420 224 91 8646 ORCID: orcid.org/-0002-5691-0682 WoS ResearcherID: A-4883-2014 ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns ab
Re: [QE-users] Optimal pw command line for large systems and only Gamma point
On 5/10/24 08:58, Antonio Cammarata via users wrote: pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out too many processors for linear-algebra parallelization. 1000 Si atoms = 2000 bands (assuming an insulator with no spin polarization). Use a few tens of processors at most "some processors have no G-vectors for symmetrization". which sounds strange to me: with the Gamma point symmetrization is not even needed Dense grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400) This is what a 256-atom Si supercell with 30 Ry cutoff yields: Dense grid: 825897 G-vectors FFT dimensions: ( 162, 162, 162) I guess you may reduce the size of your supercell Paolo Dynamical RAM for wfc: 153.50 MB Dynamical RAM for wfc (w. buffer): 153.50 MB Dynamical RAM for str. fact: 0.61 MB Dynamical RAM for local pot: 0.00 MB Dynamical RAM for nlocal pot: 1374.66 MB Dynamical RAM for qrad: 0.87 MB Dynamical RAM for rho,v,vnew: 5.50 MB Dynamical RAM for rhoin: 1.83 MB Dynamical RAM for rho*nmix: 9.78 MB Dynamical RAM for G-vectors: 2.60 MB Dynamical RAM for h,s,v(r/c): 0.25 MB Dynamical RAM for : 552.06 MB Dynamical RAM for wfcinit/wfcrot: 977.20 MB Estimated static dynamical RAM per process > 1.51 GB Estimated max dynamical RAM per process > 2.47 GB Estimated total dynamical RAM > 1900.41 GB I managed to run the simulation with 512 atoms, cg diagonalization and 3 nodes on the same machine with command line pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out Please, do you have any suggestion on how to set optimal parallelization parameters to avoid the memory issue and run the calculation? I am also planning to run simulations on nanoclusters with more than 1000 atoms. Thanks a lot in advance for your kind help. Antonio -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, Univ. Udine, via delle Scienze 208, 33100 Udine Italy, +39-0432-558216 ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users
Re: [QE-users] Optimal pw command line for large systems and only Gamma point
Dear Antonio Before struggling with parallelization setup, I see that Estimated total dynamical RAM > 1900.41 GB your calculation requires more or less up to 2TB RAM. I can't see your setup (e.g., the supercell containing your cluster) but I suggest that you ask yourself if there is something in your setup that can be reduced. Moreover, You need at least 10 nodes because of the RAM requirements, but you likely don't need 1280 processes. You should try to heavily reduce them using 10 (or more) nodes. Have you tried to launch pw.x without any indication of parallelization distribution, the simpler the better? HTH Giuseppe Quoting Antonio Cammarata via users : Dear all, I have a silicon nanocluster with 1000 atoms with 1 1 1 k-mesh (only Gamma point). I cannot manage to run the calculation due to memory issue. I use a computational cluster with 128 core/node and 200 GB RAM per node. I am using PWSCF v.7.3. In the input I set ecutwfc= 29 and cg diagonalization to save memory. According to https://www.quantum-espresso.org/Doc/user_guide/node20.html I tried several command line parameters, the last being pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out for a run on 6 nodes. I tried up to 12 nodes but after 7 nodes I get the warning message "some processors have no G-vectors for symmetrization". Here some info that may be relevant for the issue Dense grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400) Dynamical RAM for wfc: 153.50 MB Dynamical RAM for wfc (w. buffer): 153.50 MB Dynamical RAM for str. fact: 0.61 MB Dynamical RAM for local pot: 0.00 MB Dynamical RAM for nlocal pot: 1374.66 MB Dynamical RAM for qrad: 0.87 MB Dynamical RAM for rho,v,vnew: 5.50 MB Dynamical RAM for rhoin: 1.83 MB Dynamical RAM for rho*nmix: 9.78 MB Dynamical RAM for G-vectors: 2.60 MB Dynamical RAM for h,s,v(r/c): 0.25 MB Dynamical RAM for : 552.06 MB Dynamical RAM for wfcinit/wfcrot: 977.20 MB Estimated static dynamical RAM per process > 1.51 GB Estimated max dynamical RAM per process > 2.47 GB Estimated total dynamical RAM > 1900.41 GB I managed to run the simulation with 512 atoms, cg diagonalization and 3 nodes on the same machine with command line pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out Please, do you have any suggestion on how to set optimal parallelization parameters to avoid the memory issue and run the calculation? I am also planning to run simulations on nanoclusters with more than 1000 atoms. Thanks a lot in advance for your kind help. Antonio -- ___ Antonio Cammarata, PhD in Physics Associate Professor in Applied Physics Advanced Materials Group Department of Control Engineering - KN:G-204 Faculty of Electrical Engineering Czech Technical University in Prague Karlovo Náměstí, 13 121 35, Prague 2, Czech Republic Phone: +420 224 35 5711 Fax: +420 224 91 8646 ORCID: orcid.org/-0002-5691-0682 ResercherID: A-4883-2014 ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users GIUSEPPE MATTIOLI CNR - ISTITUTO DI STRUTTURA DELLA MATERIA Via Salaria Km 29,300 - C.P. 10 I-00015 - Monterotondo Scalo (RM) Mob (*preferred*) +39 373 7305625 Tel + 39 06 90672342 - Fax +39 06 90672316 E-mail: ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users
[QE-users] Optimal pw command line for large systems and only Gamma point
Dear all, I have a silicon nanocluster with 1000 atoms with 1 1 1 k-mesh (only Gamma point). I cannot manage to run the calculation due to memory issue. I use a computational cluster with 128 core/node and 200 GB RAM per node. I am using PWSCF v.7.3. In the input I set ecutwfc= 29 and cg diagonalization to save memory. According to https://www.quantum-espresso.org/Doc/user_guide/node20.html I tried several command line parameters, the last being pw.x -nk 1 -nt 1 -nb 1 -nd 768 -inp qe.in > qe.out for a run on 6 nodes. I tried up to 12 nodes but after 7 nodes I get the warning message "some processors have no G-vectors for symmetrization". Here some info that may be relevant for the issue Dense grid: 30754065 G-vectors FFT dimensions: ( 400, 400, 400) Dynamical RAM for wfc: 153.50 MB Dynamical RAM for wfc (w. buffer): 153.50 MB Dynamical RAM for str. fact: 0.61 MB Dynamical RAM for local pot: 0.00 MB Dynamical RAM for nlocal pot: 1374.66 MB Dynamical RAM for qrad: 0.87 MB Dynamical RAM for rho,v,vnew: 5.50 MB Dynamical RAM for rhoin: 1.83 MB Dynamical RAM for rho*nmix: 9.78 MB Dynamical RAM for G-vectors: 2.60 MB Dynamical RAM for h,s,v(r/c): 0.25 MB Dynamical RAM for : 552.06 MB Dynamical RAM for wfcinit/wfcrot: 977.20 MB Estimated static dynamical RAM per process > 1.51 GB Estimated max dynamical RAM per process > 2.47 GB Estimated total dynamical RAM > 1900.41 GB I managed to run the simulation with 512 atoms, cg diagonalization and 3 nodes on the same machine with command line pw.x -nk 1 -nt 1 -nd 484 -inp qe.in > qe.out Please, do you have any suggestion on how to set optimal parallelization parameters to avoid the memory issue and run the calculation? I am also planning to run simulations on nanoclusters with more than 1000 atoms. Thanks a lot in advance for your kind help. Antonio -- ___ Antonio Cammarata, PhD in Physics Associate Professor in Applied Physics Advanced Materials Group Department of Control Engineering - KN:G-204 Faculty of Electrical Engineering Czech Technical University in Prague Karlovo Náměstí, 13 121 35, Prague 2, Czech Republic Phone: +420 224 35 5711 Fax: +420 224 91 8646 ORCID: orcid.org/-0002-5691-0682 ResercherID: A-4883-2014 ___ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples ___ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users