Re: [Pw_forum] Problem with MPI parallelization: Error in routine zsqmred
Dear Jan, Paolo is right, you are providing us very little information to help. Please create a tar,gz containing: - your make.sys - the file "install/config.log" - the submission script you used to run the job - the input file - the pseudo-potentials required to run the example - some technical details about your workstation / server / cluster On Sep 2, 2016, at 8:43 AM, Jan Oliver Oelerich wrote: > > Hi QE users, > > I am trying to run QE 5.4.0 with MPI parallelization on a mid-size > cluster. I successfully tested the installation using 8 processes > distributed on 2 nodes, so communication across nodes is not a problem. > When I, however, run the same calculation on 64 cores, I am getting the > following error messages in the stdout: -- Filippo SPIGA ~ Quantum ESPRESSO Foundation ~ http://www.quantum-espresso.org ___ Pw_forum mailing list Pw_forum@pwscf.org http://pwscf.org/mailman/listinfo/pw_forum
Re: [Pw_forum] Problem with MPI parallelization: Error in routine zsqmred
First of all, try to figure out if the problem is reproducible on another machine, or with another software configuration (compilers, libraries etc). Nobody has ever reported such an error. Paolo On Fri, Sep 2, 2016 at 9:43 AM, Jan Oliver Oelerich < jan.oliver.oeler...@physik.uni-marburg.de> wrote: > Hi QE users, > > I am trying to run QE 5.4.0 with MPI parallelization on a mid-size > cluster. I successfully tested the installation using 8 processes > distributed on 2 nodes, so communication across nodes is not a problem. > When I, however, run the same calculation on 64 cores, I am getting the > following error messages in the stdout: > > >iteration # 1 ecut=30.00 Ry beta=0.70 >Davidson diagonalization with overlap > > > > %% >Error in routine zsqmred (8): > > somthing wrong with row 3 > > > %% > >stopping ... > > > %% > >Error in routine zsqmred (4): > > > %% > somthing wrong with row 3 >Error in routine zsqmred (12): > > > %% > somthing wrong with row 3 > > > > %% >stopping ... > >stopping ... > > > The cluster queues stderr shows that some MPI processes exited: > > > PSIlogger: Child with rank 28 exited with status 12. > PSIlogger: Child with rank 8 exited with status 4. > application called MPI_Abort(MPI_COMM_WORLD, 12) - process 28application > called MPI_Abort(MPI_COMM_WORLD, 4) - process 8application called > MPI_Abort(MPI_COMM_WORLD, 8) - process 18kvsprovider[12375]: sighandler: > Terminating the job. > PSIlogger: Child with rank 18 exited with status 8. > PSIlogger: Child with rank 4 exited with status 1. > PSIlogger: Child with rank 15 exited with status 1. > PSIlogger: Child with rank 53 exited with status 1. > PSIlogger: Child with rank 30 exited with status 1. > > > The cluster is running some sort of Sun Grid Engine and I used intel > MPI. I see no other error messages. Could you give me a hint how to > debug this further? Verbosity is already 'high'. > > Thank you very much and best regards, > Jan Oliver Oelerich > > > > > -- > Dr. Jan Oliver Oelerich > Faculty of Physics and Material Sciences Center > Philipps-Universität Marburg > > Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany > Phone: +49 6421 2822260 > Mail : jan.oliver.oeler...@physik.uni-marburg.de > Web : http://academics.oelerich.org > ___ > Pw_forum mailing list > Pw_forum@pwscf.org > http://pwscf.org/mailman/listinfo/pw_forum -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, Univ. Udine, via delle Scienze 208, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222 ___ Pw_forum mailing list Pw_forum@pwscf.org http://pwscf.org/mailman/listinfo/pw_forum
[Pw_forum] Problem with MPI parallelization: Error in routine zsqmred
Hi QE users, I am trying to run QE 5.4.0 with MPI parallelization on a mid-size cluster. I successfully tested the installation using 8 processes distributed on 2 nodes, so communication across nodes is not a problem. When I, however, run the same calculation on 64 cores, I am getting the following error messages in the stdout: iteration # 1 ecut=30.00 Ry beta=0.70 Davidson diagonalization with overlap %% Error in routine zsqmred (8): somthing wrong with row 3 %% stopping ... %% Error in routine zsqmred (4): %% somthing wrong with row 3 Error in routine zsqmred (12): %% somthing wrong with row 3 %% stopping ... stopping ... The cluster queues stderr shows that some MPI processes exited: PSIlogger: Child with rank 28 exited with status 12. PSIlogger: Child with rank 8 exited with status 4. application called MPI_Abort(MPI_COMM_WORLD, 12) - process 28application called MPI_Abort(MPI_COMM_WORLD, 4) - process 8application called MPI_Abort(MPI_COMM_WORLD, 8) - process 18kvsprovider[12375]: sighandler: Terminating the job. PSIlogger: Child with rank 18 exited with status 8. PSIlogger: Child with rank 4 exited with status 1. PSIlogger: Child with rank 15 exited with status 1. PSIlogger: Child with rank 53 exited with status 1. PSIlogger: Child with rank 30 exited with status 1. The cluster is running some sort of Sun Grid Engine and I used intel MPI. I see no other error messages. Could you give me a hint how to debug this further? Verbosity is already 'high'. Thank you very much and best regards, Jan Oliver Oelerich -- Dr. Jan Oliver Oelerich Faculty of Physics and Material Sciences Center Philipps-Universität Marburg Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany Phone: +49 6421 2822260 Mail : jan.oliver.oeler...@physik.uni-marburg.de Web : http://academics.oelerich.org ___ Pw_forum mailing list Pw_forum@pwscf.org http://pwscf.org/mailman/listinfo/pw_forum