Re: [Pw_forum] Problem with MPI parallelization: Error in routine zsqmred

2016-09-02 Thread Filippo SPIGA
Dear Jan,

Paolo is right, you are providing us very little information to help. Please 
create a tar,gz containing:
- your make.sys
- the file "install/config.log"
- the submission script you used to run the job
- the input file
- the pseudo-potentials required to run the example
- some technical details about your workstation / server / cluster


On Sep 2, 2016, at 8:43 AM, Jan Oliver Oelerich 
 wrote:
> 
> Hi QE users,
> 
> I am trying to run QE 5.4.0 with MPI parallelization on a mid-size 
> cluster. I successfully tested the installation using 8 processes 
> distributed on 2 nodes, so communication across nodes is not a problem. 
> When I, however, run the same calculation on 64 cores, I am getting the 
> following error messages in the stdout:

--
Filippo SPIGA ~ Quantum ESPRESSO Foundation ~ http://www.quantum-espresso.org


___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum


Re: [Pw_forum] Problem with MPI parallelization: Error in routine zsqmred

2016-09-02 Thread Paolo Giannozzi
First of all, try to figure out if the problem is reproducible on another
machine, or with another software configuration (compilers, libraries etc).
Nobody has ever reported such an error.

Paolo

On Fri, Sep 2, 2016 at 9:43 AM, Jan Oliver Oelerich <
jan.oliver.oeler...@physik.uni-marburg.de> wrote:

> Hi QE users,
>
> I am trying to run QE 5.4.0 with MPI parallelization on a mid-size
> cluster. I successfully tested the installation using 8 processes
> distributed on 2 nodes, so communication across nodes is not a problem.
> When I, however, run the same calculation on 64 cores, I am getting the
> following error messages in the stdout:
>
>
>iteration #  1 ecut=30.00 Ry beta=0.70
>Davidson diagonalization with overlap
>
>
> 
> %%
>Error in routine  zsqmred (8):
>
> somthing wrong with row 3
>
> 
> %%
>
>stopping ...
>
> 
> %%
>
>Error in routine  zsqmred (4):
>
> 
> %%
> somthing wrong with row 3
>Error in routine  zsqmred (12):
>
> 
> %%
> somthing wrong with row 3
>
>
> 
> %%
>stopping ...
>
>stopping ...
>
>
> The cluster queues stderr shows that some MPI processes exited:
>
>
> PSIlogger: Child with rank 28 exited with status 12.
> PSIlogger: Child with rank 8 exited with status 4.
> application called MPI_Abort(MPI_COMM_WORLD, 12) - process 28application
> called MPI_Abort(MPI_COMM_WORLD, 4) - process 8application called
> MPI_Abort(MPI_COMM_WORLD, 8) - process 18kvsprovider[12375]: sighandler:
> Terminating the job.
> PSIlogger: Child with rank 18 exited with status 8.
> PSIlogger: Child with rank 4 exited with status 1.
> PSIlogger: Child with rank 15 exited with status 1.
> PSIlogger: Child with rank 53 exited with status 1.
> PSIlogger: Child with rank 30 exited with status 1.
>
>
> The cluster is running some sort of Sun Grid Engine and I used intel
> MPI. I see no other error messages. Could you give me a hint how to
> debug this further? Verbosity is already 'high'.
>
> Thank you very much and best regards,
> Jan Oliver Oelerich
>
>
>
>
> --
> Dr. Jan Oliver Oelerich
> Faculty of Physics and Material Sciences Center
> Philipps-Universität Marburg
>
> Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany
> Phone: +49 6421 2822260
> Mail : jan.oliver.oeler...@physik.uni-marburg.de
> Web  : http://academics.oelerich.org
> ___
> Pw_forum mailing list
> Pw_forum@pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum




-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

[Pw_forum] Problem with MPI parallelization: Error in routine zsqmred

2016-09-02 Thread Jan Oliver Oelerich
Hi QE users,

I am trying to run QE 5.4.0 with MPI parallelization on a mid-size 
cluster. I successfully tested the installation using 8 processes 
distributed on 2 nodes, so communication across nodes is not a problem. 
When I, however, run the same calculation on 64 cores, I am getting the 
following error messages in the stdout:


   iteration #  1 ecut=30.00 Ry beta=0.70
   Davidson diagonalization with overlap

 
%%
   Error in routine  zsqmred (8):

somthing wrong with row 3
 
%%

   stopping ...
 
%%

   Error in routine  zsqmred (4):
 
%%
somthing wrong with row 3
   Error in routine  zsqmred (12):
 
%%
somthing wrong with row 3

 
%%
   stopping ...

   stopping ...


The cluster queues stderr shows that some MPI processes exited:


PSIlogger: Child with rank 28 exited with status 12.
PSIlogger: Child with rank 8 exited with status 4.
application called MPI_Abort(MPI_COMM_WORLD, 12) - process 28application 
called MPI_Abort(MPI_COMM_WORLD, 4) - process 8application called 
MPI_Abort(MPI_COMM_WORLD, 8) - process 18kvsprovider[12375]: sighandler: 
Terminating the job.
PSIlogger: Child with rank 18 exited with status 8.
PSIlogger: Child with rank 4 exited with status 1.
PSIlogger: Child with rank 15 exited with status 1.
PSIlogger: Child with rank 53 exited with status 1.
PSIlogger: Child with rank 30 exited with status 1.


The cluster is running some sort of Sun Grid Engine and I used intel 
MPI. I see no other error messages. Could you give me a hint how to 
debug this further? Verbosity is already 'high'.

Thank you very much and best regards,
Jan Oliver Oelerich




-- 
Dr. Jan Oliver Oelerich
Faculty of Physics and Material Sciences Center
Philipps-Universität Marburg

Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany
Phone: +49 6421 2822260
Mail : jan.oliver.oeler...@physik.uni-marburg.de
Web  : http://academics.oelerich.org
___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum