Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance Issues

2021-06-03 Thread Ye Luo
Hi Brad,
1. Your output files differ. One says 'Writing output data file
./pwscf.save' one doesn't. Does one have I/O and one doesn't?
2. Your simulation so small and also you are running 16 MPI ranks, so
largely exercising MPI overhead. Run it a couple times and see if the
timing is reproducible. Does your machine have 16 physical cores or 8 cores
16 hyperthreads?
3. To validate it is actually a compiler regression, run with 1 MPI rank
and compare the timing.

Ye
===
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Thu, Jun 3, 2021 at 3:36 PM Baer, Bradly 
wrote:

> Hello Users,
>
> I have a working QE6.7 install built with Intel's Parallel Studios from
> 2020.  I want to compile the d3q code but I have found that my parallel
> studios license is expired, and I must switch to Intel's new OneAPI
> distribution to continue using ifort, icc etc.  I have configured
> everything in the same way as with the parallel studios version, including
> using the same make.inc file, but my parallel performance is very poor when
> using the OneAPI version.
>
> Attached are my make.inc file that I used for both compiles and an example
> output file  using pw.x compiled with parallel studios and OneAPI. The
> parallel studios calculation had a CPU//wall time of 5.89s//5.97s but the
> OneAPI version has almost a 50% performance loss and shows times of
> 5.92s//8.71s.  Both were made using the same inputs.
>
> Has anyone had experience compiling with the new OneAPI versions of
> things?  Have I missed some small but important change in how the libraries
> are linked now?
>
> Thanks,
> Brad
>
> 
> Bradly Baer
> Graduate Research Assistant, Walker Lab
> Interdisciplinary Materials Science
> Vanderbilt University
>
>
> ___
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users@lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
___
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance Issues

2021-06-03 Thread Baer, Bradly
Hello,

1) Ah, I did not notice that.  I generally suppress I/O for test jobs but one I 
did a followup phonon calculation to test timings on that so I/O was turned on. 
 The timing difference persists regardless of the I/O and also persisted into 
the phonon calculation (1hr//1.5hr roughly)

2) I have run it multiple times and timing is reproducible.  Timing issue 
exists in ph.x as well as mentioned above.  I have 16 physical cores and 32 
hyperthreads.

3) I have run with mpirun -np 1 which is what I think 1 MPI rank means.  The 
cpu/wall timings are much more consistent, but I must confess that I am not 
experienced enough to understand what this result indicates is causing my issue.

Thanks,
Brad


Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University



From: users  on behalf of Ye Luo 

Sent: Thursday, June 3, 2021 4:22 PM
To: Quantum ESPRESSO users Forum 
Subject: Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance 
Issues

Hi Brad,
1. Your output files differ. One says 'Writing output data file ./pwscf.save' 
one doesn't. Does one have I/O and one doesn't?
2. Your simulation so small and also you are running 16 MPI ranks, so largely 
exercising MPI overhead. Run it a couple times and see if the timing is 
reproducible. Does your machine have 16 physical cores or 8 cores 16 
hyperthreads?
3. To validate it is actually a compiler regression, run with 1 MPI rank and 
compare the timing.

Ye
===
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Thu, Jun 3, 2021 at 3:36 PM Baer, Bradly 
mailto:bradly.b.b...@vanderbilt.edu>> wrote:
Hello Users,

I have a working QE6.7 install built with Intel's Parallel Studios from 2020.  
I want to compile the d3q code but I have found that my parallel studios 
license is expired, and I must switch to Intel's new OneAPI distribution to 
continue using ifort, icc etc.  I have configured everything in the same way as 
with the parallel studios version, including using the same make.inc file, but 
my parallel performance is very poor when using the OneAPI version.

Attached are my make.inc file that I used for both compiles and an example 
output file  using pw.x compiled with parallel studios and OneAPI. The parallel 
studios calculation had a CPU//wall time of 5.89s//5.97s but the  OneAPI 
version has almost a 50% performance loss and shows times of 5.92s//8.71s.  
Both were made using the same inputs.

Has anyone had experience compiling with the new OneAPI versions of things?  
Have I missed some small but important change in how the libraries are linked 
now?

Thanks,
Brad


Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University


___
Quantum ESPRESSO is supported by MaX 
(www.max-centre.eu<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cebb26f1903cc4b988a4c08d926d5b9a4%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637583521651058549%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=VC%2B7hzIdNUfdKmK33zLvgMlJ054sVVnG8PMDknhuHxA%3D&reserved=0>)
users mailing list 
users@lists.quantum-espresso.org<mailto:users@lists.quantum-espresso.org>
https://lists.quantum-espresso.org/mailman/listinfo/users<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cebb26f1903cc4b988a4c08d926d5b9a4%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637583521651068545%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=itDrMarLfOmf7HCGKhnuMIPbOofSV6gPIys%2BqJxxE%2Bo%3D&reserved=0>


MPI1ParallelStudio.out
Description: MPI1ParallelStudio.out


MPI1OneAPI.out
Description: MPI1OneAPI.out
___
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance Issues

2021-06-03 Thread Ye Luo
This time OneAPI runs faster. The ifort in OneAPI should be very similar to
the one in previous parallel studio releases.
I think the performance difference is from your machine. Neither QE nor the
compiler plays anything here.
Ye
===
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Thu, Jun 3, 2021 at 4:46 PM Baer, Bradly 
wrote:

> Hello,
>
> 1) Ah, I did not notice that.  I generally suppress I/O for test jobs but
> one I did a followup phonon calculation to test timings on that so I/O was
> turned on.  The timing difference persists regardless of the I/O and also
> persisted into the phonon calculation (1hr//1.5hr roughly)
>
> 2) I have run it multiple times and timing is reproducible.  Timing issue
> exists in ph.x as well as mentioned above.  I have 16 physical cores and 32
> hyperthreads.
>
> 3) I have run with mpirun -np 1 which is what I think 1 MPI rank means.
> The cpu/wall timings are much more consistent, but I must confess that I am
> not experienced enough to understand what this result indicates is causing
> my issue.
>
> Thanks,
> Brad
>
> 
> Bradly Baer
> Graduate Research Assistant, Walker Lab
> Interdisciplinary Materials Science
> Vanderbilt University
>
>
> --
> *From:* users  on behalf of Ye
> Luo 
> *Sent:* Thursday, June 3, 2021 4:22 PM
> *To:* Quantum ESPRESSO users Forum 
> *Subject:* Re: [QE-users] Compiling with Intel's OneAPI - Parallel
> Performance Issues
>
> Hi Brad,
> 1. Your output files differ. One says 'Writing output data file
> ./pwscf.save' one doesn't. Does one have I/O and one doesn't?
> 2. Your simulation so small and also you are running 16 MPI ranks, so
> largely exercising MPI overhead. Run it a couple times and see if the
> timing is reproducible. Does your machine have 16 physical cores or 8 cores
> 16 hyperthreads?
> 3. To validate it is actually a compiler regression, run with 1 MPI rank
> and compare the timing.
>
> Ye
> ===
> Ye Luo, Ph.D.
> Computational Science Division & Leadership Computing Facility
> Argonne National Laboratory
>
>
> On Thu, Jun 3, 2021 at 3:36 PM Baer, Bradly 
> wrote:
>
> Hello Users,
>
> I have a working QE6.7 install built with Intel's Parallel Studios from
> 2020.  I want to compile the d3q code but I have found that my parallel
> studios license is expired, and I must switch to Intel's new OneAPI
> distribution to continue using ifort, icc etc.  I have configured
> everything in the same way as with the parallel studios version, including
> using the same make.inc file, but my parallel performance is very poor when
> using the OneAPI version.
>
> Attached are my make.inc file that I used for both compiles and an example
> output file  using pw.x compiled with parallel studios and OneAPI. The
> parallel studios calculation had a CPU//wall time of 5.89s//5.97s but the
> OneAPI version has almost a 50% performance loss and shows times of
> 5.92s//8.71s.  Both were made using the same inputs.
>
> Has anyone had experience compiling with the new OneAPI versions of
> things?  Have I missed some small but important change in how the libraries
> are linked now?
>
> Thanks,
> Brad
>
> 
> Bradly Baer
> Graduate Research Assistant, Walker Lab
> Interdisciplinary Materials Science
> Vanderbilt University
>
>
> ___
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu
> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cebb26f1903cc4b988a4c08d926d5b9a4%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637583521651058549%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=VC%2B7hzIdNUfdKmK33zLvgMlJ054sVVnG8PMDknhuHxA%3D&reserved=0>
> )
> users mailing list users@lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7Cebb26f1903cc4b988a4c08d926d5b9a4%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637583521651068545%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=itDrMarLfOmf7HCGKhnuMIPbOofSV6gPIys%2BqJxxE%2Bo%3D&reserved=0>
>
> ___
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users@lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
___
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance Issues

2021-06-03 Thread Baer, Bradly
Ye,

I am not quite sure what you mean by the difference being from my machine.  Do 
you suspect that there is something in the new OneAPI version that does not 
work well with my specific hardware that was not present in the Parallel 
Studios release?

Thanks,
Brad


Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University



From: users  on behalf of Ye Luo 

Sent: Thursday, June 3, 2021 4:58 PM
To: Quantum ESPRESSO users Forum 
Subject: Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance 
Issues

This time OneAPI runs faster. The ifort in OneAPI should be very similar to the 
one in previous parallel studio releases.
I think the performance difference is from your machine. Neither QE nor the 
compiler plays anything here.
Ye
===
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Thu, Jun 3, 2021 at 4:46 PM Baer, Bradly 
mailto:bradly.b.b...@vanderbilt.edu>> wrote:
Hello,

1) Ah, I did not notice that.  I generally suppress I/O for test jobs but one I 
did a followup phonon calculation to test timings on that so I/O was turned on. 
 The timing difference persists regardless of the I/O and also persisted into 
the phonon calculation (1hr//1.5hr roughly)

2) I have run it multiple times and timing is reproducible.  Timing issue 
exists in ph.x as well as mentioned above.  I have 16 physical cores and 32 
hyperthreads.

3) I have run with mpirun -np 1 which is what I think 1 MPI rank means.  The 
cpu/wall timings are much more consistent, but I must confess that I am not 
experienced enough to understand what this result indicates is causing my issue.

Thanks,
Brad


Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University



From: users 
mailto:users-boun...@lists.quantum-espresso.org>>
 on behalf of Ye Luo mailto:xw111lu...@gmail.com>>
Sent: Thursday, June 3, 2021 4:22 PM
To: Quantum ESPRESSO users Forum 
mailto:users@lists.quantum-espresso.org>>
Subject: Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance 
Issues

Hi Brad,
1. Your output files differ. One says 'Writing output data file ./pwscf.save' 
one doesn't. Does one have I/O and one doesn't?
2. Your simulation so small and also you are running 16 MPI ranks, so largely 
exercising MPI overhead. Run it a couple times and see if the timing is 
reproducible. Does your machine have 16 physical cores or 8 cores 16 
hyperthreads?
3. To validate it is actually a compiler regression, run with 1 MPI rank and 
compare the timing.

Ye
===
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Thu, Jun 3, 2021 at 3:36 PM Baer, Bradly 
mailto:bradly.b.b...@vanderbilt.edu>> wrote:
Hello Users,

I have a working QE6.7 install built with Intel's Parallel Studios from 2020.  
I want to compile the d3q code but I have found that my parallel studios 
license is expired, and I must switch to Intel's new OneAPI distribution to 
continue using ifort, icc etc.  I have configured everything in the same way as 
with the parallel studios version, including using the same make.inc file, but 
my parallel performance is very poor when using the OneAPI version.

Attached are my make.inc file that I used for both compiles and an example 
output file  using pw.x compiled with parallel studios and OneAPI. The parallel 
studios calculation had a CPU//wall time of 5.89s//5.97s but the  OneAPI 
version has almost a 50% performance loss and shows times of 5.92s//8.71s.  
Both were made using the same inputs.

Has anyone had experience compiling with the new OneAPI versions of things?  
Have I missed some small but important change in how the libraries are linked 
now?

Thanks,
Brad


Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University


___
Quantum ESPRESSO is supported by MaX 
(www.max-centre.eu<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Cbradly.b.baer%40vanderbilt.edu%7C61e6572fae9b4b8ee59808d926dadfd3%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C637583543767331274%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=GL0HPHxbqyGtdkeb5yzUFUCZYilFtBXKVbBkdR1PgEQ%3D&reserved=0>)
users mailing list 
users@lists.quantum-espresso.org<mailto:users@lists.quantum-espresso.org>
https://lists.quantum-espresso.org/mailman/listinfo/users<https://nam04.safelinks.prote

Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance Issues

2021-06-03 Thread Ye Luo
I mean there seem to be not OneAPI/parallel studio related and not
QE-related factors which may affect the code performance.
1. turbo frequency, CPU power management. Your two runs may have different
timing due to different frequency. The first one heats up a lot and the
second one decide not to run at max turbo frequency.
2. the node may be shared with others and there are other things running.
Probably you have the best knowledge of your machine.

If you really think OneAPI has regression, you may contact Intel support as
they should care about their product.
Ye
===
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Thu, Jun 3, 2021 at 5:07 PM Baer, Bradly 
wrote:

> Ye,
>
> I am not quite sure what you mean by the difference being from my
> machine.  Do you suspect that there is something in the new OneAPI version
> that does not work well with my specific hardware that was not present in
> the Parallel Studios release?
>
> Thanks,
> Brad
>
> 
> Bradly Baer
> Graduate Research Assistant, Walker Lab
> Interdisciplinary Materials Science
> Vanderbilt University
>
>
> --
> *From:* users  on behalf of Ye
> Luo 
> *Sent:* Thursday, June 3, 2021 4:58 PM
> *To:* Quantum ESPRESSO users Forum 
> *Subject:* Re: [QE-users] Compiling with Intel's OneAPI - Parallel
> Performance Issues
>
> This time OneAPI runs faster. The ifort in OneAPI should be very similar
> to the one in previous parallel studio releases.
> I think the performance difference is from your machine. Neither QE nor
> the compiler plays anything here.
> Ye
> ===
> Ye Luo, Ph.D.
> Computational Science Division & Leadership Computing Facility
> Argonne National Laboratory
>
>
> On Thu, Jun 3, 2021 at 4:46 PM Baer, Bradly 
> wrote:
>
> Hello,
>
> 1) Ah, I did not notice that.  I generally suppress I/O for test jobs but
> one I did a followup phonon calculation to test timings on that so I/O was
> turned on.  The timing difference persists regardless of the I/O and also
> persisted into the phonon calculation (1hr//1.5hr roughly)
>
> 2) I have run it multiple times and timing is reproducible.  Timing issue
> exists in ph.x as well as mentioned above.  I have 16 physical cores and 32
> hyperthreads.
>
> 3) I have run with mpirun -np 1 which is what I think 1 MPI rank means.
> The cpu/wall timings are much more consistent, but I must confess that I am
> not experienced enough to understand what this result indicates is causing
> my issue.
>
> Thanks,
> Brad
>
> 
> Bradly Baer
> Graduate Research Assistant, Walker Lab
> Interdisciplinary Materials Science
> Vanderbilt University
>
>
> --------------
> *From:* users  on behalf of Ye
> Luo 
> *Sent:* Thursday, June 3, 2021 4:22 PM
> *To:* Quantum ESPRESSO users Forum 
> *Subject:* Re: [QE-users] Compiling with Intel's OneAPI - Parallel
> Performance Issues
>
> Hi Brad,
> 1. Your output files differ. One says 'Writing output data file
> ./pwscf.save' one doesn't. Does one have I/O and one doesn't?
> 2. Your simulation so small and also you are running 16 MPI ranks, so
> largely exercising MPI overhead. Run it a couple times and see if the
> timing is reproducible. Does your machine have 16 physical cores or 8 cores
> 16 hyperthreads?
> 3. To validate it is actually a compiler regression, run with 1 MPI rank
> and compare the timing.
>
> Ye
> ===
> Ye Luo, Ph.D.
> Computational Science Division & Leadership Computing Facility
> Argonne National Laboratory
>
>
> On Thu, Jun 3, 2021 at 3:36 PM Baer, Bradly 
> wrote:
>
> Hello Users,
>
> I have a working QE6.7 install built with Intel's Parallel Studios from
> 2020.  I want to compile the d3q code but I have found that my parallel
> studios license is expired, and I must switch to Intel's new OneAPI
> distribution to continue using ifort, icc etc.  I have configured
> everything in the same way as with the parallel studios version, including
> using the same make.inc file, but my parallel performance is very poor when
> using the OneAPI version.
>
> Attached are my make.inc file that I used for both compiles and an example
> output file  using pw.x compiled with parallel studios and OneAPI. The
> parallel studios calculation had a CPU//wall time of 5.89s//5.97s but the
> OneAPI version has almost a 50% performance loss and shows times of
> 5.92s//8.71s.  Both were made using the sa

Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance Issues

2021-06-03 Thread Baer, Bradly
Ye,

Thanks for your advice on this matter.

All of this is running on my personal workstation. All of these calculations 
were run today and I compiled everything myself so I don't think that this 
should be caused by some environmental factor.  I will look into contacting 
intel's support then and see if they can provide any advice on how to proceed.

Once again, thanks for your help.

Thanks,
Brad

Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University



From: users  on behalf of Ye Luo 

Sent: Thursday, June 3, 2021 5:18 PM
To: Quantum ESPRESSO users Forum 
Subject: Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance 
Issues

I mean there seem to be not OneAPI/parallel studio related and not QE-related 
factors which may affect the code performance.
1. turbo frequency, CPU power management. Your two runs may have different 
timing due to different frequency. The first one heats up a lot and the second 
one decide not to run at max turbo frequency.
2. the node may be shared with others and there are other things running.
Probably you have the best knowledge of your machine.

If you really think OneAPI has regression, you may contact Intel support as 
they should care about their product.
Ye
===
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Thu, Jun 3, 2021 at 5:07 PM Baer, Bradly 
mailto:bradly.b.b...@vanderbilt.edu>> wrote:
Ye,

I am not quite sure what you mean by the difference being from my machine.  Do 
you suspect that there is something in the new OneAPI version that does not 
work well with my specific hardware that was not present in the Parallel 
Studios release?

Thanks,
Brad


Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University



From: users 
mailto:users-boun...@lists.quantum-espresso.org>>
 on behalf of Ye Luo mailto:xw111lu...@gmail.com>>
Sent: Thursday, June 3, 2021 4:58 PM
To: Quantum ESPRESSO users Forum 
mailto:users@lists.quantum-espresso.org>>
Subject: Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance 
Issues

This time OneAPI runs faster. The ifort in OneAPI should be very similar to the 
one in previous parallel studio releases.
I think the performance difference is from your machine. Neither QE nor the 
compiler plays anything here.
Ye
===
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Thu, Jun 3, 2021 at 4:46 PM Baer, Bradly 
mailto:bradly.b.b...@vanderbilt.edu>> wrote:
Hello,

1) Ah, I did not notice that.  I generally suppress I/O for test jobs but one I 
did a followup phonon calculation to test timings on that so I/O was turned on. 
 The timing difference persists regardless of the I/O and also persisted into 
the phonon calculation (1hr//1.5hr roughly)

2) I have run it multiple times and timing is reproducible.  Timing issue 
exists in ph.x as well as mentioned above.  I have 16 physical cores and 32 
hyperthreads.

3) I have run with mpirun -np 1 which is what I think 1 MPI rank means.  The 
cpu/wall timings are much more consistent, but I must confess that I am not 
experienced enough to understand what this result indicates is causing my issue.

Thanks,
Brad


Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University



From: users 
mailto:users-boun...@lists.quantum-espresso.org>>
 on behalf of Ye Luo mailto:xw111lu...@gmail.com>>
Sent: Thursday, June 3, 2021 4:22 PM
To: Quantum ESPRESSO users Forum 
mailto:users@lists.quantum-espresso.org>>
Subject: Re: [QE-users] Compiling with Intel's OneAPI - Parallel Performance 
Issues

Hi Brad,
1. Your output files differ. One says 'Writing output data file ./pwscf.save' 
one doesn't. Does one have I/O and one doesn't?
2. Your simulation so small and also you are running 16 MPI ranks, so largely 
exercising MPI overhead. Run it a couple times and see if the timing is 
reproducible. Does your machine have 16 physical cores or 8 cores 16 
hyperthreads?
3. To validate it is actually a compiler regression, run with 1 MPI rank and 
compare the timing.

Ye
===
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Thu, Jun 3, 2021 at 3:36 PM Baer, Bradly 
mailto:bradly.b.b...@vanderbilt.edu>> wrote:
Hello Users,

I have a working QE6.7 install built with Intel's Parallel Studios from 2020.  
I want to compile the d3q code but I have f