Re: [Users] ET on KNL.

2017-04-03 Thread Eloisa Bentivegna
On 24/03/17 14:17, Erik Schnetter wrote:
> On Fri, Mar 24, 2017 at 8:43 AM, Eloisa Bentivegna
> mailto:eloisa.bentive...@ct.infn.it>> wrote:
> 
> Hi all,
> 
> still on the KNL topic, I realize there was a question I ignored, and
> which perhaps is worth pursuing.
> 
> On 01/03/17 21:43, Ian Hinder wrote:
> > Eloisa: does Carpet report the vector size of the KNL as 8?  From the
> > wikipedia entry, I would expect that to be the case, but I think you
> > mentioned to me that it was reporting 4.
> 
> That's correct. In my output I find:
> 
> ===
> INFO (Vectors): Using vector size 4 for architecture AVX (64-bit
> precision)
> ===
> 
> Is this not supposed to be so? And in this case, is there anything I can
> do about it?
> 
> 
> You should see a vector size of 8, and a message containing "AVX512
> (64-bit precision)". If you do not do this, then there is a problem with
> the compiler options -- this decision is made at compile time. You might
> need to specify "-march=knl" to GCC, or a similar option to the Intel
> compiler, or you could build on a KNL node and use "-march=native" or
> "-xHost".

Hi all,

I have now tried all of these options and continue getting the message

===
INFO (Vectors): Using vector size 4 for architecture AVX (64-bit
precision)
===

Can linking to system libraries compiled for the BDW architecture lead
to this message, even if the Cactus executable itself was build on a KNL
with the KNL flags?

In any case, I have created a pull request of Simfactory with the KNL
configuration I am using now. At least it's a starting point.

Best,
Eloisa
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL

2017-03-29 Thread David Radice

>> Why is it gone???
> 
> It is a dropbox link. From the link it is not clear who the user is though, 
> so we cannot ask. All that we can see from that is that it is apparently user 
> "17077801". The change on the wiki was done anonymously, from 172.56.34.196, 
> a T-Mobile IP, which is little better.

I think they were probably on my dropbox. I have committed updated versions of 
the optionlist and submission scripts to the batchtools repository 
(https://bitbucket.org/dradice/batchtools). Look in the "templates/cactus" 
folder. I guess that it should be easy to adapt them to work with simfactory.

FYI batchtools is an experimental replacement of simfactory I have been using 
for some time. I do not necessarily recommend anyone to use it, because it is 
not meant to be user friendly, but it is out there in case somebody has a 
workflow similar to mine.

David


signature.asc
Description: Message signed with OpenPGP
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL

2017-03-27 Thread Frank Loeffler

On Sat, Mar 25, 2017 at 12:00:07PM -0400, Erik Schnetter wrote:

Why is it gone???


It is a dropbox link. From the link it is not clear who the user is 
though, so we cannot ask. All that we can see from that is that it is 
apparently user "17077801". The change on the wiki was done anonymously, 
from 172.56.34.196, a T-Mobile IP, which is little better.


Frank



signature.asc
Description: Digital signature
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL

2017-03-25 Thread Steven R. Brandt
IDK. Clicking on the link says "File Not Found." Is there some reason 
people are reluctant to add to the simfactory repo?


--Steve


On 03/25/2017 11:00 AM, Erik Schnetter wrote:

Why is it gone???

-erik

On Friday, March 24, 2017, Steven R. Brandt > wrote:

> Dumb question. Where can I find a cfg file for KNL? The one that was
> here is gone: https://docs.einsteintoolkit.org/et-docs/2017_MHD_Workshop
>
> --Steve
>
> ___
> Users mailing list
> Users@einsteintoolkit.org 
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>

--
Erik Schnetter mailto:schnet...@gmail.com>> 
http://www.perimeterinstitute.ca/personal/eschnetter/




___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL

2017-03-25 Thread Erik Schnetter
Why is it gone???

-erik

On Friday, March 24, 2017, Steven R. Brandt  wrote:
> Dumb question. Where can I find a cfg file for KNL? The one that was
> here is gone: https://docs.einsteintoolkit.org/et-docs/2017_MHD_Workshop
>
> --Steve
>
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>

-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL

2017-03-24 Thread David Radice
Hi Steve,

I made two cfg files for Stampede/KNL (attached). One is with the Intel 
compiler and one is with GCC.

David



stampede-knl-gcc.cfg
Description: Binary data


stampede-knl.cfg
Description: Binary data

> On Mar 24, 2017, at 2:33 PM, Steven R. Brandt  wrote:
> 
> Dumb question. Where can I find a cfg file for KNL? The one that was
> here is gone: https://docs.einsteintoolkit.org/et-docs/2017_MHD_Workshop
> 
> --Steve
> 
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
> 



signature.asc
Description: Message signed with OpenPGP
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-03-24 Thread Erik Schnetter
On Fri, Mar 24, 2017 at 8:43 AM, Eloisa Bentivegna <
eloisa.bentive...@ct.infn.it> wrote:

> Hi all,
>
> still on the KNL topic, I realize there was a question I ignored, and
> which perhaps is worth pursuing.
>
> On 01/03/17 21:43, Ian Hinder wrote:
> > Eloisa: does Carpet report the vector size of the KNL as 8?  From the
> > wikipedia entry, I would expect that to be the case, but I think you
> > mentioned to me that it was reporting 4.
>
> That's correct. In my output I find:
>
> ===
> INFO (Vectors): Using vector size 4 for architecture AVX (64-bit precision)
> ===
>
> Is this not supposed to be so? And in this case, is there anything I can
> do about it?
>

You should see a vector size of 8, and a message containing "AVX512 (64-bit
precision)". If you do not do this, then there is a problem with the
compiler options -- this decision is made at compile time. You might need
to specify "-march=knl" to GCC, or a similar option to the Intel compiler,
or you could build on a KNL node and use "-march=native" or "-xHost".

-erik

> Eloisa, are you using LoopControl?
>
> Yes.
>
> Eloisa
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>



-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-03-24 Thread Eloisa Bentivegna
Hi all,

still on the KNL topic, I realize there was a question I ignored, and
which perhaps is worth pursuing.

On 01/03/17 21:43, Ian Hinder wrote:
> Eloisa: does Carpet report the vector size of the KNL as 8?  From the
> wikipedia entry, I would expect that to be the case, but I think you
> mentioned to me that it was reporting 4.

That's correct. In my output I find:

===
INFO (Vectors): Using vector size 4 for architecture AVX (64-bit precision)
===

Is this not supposed to be so? And in this case, is there anything I can
do about it?

> Eloisa, are you using LoopControl?

Yes.

Eloisa
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-03-02 Thread Erik Schnetter
On Thu, Mar 2, 2017 at 10:03 AM, Ian Hinder  wrote:

>
> On 2 Mar 2017, at 14:37, Erik Schnetter  wrote:
>
> I am currently redesigning the tiling infrastructure, also to allow
> multithreading via Qthreads instead of OpenMP and to allow for aligning
> arrays with cache line boundaries. The new approach (different from the
> current LoopControl) is to choose a fixed tile size, either globally or per
> loop, and then assign individual tiles to threads. This also works will
> with DG derivative where the DG element size dictates a granularity for the
> tile size, and the new efficient tiled derivative operators. Most of this
> is still in flux. I have seen large efficiency improvements in the RHS
> calculation, but two puzzling items remain:
>
> (1) It remains more efficient to use MPI than multi-threading for
> parallelization, at least on regular CPUs. On KNL my results are still
> somewhat random.
>
>
> When using MPI vs multi-threading on the same number of cores, the
> component will be smaller, meaning that more of it is likely to fit in the
> cache.  Would that explain this observation?
>

My wild guess is that an explicit MPI parallelization exhibits more data
locality, leading to better performance.

-erik

(2) MoL_Add is quite expensive compared to the RHS evaluation.
>
>
> That is indeed odd.
>
> The main thing that changed since our last round of thorough benchmarks is
> that CPU became much more powerful while memory bandwidth hasn't. I'm
> beginning to think that things such as vectorization or parallelization
> basically don't matter any more if we ensure that we pull data from memory
> into caches efficiently.
>
> I have not yet collected PAPI statistics.
>
> -erik
>
>
> On Thu, Mar 2, 2017 at 6:57 AM, Ian Hinder  wrote:
>
>>
>> On 1 Mar 2017, at 22:10, David Radice 
>> wrote:
>>
>> Hi Ian, Erik, Eloisa,
>>
>> I attach a very brief report of some results I obtained in 2015 after
>> attending a KNC workshop.
>>
>> Conclusions: By using 244 threads, with the domain split into tiles of
>> size 8 × 4 × 4 points, and OpenMP threads assigned one per tile as they
>> become available, the MIC was able to outperform the single CPU by a factor
>> of 1.5. The same tiling strategy was used on the CPU, as it has been found
>> to give good performance there in the past. Since we have not yet optimised
>> the code for the MIC architecture, we believe that further speed
>> improvements will be possible, and that solving the Einstein equations on
>> the MIC architecture should be feasible.
>>
>> Eloisa, are you using LoopControl?  There are tiling parameters which can
>> also help with performance on these devices.
>>
>>
>> how does tiling work with LoopControl? Is it documented somewhere? I
>> naively thought that the point of tiling was to have chunks of data stored
>> contiguously in memory...
>>
>>
>> Ideally yes, but this would need to be done in Carpet not LoopControl,
>> and I think you would then require ghost zones around each tile.  Since we
>> have huge numbers of ghost zones, I'm not sure it is practical.
>>
>> LoopControl has parameters such as tilesize and loopsize, but Erik would
>> know better how to use these. It was a long time ago, and I can't
>> immediately find my parameter files.
>>
>> BTW, at the moment I am using this macro for all of my loop needs:
>>
>> #define UTILS_LOOP3(NAME,I,SI,EI,J,SJ,EJ,K,SK,EK)
>>  \
>>_Pragma("omp for collapse(3)")
>>   \
>>for(int I = SI; I < EI; ++I)
>> \
>>for(int J = SJ; J < EJ; ++J)
>> \
>>for(int K = SK; K < EK; ++K)
>>
>> How would I convert it to something equivalent using LoopControl?
>>
>> Thanks,
>>
>> David
>>
>> PS. Seeing that Eloisa was able to compile bbox.cc with the intel-17.0.0
>> with -no-vec, I made a patch to disable vectorization using pragmas inside
>> bbox.cc (to avoid having to compile it manually):
>>
>> https://bitbucket.org/eschnett/carpet/pull-requests/16/
>> carpetlib-fix-compilation-with-intel-1700/diff
>>
>>
>> --
>> Ian Hinder
>> http://members.aei.mpg.de/ianhin
>>
>>
>
>
> --
> Erik Schnetter 
> http://www.perimeterinstitute.ca/personal/eschnetter/
>
>
> --
> Ian Hinder
> http://members.aei.mpg.de/ianhin
>
>


-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-03-02 Thread Ian Hinder

On 2 Mar 2017, at 14:37, Erik Schnetter  wrote:

> I am currently redesigning the tiling infrastructure, also to allow 
> multithreading via Qthreads instead of OpenMP and to allow for aligning 
> arrays with cache line boundaries. The new approach (different from the 
> current LoopControl) is to choose a fixed tile size, either globally or per 
> loop, and then assign individual tiles to threads. This also works will with 
> DG derivative where the DG element size dictates a granularity for the tile 
> size, and the new efficient tiled derivative operators. Most of this is still 
> in flux. I have seen large efficiency improvements in the RHS calculation, 
> but two puzzling items remain:
> 
> (1) It remains more efficient to use MPI than multi-threading for 
> parallelization, at least on regular CPUs. On KNL my results are still 
> somewhat random.

When using MPI vs multi-threading on the same number of cores, the component 
will be smaller, meaning that more of it is likely to fit in the cache.  Would 
that explain this observation?

> (2) MoL_Add is quite expensive compared to the RHS evaluation.

That is indeed odd.

> The main thing that changed since our last round of thorough benchmarks is 
> that CPU became much more powerful while memory bandwidth hasn't. I'm 
> beginning to think that things such as vectorization or parallelization 
> basically don't matter any more if we ensure that we pull data from memory 
> into caches efficiently.
> 
> I have not yet collected PAPI statistics.
> 
> -erik
> 
> 
> On Thu, Mar 2, 2017 at 6:57 AM, Ian Hinder  wrote:
> 
> On 1 Mar 2017, at 22:10, David Radice  wrote:
> 
>> Hi Ian, Erik, Eloisa,
>> 
>>> I attach a very brief report of some results I obtained in 2015 after 
>>> attending a KNC workshop.
 Conclusions: By using 244 threads, with the domain split into tiles of 
 size 8 × 4 × 4 points, and OpenMP threads assigned one per tile as they 
 become available, the MIC was able to outperform the single CPU by a 
 factor of 1.5. The same tiling strategy was used on the CPU, as it has 
 been found to give good performance there in the past. Since we have not 
 yet optimised the code for the MIC architecture, we believe that further 
 speed improvements will be possible, and that solving the Einstein 
 equations on the MIC architecture should be feasible.
 
>>> Eloisa, are you using LoopControl?  There are tiling parameters which can 
>>> also help with performance on these devices.
>> 
>> how does tiling work with LoopControl? Is it documented somewhere? I naively 
>> thought that the point of tiling was to have chunks of data stored 
>> contiguously in memory...
> 
> Ideally yes, but this would need to be done in Carpet not LoopControl, and I 
> think you would then require ghost zones around each tile.  Since we have 
> huge numbers of ghost zones, I'm not sure it is practical.
> 
> LoopControl has parameters such as tilesize and loopsize, but Erik would know 
> better how to use these. It was a long time ago, and I can't immediately find 
> my parameter files.
> 
>> BTW, at the moment I am using this macro for all of my loop needs:
>> 
>> #define UTILS_LOOP3(NAME,I,SI,EI,J,SJ,EJ,K,SK,EK)
>>   \
>>_Pragma("omp for collapse(3)")
>>  \
>>for(int I = SI; I < EI; ++I)  
>>  \
>>for(int J = SJ; J < EJ; ++J)  
>>  \
>>for(int K = SK; K < EK; ++K)
>> 
>> How would I convert it to something equivalent using LoopControl?
>> 
>> Thanks,
>> 
>> David
>> 
>> PS. Seeing that Eloisa was able to compile bbox.cc with the intel-17.0.0 
>> with -no-vec, I made a patch to disable vectorization using pragmas inside 
>> bbox.cc (to avoid having to compile it manually):
>> 
>> https://bitbucket.org/eschnett/carpet/pull-requests/16/carpetlib-fix-compilation-with-intel-1700/diff
> 
> -- 
> Ian Hinder
> http://members.aei.mpg.de/ianhin
> 
> 
> 
> 
> -- 
> Erik Schnetter 
> http://www.perimeterinstitute.ca/personal/eschnetter/

-- 
Ian Hinder
http://members.aei.mpg.de/ianhin

___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-03-02 Thread Erik Schnetter
I am currently redesigning the tiling infrastructure, also to allow
multithreading via Qthreads instead of OpenMP and to allow for aligning
arrays with cache line boundaries. The new approach (different from the
current LoopControl) is to choose a fixed tile size, either globally or per
loop, and then assign individual tiles to threads. This also works will
with DG derivative where the DG element size dictates a granularity for the
tile size, and the new efficient tiled derivative operators. Most of this
is still in flux. I have seen large efficiency improvements in the RHS
calculation, but two puzzling items remain:

(1) It remains more efficient to use MPI than multi-threading for
parallelization, at least on regular CPUs. On KNL my results are still
somewhat random.

(2) MoL_Add is quite expensive compared to the RHS evaluation.

The main thing that changed since our last round of thorough benchmarks is
that CPU became much more powerful while memory bandwidth hasn't. I'm
beginning to think that things such as vectorization or parallelization
basically don't matter any more if we ensure that we pull data from memory
into caches efficiently.

I have not yet collected PAPI statistics.

-erik


On Thu, Mar 2, 2017 at 6:57 AM, Ian Hinder  wrote:

>
> On 1 Mar 2017, at 22:10, David Radice  wrote:
>
> Hi Ian, Erik, Eloisa,
>
> I attach a very brief report of some results I obtained in 2015 after
> attending a KNC workshop.
>
> Conclusions: By using 244 threads, with the domain split into tiles of
> size 8 × 4 × 4 points, and OpenMP threads assigned one per tile as they
> become available, the MIC was able to outperform the single CPU by a factor
> of 1.5. The same tiling strategy was used on the CPU, as it has been found
> to give good performance there in the past. Since we have not yet optimised
> the code for the MIC architecture, we believe that further speed
> improvements will be possible, and that solving the Einstein equations on
> the MIC architecture should be feasible.
>
> Eloisa, are you using LoopControl?  There are tiling parameters which can
> also help with performance on these devices.
>
>
> how does tiling work with LoopControl? Is it documented somewhere? I
> naively thought that the point of tiling was to have chunks of data stored
> contiguously in memory...
>
>
> Ideally yes, but this would need to be done in Carpet not LoopControl, and
> I think you would then require ghost zones around each tile.  Since we have
> huge numbers of ghost zones, I'm not sure it is practical.
>
> LoopControl has parameters such as tilesize and loopsize, but Erik would
> know better how to use these. It was a long time ago, and I can't
> immediately find my parameter files.
>
> BTW, at the moment I am using this macro for all of my loop needs:
>
> #define UTILS_LOOP3(NAME,I,SI,EI,J,SJ,EJ,K,SK,EK)
>  \
>_Pragma("omp for collapse(3)")
>   \
>for(int I = SI; I < EI; ++I)
> \
>for(int J = SJ; J < EJ; ++J)
> \
>for(int K = SK; K < EK; ++K)
>
> How would I convert it to something equivalent using LoopControl?
>
> Thanks,
>
> David
>
> PS. Seeing that Eloisa was able to compile bbox.cc with the intel-17.0.0
> with -no-vec, I made a patch to disable vectorization using pragmas inside
> bbox.cc (to avoid having to compile it manually):
>
> https://bitbucket.org/eschnett/carpet/pull-requests/
> 16/carpetlib-fix-compilation-with-intel-1700/diff
>
>
> --
> Ian Hinder
> http://members.aei.mpg.de/ianhin
>
>


-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-03-02 Thread Ian Hinder

On 1 Mar 2017, at 22:10, David Radice  wrote:

> Hi Ian, Erik, Eloisa,
> 
>> I attach a very brief report of some results I obtained in 2015 after 
>> attending a KNC workshop.
>>> Conclusions: By using 244 threads, with the domain split into tiles of size 
>>> 8 × 4 × 4 points, and OpenMP threads assigned one per tile as they become 
>>> available, the MIC was able to outperform the single CPU by a factor of 
>>> 1.5. The same tiling strategy was used on the CPU, as it has been found to 
>>> give good performance there in the past. Since we have not yet optimised 
>>> the code for the MIC architecture, we believe that further speed 
>>> improvements will be possible, and that solving the Einstein equations on 
>>> the MIC architecture should be feasible.
>>> 
>> Eloisa, are you using LoopControl?  There are tiling parameters which can 
>> also help with performance on these devices.
> 
> how does tiling work with LoopControl? Is it documented somewhere? I naively 
> thought that the point of tiling was to have chunks of data stored 
> contiguously in memory...

Ideally yes, but this would need to be done in Carpet not LoopControl, and I 
think you would then require ghost zones around each tile.  Since we have huge 
numbers of ghost zones, I'm not sure it is practical.

LoopControl has parameters such as tilesize and loopsize, but Erik would know 
better how to use these. It was a long time ago, and I can't immediately find 
my parameter files.

> BTW, at the moment I am using this macro for all of my loop needs:
> 
> #define UTILS_LOOP3(NAME,I,SI,EI,J,SJ,EJ,K,SK,EK) 
>  \
>_Pragma("omp for collapse(3)") 
> \
>for(int I = SI; I < EI; ++I)   
> \
>for(int J = SJ; J < EJ; ++J)   
> \
>for(int K = SK; K < EK; ++K)
> 
> How would I convert it to something equivalent using LoopControl?
> 
> Thanks,
> 
> David
> 
> PS. Seeing that Eloisa was able to compile bbox.cc with the intel-17.0.0 with 
> -no-vec, I made a patch to disable vectorization using pragmas inside bbox.cc 
> (to avoid having to compile it manually):
> 
> https://bitbucket.org/eschnett/carpet/pull-requests/16/carpetlib-fix-compilation-with-intel-1700/diff

--
Ian Hinder
http://members.aei.mpg.de/ianhin



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-03-01 Thread David Radice
Hi Ian, Erik, Eloisa,

> I attach a very brief report of some results I obtained in 2015 after 
> attending a KNC workshop.
>> Conclusions: By using 244 threads, with the domain split into tiles of size 
>> 8 × 4 × 4 points, and OpenMP threads assigned one per tile as they become 
>> available, the MIC was able to outperform the single CPU by a factor of 1.5. 
>> The same tiling strategy was used on the CPU, as it has been found to give 
>> good performance there in the past. Since we have not yet optimised the code 
>> for the MIC architecture, we believe that further speed improvements will be 
>> possible, and that solving the Einstein equations on the MIC architecture 
>> should be feasible.
>> 
> Eloisa, are you using LoopControl?  There are tiling parameters which can 
> also help with performance on these devices.

how does tiling work with LoopControl? Is it documented somewhere? I naively 
thought that the point of tiling was to have chunks of data stored contiguously 
in memory...

BTW, at the moment I am using this macro for all of my loop needs:

#define UTILS_LOOP3(NAME,I,SI,EI,J,SJ,EJ,K,SK,EK)  \
_Pragma("omp for collapse(3)") \
for(int I = SI; I < EI; ++I)   \
for(int J = SJ; J < EJ; ++J)   \
for(int K = SK; K < EK; ++K)

How would I convert it to something equivalent using LoopControl?

Thanks,

David

PS. Seeing that Eloisa was able to compile bbox.cc with the intel-17.0.0 with 
-no-vec, I made a patch to disable vectorization using pragmas inside bbox.cc 
(to avoid having to compile it manually):

https://bitbucket.org/eschnett/carpet/pull-requests/16/carpetlib-fix-compilation-with-intel-1700/diff


signature.asc
Description: Message signed with OpenPGP
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-03-01 Thread Erik Schnetter
On Wed, Mar 1, 2017 at 7:04 AM, Eloisa Bentivegna <
eloisa.bentive...@ct.infn.it> wrote:

> On 28/02/17 23:17, David Radice wrote:
> > Hello Eloisa,
> >
> > sorry for the delay in the reply. For the records I did manage to
> > compile and run ET on KNL (stampede), but I did not manage to run any
> > benchmark with it yet. The current status is:
> >
> > * intel-17: the compiler fails to compile Carpet and either gives an
> > internal error or segfaults. * gcc-6.3: used to compile and run with
> > Erik's spack installation (it is currently broken). I did not really
> > manage to benchmark it since even a low-resolution TOV test did not
> > run to completion (meaning less than 4 coarse grid steps) within 30
> > minutes on 4 nodes.
> >
> > This was using the current stable release of the ET (2016-11) and
> > WhiskyTHC. You might have more luck with GRHydro / pure-vacuum runs.
>
> Hi David and all,
>
> thanks for all the help. It turned out that consolidating my
> configuration made things significantly better: I was using Intel 16 (to
> avoid the Carpet problem with Intel 17) along with a strange mix of
> libraries (mostly compiled with Intel 17, and the only available on
> Marconi), and that seemed to impact the performance quite strongly. With
> everything Intel 17 (and using -no-vec on bbox.cc), I now obtain a
> runspeed on a Marconi KNL node which is around 80% of a Xeon E5 v4.
>
> There are still some puzzling features, though. One is that using
> -no-vec, along with the settings:
>
> VECTORISE   = no
> VECTORISE_ALIGNED_ARRAYS= no
> VECTORISE_INLINE= no
> VECTORISE_ALIGN_FOR_CACHE   = no
> VECTORISE_ALIGN_INTERIOR= no
>

I would expect that VECTORISE=yes (keep the others to "no") might improve
performance, in particular if you do not use hyperthreading, so that each
thread has more L1 cache space available.

in my optionlist, I obtain essentially the same throughput. This is a
> vacuum McLachlan run with very little else turned on (but I can run a
> QC0 benchmark for definiteness, if people are interested). I too am
> using the November release.
>
> Second, hyperthreading decreases the runspeed significantly. I am using
> 272 threads on the 68-core KNL, and for what I can gather from the
> Carpet output, all of the cores are engaged. More cores are reported,
> however, than available on the node:
>
> INFO (Carpet): MPI is enabled
> INFO (Carpet): Carpet is running on 1 processes
> INFO (Carpet): This is process 0
> INFO (Carpet): OpenMP is enabled
> INFO (Carpet): This process contains 272 threads, this is thread 0
> INFO (Carpet): There are 272 threads in total
> INFO (Carpet): There are 272 threads per process
> INFO (Carpet): This process runs on host r098c04s01, pid=2465840
> INFO (Carpet): This process runs on 272 cores: 0-271
> INFO (Carpet): Thread 0 runs on 1 core: 0
> INFO (Carpet): Thread 1 runs on 1 core: 68
> INFO (Carpet): Thread 2 runs on 1 core: 136
> INFO (Carpet): Thread 3 runs on 1 core: 204
> …
>

The nomenclature is inconsistent since it changes so often. This output
looks correct. (Carpet cannot easily distinguish between hyperthreads and
cores.) As long as there is only one thread per core, this is fine.

Notice that I am requesting hyperthreading by using num-smt=4 and
> num-threads=272. Is this correct?
>

This looks correct. You might also need to play with "ppn=" and "ppn-used=".

-erik

-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-03-01 Thread Eloisa Bentivegna
On 28/02/17 23:17, David Radice wrote:
> Hello Eloisa,
> 
> sorry for the delay in the reply. For the records I did manage to
> compile and run ET on KNL (stampede), but I did not manage to run any
> benchmark with it yet. The current status is:
> 
> * intel-17: the compiler fails to compile Carpet and either gives an
> internal error or segfaults. * gcc-6.3: used to compile and run with
> Erik's spack installation (it is currently broken). I did not really
> manage to benchmark it since even a low-resolution TOV test did not
> run to completion (meaning less than 4 coarse grid steps) within 30
> minutes on 4 nodes.
> 
> This was using the current stable release of the ET (2016-11) and
> WhiskyTHC. You might have more luck with GRHydro / pure-vacuum runs.

Hi David and all,

thanks for all the help. It turned out that consolidating my
configuration made things significantly better: I was using Intel 16 (to
avoid the Carpet problem with Intel 17) along with a strange mix of
libraries (mostly compiled with Intel 17, and the only available on
Marconi), and that seemed to impact the performance quite strongly. With
everything Intel 17 (and using -no-vec on bbox.cc), I now obtain a
runspeed on a Marconi KNL node which is around 80% of a Xeon E5 v4.

There are still some puzzling features, though. One is that using
-no-vec, along with the settings:

VECTORISE   = no
VECTORISE_ALIGNED_ARRAYS= no
VECTORISE_INLINE= no
VECTORISE_ALIGN_FOR_CACHE   = no
VECTORISE_ALIGN_INTERIOR= no

in my optionlist, I obtain essentially the same throughput. This is a
vacuum McLachlan run with very little else turned on (but I can run a
QC0 benchmark for definiteness, if people are interested). I too am
using the November release.

Second, hyperthreading decreases the runspeed significantly. I am using
272 threads on the 68-core KNL, and for what I can gather from the
Carpet output, all of the cores are engaged. More cores are reported,
however, than available on the node:

INFO (Carpet): MPI is enabled
INFO (Carpet): Carpet is running on 1 processes
INFO (Carpet): This is process 0
INFO (Carpet): OpenMP is enabled
INFO (Carpet): This process contains 272 threads, this is thread 0
INFO (Carpet): There are 272 threads in total
INFO (Carpet): There are 272 threads per process
INFO (Carpet): This process runs on host r098c04s01, pid=2465840
INFO (Carpet): This process runs on 272 cores: 0-271
INFO (Carpet): Thread 0 runs on 1 core: 0
INFO (Carpet): Thread 1 runs on 1 core: 68
INFO (Carpet): Thread 2 runs on 1 core: 136
INFO (Carpet): Thread 3 runs on 1 core: 204
…

Notice that I am requesting hyperthreading by using num-smt=4 and
num-threads=272. Is this correct?

Thanks again,
Eloisa
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-02-28 Thread David Radice
Hello Eloisa,

sorry for the delay in the reply. For the records I did manage to compile and 
run ET on KNL (stampede), but I did not manage to run any benchmark with it 
yet. The current status is:

* intel-17: the compiler fails to compile Carpet and either gives an 
internal error or segfaults.
* gcc-6.3: used to compile and run with Erik's spack installation (it 
is currently broken). I did not really manage to benchmark it since even a 
low-resolution TOV test did not run to completion (meaning less than 4 coarse 
grid steps) within 30 minutes on 4 nodes.

This was using the current stable release of the ET (2016-11) and WhiskyTHC. 
You might have more luck with GRHydro / pure-vacuum runs.

Best wishes,

David

> On Feb 22, 2017, at 5:53 PM, Haas, Roland  wrote:
> 
> Hello Eloisa,
> 
> yup, widely off. David with Erik's help had a try at ET on KNL (stampede) and 
> you can find some information here: 
> https://docs.einsteintoolkit.org/et-docs/2017_MHD_Workshop
> 
> Yours,
> Roland
> 
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://keys.gnupg.net.
> 
> 
> From: users-boun...@einsteintoolkit.org [users-boun...@einsteintoolkit.org] 
> on behalf of Eloisa Bentivegna [eloisa.bentive...@ct.infn.it]
> Sent: Wednesday, February 22, 2017 16:24
> To: Einstein Toolkit Users
> Subject: [Users] ET on KNL.
> 
> Dear all,
> 
> I was wondering if anybody is using the ET on the Knights Landing
> architecture, and what sort of performance one could expect to get.
> Admittedly without much optimization, I am measuring a performance per
> core which is almost two orders of magnitude smaller than that of a
> Broadwell Xeon. Does this sound wildly off?
> 
> Eloisa
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
> 



signature.asc
Description: Message signed with OpenPGP
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-02-23 Thread Roland Haas
Hello Eloisa,

> thanks for the tips. I will try the stampede configuration (the machine
> I am trying this on is CINECA's Marconi, for the record) and see if it
> improves things.
There's also a configuration for the DOE Corie system for knl however
that one is a Cray so may look very different.

Yours,
Roland

-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://keys.gnupg.net.


pgpPoYEKuHyu8.pgp
Description: OpenPGP digital signature
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-02-23 Thread Eloisa Bentivegna
On 23/02/17 00:14, Erik Schnetter wrote:
> Eloisa
> 
> The general consensus (I heard that as well from TACC staff) is that a
> KNL node is about as fast as a modern Xeon node. That agrees with what I
> measured on Cori.
> 
> The per-core performance is lower (maybe by a factor of two or three)
> because the cores are slower, but there are also more cores.

Roland, Erik,

thanks for the tips. I will try the stampede configuration (the machine
I am trying this on is CINECA's Marconi, for the record) and see if it
improves things.

Eloisa
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-02-22 Thread Erik Schnetter
Hee Il

Yes, thorn Vectors was updated from KNC to KNL.

-erik

On Wed, Feb 22, 2017 at 6:05 PM, Hee Il Kim  wrote:

> Hi,
>
> Thanks Roland. So, the issue of Intel compiler was fixed? No upgrade of
> Vector thorn is required? Please take a look at my previous post.
> http://lists.einsteintoolkit.org/pipermail/users/2016-November/005097.html
>
> Hee Il
>
>
>
> 2017-02-23 7:53 GMT+09:00 Haas, Roland :
>
>> Hello Eloisa,
>>
>> yup, widely off. David with Erik's help had a try at ET on KNL (stampede)
>> and you can find some information here: https://docs.einsteintoolkit.o
>> rg/et-docs/2017_MHD_Workshop
>>
>> Yours,
>> Roland
>>
>> --
>> My email is as private as my paper mail. I therefore support encrypting
>> and signing email messages. Get my PGP key from http://keys.gnupg.net.
>>
>> 
>> From: users-boun...@einsteintoolkit.org [users-bounces@einsteintoolkit
>> .org] on behalf of Eloisa Bentivegna [eloisa.bentive...@ct.infn.it]
>> Sent: Wednesday, February 22, 2017 16:24
>> To: Einstein Toolkit Users
>> Subject: [Users] ET on KNL.
>>
>> Dear all,
>>
>> I was wondering if anybody is using the ET on the Knights Landing
>> architecture, and what sort of performance one could expect to get.
>> Admittedly without much optimization, I am measuring a performance per
>> core which is almost two orders of magnitude smaller than that of a
>> Broadwell Xeon. Does this sound wildly off?
>>
>> Eloisa
>> ___
>> Users mailing list
>> Users@einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>> ___
>> Users mailing list
>> Users@einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>
>
>
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
>


-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-02-22 Thread Erik Schnetter
Eloisa

The general consensus (I heard that as well from TACC staff) is that a KNL
node is about as fast as a modern Xeon node. That agrees with what I
measured on Cori.

The per-core performance is lower (maybe by a factor of two or three)
because the cores are slower, but there are also more cores.

-erik


On Wed, Feb 22, 2017 at 5:53 PM, Haas, Roland  wrote:

> Hello Eloisa,
>
> yup, widely off. David with Erik's help had a try at ET on KNL (stampede)
> and you can find some information here: https://docs.einsteintoolkit.
> org/et-docs/2017_MHD_Workshop
>
> Yours,
> Roland
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://keys.gnupg.net.
>
> 
> From: users-boun...@einsteintoolkit.org [users-boun...@einsteintoolkit.org]
> on behalf of Eloisa Bentivegna [eloisa.bentive...@ct.infn.it]
> Sent: Wednesday, February 22, 2017 16:24
> To: Einstein Toolkit Users
> Subject: [Users] ET on KNL.
>
> Dear all,
>
> I was wondering if anybody is using the ET on the Knights Landing
> architecture, and what sort of performance one could expect to get.
> Admittedly without much optimization, I am measuring a performance per
> core which is almost two orders of magnitude smaller than that of a
> Broadwell Xeon. Does this sound wildly off?
>
> Eloisa
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>



-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-02-22 Thread Hee Il Kim
Hi,

Thanks Roland. So, the issue of Intel compiler was fixed? No upgrade of
Vector thorn is required? Please take a look at my previous post.
http://lists.einsteintoolkit.org/pipermail/users/2016-November/005097.html

Hee Il



2017-02-23 7:53 GMT+09:00 Haas, Roland :

> Hello Eloisa,
>
> yup, widely off. David with Erik's help had a try at ET on KNL (stampede)
> and you can find some information here: https://docs.einsteintoolkit.
> org/et-docs/2017_MHD_Workshop
>
> Yours,
> Roland
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://keys.gnupg.net.
>
> 
> From: users-boun...@einsteintoolkit.org [users-boun...@einsteintoolkit.org]
> on behalf of Eloisa Bentivegna [eloisa.bentive...@ct.infn.it]
> Sent: Wednesday, February 22, 2017 16:24
> To: Einstein Toolkit Users
> Subject: [Users] ET on KNL.
>
> Dear all,
>
> I was wondering if anybody is using the ET on the Knights Landing
> architecture, and what sort of performance one could expect to get.
> Admittedly without much optimization, I am measuring a performance per
> core which is almost two orders of magnitude smaller than that of a
> Broadwell Xeon. Does this sound wildly off?
>
> Eloisa
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
> ___
> Users mailing list
> Users@einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users


Re: [Users] ET on KNL.

2017-02-22 Thread Haas, Roland
Hello Eloisa,

yup, widely off. David with Erik's help had a try at ET on KNL (stampede) and 
you can find some information here: 
https://docs.einsteintoolkit.org/et-docs/2017_MHD_Workshop

Yours,
Roland

--
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://keys.gnupg.net.


From: users-boun...@einsteintoolkit.org [users-boun...@einsteintoolkit.org] on 
behalf of Eloisa Bentivegna [eloisa.bentive...@ct.infn.it]
Sent: Wednesday, February 22, 2017 16:24
To: Einstein Toolkit Users
Subject: [Users] ET on KNL.

Dear all,

I was wondering if anybody is using the ET on the Knights Landing
architecture, and what sort of performance one could expect to get.
Admittedly without much optimization, I am measuring a performance per
core which is almost two orders of magnitude smaller than that of a
Broadwell Xeon. Does this sound wildly off?

Eloisa
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users
___
Users mailing list
Users@einsteintoolkit.org
http://lists.einsteintoolkit.org/mailman/listinfo/users