Re: [petsc-users] interpreting petsc streams result

Junchao Zhang Tue, 21 Oct 2025 14:18:42 -0700

Hi, Chris,
  I think I am done with the MR,
https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7651__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp3catWNW$
 
  You can look at the sample output there.  The array size is now very
large,  supporting an aggregated L3 cache size of 1,920MB.


--Junchao Zhang


On Tue, Oct 21, 2025 at 6:17 AM Klaij, Christiaan <[email protected]> wrote:

> OK, experiments will have to wait till we get the hardware.
>
> Can you give me a sign when you are done with the merge request? I
> would like to try with the increased array size, other vendors
> already warned me that "the array in stream is quiet small".
>
> Chris
>
> ________________________________________
> From: Junchao Zhang <[email protected]>
> Sent: Monday, October 20, 2025 6:36 PM
> To: Klaij, Christiaan
> Cc: PETSc users list
> Subject: Re: [petsc-users] interpreting petsc streams result
>
> Hi, Chris,
>   Since we compute the speed up off the bandwidth achieved by a single MPI
> process, and a process can drive all memory channels,  the maximum speed up
> can only come from experiments (vs. not by # of memory channels).
>
>   --Junchao Zhang
>
>
> On Mon, Oct 20, 2025 at 9:45 AM Klaij, Christiaan <[email protected]
> <mailto:[email protected]>> wrote:
> Hi Junchao,
>
> Thanks for you answer. Regarding the speed-up what would you expect if not
> 24 out of 64, and why?
>
> Chris
>
> ________________________________________
> [cid:ii_19a027041d3d825dd561]
>
> dr. ir.         Christiaan       Klaij   |      senior researcher
> Research & Development   |      CFD Development
> T +31 317 49 33 44<tel:+31%20317%2049%2033%2044>         |
> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp84_t49H$
>  
> <https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp6Qf0VQG$
>  >
> [Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp0gsxdAC$
>  >
> [LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYpwdttGUe$
>  >
> [YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp_4oXT7O$
>  >
>
> From: Junchao Zhang <[email protected]<mailto:
> [email protected]>>
> Sent: Friday, October 17, 2025 5:01 PM
> To: Klaij, Christiaan
> Cc: PETSc users list
> Subject: Re: [petsc-users] interpreting petsc streams result
>
> Hi, Chris,
> I did have an MR 
> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7651__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp3catWNW$
>   to
> improve mpistream. I should rework it after Barry's !6903. See my inlined
> comments to your questions
>
> On Fri, Oct 17, 2025 at 3:37 AM Klaij, Christiaan via petsc-users <
> [email protected]<mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>>> wrote:
> Attached is a petsc streams result kindly provided by a hardware
> vendor for a single compute node, dual socket, with two AMD epyc
> 9355 processors. Each processor has 32 cores, 12 DDR5 memory
> channels and mem BW around 600 GB/s.
>
> * It is not immediately clear which line corresponds to which
> y-axis. Could future versions of petsc please color the axis
> label with the matching line color?
> definitely
>
>
> * Why would the achieved bandwidth be roughly 0.9 x 1e6 MB/s =
> 900 GB/s and not closer to 1200 GB/s?
> I recall it is actually not simple to get the theoretical max bandwidth.
> One has to use special SIMD instructions, compiler flags and streaming
> stores etc.
>
>
> * The speed-up seems to be 12 out of 64, provided multiples of 8
> cores are used. As expected given 12 memory channels?
> Maybe not, otherwise the speedup should be 24 as you have 24 channels.
>
>
> * Does the zig-zag pattern indicate a pinning problem, or is it
> unavoidable given the 8 core building block of these type of
> processors?
> I checked and found "make mpistream" uses --map-by core. I think we should
> use --map-by socket or --map-by l3cache.
>
>
> Chris
> [cid:ii_199f2a38566119b24a61]
> dr. ir. Christiaan Klaij | senior researcher
> Research & Development | CFD Development
> T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | 
> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp84_t49H$
>  <
> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp84_t49H$
>  ><
> https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsBfp_z4A$
> >
> [Facebook]<
> https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsCH7BGfA$
> >
> [LinkedIn]<
> https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsDAV2fAI$
> >
> [YouTube]<
> https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsEyu_yEs$
> >
>

Re: [petsc-users] interpreting petsc streams result

Reply via email to