Thanks Junchao, that looks good. The zig zag pattern is visible but not as wild. The speed up is similar.
Chris ________________________________________ From: Junchao Zhang <[email protected]> Sent: Tuesday, October 21, 2025 11:17 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] interpreting petsc streams result Hi, Chris, I think I am done with the MR, https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7651__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7m4wDWgDU$ You can look at the sample output there. The array size is now very large, supporting an aggregated L3 cache size of 1,920MB. --Junchao Zhang On Tue, Oct 21, 2025 at 6:17 AM Klaij, Christiaan <[email protected]<mailto:[email protected]>> wrote: OK, experiments will have to wait till we get the hardware. Can you give me a sign when you are done with the merge request? I would like to try with the increased array size, other vendors already warned me that "the array in stream is quiet small". Chris ________________________________________ From: Junchao Zhang <[email protected]<mailto:[email protected]>> Sent: Monday, October 20, 2025 6:36 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] interpreting petsc streams result Hi, Chris, Since we compute the speed up off the bandwidth achieved by a single MPI process, and a process can drive all memory channels, the maximum speed up can only come from experiments (vs. not by # of memory channels). --Junchao Zhang On Mon, Oct 20, 2025 at 9:45 AM Klaij, Christiaan <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> wrote: Hi Junchao, Thanks for you answer. Regarding the speed-up what would you expect if not 24 out of 64, and why? Chris ________________________________________ [cid:ii_19a027041d3d825dd561] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7mdXphyZs$ <https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7mdXphyZs$ ><https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7muq8j-ds$ > [Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7mLqrnypA$ > [LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7mzmmkNA0$ > [YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7mZCDzHLA$ > From: Junchao Zhang <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Sent: Friday, October 17, 2025 5:01 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] interpreting petsc streams result Hi, Chris, I did have an MR https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7651__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7m4wDWgDU$ to improve mpistream. I should rework it after Barry's !6903. See my inlined comments to your questions On Fri, Oct 17, 2025 at 3:37 AM Klaij, Christiaan via petsc-users <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> wrote: Attached is a petsc streams result kindly provided by a hardware vendor for a single compute node, dual socket, with two AMD epyc 9355 processors. Each processor has 32 cores, 12 DDR5 memory channels and mem BW around 600 GB/s. * It is not immediately clear which line corresponds to which y-axis. Could future versions of petsc please color the axis label with the matching line color? definitely * Why would the achieved bandwidth be roughly 0.9 x 1e6 MB/s = 900 GB/s and not closer to 1200 GB/s? I recall it is actually not simple to get the theoretical max bandwidth. One has to use special SIMD instructions, compiler flags and streaming stores etc. * The speed-up seems to be 12 out of 64, provided multiples of 8 cores are used. As expected given 12 memory channels? Maybe not, otherwise the speedup should be 24 as you have 24 channels. * Does the zig-zag pattern indicate a pinning problem, or is it unavoidable given the 8 core building block of these type of processors? I checked and found "make mpistream" uses --map-by core. I think we should use --map-by socket or --map-by l3cache. Chris [cid:ii_199f2a38566119b24a61] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7mdXphyZs$ <https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7mdXphyZs$ ><https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!f0ym-ubSw0zZ557es-25JfsDmPjk4EAhMPYF65uYFefhx7maXr2_xgDoONAvCV6uAJ-WpEtJKblk9W7mdXphyZs$ ><https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsBfp_z4A$> [Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsCH7BGfA$> [LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsDAV2fAI$> [YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsEyu_yEs$>
