Hi, Chris, I think I am done with the MR, https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7651__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp3catWNW$ You can look at the sample output there. The array size is now very large, supporting an aggregated L3 cache size of 1,920MB.
--Junchao Zhang On Tue, Oct 21, 2025 at 6:17 AM Klaij, Christiaan <[email protected]> wrote: > OK, experiments will have to wait till we get the hardware. > > Can you give me a sign when you are done with the merge request? I > would like to try with the increased array size, other vendors > already warned me that "the array in stream is quiet small". > > Chris > > ________________________________________ > From: Junchao Zhang <[email protected]> > Sent: Monday, October 20, 2025 6:36 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] interpreting petsc streams result > > Hi, Chris, > Since we compute the speed up off the bandwidth achieved by a single MPI > process, and a process can drive all memory channels, the maximum speed up > can only come from experiments (vs. not by # of memory channels). > > --Junchao Zhang > > > On Mon, Oct 20, 2025 at 9:45 AM Klaij, Christiaan <[email protected] > <mailto:[email protected]>> wrote: > Hi Junchao, > > Thanks for you answer. Regarding the speed-up what would you expect if not > 24 out of 64, and why? > > Chris > > ________________________________________ > [cid:ii_19a027041d3d825dd561] > > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp84_t49H$ > > <https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp6Qf0VQG$ > > > [Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp0gsxdAC$ > > > [LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYpwdttGUe$ > > > [YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp_4oXT7O$ > > > > From: Junchao Zhang <[email protected]<mailto: > [email protected]>> > Sent: Friday, October 17, 2025 5:01 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] interpreting petsc streams result > > Hi, Chris, > I did have an MR > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7651__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp3catWNW$ > to > improve mpistream. I should rework it after Barry's !6903. See my inlined > comments to your questions > > On Fri, Oct 17, 2025 at 3:37 AM Klaij, Christiaan via petsc-users < > [email protected]<mailto:[email protected]><mailto: > [email protected]<mailto:[email protected]>>> wrote: > Attached is a petsc streams result kindly provided by a hardware > vendor for a single compute node, dual socket, with two AMD epyc > 9355 processors. Each processor has 32 cores, 12 DDR5 memory > channels and mem BW around 600 GB/s. > > * It is not immediately clear which line corresponds to which > y-axis. Could future versions of petsc please color the axis > label with the matching line color? > definitely > > > * Why would the achieved bandwidth be roughly 0.9 x 1e6 MB/s = > 900 GB/s and not closer to 1200 GB/s? > I recall it is actually not simple to get the theoretical max bandwidth. > One has to use special SIMD instructions, compiler flags and streaming > stores etc. > > > * The speed-up seems to be 12 out of 64, provided multiples of 8 > cores are used. As expected given 12 memory channels? > Maybe not, otherwise the speedup should be 24 as you have 24 channels. > > > * Does the zig-zag pattern indicate a pinning problem, or is it > unavoidable given the 8 core building block of these type of > processors? > I checked and found "make mpistream" uses --map-by core. I think we should > use --map-by socket or --map-by l3cache. > > > Chris > [cid:ii_199f2a38566119b24a61] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp84_t49H$ > < > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp84_t49H$ > >< > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsBfp_z4A$ > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsCH7BGfA$ > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsDAV2fAI$ > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsEyu_yEs$ > > >
