Dear Patrick Sanan,
Thank you very much for your answer, especially for your code.
I was able to compile and run your code on 8 nodes with 20 processes per
node. Below is the result
Testing with 160 MPI ranks
reducing an array of size 32 (256 bytes)
Running 5 burnin runs and 100 tests ... Done.
Hi Randall,
Thanks for providing a pointer to the DMDAGetRay function!
After looking at its implementation, I came up with a solution that creates
a natural ordered slice vector on the same subset of processors as the DMDA
ordered slice vector (by scattering from the DMDA order slice to a
On Mon, Jan 25, 2021 at 4:40 AM Roland Richter
wrote:
> Hei,
>
> is there a way to gather a distributed dense matrix only on rank 0, not
> on the other ranks? My matrix processing/storage routines in my program
> currently are single-threaded and only operate on rank 0, and therefore
> I assume
Sorry about the delay in responding, but I'll add a couple of points here:1) It's important to have some reason to believe that pipelining will actually help your problem. Pipelined Krylov methods work by overlapping reductions with operator and preconditioner applications. So, to see speedup, the
Hei,
is there a way to gather a distributed dense matrix only on rank 0, not
on the other ranks? My matrix processing/storage routines in my program
currently are single-threaded and only operate on rank 0, and therefore
I assume that I can ignore all other ranks. This should also save a bit
Dear Barry,
Thank you very much for your information.
It seems complicated to set environment variables to allow asynchronous
progress and pinning threads to cores when using Intel MPI.
$ export I_MPI_ASYNC_PROGRESS = 1
$ export I_MPI_ASYNC_PROGRESS_PIN =