Re: [petsc-users] How to confirm the performance of asynchronous computations

2021-01-25 Thread Viet H.Q.H.
Dear Patrick Sanan, Thank you very much for your answer, especially for your code. I was able to compile and run your code on 8 nodes with 20 processes per node. Below is the result Testing with 160 MPI ranks reducing an array of size 32 (256 bytes) Running 5 burnin runs and 100 tests ... Done.

Re: [petsc-users] Convert a 3D DMDA sub-vector to a natural 2D vector

2021-01-25 Thread Sajid Ali
Hi Randall, Thanks for providing a pointer to the DMDAGetRay function! After looking at its implementation, I came up with a solution that creates a natural ordered slice vector on the same subset of processors as the DMDA ordered slice vector (by scattering from the DMDA order slice to a

Re: [petsc-users] Gathering distributed dense matrix only on rank 0

2021-01-25 Thread Matthew Knepley
On Mon, Jan 25, 2021 at 4:40 AM Roland Richter wrote: > Hei, > > is there a way to gather a distributed dense matrix only on rank 0, not > on the other ranks? My matrix processing/storage routines in my program > currently are single-threaded and only operate on rank 0, and therefore > I assume

Re: [petsc-users] How to confirm the performance of asynchronous computations

2021-01-25 Thread Patrick Sanan
Sorry about the delay in responding, but I'll add a couple of points here:1) It's important to have some reason to believe that pipelining will actually help your problem. Pipelined Krylov methods work by overlapping reductions with operator and preconditioner applications. So, to see speedup, the

[petsc-users] Gathering distributed dense matrix only on rank 0

2021-01-25 Thread Roland Richter
Hei, is there a way to gather a distributed dense matrix only on rank 0, not on the other ranks? My matrix processing/storage routines in my program currently are single-threaded and only operate on rank 0, and therefore I assume that I can ignore all other ranks. This should also save a bit

Re: [petsc-users] How to confirm the performance of asynchronous computations

2021-01-25 Thread Viet H.Q.H.
Dear Barry, Thank you very much for your information. It seems complicated to set environment variables to allow asynchronous progress and pinning threads to cores when using Intel MPI. $ export I_MPI_ASYNC_PROGRESS = 1 $ export I_MPI_ASYNC_PROGRESS_PIN =