On , 2021Dec11, at 17:56, Rohan Yadav <roh...@alumni.cmu.edu<mailto:roh...@alumni.cmu.edu>> wrote:
40 mpi ranks on a single node should be similar performance as 40 threads. Both petsc and taco are doing a row-based parallelism strategy so it should line up. An MPI division of rows is static. Petsc divides strictly by numbers of rows. A thread based system can do things like “schedule(guided)” (OpenMP) and get better load balancing if the rows have widely differing numbers of nonzero. Victor.