> > > The way to reduce the memory is to have the all-at-once algorithm (Mark is > an expert on this). But I am not sure how efficient it could be > implemented. >
I have some data from a 3D elasticity problem with 1.4B equations on: /home1/04906/bonnheim/olympus-keaveny/Olympus/olympus.petsc-3.9.3.skx-cxx-O on a skx-cxx-O named c478-062.stampede2.tacc.utexas.edu with 4800 processors, by bonnheim Fri Mar 15 04:48:27 2019 Using Petsc Release Version 3.9.3, unknown I assume this is on 100 Skylake nodes, but not sure. Using the all-at-once algorithm in my old solver Prometheus. There are six levels and thus 5 RAPs. The time for these RAPs is about 150 Mat-vecs on the fine grid. The total flop rate for these 5 RAPs was about 4x the flop rate for these Mat-vecs on the fine grid. This to be expected as the all-at-once algorithm is simple and not flop optimal and has high arithmetic intensity. There is a fair amount of load imbalance in this RAP, but the three coarsest grids have idle processes. The max/min was 2.6 and about 25% of the time was in the communication layer. (hand written communication layer that I wrote in grad school) The fine grid Mat-Vecs had a max/min of 3.1. Anyway, I know this is not very precise data, but maybe it would help to (de)motivate its implementation in PETSc. Mark