>
>
> The way to reduce the memory is to have the all-at-once algorithm (Mark is
> an expert on this). But I am not sure how efficient it could be
> implemented.
>

I have some data  from a 3D elasticity problem with 1.4B equations on:

/home1/04906/bonnheim/olympus-keaveny/Olympus/olympus.petsc-3.9.3.skx-cxx-O
on a skx-cxx-O named c478-062.stampede2.tacc.utexas.edu with 4800
processors, by bonnheim Fri Mar 15 04:48:27 2019
Using Petsc Release Version 3.9.3, unknown

I assume this is on 100 Skylake nodes, but not sure.

Using the all-at-once algorithm in my old solver Prometheus. There are six
levels and thus 5 RAPs. The time for these RAPs is about 150 Mat-vecs on
the fine grid.

The total flop rate for these 5 RAPs was about 4x the flop rate for these
Mat-vecs on the fine grid. This to be expected as the all-at-once algorithm
is simple and not flop optimal and has high arithmetic intensity.

There is a fair amount of load imbalance in this RAP, but the three
coarsest grids have idle processes. The max/min was 2.6 and about 25% of
the time was in the communication layer. (hand written communication layer
that I wrote in grad school)

The fine grid Mat-Vecs had a max/min of 3.1.

Anyway, I know this is not very precise data, but maybe it would help to
(de)motivate its implementation in PETSc.

Mark

Reply via email to