I have an update and a few questions regarding my project. For
introduciton and details, please see
http://www.opm-project.org/pipermail/opm/2014-October/000664.html
This will be another wall-of-text, but please bear with me.
Ok, so I have more or less completed my fallback CSR SparseMatrix
representation as well as fixed a few bugs, and started testing the
IncompFlowSolverHybrid (through upscaler-benchmark-relperm) with Petsc
as a backend. So far I've accomplished the following:
* Performance
An issue that was brought up after the announcement was performance.
Using my SparseMatrixBuilder, now modified to rely on std::map instead
of std::vector, I am able to remove all allocation code in the
IncompFlowSolverHybrid. I consider this a win, because it significantly
simplifies setting up sparse matrices and leaks fewer implementation
details because we don't have to circumvent Dune::BCRSMatrix somewhat
clumsy interface (sorry, Markus!).
$ git log --stat IncompFlowSolverHybrid.hpp
opm/porsol/mimetic/IncompFlowSolverHybrid.hpp
1 file changed, 79 insertions(+), 462 deletions(-)
In addition, using this construction method into using
opm/core/linalg/LinearSolverIstl (which converts from flat array CSR
into Dune matrices) I am able to reduce the running time of
upscale-benchmark-relperm. I attribute this to a more efficient
allocation and instantiation of the matrices, as the BCRSMatrix now can
be build row-wise and not in the more inefficient random mode.
Running the benchmark with this new implementation using Dune-istl with
CG/ILU(0) on my Intel i7-950@3.07GHz I get the following output:
Wallclock timing:
Input- and grid processing: 2.65775 sec
Upscaling: 143.47 sec
Total wallclock time: 146.128 sec (2 min 26.1279 sec)
Do the numbers look ok? The original, upstream code gives the following:
Wallclock timing:
Input- and grid processing: 2.75897 sec
Upscaling: 171.677 sec
Total wallclock time: 174.436 sec (2 min 54.4357 sec)
Note that this code is almost drop-in compatible with using Petsc as the
linear solver backend.
* Petsc compability
The second thing I've accomplished is to run the benchmark using Petsc
as the linear solver. Petsc support has yet to be merged into opm-core
upstream, and still has a few issues that need to be resolved before
this can happen, but is on the way. The benchmark reports correct
results, but I still experience some performance issues, which hopefully
will be discussed in this thread. I still consider it a win already,
simply because it proves that it is possible to have simple support for
multiple solvers, possibly with performance improvements to boot!
Running the same benchmark with CG/ILU on petsc:
Wallclock timing:
Input- and grid processing: 5.40389 sec
Upscaling: 445.309 sec
Total wallclock time: 450.713 sec (7 min 30.7128 sec)
Which brings me to the questions:
Petsc obviously performs a LOT worse than Dune. I ran the benchmark in
callgrind which revealed that it spends ~48% of its time inside petsc's
PCApply. Another 43% is spent in KSP_MatMult.
Unfortunately I'm not familiar enough with linear methods to actually
consider if this is reasonable or not, so I ask here. Have I configured
petsc wrong or is this to be expected. Inspecting the callgrind output
of the Dune one leads me to think that this is ok, because it spends
approx. 37% time in SeqILU0::apply.
It's worth mentioning that both Dune and Petsc uses a comparable amount
of iterations - the variance between them is at most in the order of 50
iterations, something that can probably be attributed to Petsc taking a
lot more parameters. They also both produce correct output. Using
ksp_view in petsc gives:
KSP Object: 1 MPI processes
type: cg
maximum iterations=100000
tolerances: relative=1e-12, absolute=1e-05, divergence=100000
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: ilu
ILU: out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: natural
factor fill ratio given 1, needed 1
Factored matrix follows:
Matrix Object: 1 MPI processes
type: seqaij
rows=102697, cols=102697
package used to perform factorization: petsc
total: nonzeros=1030113, allocated nonzeros=1030113
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Matrix Object: 1 MPI processes
type: seqaij
rows=102697, cols=102697
total: nonzeros=1030113, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
not using I-node routines
KSP Iterations 145, Final Residual 9.31856e-06
for a typical application. Does it look mis-configured somehow? Or is it
just that Dune is THAT much faster? If so I am -very- impressed.
Allright, next question:
The problem with unifying several solvers is that they all take
different parameters, name options and methods differently etc. Now,
there are plenty of ways to work with this, including:
#1: Lowest common denominator. We provide a specific feature set that we
support and say that our implementation only allows specific
computations. This has the benefit of providing a simple interface that
allows substitution of solvers with easy. The drawback is obviously that
some configuration opportunities are discarded.
#2: Use a dynamic configuration method (such as ParameterGroup) that
basically forwards options to the solver. The main drawback here is that
the solvers really aren't unified at all, as every call to it usually
must be special-cased for every single solver. Of course it then exposes
the full power of the underlying solver.
#3: A hybrid. A well-defined supported interface and operations with an
"unsafe & unportable" feature that allows for direct configuration. This
sort-of breaks encapsulation, but if it is documented as unsafe and is
only used for "emergencies" then I think it could be fine.
With option #1 or #3 some standardized mechanism for translating between
our option "language" and the target solver option setting is needed.
LinearSolverInterface currently doesn't support it directly - it does
allow options to be passed through ParameterGroup, but is not very well
defined in exactly what options should look like. I personally don't
like it because it is impossible to verify statically.
The real question is: what solution do the community think it is worth
going for? If LinearSolver* is to be used then it does require a little
bit more work. Personally I prefer solution #3, but I'd love some
community feedback on that.
Sincerely,
Jørgen Kvalsvik
_______________________________________________
Opm mailing list
Opm@opm-project.org
http://www.opm-project.org/mailman/listinfo/opm