I have an update and a few questions regarding my project. For introduciton and details, please see http://www.opm-project.org/pipermail/opm/2014-October/000664.html

This will be another wall-of-text, but please bear with me.

Ok, so I have more or less completed my fallback CSR SparseMatrix representation as well as fixed a few bugs, and started testing the IncompFlowSolverHybrid (through upscaler-benchmark-relperm) with Petsc as a backend. So far I've accomplished the following:

* Performance
An issue that was brought up after the announcement was performance. Using my SparseMatrixBuilder, now modified to rely on std::map instead of std::vector, I am able to remove all allocation code in the IncompFlowSolverHybrid. I consider this a win, because it significantly simplifies setting up sparse matrices and leaks fewer implementation details because we don't have to circumvent Dune::BCRSMatrix somewhat clumsy interface (sorry, Markus!).

$ git log --stat IncompFlowSolverHybrid.hpp
opm/porsol/mimetic/IncompFlowSolverHybrid.hpp
 1 file changed, 79 insertions(+), 462 deletions(-)

In addition, using this construction method into using opm/core/linalg/LinearSolverIstl (which converts from flat array CSR into Dune matrices) I am able to reduce the running time of upscale-benchmark-relperm. I attribute this to a more efficient allocation and instantiation of the matrices, as the BCRSMatrix now can be build row-wise and not in the more inefficient random mode.

Running the benchmark with this new implementation using Dune-istl with CG/ILU(0) on my Intel i7-950@3.07GHz I get the following output:

Wallclock timing:
Input- and grid processing: 2.65775 sec
Upscaling:                  143.47 sec
Total wallclock time:       146.128 sec  (2 min 26.1279 sec)

Do the numbers look ok? The original, upstream code gives the following:
Wallclock timing:
Input- and grid processing: 2.75897 sec
Upscaling:                  171.677 sec
Total wallclock time:       174.436 sec  (2 min 54.4357 sec)


Note that this code is almost drop-in compatible with using Petsc as the linear solver backend.

* Petsc compability
The second thing I've accomplished is to run the benchmark using Petsc as the linear solver. Petsc support has yet to be merged into opm-core upstream, and still has a few issues that need to be resolved before this can happen, but is on the way. The benchmark reports correct results, but I still experience some performance issues, which hopefully will be discussed in this thread. I still consider it a win already, simply because it proves that it is possible to have simple support for multiple solvers, possibly with performance improvements to boot!

Running the same benchmark with CG/ILU on petsc:
Wallclock timing:
Input- and grid processing: 5.40389 sec
Upscaling:                  445.309 sec
Total wallclock time:       450.713 sec  (7 min 30.7128 sec)

Which brings me to the questions:
Petsc obviously performs a LOT worse than Dune. I ran the benchmark in callgrind which revealed that it spends ~48% of its time inside petsc's PCApply. Another 43% is spent in KSP_MatMult.

Unfortunately I'm not familiar enough with linear methods to actually consider if this is reasonable or not, so I ask here. Have I configured petsc wrong or is this to be expected. Inspecting the callgrind output of the Dune one leads me to think that this is ok, because it spends approx. 37% time in SeqILU0::apply.

It's worth mentioning that both Dune and Petsc uses a comparable amount of iterations - the variance between them is at most in the order of 50 iterations, something that can probably be attributed to Petsc taking a lot more parameters. They also both produce correct output. Using ksp_view in petsc gives:

KSP Object: 1 MPI processes
  type: cg
  maximum iterations=100000
  tolerances:  relative=1e-12, absolute=1e-05, divergence=100000
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: ilu
    ILU: out-of-place factorization
    0 levels of fill
    tolerance for zero pivot 2.22045e-14
    using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
    matrix ordering: natural
    factor fill ratio given 1, needed 1
      Factored matrix follows:
        Matrix Object:         1 MPI processes
          type: seqaij
          rows=102697, cols=102697
          package used to perform factorization: petsc
          total: nonzeros=1030113, allocated nonzeros=1030113
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
  linear system matrix = precond matrix:
  Matrix Object:   1 MPI processes
    type: seqaij
    rows=102697, cols=102697
    total: nonzeros=1030113, allocated nonzeros=0
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines
KSP Iterations 145, Final Residual 9.31856e-06

for a typical application. Does it look mis-configured somehow? Or is it just that Dune is THAT much faster? If so I am -very- impressed.


Allright, next question:
The problem with unifying several solvers is that they all take different parameters, name options and methods differently etc. Now, there are plenty of ways to work with this, including:

#1: Lowest common denominator. We provide a specific feature set that we support and say that our implementation only allows specific computations. This has the benefit of providing a simple interface that allows substitution of solvers with easy. The drawback is obviously that some configuration opportunities are discarded.

#2: Use a dynamic configuration method (such as ParameterGroup) that basically forwards options to the solver. The main drawback here is that the solvers really aren't unified at all, as every call to it usually must be special-cased for every single solver. Of course it then exposes the full power of the underlying solver.

#3: A hybrid. A well-defined supported interface and operations with an "unsafe & unportable" feature that allows for direct configuration. This sort-of breaks encapsulation, but if it is documented as unsafe and is only used for "emergencies" then I think it could be fine.

With option #1 or #3 some standardized mechanism for translating between our option "language" and the target solver option setting is needed. LinearSolverInterface currently doesn't support it directly - it does allow options to be passed through ParameterGroup, but is not very well defined in exactly what options should look like. I personally don't like it because it is impossible to verify statically.

The real question is: what solution do the community think it is worth going for? If LinearSolver* is to be used then it does require a little bit more work. Personally I prefer solution #3, but I'd love some community feedback on that.

Sincerely,
Jørgen Kvalsvik

_______________________________________________
Opm mailing list
Opm@opm-project.org
http://www.opm-project.org/mailman/listinfo/opm

Reply via email to