[OPM] Support for more linear solvers: status

Jørgen Kvalsvik Fri, 14 Nov 2014 06:34:41 -0800

I have an update and a few questions regarding my project. Forintroduciton and details, please seehttp://www.opm-project.org/pipermail/opm/2014-October/000664.html


This will be another wall-of-text, but please bear with me.

Ok, so I have more or less completed my fallback CSR SparseMatrixrepresentation as well as fixed a few bugs, and started testing theIncompFlowSolverHybrid (through upscaler-benchmark-relperm) with Petscas a backend. So far I've accomplished the following:


* Performance

An issue that was brought up after the announcement was performance.Using my SparseMatrixBuilder, now modified to rely on std::map insteadof std::vector, I am able to remove all allocation code in theIncompFlowSolverHybrid. I consider this a win, because it significantlysimplifies setting up sparse matrices and leaks fewer implementationdetails because we don't have to circumvent Dune::BCRSMatrix somewhatclumsy interface (sorry, Markus!).


$ git log --stat IncompFlowSolverHybrid.hpp
opm/porsol/mimetic/IncompFlowSolverHybrid.hpp
 1 file changed, 79 insertions(+), 462 deletions(-)

In addition, using this construction method into usingopm/core/linalg/LinearSolverIstl (which converts from flat array CSRinto Dune matrices) I am able to reduce the running time ofupscale-benchmark-relperm. I attribute this to a more efficientallocation and instantiation of the matrices, as the BCRSMatrix now canbe build row-wise and not in the more inefficient random mode.

Running the benchmark with this new implementation using Dune-istl withCG/ILU(0) on my Intel i7-950@3.07GHz I get the following output:


Wallclock timing:
Input- and grid processing: 2.65775 sec
Upscaling:                  143.47 sec
Total wallclock time:       146.128 sec  (2 min 26.1279 sec)

Do the numbers look ok? The original, upstream code gives the following:
Wallclock timing:
Input- and grid processing: 2.75897 sec
Upscaling:                  171.677 sec
Total wallclock time:       174.436 sec  (2 min 54.4357 sec)

Note that this code is almost drop-in compatible with using Petsc as thelinear solver backend.


* Petsc compability

The second thing I've accomplished is to run the benchmark using Petscas the linear solver. Petsc support has yet to be merged into opm-coreupstream, and still has a few issues that need to be resolved beforethis can happen, but is on the way. The benchmark reports correctresults, but I still experience some performance issues, which hopefullywill be discussed in this thread. I still consider it a win already,simply because it proves that it is possible to have simple support formultiple solvers, possibly with performance improvements to boot!


Running the same benchmark with CG/ILU on petsc:
Wallclock timing:
Input- and grid processing: 5.40389 sec
Upscaling:                  445.309 sec
Total wallclock time:       450.713 sec  (7 min 30.7128 sec)

Which brings me to the questions:

Petsc obviously performs a LOT worse than Dune. I ran the benchmark incallgrind which revealed that it spends ~48% of its time inside petsc'sPCApply. Another 43% is spent in KSP_MatMult.

Unfortunately I'm not familiar enough with linear methods to actuallyconsider if this is reasonable or not, so I ask here. Have I configuredpetsc wrong or is this to be expected. Inspecting the callgrind outputof the Dune one leads me to think that this is ok, because it spendsapprox. 37% time in SeqILU0::apply.

It's worth mentioning that both Dune and Petsc uses a comparable amountof iterations - the variance between them is at most in the order of 50iterations, something that can probably be attributed to Petsc taking alot more parameters. They also both produce correct output. Usingksp_view in petsc gives:


KSP Object: 1 MPI processes
  type: cg
  maximum iterations=100000
  tolerances:  relative=1e-12, absolute=1e-05, divergence=100000
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: ilu
    ILU: out-of-place factorization
    0 levels of fill
    tolerance for zero pivot 2.22045e-14
    using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
    matrix ordering: natural
    factor fill ratio given 1, needed 1
      Factored matrix follows:
        Matrix Object:         1 MPI processes
          type: seqaij
          rows=102697, cols=102697
          package used to perform factorization: petsc
          total: nonzeros=1030113, allocated nonzeros=1030113
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
  linear system matrix = precond matrix:
  Matrix Object:   1 MPI processes
    type: seqaij
    rows=102697, cols=102697
    total: nonzeros=1030113, allocated nonzeros=0
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines
KSP Iterations 145, Final Residual 9.31856e-06

for a typical application. Does it look mis-configured somehow? Or is itjust that Dune is THAT much faster? If so I am -very- impressed.



Allright, next question:

The problem with unifying several solvers is that they all takedifferent parameters, name options and methods differently etc. Now,there are plenty of ways to work with this, including:

#1: Lowest common denominator. We provide a specific feature set that wesupport and say that our implementation only allows specificcomputations. This has the benefit of providing a simple interface thatallows substitution of solvers with easy. The drawback is obviously thatsome configuration opportunities are discarded.

#2: Use a dynamic configuration method (such as ParameterGroup) thatbasically forwards options to the solver. The main drawback here is thatthe solvers really aren't unified at all, as every call to it usuallymust be special-cased for every single solver. Of course it then exposesthe full power of the underlying solver.

#3: A hybrid. A well-defined supported interface and operations with an"unsafe & unportable" feature that allows for direct configuration. Thissort-of breaks encapsulation, but if it is documented as unsafe and isonly used for "emergencies" then I think it could be fine.

With option #1 or #3 some standardized mechanism for translating betweenour option "language" and the target solver option setting is needed.LinearSolverInterface currently doesn't support it directly - it doesallow options to be passed through ParameterGroup, but is not very welldefined in exactly what options should look like. I personally don'tlike it because it is impossible to verify statically.

The real question is: what solution do the community think it is worthgoing for? If LinearSolver* is to be used then it does require a littlebit more work. Personally I prefer solution #3, but I'd love somecommunity feedback on that.


Sincerely,
Jørgen Kvalsvik

_______________________________________________
Opm mailing list
Opm@opm-project.org
http://www.opm-project.org/mailman/listinfo/opm

[OPM] Support for more linear solvers: status

Reply via email to