Hi Gael,

Interesting! I hadn't run a comparison against CholMod yet (but it is obviously 
the gold standard). There is a slight amount of automatic algorithmic switching 
that needs to be added in (https://gitlab.com/hodge_star/catamari/issues/2) 
before I would assume a blackbox comparison across the gamut of matrix types 
would be competitive. Catamari currently implements:
 * Sequential and multithreaded, supernodal right-looking (multifrontal) 
Cholesky, LDL^T, and LDL^H,
 * Sequential and multithreaded, supernodal left-looking Cholesky, LDL^T, and 
LDL^H,
 * Sequential scalar left-looking Cholesky, LDL^T, and LDL^H
 * Sequential scalar up-looking Cholesky, LDL^T, and LDL^H.

At the moment, the first category is used by default, but there are more 
refined categories:
 * For large, reasonably dense, multithreaded environments, the supernodal 
right-looking approach is preferred.
 * For large, reasonably dense, sequential environments, the supernodal 
left-looking approach might be preferred.
 * For small/sparse problems, the up-looking approach should be preferred, but 
there is not yet parallel support for this in catamari (not that it would be 
difficult).

According to CholMod's publications, it intelligently switches between the 
supernodal left-looking factorization and the scalar up-looking factorization 
(based upon the computational intensity as determined from the symbolic 
factorization). For higher core counts, the supernodal right-looking approach 
becomes critical so that the top of the assembly tree is parallelized.

On the subject of Eigen integration: I would defer to Eigen experts on the 
best/most-useful means of integration. If you believe that there would be added 
usefulness in transplanting Catamari's logic into core Eigen (on top of its 
BLAS and matrix classes), I think that could be interesting. I believe that my 
permissively-licensed AMD implementation (quotient) should be roughly 
equivalent to SuiteSparse's (and, I suppose, Eigen's) in terms of performance.

A wrapper might be a good first step, but this conversation has gotten me 
thinking that improving the algorithmic switching logic is a priority.

With Thanks,
Jack

On Sat, Mar 9, 2019, at 2:31 AM, Gael Guennebaud wrote:
> Hi Jack,
> 
> This is a very nice piece of work, congrats! I did one benchmark if 
> catamari+MKL on the "ldoor" matrix [1] and got same speed as 
> suitesparse/Cholmod: about 8s on my Haswell quad-core (15s without openmp).
> 
> Requiring c++14 for new features is not a big deal anymore.
> 
> How do you envision this "Eigen/Catamari" module? A simple wrapper as we do 
> with Cholmod/UmfPack/Pastix/Pardiso? or a built-in Eigen-ified version? As 
> you probably already figured out Eigen already provides several features 
> redundant with what you developed in Catamari:
> - BLAS/LAPACK layer with optional compile-time fallback to any BLAS / LAPACKE 
> / MKL
> - Sparse storage
> - Fill-in reorderings, we have AMD and COLAMD, so yours might rather be 
> complementary.
> 
> Gaƫl
> 
> [1] https://sparse.tamu.edu/GHS_psdef/ldoor
> 
> On Fri, Mar 8, 2019 at 4:59 PM Jack Poulson <[email protected]> wrote:
>> __
>> Dear Eigen Maintainers,
>> 
>> I noticed that Eigen's current documentation only lists a simplicial LDLT 
>> sparse-direct solver; I spent my last several months building a header-only 
>> C++ (optionally multithreaded) supernodal Cholesky/LDL^T/LDL^H sparse-direct 
>> solver (which also implements high-performance Determinantal Point Process 
>> sampling with similar techniques) and am curious if it would be seen as a 
>> good fit for an unsupported Eigen module. The solver achieves as much as 1 
>> TF in double-precision on my 16-core skylake workstation.
>> 
>> The current project is described here:
>> https://hodgestar.com/catamari/
>> I released v0.1 a few days ago, but I have improved the performance in 
>> 'master' substantially since then.
>> 
>> My assumption is that the biggest hurdle will be due to the solver 
>> (catamari) being implemented in C++14, and that Eigen is currently 
>> restricted to C++03. And, while catamari optionally incorporates OpenMP task 
>> scheduling for the multithreaded parallelism, this support is only enabled 
>> if an appropriate preprocessor directive is defined (CATAMARI_OPENMP).
>> 
>> Any advice would be appreciated.
>> 
>> Sincerely,
>> Jack Poulson

Reply via email to