It seems that the ReLAPACK authors were a bit pessimistic about the possibility of a Level 3 BLAS recursive QR algorithm, since back in 2000, Elmroth and Gustavson published some work on exactly this. Their algorithm tends to work best for "tall skinny" matrices with M >> N, since the number of flops grows as O(N^3).
One of the LAPACK authors is currently doing research on this algorithm, and has kindly provided GSL with GPL'd code to implement the Elmroth/Gustavson algorithm, including the decomposition, Q^T b calculation, and calculation of the full Q. All of these operations significantly outperform the current Level 2 implementation for various matrix sizes I have tried. Everything is already on the git. Due to the nature of the new algorithm, I needed to create a new interface, which I have suffixed with _r (i.e. gsl_linalg_QR_decomp_r). So GSL now has "state of the art" algorithms for Cholesky, LU and QR. These are exciting times :) There are other parts of the library which use QR decompositions which will likely benefit from this new algorithm, though I have not yet converted them over. Over the years, there have been discussions about whether we should create GSL interfaces for LAPACK routines. I think these new developments have reduced the need for that, though of course there are many other areas where LAPACK is far superior (i.e. SVD, eigensystems, COD). Enjoy, Patrick On 6/8/19 2:45 PM, Patrick Alken wrote: > The LU decomposition in GSL (both real and complex) is now based on a > recursive Level 3 BLAS algorithm. The performance improvement is quite > dramatic when using an optimized multi-threaded BLAS library like ATLAS. > I'd be interested in hearing feedback from anyone who uses Cholesky/LU > factorizations in their work. GSL may out-perform LAPACK in these areas > now, and the recursive algorithms are surprisingly simple to implement > and fit quite nicely with GSL's codebase. > > Enjoy, > Patrick > > On 5/30/19 9:06 AM, Patrick Alken wrote: >> Hi all, >> >> >> I have recently learned of a project called ReLAPACK >> (https://github.com/HPAC/ReLAPACK, paper here: >> https://arxiv.org/abs/1602.06763) which implements a number of LAPACK >> algorithms (such as LU, Cholesky, Sylvester equations) using recursive >> methods which can use Level 3 BLAS calls. The paper shows that most of >> these algorithms out-perform the block Level 3 algorithms in LAPACK. The >> main advantage is that LAPACK block algorithms require fixing the block >> size ahead of time, which may not be optimal for a given architecture, >> while the recursive methods don't require a block size parameter. >> >> The recursive methods do however require a "base case" - i.e. at what >> size matrix should it switch to the Level 2 BLAS based algorithms. >> ReLAPACK fixes this currently at N=24. >> >> Anyway, the recursive Cholesky variant is quite straightforward to >> implement, and I have already coded it for GSL (both the decomposition >> and inversion). I did some tests for N=10,000 with ATLAS BLAS and found >> that it runs faster than DPOTRF from LAPACK. This fast Cholesky >> decomposition will improve the performance also for the generalized >> symmetric definite eigensolvers, and the least squares modules (linear >> and nonlinear). >> >> So GSL now has a competitive Cholesky solver, which I think should make >> many GSL users happy :) >> >> Work is currently underway to implement the recursive pivoted LU >> decomposition in GSL. >> >> Unfortunately the ReLAPACK authors state that the QR algorithm is not >> amenable to recursive methods, so the block QR seems to still be the >> best choice. It would be nice to implement this for GSL, in case any >> volunteers are looking for a project ;) >> >> Patrick >> >>
