Hi Johan, you wrote:
[snip] > Let's look at just assembly for now. We start by creating an 1e5 x 1e5 > sparse matrix. We then successively insert 1e5 elements in the diagonal. > What happens is that the insertion speed is linear for the first ~50000 > elements, and then grinds to a halt. The initial insertion speed is > somewhere at 1e5 elements/s, after 50000 elements it sharply falls to > 1000 elements/s (to compare, our naive implementation gets 1e6 elements / > s). On 32-bit systems the upper dimensions for sparse_matrix are 65535 x 65535 due to the internal mapping of (i, j) -> (i * size2+j) (or (i + j * size1). I've added a corresponding check to the debug version. For larger dimension consider to use compressed_matrix (FORTRAN CRS/CCS format) or coordinate_matrix (FORTRAN COO format) please. I'm also assuming you're checking the boost_1_29_0 release, so you'll probably have to look into the current CVS version for these (many changes). > This is quite bizarre. > > With an 1e6 x 1e6 matrix, the insertion speed is smoother, but slow > throughout (on the order 1000 elements / s). > > I've tested this on two different computers, one dual Athlon and one PII > laptop, and the result is identical (aside from the absolute speed > numbers). Memory is not at all full, so that can't be an issue. > > The code for this simple test is at the end. > > The compiler used was g++-2.95 (in Debian) and Boost 1.29 (also Debian). > > I've also observed a quadratic time complexity (in the non-zero elements) > in sparse matrix-vector multiplication. I think this has been brought up > before though. Using the experimental (means undocumented ;-( axpy_prod() functions one achieves the correct complexity. > We've also tested MTL, and while it doesn't produce these kinds of wild > irregularities, the performance is a factor 2 or 3 worse than our naive > implementation, and the memory usage is a factor 1.5-2 worse. > > This makes me question the claim that genericity does not add overhead (in > practice). 10-20% overhead is acceptable, but when we have 100-200% > overhead, both in performance and memory, then it makes it impossible to > justify its use. Alexei Novakov's benchmarks show an overhead of around 100% for the better ublas sparse matrix vector multiply implementations currently. He proposed an improvement of sparse matrix iterators already, which we'll tackle immediately after the upcoming 1_30_0 release. Thanks for your feedback, Joerg _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost