Am 15.05.2012 09:36, schrieb Dave May: > I have seem similar behaviour comparing umfpack and superlu_dist, > however the difference wasn't enormous, possibly umfpack was a factor > of 1.2-1.4 times faster on 1 - 4 cores. > What sort of time differences are you observing? Can you post the > numbers somewhere? I attached my data to this mail. For the largest matrix, umfpack failed after allocating 4 GB of memory. I have not tried to figure out what's the problem there. As you can see, for these matrices the distributed solvers are slower by a factor of 2 or 3 compared to umfpack. For all solvers, I have used the standard parameters, so I have not played around with the permutation strategies and such things. This may be also the reason why superlu is much slower than superlu_dist with just one core as it makes use of different col and row permutation strategies. > However, umpack will not work on a distributed memory machine. > My personal preference is to use superlu_dist in parallel. In my > experience using it as a coarse grid solver for multigrid, I find it > much more reliable than mumps. However, when mumps works, its is > typically slightly faster than superlu_dist. Again, not by a large > amount - never more than a factor of 2 faster. In my codes I also make use of the distributed direct solvers for the coarse grid problems. I just wanted to make some tests how far away these solvers are from the sequential counterparts.
Thomas > > The failure rate using mumps is definitely higher (in my experience) > when running on large numbers of cores compared to superlu_dist. I've > never got to the bottom as to why it fails. > > Cheers, > Dave > > > On 15 May 2012 09:25, Thomas Witkowski<thomas.witkowski at tu-dresden.de> > wrote: >> I made some comparisons of using umfpack, superlu, superlu_dist and mumps to >> solve systems with sparse matrices arising from finite element method. The >> size of the matrices range from around 50000 to more than 3 million >> unknowns. I used 1, 2, 4, 8 and 16 nodes to make the benchmark. Now, I >> wonder that in all cases the sequential umfpack was the fastest one. So even >> with 16 cores, superlu_dist and mumps are slower. Can anybody of you confirm >> this observation? Are there any other parallel direct solvers around which >> are more efficient? >> >> Thomas -------------- next part -------------- #rows | umfpack / 1.core | superlu / 1.core | superlu_dist / 1.core | mumps / 1.core | -------------------------------------------------------------------------|----------------| 49923 | 0.644 | 4.914 | 2.148 | 1.731 | 198147 | 3.992 | 41.53 | 13.05 | 10.04 | 792507 | 32.33 | 463.5 | 66.75 | 52.56 | 3151975 | 4GB limit | - | 394.1 | 303.2 | #rows | superlu_dist / 1.core | superlu_dist / 2.core | superlu_dist / 4.core | superlu_dist / 8.core | superlu_dist / 16.core | ---------------------------------|-----------------------|-----------------------|-----------------------|------------------------| 49923 | 2.148 | 1.922 | 1.742 | 1.705 | 1.745 | 198147 | 13.05 | 11.77 | 10.47 | 9.832 | 9.565 | 792507 | 66.75 | 61.07 | 53.58 | 49.14 | 47.06 | 3151875 | 394.1 | failed | failed | failed | failed | #rows | mumps / 1.core | mumps / 2.core | mumps / 4.core | mumps / 8.core | mumps / 16.core | --------------------------|----------------|----------------|----------------|-----------------| 49923 | 1.731 | 1.562 | 1.485 | 1.426 | 1.418 | 198147 | 10.04 | 8.959 | 8.468 | 8.144 | 7.978 | 792507 | 52.56 | 45.38 | 42.56 | 40.22 | 39.12 | 3151875 | 303.2 | 255.7 | 232.2 | 216.0 | 210.0 |