Hello everybody,
for one part of a calculation I have to solve several (about 6-15)
linear systems of the form
M x_i = L_i b
M is the Mass matrix, the Matrix L_i is the discretisation of an
integral operator and is therefore denser than M. The L_i can take
several hundert MB each in some cases. The size of the problems is
moderate, ranging from several ten to few hundred thousand dofs.
This step takes a significant part of the time Now I was thinking
whether it might be possible to do make better use of the 4 cores of my
computer. As these linear systems are independent, it should be possible
to solve them in parallel. I do not intend to distribute this
calculation to multiple machines. However I will get access to a machine
with 12 cores in the next few weeks.
In principle I see several different possibilities how to do that. One
could either use tasks or threads to solve the linear systems
simultaneously, or use Trilinos or PETSc to solve them one after
another, but using multiple MPI Processes.
I did some tests with Tasks, using a taskgroup to solve 8 systems with
around 60000 dofs in parallel, using a incomplete LU decomposition. Each
task has to use its own solver instance, otherwise everything fails. The
CG solver takes around 5 iterations. However I only observe a speedup of
<= 10 % over the serial solution, which I find a bit disappointing.
Could this be due to a memory bottleneck? And if this is the case, how
are the chances to get better results with Trilinos or PETSc?
Thank you very much for your efforts
Johannes
_______________________________________________
dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii