> I did some tests with Tasks, using a taskgroup to solve 8 systems with
> around 60000 dofs in parallel, using a incomplete LU decomposition. Each
> task has to use its own solver instance, otherwise everything fails. The CG
> solver takes around 5 iterations. However I only observe a speedup of <= 10
> % over the serial solution, which I find a bit disappointing.
>
> Could this be due to a memory bottleneck? And if this is the case, how are
> the chances to get better results with Trilinos or PETSc?

I would assume so. From your description, your problem is heavily
dominated by memory access. Can't you do the multiplication with L
before the solver?

Best,
Guido
_______________________________________________
dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii

Reply via email to