Hi Serbulent,

I ran a few tests with the code that you pasted. I removed the viewers as
they aren't relevant. I tested on three different machines, my laptop with
4 dual-core CPUs, a workstation that has 8 processors, and some large
shared memory machines with more than 48 processors on a queuing system. I
only ran each job once so not very scientific.

machine  solver   grid   nproc    time (s)
laptop   pysparse  gmsh  1        37
laptop   trilinos  fipy  1        68
laptop   trilinos  gmsh  1        78
laptop   trilinos  gmsh  2       49
laptop   trilinos  gmsh  4       47
laptop   trilinos  gmsh  8       37
workstation trilinos  gmsh  1    156
workstation trilinos  gmsh  2    91
workstation trilinos  gmsh  4    59
workstation trilinos  gmsh  8    145
workstation trilinos  fipy  1    156
cluster     trlinos   gmsh  1    210
cluster     trlinos   gmsh  2    157
cluster     trlinos   gmsh  4    64
cluster     trlinos   gmsh  8    44
cluster     trlinos   gmsh  16   16
cluster     trlinos   gmsh  24   17
cluster     trlinos   gmsh  32   17
cluster     trlinos   gmsh  48   9

 * On my laptop, I see some speed up, but it is generally fairly bad
scaling and this has been my experience in the past.

 * There may have been another job on the workstation, hence why it didn't
scale well to 8 processes

 * Using GmshGrid2D "gmsh" isn't that much slower than using a Grid2D
"fipy" on my laptop and ran the same on the workstation,

 * PySparse is about twice as fast as Trilinos. This is with 500
iterations. This may be way more iterations than you need for a lot of
problems so decreasing the number of iterations will reduce the difference
between Trilinos and PySparse. Parallel will buy you quite a lot under
those circumstances.

  * On the shared memory machine "cluster" the results are quite variable.
This operates on a queuing system so there can be other jobs running on the
same machine. However the scaling for 48 processes seems quite good having
a 23 fold speed up. The problem is not that big so I can't imagine it
scalint that well past 64 nodes.

  * I think that some of the overhead associated with both the Python
interface and firing up preconditioners with PyTrilinos needs some work.

The bottom line is if you don't spend too long in the solver and you are on
a machine with a lot of CPUs (not a laptop) it is worth looking into using
Trilinos.

On Tue, Sep 30, 2014 at 11:41 AM, Serbulent UNSAL <serbule...@gmail.com>
wrote:

> Thanks for your interest,
>
> I already shutdown preconditioner as shown in notebook with;
>
> solvers.linearPCGSolver.LinearPCGSolver(precon=None, iterations=500,
> tolerance=1e-15)
>
> I think this should turn preconditioner off but still code run completed
> at around 55 sec with 8 cores (serial was around 41 sec). So I think
> problem is beyond preconditioner. I paste last version of the code below if
> you'd like to try.
>


Try changing the number of solver iterations to 1. It should be
embarrassingly parallel at that point. This is just to determine that
Trilinos is actually the issue and not something else.

Also, are you certain you are running in parallel. This has caught me out
before. Print "fipy.parallelComm.procID", also print mpi4py version of
procID and Epetra's version "Epetra.PyComm().MyPID()".

-- 
Daniel Wheeler
_______________________________________________
fipy mailing list
fipy@nist.gov
http://www.ctcms.nist.gov/fipy
  [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]

Reply via email to