Hi Serbulent, I ran a few tests with the code that you pasted. I removed the viewers as they aren't relevant. I tested on three different machines, my laptop with 4 dual-core CPUs, a workstation that has 8 processors, and some large shared memory machines with more than 48 processors on a queuing system. I only ran each job once so not very scientific.
machine solver grid nproc time (s) laptop pysparse gmsh 1 37 laptop trilinos fipy 1 68 laptop trilinos gmsh 1 78 laptop trilinos gmsh 2 49 laptop trilinos gmsh 4 47 laptop trilinos gmsh 8 37 workstation trilinos gmsh 1 156 workstation trilinos gmsh 2 91 workstation trilinos gmsh 4 59 workstation trilinos gmsh 8 145 workstation trilinos fipy 1 156 cluster trlinos gmsh 1 210 cluster trlinos gmsh 2 157 cluster trlinos gmsh 4 64 cluster trlinos gmsh 8 44 cluster trlinos gmsh 16 16 cluster trlinos gmsh 24 17 cluster trlinos gmsh 32 17 cluster trlinos gmsh 48 9 * On my laptop, I see some speed up, but it is generally fairly bad scaling and this has been my experience in the past. * There may have been another job on the workstation, hence why it didn't scale well to 8 processes * Using GmshGrid2D "gmsh" isn't that much slower than using a Grid2D "fipy" on my laptop and ran the same on the workstation, * PySparse is about twice as fast as Trilinos. This is with 500 iterations. This may be way more iterations than you need for a lot of problems so decreasing the number of iterations will reduce the difference between Trilinos and PySparse. Parallel will buy you quite a lot under those circumstances. * On the shared memory machine "cluster" the results are quite variable. This operates on a queuing system so there can be other jobs running on the same machine. However the scaling for 48 processes seems quite good having a 23 fold speed up. The problem is not that big so I can't imagine it scalint that well past 64 nodes. * I think that some of the overhead associated with both the Python interface and firing up preconditioners with PyTrilinos needs some work. The bottom line is if you don't spend too long in the solver and you are on a machine with a lot of CPUs (not a laptop) it is worth looking into using Trilinos. On Tue, Sep 30, 2014 at 11:41 AM, Serbulent UNSAL <serbule...@gmail.com> wrote: > Thanks for your interest, > > I already shutdown preconditioner as shown in notebook with; > > solvers.linearPCGSolver.LinearPCGSolver(precon=None, iterations=500, > tolerance=1e-15) > > I think this should turn preconditioner off but still code run completed > at around 55 sec with 8 cores (serial was around 41 sec). So I think > problem is beyond preconditioner. I paste last version of the code below if > you'd like to try. > Try changing the number of solver iterations to 1. It should be embarrassingly parallel at that point. This is just to determine that Trilinos is actually the issue and not something else. Also, are you certain you are running in parallel. This has caught me out before. Print "fipy.parallelComm.procID", also print mpi4py version of procID and Epetra's version "Epetra.PyComm().MyPID()". -- Daniel Wheeler
_______________________________________________ fipy mailing list fipy@nist.gov http://www.ctcms.nist.gov/fipy [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]