Hello, I am trying to run a PETSc code on a parallel machine (it may be relevant that each node contains four AMD Opteron Quad-Core 64-bit processors (16 cores in all) as an SMP unit with 32GB of memory) and I'm observing some behaviour I don't understand.
I'm using PETSC_COMM_SELF in order to construct the same matrix on each processor (and solve the system with a different right-hand side vector on each processor), and when each linear system is around 315x315 (block-sparse), then each linear system is solved very quickly on each processor (approx 7x10^{-4} seconds), but when I increase the size of the linear system to 350x350 (or larger), the linear solves completely stall. I've tried a number of different solvers and preconditioners, but nothing seems to help. Also, this code has worked very well on other machines, although the machines I have used it on before have not had this architecture in which each node is an SMP unit. I was wondering if you have observed this kind of issue before? I'm using PETSc 2.3.3, compiled with the Intel 10.1 compiler. Thanks very much, David