There are 2 aspects to performance. - MPI performance [while message passing] - sequential performance for the numerical stuff.
So it could be that the SMP box has better MPI performance. This can be verified with -log_summary from both the runs [and looking at VecScatter times] However with the sequential numerical codes - it primarily depends upon the bandwidth between the CPU and the memory. On the SMP box - depending upon how the memory subsystem is designed - the effective memory bandwidth per cpu could be a small fraction of the peak memory bandwidth [when all cpus are used] So you'll have to look at the memory subsystem design of each of these machines and compare the 'memory bandwidth per cpu]. The performance from log_summary - for ex: in MatMult will reflect this. [ including the above communication overhead] Satish On Fri, 2 Feb 2007, Shi Jin wrote: > Hi there, > > I am fairly new to PETSc but have 5 years of MPI > programming already. I recently took on a project of > analyzing a finite element code written in C with > PETSc. > I found out that on a shared-memory machine (60GB RAM, > 16 CPUS), the code runs around 4 times slower than > on a distributed memory cluster (4GB Ram, 4CPU/node), > although they yield identical results. > There are 1.6Million finite elements in the problem so > it is a fairly large calculation. The total memory > used is 3GBx16=48GB. > > Both the two systems run Linux as OS and the same code > is compiled against the same version of MPICH-2 and > PETSc. > > The shared-memory machine is actually a little faster > than the cluster machines in terms of single process > runs. > > I am surprised at this result since we usually tend to > think that shared-memory would be much faster since > the in-memory operation is much faster that the > network communication. > > However, I read the PETSc FAQ and found that "the > speed of sparse matrix computations is almost totally > determined by the speed of the memory, not the speed > of the CPU". > This makes me wonder whether the poor performance of > my code on a shared-memory machine is due to the > competition of different process on the same memory > bus. Since the code is still MPI based, a lot of data > are moving around inside the memory. Is this a > reasonable explanation of what I observed? > > Thank you very much. > > Shi > > > > ____________________________________________________________________________________ > Do you Yahoo!? > Everyone is raving about the all-new Yahoo! Mail beta. > http://new.mail.yahoo.com > >
