Re: [gmx-users] Performance in ia64 and x86_64

2011-02-25 Thread Ignacio Fernández Galván
Carsten Kutzner wrote:


> With Fortran kernels I got a performance of 0.31 ns/day for an 80,000 atom 
>system
> (with PME) on an Altix 4700, so your 0.5 ns/day for 1,500 waters seems too 
> slow 
>to me.
> What processor is this? Are you shure you are using the Fortran and not the C 
>kernels?


I don't have physical access to the ia64, and I don't know the details, but 
/proc/cpuinfo says:

model name : Dual-Core Intel(R) Itanium(R) Processor 9140N
cpu MHz: 1598.00


And as for the kernels in the log:

Configuring nonbonded kernels...
Configuring standard C nonbonded kernels...
Configuring single precision Fortran kernels...



Whereas for the x86_64 it's:

model name  : Intel(R) Xeon(R) CPU   E5540  @ 2.53GHz
cpu MHz : 1600.000

Configuring nonbonded kernels...
Configuring standard C nonbonded kernels...
Testing x86_64 SSE2 support... present.




--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Performance in ia64 and x86_64

2011-02-25 Thread Mark Abraham

On 25/02/2011 8:25 PM, Ignacio Fernández Galván wrote:

I wrote:


It seems the x86_64 processor has 4 cores and 8 threads support
(), so the machine has probably two

physical processors. I thought MPI was only needed if there was network
communication involved, as in a cluster, but not in SMP, which is what both
machines are (single memory, single OS), I guess I was wrong. I'll try
compiling
with MPI.

Well, I've compiled mdrun with MPI (with fortran kernels in the ia64), and run
my test system in both machines, with a single processor. The results are still
worrying (to me). This is a 50 time step (0.5 ns) simulation with 1500 water
molecules, not a big system, but it still takes some hours:

x86_64: 3.147 ns/day
ia64: 0.507 ns/day


Is this difference normal? Am I doing anything wrong? what further data should I
provide? The compilation in both machines was quite straightforward (once I had
solved the library and path issues).


You should inspect the first few hundred lines of the .log files and 
observe what kernels GROMACS is using at run time.


Mark
--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Performance in ia64 and x86_64

2011-02-25 Thread Carsten Kutzner
Hello Ignacio,

On Feb 25, 2011, at 10:25 AM, Ignacio Fernández Galván wrote:
> Well, I've compiled mdrun with MPI (with fortran kernels in the ia64), and 
> run 
> my test system in both machines, with a single processor. The results are 
> still 
> worrying (to me). This is a 50 time step (0.5 ns) simulation with 1500 
> water 
> molecules, not a big system, but it still takes some hours:
> 
> x86_64: 3.147 ns/day
> ia64: 0.507 ns/day
> 
> 
> Is this difference normal? Am I doing anything wrong? what further data 
> should I 


Some time ago I compared Itanium and x86 performances, see the fifth slide of 
this PDF: 

http://www.mpibpc.mpg.de/home/grubmueller/ihp/ckutzne/Talks/PDFs/kutzner07talk-optimizing.pdf

With Fortran kernels I got a performance of 0.31 ns/day for an 80,000 atom 
system
(with PME) on an Altix 4700, so your 0.5 ns/day for 1,500 waters seems too slow 
to me. 
What processor is this? Are you shure you are using the Fortran and not the C 
kernels?

Carsten--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Performance in ia64 and x86_64

2011-02-25 Thread Ignacio Fernández Galván
I wrote:

> It seems the x86_64 processor has 4 cores and 8 threads support 
> (), so the machine has probably 
> two 
>
> physical processors. I thought MPI was only needed if there was network 
> communication involved, as in a cluster, but not in SMP, which is what both 
> machines are (single memory, single OS), I guess I was wrong. I'll try 
>compiling 
> with MPI.

Well, I've compiled mdrun with MPI (with fortran kernels in the ia64), and run 
my test system in both machines, with a single processor. The results are still 
worrying (to me). This is a 50 time step (0.5 ns) simulation with 1500 
water 
molecules, not a big system, but it still takes some hours:

x86_64: 3.147 ns/day
ia64: 0.507 ns/day


Is this difference normal? Am I doing anything wrong? what further data should 
I 
provide? The compilation in both machines was quite straightforward (once I had 
solved the library and path issues).

Thanks
Ignacio



--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Performance in ia64 and x86_64

2011-02-11 Thread Mark Abraham

On 12/02/2011 12:44 AM, Ignacio Fernández Galván wrote:

 Original Message 
From: Mark Abraham


We can't say with the information given. For best performance, the number of
threads cannot exceed the number of physical cores available to one processor.
To go higher, you need to compile and use GROMACS with MPI, not threading. If
the IA64 is "dual core" then you are not measuring anything useful. You also
need
to be sure you're measuring for a decent length of time - a few minutes at
least.


It seems the x86_64 processor has 4 cores and 8 threads support
(), so the machine has probably two
physical processors.


That's delivered via hyper-threading (see bottom of that page). GROMACS 
is unlikely to get any significant value out of that, because the 
number-crunching loops of the code dominate the execution time, and by 
design GROMACS doesn't do many cache misses, branch mispredictions, etc. 
in those loops, so the second thread doesn't have much dead time to use. 
HT is good for desktop workstations that can expect to do lots of that 
kind of thing. Secondarily, it probably doubles the pressure on the 
cache (but MD is normally fairly cache-friendly).



  I thought MPI was only needed if there was network
communication involved, as in a cluster, but not in SMP, which is what both
machines are (single memory, single OS), I guess I was wrong. I'll try compiling
with MPI.


As I understand it, useful threading has got to do with how many cores 
are on the same piece of processor silicon, not whether the memory is 
shared between processors. Xeon E5540 has 4 physical cores per 
processor, so that's as far as GROMACS will usefully thread. The good 
news is that if you really have shared memory, that does make 
MPI-GROMACS almost indistinguishably fast from a threaded version 
running on the theoretically equivalent hardware. I have a vaguely 
similar machine, but with dual quad-core Xeon X5570 processors. MPI and 
threading work indistinguishably out to 8 processes, then threading stops.


Mark
--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Performance in ia64 and x86_64

2011-02-11 Thread Carsten Kutzner
Hi Ignacio,

On Feb 11, 2011, at 1:33 PM, Ignacio Fernández Galván wrote:

> Hi all,
> 
> 
> I'm compiling and testing gromacs 4.5.3 in different machines, and I'm 
> wondering 
> if it's normal that the ia64 is much slower than the x86_64
> 
> I don't know full details of the machines, because I'm not the administrator 
> or 
> owner, but /proc/cpuinfo says:
> 
> ia64 (128 cores): Dual-Core Intel(R) Itanium(R) Processor 9140N
> 
> x86_64 (16 cores): Intel(R) Xeon(R) CPU   E5540  @ 2.53GHz
> 
> Just looking at the GHz, one is 2.5 and the other is 1.4, so I'd expect some 
> difference, but not a tenfold one: with 8 threads (mdrun -nt 8) I get 0.727 
> hours/ns on the x86_64, but 7.607 hours/ns on the ia64. (With 4 threads, it's 
> 1.3 and 13.7).
> 
> I compiled both cases with gcc, although different versions, and default 
> options. I had read assembly or fortran kernels could help with ia64, but 
> fortran is apparently incompatible with threads, and when I tried with 
> assembly 
> the mdrun seemed stuck (no timestep output was written). Is this normal? Is 
Yes, there is a problem with the ia64 assembly loops and this is exactly
how it manifests. I did run into that problem several times. What you can
do is to use the fortran kernels and compile with MPI. The performance
of the threaded and MPI versions should be the same, and the fortran 
kernels are nearly as fast as the ia64 assembly. Probably you can speed
things up a few percent by using the Intel compiler.

Cheers,
  Carsten



> there something else I'm missing?
> 
> Also, in the x86_64 system I get much lower performance with 12 or 16 
> threads, I 
> guess that could be because of the cores/processors, but I don't know what's 
> the 
> exact configuration of the machine. Again: is this normal?
> 
> Thanks,
> Ignacio
> 
> 
> 
> --
> gmx-users mailing listgmx-users@gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-requ...@gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/home/grubmueller/ihp/ckutzne




--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Performance in ia64 and x86_64

2011-02-11 Thread Ignacio Fernández Galván
 Original Message 
From: Mark Abraham 

> We can't say with the information given. For best performance, the number of
> threads cannot exceed the number of physical cores available to one processor.
> To go higher, you need to compile and use GROMACS with MPI, not threading. If
> the IA64 is "dual core" then you are not measuring anything useful. You also 
>need
> to be sure you're measuring for a decent length of time - a few minutes at 
>least.


It seems the x86_64 processor has 4 cores and 8 threads support 
(), so the machine has probably two 
physical processors. I thought MPI was only needed if there was network 
communication involved, as in a cluster, but not in SMP, which is what both 
machines are (single memory, single OS), I guess I was wrong. I'll try 
compiling 
with MPI.

Thanks,
Ignacio



--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Performance in ia64 and x86_64

2011-02-11 Thread Mark Abraham

On 11/02/2011 11:33 PM, Ignacio Fernández Galván wrote:

Hi all,


I'm compiling and testing gromacs 4.5.3 in different machines, and I'm wondering
if it's normal that the ia64 is much slower than the x86_64

I don't know full details of the machines, because I'm not the administrator or
owner, but /proc/cpuinfo says:

ia64 (128 cores): Dual-Core Intel(R) Itanium(R) Processor 9140N

x86_64 (16 cores): Intel(R) Xeon(R) CPU   E5540  @ 2.53GHz

Just looking at the GHz, one is 2.5 and the other is 1.4, so I'd expect some
difference, but not a tenfold one: with 8 threads (mdrun -nt 8) I get 0.727
hours/ns on the x86_64, but 7.607 hours/ns on the ia64. (With 4 threads, it's
1.3 and 13.7).

I compiled both cases with gcc, although different versions, and default
options. I had read assembly or fortran kernels could help with ia64, but
fortran is apparently incompatible with threads, and when I tried with assembly
the mdrun seemed stuck (no timestep output was written). Is this normal? Is
there something else I'm missing?


GROMACS assembly kernels for IA64 have been known to have problems (see 
mailing list archives), but IIRC usually in compilation, not execution. 
You will need to inspect the first few hundred lines of the .log files 
where GROMACS reports what kernels are being used for the execution.



Also, in the x86_64 system I get much lower performance with 12 or 16 threads, I
guess that could be because of the cores/processors, but I don't know what's the
exact configuration of the machine. Again: is this normal?


We can't say with the information given. For best performance, the 
number of threads cannot exceed the number of physical cores available 
to one processor. To go higher, you need to compile and use GROMACS with 
MPI, not threading. If the IA64 is "dual core" then you are not 
measuring anything useful. You also need to be sure you're measuring for 
a decent length of time - a few minutes at least.


Mark
--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] Performance in ia64 and x86_64

2011-02-11 Thread Ignacio Fernández Galván
Hi all,


I'm compiling and testing gromacs 4.5.3 in different machines, and I'm 
wondering 
if it's normal that the ia64 is much slower than the x86_64

I don't know full details of the machines, because I'm not the administrator or 
owner, but /proc/cpuinfo says:

ia64 (128 cores): Dual-Core Intel(R) Itanium(R) Processor 9140N

x86_64 (16 cores): Intel(R) Xeon(R) CPU   E5540  @ 2.53GHz

Just looking at the GHz, one is 2.5 and the other is 1.4, so I'd expect some 
difference, but not a tenfold one: with 8 threads (mdrun -nt 8) I get 0.727 
hours/ns on the x86_64, but 7.607 hours/ns on the ia64. (With 4 threads, it's 
1.3 and 13.7).

I compiled both cases with gcc, although different versions, and default 
options. I had read assembly or fortran kernels could help with ia64, but 
fortran is apparently incompatible with threads, and when I tried with assembly 
the mdrun seemed stuck (no timestep output was written). Is this normal? Is 
there something else I'm missing?

Also, in the x86_64 system I get much lower performance with 12 or 16 threads, 
I 
guess that could be because of the cores/processors, but I don't know what's 
the 
exact configuration of the machine. Again: is this normal?

Thanks,
Ignacio



--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists