[Simh] SIMH and Compiler Profiling

2015-12-08 Thread Henry Bent
Hi all,

I did some experiments compiling SIMH with compiler profiling and found
some significant improvements.  My testing was with the vax8600 simulator
running Ultrix 2.0, though similar improvements should be seen with any VAX
simulator and any OS.  I used nbench (
http://www.tux.org/~mayer/linux/bmark.html ) to provide a mix of memory and
CPU testing.

For these tests I used gcc 5.2.0 and icc 15.0.3 on a Core 2 Q9500 running
Linux 4.2.5, and the latest version of SIMH from git.  In all tests I
simply booted the VAX to multi-user, ran nbench, and then shut down.  With
gcc, profiling was collected by compiling with -fprofile-generate, running
the benchmark, and then recompiling with -fprofile-use.  icc was similar,
using -prof_gen and -prof_use.  The procedure should be the same for any
reasonably modern version of gcc or icc.  I didn't test clang, but informal
testing in the past showed that it did not produce code that was as fast as
either of the other two compilers.

Results, higher numbers are better:

gcc -O3 -march=native
mem 0.112
integer 0.069
floating 0.128

gcc -O3 -march=native profiled
mem 0.202
integer 0.113
floating 0.199

icc -O3 -xHOST -ipo -no-prec-div -fp-model fast=2
mem 0.147
integer 0.087
floating 0.156

icc -O3 -xHOST -ipo -no-prec-div -fp-model fast=2 profiled
mem 0.194
int 0.109
floating 0.198

and for comparison, a real VAX 4000/90 running NetBSD that I benchmarked
many years ago:
mem 0.136
integer 0.132
floating 0.157

As you can see, the performance improvements are fairly dramatic, with gcc
improving an average of 66%.  Of course, one benchmark is not necessarily
indicative of a real life workload; the most appropriate improvements would
probably be made by profiling using whatever workload your system generally
runs.  This was merely meant to illustrate what sort of gains are possible.

I find it fascinating that a Q9500 can be almost as fast, or faster, than a
real NVAX workstation.  I imagine that the most modern Intel processor
would probably be faster than any real VAX.

I also did some quick testing with the pdp11 simulator and found
significant improvements.  I imagine that profiling gains would probably be
seen with any of the simulators.

-Henry
___
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh

Re: [Simh] SIMH and Compiler Profiling

2015-12-08 Thread Paul Koning

> On Dec 8, 2015, at 10:47 AM, Henry Bent  wrote:
> 
> Hi all,
> 
> ...
> Results, higher numbers are better:...
> As you can see, the performance improvements are fairly dramatic, with gcc 
> improving an average of 66%.  Of course, one benchmark is not necessarily 
> indicative of a real life workload; the most appropriate improvements would 
> probably be made by profiling using whatever workload your system generally 
> runs.  This was merely meant to illustrate what sort of gains are possible.

Interesting.  I would expect profiling to be most helpful for code that has 
some well defined hot spots, and computer emulators seem to be a good fit for 
that pattern.

> I find it fascinating that a Q9500 can be almost as fast, or faster, than a 
> real NVAX workstation.  I imagine that the most modern Intel processor would 
> probably be faster than any real VAX.

I'm amazed that a real VAX would be anywhere near as fast as a modern PC 
running the emulator.  Perhaps I'm mislead by PDP11 emulation, which has for 
ages now been vastly faster than the real machines.  (Ditto CDC 6000 series, 
for that matter.)

The other interesting thing is that GCC is better than ICC.  That's a bit 
surprising since ICC is totally focused on being a good Intel compiler, while 
GCC is a many-platforms portable compiler.  

paul

___
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh