I have been skeptical of computer benchmarks for a long
time (40 or more years). If one has time to waste, they
can be amusing - but as an evaluation tool, they are, at
best, rough directional guides. I agree that the ones
proscribing alternative approaches to solving a problem
are biased against j and other "think out of the box"
languages or even creative approaches to solution. But
I have a different question to ask the forum -
NB. This message turned into a, perhaps too, long trip down
memory lane, but there really is a question at the end
about which I would like to hear some discussion - PLEASE
don't quote this whole/long message in replies!
Starting all those 40 years ago, I was impressed at APL's
ability to invert a matrix. The mainframe APLSV system
I had at my disposal (running on a System/360 Model 50) could
invert a 20 20 matrix of random numbers in just a few seconds.
The other APL machine I had access to was a model 75 and it
was quite a bit faster. Soon I decided to increase the
size of my test matrix to 50x50 and the time to invert
that on a S/370-145 was around 11 seconds.
Over the years, I had several complaints about using such
a floating point intensive check to look at system/cpu
speed. But still, people tried it on various machines
and reported the interesting results. In June of 1985,
Roger Moore reported that the 50x50 inverse took 12.5
seconds running on an AT/370 card (this was an ISA card
in a PC/AT which emulated a S/370 and was running Sharp
APL) - Nice that a PC could perform as well as a mainframe
model 145. Machines got better, and later in 1985 I noted
that the Amdahl/V8 we were using took only 0.2 seconds
to invert the 50 50 matrix.
In November of 1985, at COMDEX, I got a chance to try
my test in a Unix APL from STSC running on a Motorola
68010 clocked at 8 MHz (my how speeds have changed!) and
it took about 100 seconds to invert the 50x50... In 1987
I had a chance to use a brand new Hitachi mainframe with
Sharp APL and it was able to invert a 100x100 in under 1
second, so at that time, mainframes still ruled - but the
exponential nature of Moore's Law gave them a relatively
short time to keep that dominance...
Then came j -- here is an email from Eugene McDonnell (in
his usual amazingly thorough style) to Roger reporting
on results from several different machines (this message
was an expansion on one he had sent 4 days earlier):
no. 4353297 filed 17.34.00 mon 16 jul 1990
from eem
to hui
cc dhs jkt kei
subj JKT benchmark on J on various machines
The results of executing:
6X.2'%.?50 50$1000'
using J version 1 on various machines are as follows:
--------- Machine -------------------------------------- ----Result---
IBM PC, 8088, 4.77 MHz, MS/DOS, no math chip 2801.21 v
IBM XT, 8088, 4.77 MHz, MS/DOS 1680 v
Apple Macintosh Plus, 68000, 7.8336 MHz, MacOS, no math chip 1207.08 *v
Atari ST, 68000, Mac Simulator, no math chip 935.1
Packard-Bell PC/AT, MS/DOS, no math chip 525.495 *
QSP Super Micro 286AT, 8 MHz MS/DOS, no math chip 521.099 *v
AT&T 3B1, 68010, 10MHz, UNIX, no math chip 442.332 *v
IBM PS/2 55, 386 SX, 20 MHz, MS/DOS, no math chip 341.044
IBM PS/2 70, 386, 25 MHz, MS/DOS 230.879 v
Apple Macintosh IIx, 68030, 16.67 MHz, MacOS 227.25 v
Apple Macintosh SE/30, 68030, 16.67 MHz, MacOS 201.55 v
Sun 3/60, 68020, 16.67 MHz, UNIX 162.027 *v
Philips P9070, 68020, 16.67 MHz, UNIX 158
Apple Macintosh IIci, 68030, 25 MHz, MacOS 152.35
Sun 386i/250, 25 MHz, DOS window PC/AT emulator 111.758 *
Apple Macintosh IIfx, 68030, 40 MHz, MacOS 76.83
Sun 386i/250, 25 MHz, UNIX 73.947 *
Sun Sparcstation 1+ (Sun 4/65), UNIX 28.0322 *
IBM RS/6000/320, 20 MHz, AIX 20.36
Mips R3240, Mips R3000, 25 MHz, UNIX 12.05 v
Entries followed by a * are new or have added information.
Timing accuracy in entries followed by a v has been verified by stopwatch.
A few days later, Roger responded with:
no. 4381718 filed 15.50.43 thu 26 jul 1990
from hui
to eem
cc dhs jkt kei
subj Further JKT benchmarks
6X.2 '%.?50 50$1000' on a MIPS machine at Waterloo gave 12 point
something, approximately equal to the 12.05 you reported. On a NeXT
machine, the figure was 101.141. On a VAX, J is not yet stable enough
to execute the benchmark.
Contrary to what I said before, 6X.2 gives CPU time and not elapsed
time.
Through all of this, there were valid criticisms about my
favorite benchmark - e.g.
no. 2772762 filed 22.47.36 tue 10 may 1988
from rdm
to jkt
cc rbe
subj performance
yes but what about the single most important apl expression ever coded:
<Qdivide>50 50<rho>1e9
no. 3492475 filed 20.09.40 mon 12 jun 1989
from rbe
to akr
cc jkt
subj fyi
Note that jkt's favorite benchmark is strongly floating-point biased,
and hence doesn't reflect most sapl site instruction mixes... Bob
In talking about this with Roger Hui in the early days of j, I suggested
that maybe using my favorite benchmark was inappropriate to look at j.
He countered by saying that because of the way he had implemented
matrix inverse, using j primitives to code it rather than hand tuned
assembler
code as used in mainframe systems, he reckoned it was a pretty fair test.
So I have continued to try it to evaluate systems over the 17 years since
then. I don't know if it is still true that %. is a "fair benchmark",
Roger would have to comment on that....
Over the years (and I've spared you the messages from my 30 years
of saved emails...) I was often confused by things I found using
my little test. Now, once again I'm confused and I asked Roger
if he could clear my confusion and he said he had no idea why I
might have observed my puzzling results.
So, finally, here is my current conundrum.
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
I just got my hands on a new 24 inch iMac (what an incredibly
lovely and powerful computer!!). Of course, the first thing I
did was install j602c on it (the new installation process is
light years of improvement over past Mac installations, thank
you again Eric for all the work!) The next thing I did was run
my favorite benchmark (increasing the size of the matrix a bit).
Here is the the question I posed to Roger and now ask the
collective thoughts of the forum -
I started out timing %. 100 100 [EMAIL PROTECTED] 1000 but that is such
a short time that I changed to 500 500.... The two machines
giving rise to my confusion are a Linux box with dual core
Intel cpu - and an an OS 10.4 Mac with dual core Intel cpu.
On the Linux box, an excerpt from cat /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping : 3
cpu MHz : 2992.720
cache size : 2048 KB
memory size on Linux (Redhat Fedora Core6) is 1.5 Gbytes.
On the Mac -
Processor Name: Intel Core 2 Duo
Processor Speed: 2.4 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache (per processor): 4 MB
Memory: 1 GB
Bus Speed: 800 MHz
Both machines running j602c
ts =: 6!:2 , 7!:[EMAIL PROTECTED]
On Linux box -
10 ts '%. 500 500 [EMAIL PROTECTED] 1000'
1.07966 1.57304e7
On Mac -
10 ts '%. 500 500 [EMAIL PROTECTED] 1000'
0.492236 1.57304e7
Which surprises me since one would guess the 3.00 GHz machine
would be faster than the 2.4 GHz machine - instead it is half
the speed... I thought it might be the cache size, but a smaller
matrix produces:
10 ts '%. 100 100 [EMAIL PROTECTED] 1000'
0.0135964 984768
and
10 ts '%. 100 100 [EMAIL PROTECTED] 1000'
0.0088636 984768
respectively, even though the cache would seem not to be an issue.
I had noticed this 2:1 ratio, and so when I was looking at spirals
yesterday, I found not such a difference, that is -
JKT =: ,~ $ /:@(+/\)@Increments
Increments =: _1&|. @ (# Cycles) @ Repeats
Repeats =: ,~ 2: # i.
Cycles =: # $ (,-)@(,1:)@{:
JKT 5
12 11 10 9 24
13 2 1 8 23
14 3 0 7 22
15 4 5 6 21
16 17 18 19 20
On the 3 GHz Linux box -
ts 'JKT 1001'
0.080173 1.6778e7
On 2.4 GHz Mac -
ts 'JKT 1001'
0.0818748 1.6778e7
So this seems relatively close, I suppose it could be that
the speed difference is in processing floats, but that
surprises me too. Have you experienced similar things?
Any ideas what is going on?
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
After posing the above, I watched CPU monitors to verify
that j only uses one core at a time, so it isn't that OS X
somehow automatically uses both. But still, I can't imagine
why a 2.4 GHz machine is twice as fast as a 3 GHz one...
To further my confusion, when I installed OS 10.5 (Leopard)
the timings became even a little more favorable - this was
a surprise as well. The change wasn't large, but 10.5 is
consistently a few percent faster than 10.4...
Comments/thoughts?
- joey
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm