Re: [performance] DaCapo benchmarks

Robin Garner Tue, 21 Nov 2006 19:19:02 -0800

I've also added performance results to the DaCapo regression tests. Ifyou page down on the front page


  http://cs.anu.edu.au/people/Robin.Garner/dacapo/regression/

You'll see a bunch of graphs, comparing the performance of JikesRVM withDRLVM. The numbers are all relative to the best figure from any of thecommercial VMs I had available.


cheers

Stefano Mazzocchi wrote:

Sergey Kuksenko wrote:

Stefano,
Trying to get the potential of Harmony I've quickly checked SciMak on tuned
Harmony release build and compared it with BEA & SUN.


Sergey,

many thanks for doing this.

Hardware: P4 Xeon 3GHz
Windows XP SP2 (It's another platform, but I hope the key things are still
the same).

BEA -
java version "1.5.0_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
BEA JRockit(R) (build R26.3.0-32-58710-1.5.0_06-20060308-2022-win-ia32, )

SUN -
java version "1.5.0_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode)


Harmony -
Apache Harmony Launcher : (c) Copyright 1991, 2006 The Apache Software
Foundation or its licensors, as applicable.
java version "1.5.0"
pre-alpha : not complete or compatible
svn = 475925, (Nov 17 2006), Windows/ia32/msvc 1310, release build

I've got the following results

BEA (out of the box):

Composite Score: 435.9674695335291
FFT (1024): 295.33366058958575
SOR (100x100):   474.15229982839213
Monte Carlo : 111.56918839504195
Sparse matmult (N=1000, nz=5000): 551.8821052631578
LU (100x100): 746.9000935914679
----

Sun (out of the box):

Composite Score: 229.70779446543412
FFT (1024): 104.92303791891565
SOR (100x100):   400.44785722405015
Monte Carlo : 13.257380552894444
Sparse matmult (N=1000, nz=5000): 160.07814989061512
LU (100x100): 469.8325467406951

---

Harmony (out of the box):

Composite Score: 109.43208528481887
FFT (1024): 51.30119529411764
SOR (100x100):   257.9591618631154
Monte Carlo : 17.04568642272773
Sparse matmult (N=1000, nz=5000): 129.4666069618598
LU (100x100): 91.38777588227376
----

Harmony (tuned options, server path):

Composite Score: 181.54555681031619
FFT (1024): 91.22597999162443
SOR (100x100):   329.8450882375011
Monte Carlo : 42.51432538579417
Sparse matmult (N=1000, nz=5000): 260.58050602943024
LU (100x100): 183.56188440723088


that's pretty good.

------

When I looked into Harmony OOB I've found that all hot methods of SciMark
are compiled by JET (not recompiled by the optimizing JIT compiler). The
way
our DRLVM currrently recognises hot path is not sutable for SciMark becasue

of short run.


Hmmm, what do you mean by "short run"? the entire app runs for a short
amount of time total or each hot method runs for a short amount of time
not enough to have it recognized as "hot"?

We need to tune DRLVM options to get better results.
Tuned options give good SciMark score improvement (109->181).


Well, to be fair, all the other JVM could probably do the same.

Which moves Harmony performance close to what Sun OOB shows.


excuse my ignorance, but what's OOB? (google define says "out of
business" or "order of battle"... not sure they apply here ;-)

Our client (default) compilation path was tuned a long time ago and it
probably makes sense to have another round. What we initially did was
running some script executing the given set of workloads trying to find the
best configuration for our VM. Having said that I suggest we choose the
right set of applications/benchmarks, so we can start our tuning once
again.


Maybe it's the analog microelectonic guy in me talking, but every time I
hear something like "let's get reasonable defaults", I think of
introducing a variation and a feedback to reach a local minimum and
stabilize the system.

I know very little about how DRLVM works, but would it be feasible to
start with such "reasonable defaults" and introduce a random variability
to the way the JIT works alongside a very simple method profiler and see
if the performance increase? think of you trying out different things
and see if they work better... but done by the JVM as it runs.

Keep in mind I'm a total newbie in virtual machine design (or CPU
architectures for that matter, despite my degree in microelectronics..
well, to be fair, I was doing analog not digital circuits) so bear with
me if I'm saying stupid things :-)

Currently we have in mind the following list:
- HWA (Hello World Application)
- SciMark
- Dacapo (reasonable set of benchmarks, like fop, hsqldb, chart and xalan)
- Anything else?

What do you think about this? Any additions to the list? Comments?
Questions?


The problem I have with this is that I feel that each one of such
scenario might require different tuning parameters... and if that is the
case, you end up with the 'short blanket' problem: you improve here and
you decrease there.

An 'adaptive' scenario, on the other hand, would allow us to:

 1) avoid trying to find the optimal defaults (since we can't possible
test every scenario that will be useful in a way that is consistent with
real world usage)

 2) avoid the blanket problem, each VM can adapt to the scenario of use

 3) avoid the 'stiffness' problem, each VM can adapt to machine resource
changes and 'retune' itself if the environment changes.

Of course, there is a price to pay in such 'fedback variability' systems
since they have to find the minima over and over again.

So, another solution is to have a JVM "tuning parameters discovery mode"
that you can run and you turn such "parameter finding" autoprofiling
on... and the JVM dumps the tuning results for you on disk which you can
later use to initialize the JVM on your own.

Not sure how feasible or complicated to write this is, but wow does this
sound on paper?

Thanks,
---
Sergey Kuksenko
Intel Enterprise Solutions Software Division


On 11/17/06, Stefano Mazzocchi <[EMAIL PROTECTED]> wrote:

Alexey Varlamov wrote:

Stefano,

It is a bit unfair to compare *debug* build of Harmony with other
release versions :)

I'm simulating what a journalist with a developer could do.

If there is a way to make it compile in 'release mode' (if such a thing
exists), I'll be very glad to redo the benchmarks.

I suppose all VMs where run in default mode (i.e. no special cmd-line
switches)?

Right. No switches. I'm simulating what users do when they get the JVM:
they run "java"... and if it's now fast enough they buy a new box.

Having command line tuning parameters is mostly useless since most
people don't know the internals of a JVM well enough to guess what
parameters to tune anyway.

So, what people will do once they get an harmony snapshot is "java
my.class.Name <http://my.class.name/>" and see the results.

I want to simulate that and compare it to the same exact experience they
will get with other virtual machines for a variety of common scenarios
(number crunching, xml processing, http serving, database load, etc...)

I will focus on the server because that's there the apache action (and
my personal interest) is.

So, like I said, if there are 'compile time' switches that I can use to
turn 'release mode' on, please tell me and I'll re-do the tests.

2006/11/17, Stefano Mazzocchi <[EMAIL PROTECTED]>:

There are lies, damn lies and benchmarks.... which don't really tell

you

if an implementation of a program is *faster* but at least it tells

you

where you're at.

So, as Geir managed to get the DSO linking problem go away in DRLVM, I
was able to start running some benchmarks.

The machine is the following:

Linux harmony-em64t 2.6.15-27-amd64-generic #1 SMP PREEMPT Sat Sep 16
01:50:50 UTC 2006 x86_64 GNU/Linux

dual Intel(R) Pentium(R) D CPU 3.20GHz
bogomips 6410.31 (per CPU)

There is nothing else running on the machine (load is 0.04 at the time
of testing).

The various virtual machines tested are:

harmony
-------
Apache Harmony Launcher : (c) Copyright 1991, 2006 The Apache Software
Foundation or its licensors, as applicable.
java version " 1.5.0"
pre-alpha : not complete or compatible
svn = r476006, (Nov 16 2006), Linux/em64t/gcc 4.0.3, debug build

sun5
---
java version "1.5.0_09 "
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_09-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_09-b03, mixed mode)

sun6
----
java version " 1.6.0-rc"
Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-rc-b104, mixed mode)

ibm
---
java version " 1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build
pxa64dev-20061002a (SR3) )
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux amd64-64
j9vmxa6423-20061001 (JIT enabled)
J9VM - 20060915_08260_LHdSMr
JIT  - 20060908_1811_r8
GC   - 20060906_AA)
JCL  - 20061002

bea
---
java version "1.5.0_06 "
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
BEA JRockit(R) (build
R26.4.0-63-63688-1.5.0_06-20060626-2259-linux-x86_64, )

--------------------------------------------------------------------------


Test #1: java scimark2 (http://math.nist.gov/scimark2/)

command: java jnt.scimark2.commandline

NOTE: bigger number is better

Sun6
Composite Score: 364.5832265230057
FFT (1024): 220.8458713892794
SOR (100x100):   696.1542342357722
Monte Carlo : 149.37978088875656
Sparse matmult (N=1000, nz=5000): 326.37451873283845
LU (100x100): 430.1617273683819

BEA
Composite Score: 359.13480378697835
FFT (1024): 303.8746880751562
SOR (100x100):   454.25628897202307
Monte Carlo : 93.23913192138497
Sparse matmult (N=1000, nz=5000): 530.44112637391
LU (100x100): 413.8627835924175

Sun5
Composite Score: 332.84987587548574
FFT (1024): 216.5144595799027
SOR (100x100):   689.429322146947
Monte Carlo : 25.791262124978065
Sparse matmult (N=1000, nz=5000): 317.5193965699373
LU (100x100): 414.99493895566377

IBM
Composite Score: 259.8249218693683
FFT (1024): 296.8415012789055
SOR (100x100):   428.974881649179
Monte Carlo : 89.15159857584082
Sparse matmult (N=1000, nz=5000): 144.3524241203982
LU (100x100): 339.8042037225181

Harmony
Composite Score: 113.65082278962575
FFT (1024): 203.76641991778123
SOR (100x100):   224.37761309236748
Monte Carlo : 9.063866256533116
Sparse matmult (N=1000, nz=5000): 65.4051866327227
LU (100x100): 65.6410280487242

In this test harmony is clearly lagging behind... at about 30%
performance of the best JVM, it's a little crappy. Please note how

FFT's

performance is not so bad awhile monte carlo is pretty bad compared to
BEA or IBM.

Overall, it seems like there is some serious work to do here to catch

up.

--------------------------------------------------------------------------


Test 2: Dhrystones

(http://www.c-creators.co.jp/okayan/DhrystoneApplet/
)

command: java dhry 100000000

NOTE: bigger is better

NB: I modified the code to accept the count at input from the command
line!

sun6:     8552856 dhrystones/sec
sun5:     6605892
bea:      5678914
harmony:   669734
ibm:       501562

The performance here is horrific but what's surprising is that J9 is
even worse. No idea what's going on but it seems like something is not
working as it should (in both harmony and J9)

--------------------------------------------------------------------------


Test 3: Sieve (part of http://www.sax.de/~adlibit/tya18.tgz)

command: java Sieve 30

NB: I modified the test to run for a configurable amount of seconds.

sun6     8545 sieves/sec
sun5     8364
bea      6174
harmony  1836
ibm       225

IBM J9 clearly has something wrong on x86_64 but harmony is clearly
lagging behind.

Stay tuned for more tests.

--
Stefano.


--
Stefano.



--
Robin Garner
Dept. of Computer Science
Australian National University
http://cs.anu.edu.au/people/Robin.Garner/

Re: [performance] DaCapo benchmarks

Reply via email to