> With the call to trace_system_stack commented out in dod.c, I get 48.5
> generations per second. The full stats are:
> 5000 generations in 103.185356 seconds. 48.456488 generations/sec
> A total of 36608 bytes were allocated
> A total of 42386 DOD runs were made
> A total of 6005 collection runs were made
> Copying a total of 72819800 bytes
> There are 21 active Buffer structs
> There are 1024 total Buffer structs
>
> This compares to the 14th July CVS version:
> 5000 generations in 81.172149 seconds. 61.597482 generations/sec
> A total of 58389 bytes were allocated
> A total of 160793 DOD runs were made
> A total of 1752 collection runs were made
> Copying a total of 1228416 bytes
> There are 81 active Buffer structs
> There are 192 total Buffer structs

I guess this means the examples/benchmarks I was using to test were not
too representative of real-world programs. Or maybe that's the case for
life.pasm. :)

Looking at the above results, I think I can see part of the problem.
What's really annoying is that the more I play with the benchmarks, the
more I realize they are useless. The new parrot has an initial buffer
count of 256, which helped performance on my system, when compared to the
pre-GC commit. The old version has 64 or so. Just this small tuning
difference means:

a) more buffers to dod and collect
b) less of a need to DOD since we can "live" longer without it
c) more memory usage because we can't collect data in old PMCs until
they've been DOD'ed

Doing minor adjustments like inlining functions, etc (which I did the
other day) can give maybe a 1-4% performance across the board, each.
However, changing a number like HEADERS_PER_ALLOC can affect performance
+/-8%, program-depending.

This makes it rather difficult to difficult to optimize the GC, since
optimizing for one program *easily* messes up the performance on other
programs.

Setting *_HEADERS_PER_ALLOC back to the original of 16 improves
performance on life.pasm by 5%, although it causes a corresponding hit on
the examples/benchmarks.

Changing UNITS_PER_ALLOC_GROWTH_FACTOR either way causes a big speed hit.
Changing REPLENISH_LEVEL_FACTOR either way causes a big speed hit.
Changing the logic on when we DOD relative to collection, in any manner,
causes a speed hit.

This leads me to believe that we have a GC that's tuned for life.pasm,
which makes a lot of sense. Before examples/benchmarks, there was only
life, and all GC performance changes were compared on that. In my attempts
to tune for examples/benchmarks, I undoubtedly caused life performance to
suffer. Parrot doesn't have any real-world programs, which makes it
difficult to do any sort of worthwhile tuning.

Hopefully, with sean's (and everyone else's) work on the Perl6 grammar, we
can start taking these perl programs (like qsort), and running them
through and benchmarking them against the parrot VM. Unfortunately, until
we have a wide test suite of programs, or start implementing adaptive
adjustment of GC parameters, I have a feeling we're just going to travel
around in circles.

Mike Lambert

Reply via email to