> With the call to trace_system_stack commented out in dod.c, I get 48.5 > generations per second. The full stats are: > 5000 generations in 103.185356 seconds. 48.456488 generations/sec > A total of 36608 bytes were allocated > A total of 42386 DOD runs were made > A total of 6005 collection runs were made > Copying a total of 72819800 bytes > There are 21 active Buffer structs > There are 1024 total Buffer structs > > This compares to the 14th July CVS version: > 5000 generations in 81.172149 seconds. 61.597482 generations/sec > A total of 58389 bytes were allocated > A total of 160793 DOD runs were made > A total of 1752 collection runs were made > Copying a total of 1228416 bytes > There are 81 active Buffer structs > There are 192 total Buffer structs
I guess this means the examples/benchmarks I was using to test were not too representative of real-world programs. Or maybe that's the case for life.pasm. :) Looking at the above results, I think I can see part of the problem. What's really annoying is that the more I play with the benchmarks, the more I realize they are useless. The new parrot has an initial buffer count of 256, which helped performance on my system, when compared to the pre-GC commit. The old version has 64 or so. Just this small tuning difference means: a) more buffers to dod and collect b) less of a need to DOD since we can "live" longer without it c) more memory usage because we can't collect data in old PMCs until they've been DOD'ed Doing minor adjustments like inlining functions, etc (which I did the other day) can give maybe a 1-4% performance across the board, each. However, changing a number like HEADERS_PER_ALLOC can affect performance +/-8%, program-depending. This makes it rather difficult to difficult to optimize the GC, since optimizing for one program *easily* messes up the performance on other programs. Setting *_HEADERS_PER_ALLOC back to the original of 16 improves performance on life.pasm by 5%, although it causes a corresponding hit on the examples/benchmarks. Changing UNITS_PER_ALLOC_GROWTH_FACTOR either way causes a big speed hit. Changing REPLENISH_LEVEL_FACTOR either way causes a big speed hit. Changing the logic on when we DOD relative to collection, in any manner, causes a speed hit. This leads me to believe that we have a GC that's tuned for life.pasm, which makes a lot of sense. Before examples/benchmarks, there was only life, and all GC performance changes were compared on that. In my attempts to tune for examples/benchmarks, I undoubtedly caused life performance to suffer. Parrot doesn't have any real-world programs, which makes it difficult to do any sort of worthwhile tuning. Hopefully, with sean's (and everyone else's) work on the Perl6 grammar, we can start taking these perl programs (like qsort), and running them through and benchmarking them against the parrot VM. Unfortunately, until we have a wide test suite of programs, or start implementing adaptive adjustment of GC parameters, I have a feeling we're just going to travel around in circles. Mike Lambert