So finally the patch went in.
Here is a summary of changes:
Configure System
- tests for posix_memalign and memalign, the former has priority
- if neither is found, ARENA_DOD_FLAGS gets disabled
- platform function prototypes are in platform_interface.h now
PMC layout
- metadata, synchronize and next_for_GC are in a separate structure
PMC_EXT, which is either added on demand (by adding a property to
a PerlScalar) or attached at PMC creation time (for everything but
PerlScalars)
The 2 structures are connected by the ->pmc_ext pointer. PMC size is
8 bytes smaller now. The ->data entry will follow probably.
ARENA_DOD_FLAGS
- Can be turned on/off in include/parrot/pobj.h
- If off: old behavior
- If on we get:
- Aligned equally sized pool arenas
- 4 DOD relevant flags occupy one nibble per PObj and are in
arena->dod_flags
- free_lists are per arena not per pool
- during the hot path of a DOD run, the PObjs memory is not touched
- empty arenas can be freed now, though this doen't seem to increase
performance, or we don't have a test case for it yet,
s. REDUCE_ARENAS in dod.c
DOD
- reintroduced the skip logic per pool: If a DOD run doesn't yield more
then replenish_level free objects, the next DOD run that would be
triggered by a shortage in this pool is skipped.
- tests in free_unused_pobjects are reordered, saving some cycles
PObject life cycle
- All objects are washed now in get_free_{object,PMC} where its much more
likely, that they will get touched therafter anyway - and not in
add_free_object. This also implies that we don't need zeroed memory
for the arena memory any more.
Misc
- I reintroduced arena->name's (as const char *)
- cleaning up the last interpreter is not done anymore, a commandline
switch for leak testing will follow
- some cleanup in interpreter.c
- GC_DEBUG currently turned off, should probably be reenabled after
doing performance teests
wanted:
- memalign tests for other platforms
- if there is no memalign you could try compiling and linking
the malloc.c in the source tree.
Performance
Both schemes are almost equally fast. ARENA_DOD_FLAGS should be faster
with huge amounts of life objects (and non linear access to them) i.e.
in RL programs and with faster processors. It takes more cycles during a
DOD run and in get_free_object but doesn't pollute the caches.
With the changes already in CVS (WRT list.c) stress.pasm runs now more
then 3 times faster then before (0.27s vs 1.0s, -O3, Athlon 800).
PS All tests are still passing ;-)
PPS Performance tests with ARENA_DOD_FLAGS on/off welcome
Have fun
leo