So finally the patch went in.
Here is a summary of changes:

Configure System
- tests for posix_memalign and memalign, the former has priority
- if neither is found, ARENA_DOD_FLAGS gets disabled
- platform function prototypes are in platform_interface.h now

PMC layout
- metadata, synchronize and next_for_GC are in a separate structure
  PMC_EXT, which is either added on demand (by adding a property to
  a PerlScalar) or attached at PMC creation time (for everything but
  PerlScalars)
  The 2 structures are connected by the ->pmc_ext pointer. PMC size is
  8 bytes smaller now. The ->data entry will follow probably.

ARENA_DOD_FLAGS
- Can be turned on/off in include/parrot/pobj.h
- If off: old behavior
- If on we get:
  - Aligned equally sized pool arenas
  - 4 DOD relevant flags occupy one nibble per PObj and are in
    arena->dod_flags
  - free_lists are per arena not per pool
  - during the hot path of a DOD run, the PObjs memory is not touched
  - empty arenas can be freed now, though this doen't seem to increase
    performance, or we don't have a test case for it yet,
    s. REDUCE_ARENAS in dod.c

DOD
- reintroduced the skip logic per pool: If a DOD run doesn't yield more
  then replenish_level free objects, the next DOD run that would be
  triggered by a shortage in this pool is skipped.
- tests in free_unused_pobjects are reordered, saving some cycles

PObject life cycle
- All objects are washed now in get_free_{object,PMC} where its much more
  likely, that they will get touched therafter anyway - and not in
  add_free_object. This also implies that we don't need zeroed memory
  for the arena memory any more.

Misc
- I reintroduced arena->name's (as const char *)
- cleaning up the last interpreter is not done anymore, a commandline
  switch for leak testing will follow
- some cleanup in interpreter.c
- GC_DEBUG currently turned off, should probably be reenabled after
  doing performance teests

wanted:
- memalign tests for other platforms
- if there is no memalign you could try compiling and linking
  the malloc.c in the source tree.

Performance
Both schemes are almost equally fast. ARENA_DOD_FLAGS should be faster
with huge amounts of life objects (and non linear access to them) i.e.
in RL programs and with faster processors. It takes more cycles during a
DOD run and in get_free_object but doesn't pollute the caches.

With the changes already in CVS (WRT list.c) stress.pasm runs now more
then 3 times faster then before (0.27s vs 1.0s, -O3, Athlon 800).

PS All tests are still passing ;-)
PPS Performance tests with ARENA_DOD_FLAGS on/off welcome


Have fun leo



Reply via email to