I took a look at what was going on, and found that the GC probably needs a good tuning. For the 20K file, parrot is doing 217 collections of the string pool, the last 102 of which reclaim less than 10% of the pool. Changing compact_string_pool() to increase the pool size by a factor of (0.5 - pct_freed_last_time) if it reclaimed less than 50% of memory reduced this to 21 collections, much fewer of which reclaimed abysmally small amounts of memory.
This is a total quick hack -- I'm fairly sure I'm doing it in the wrong place, and it's no replacement for real generational collection. However, it does improve performance by something like a factor of 4. Could someone point me to somewhere I could find out more about the performance-relevant aspects of the GC design? Are there plans to make it generational in the future? The PDD seems to deal mostly with safety issues. Thanks, /s