Re: Why is the fib benchmark still slow - part 1

Dan Sugalski Fri, 05 Nov 2004 11:25:22 -0800

At 11:39 AM +0100 11/5/04, Leopold Toetsch wrote:

The cache misses are basically in two places

a) accessing a new register frame and context
b) during DOD/GC

b) is relatively easy -- I'd bet that the vast majority of the cache misses are because of the copying collector. That could be cleared up my moving to a non-copying collector. I'm perfectly fine with that.

Having said that, I do *not* want the copying collector to be the default, or even enabled, for any non-production release, the same way that the default build is with no C compiler optimizations. When we cut 1.0.0, or want to do a real benchmark test, it can be turned on, but until then I want it off. The copying collector right now is truly vicious on dangling pointer bugs, and makes parrot die at the drop of a hat when there's a GC/DOD problem. I rather like that.

a), the register frame/context stuff. We've batted around a number of solutions to this, and I think a variant of the "we get change interpreter structure on invoke" will get us what we need.

If, generally speaking, invoking something gets you a new interpreter structure with most of the contents of the current structure, we can tidy some stuff up and, I think, make things a bit cleaner. As a side effect, a return continuation becomes the interpreter structure itself. If we do this we have five basic 'destination' PMCs:

1) Sub PMC

2) Closure PMC

3) Coroutine PMC

4) Continuation PMC

5) Return continuation

On invocation:

Sub PMC) Allocates a new interpreter structure from the interpreter cache. Copies the calling convention registers into the new interpreter. Copies most of the environment data from the old interpreter into the new one. Copies a few bits of info for the new sub (default namespace pointer and such) from the sub PMC into the new interpreter

Closure PMC) Pretty much the same as the sub PMC, except that the closure PMC copies its cached lexical pad pointer into the new interpreter rather than starting fresh

Coroutine PMC) Like the closure PMC, except that it caches the top half of the register sets and copies those in on invocation, and holds two start addresses (the default start and the current start).

Continuation PMC) Allocates a new interpreter and copies the entire cached interpreter into it with one massive memcpy. (Whether the low half of the registers gets copied in is up in the air) If the current interpreter's not had a continuation taken on it, then it's immediately put on the free interpreter list.

Return Continuation PMC) We just make it the interpreter we use, copying the low half registers of the (now old) interpreter in. The current interpreter's immediately put on the free interpreter list.

I think this covers it all. It means that returning from a sub can immediately recycle an interpreter, and needs only a potentially small memcpy to work. Calling a sub does require an allocation off a special queue and some memory copying, but since each sub potentially has a very different environment (lexicals, namespaces, opcode files, bytecode file, security settings, and such) we're going to have to do this regardless of what we want.

We *could*, to cut down on some of this, split out the 'constant across calls' parts, like the counters, memory pools, arenas, and whatnot, into a separate structure that gets accessed indirectly. Depending on how much there was we may or may not see a win, since we're trading off a little extra access time (through a pointer) for a bit less memory copying on sub invocation. It's probably worth it, though. -- Dan

--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Re: Why is the fib benchmark still slow - part 1

Reply via email to