"Phillip J. Eby" <[EMAIL PROTECTED]> writes:
> At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote:
>>Where are the attempts to speed up function/method calls? That's an
>>area where we could *really* use a breakthrough...
>
> Amen!
>
> So what happened to Armin's pre-allocated frame patch? Did that get into 2.4?
No, because it slows down recursive function calls, or functions that happen to be called at the same time in different threads. Fixing *that* would require things like code specific frame free-lists and that's getting a bit convoluted and might waste quite a lot of memory.
Ah. I thought it was just going to fall back to the normal case if the pre-allocated frame wasn't available (i.e., didn't have a refcount of 1).
Eliminating the blockstack would be nice (esp. if it's enough to get frames small enough that they get allocated by PyMalloc) but this seemed to be tricky too (or at least Armin, Samuele and I spent a cuple of hours yakking about it on IRC and didn't come up with a clear approach). Dynamically allocating the blockstack would be simpler, and might acheive a similar win. (This is all from memory, I haven't thought about specifics in a while).
I'm not very familiar with the operation of the block stack, but why does it need to be a stack? For exception handling purposes, wouldn't it suffice to know the offset of the current handler, and have an opcode to set the current handler location? And for "for" loops, couldn't an anonymous local be used to hold the loop iterator instead of using a stack variable?
Hm, actually I think I see the answer; in the case of module-level code there can be no "anonymous local variables" the way there can in functions. Hmm. I guess you'd need to also have a "reset stack to level X" opcode, then, and both it and the set-handler opcode would have to be placed at every destination of a jump that crosses block boundaries. It's not clear how big a win that is, due to the added opcodes even on non-error paths.
Hey, wait a minute... all the block stack data is static, isn't it? I mean, the contents of the block stack at any point in a code string could be determined statically, by examination of the bytecode, couldn't it? If that's the case, then perhaps we could design a pre-computed data structure similar to co_lnotab that would be used by the evaluator in place of the blockstack.
Of course, I may be talking through my hat here, as I have very little experience with how the blockstack works. However, if this idea makes sense, then perhaps it could actually speed up non-error paths as well (except perhaps for the 'return' statement), at the cost of a larger code structure and compiler complexity. But, if it also means that frames can be allocated faster (e.g. via pymalloc), it might be worth it, just like getting rid of SET_LINENO turned out to be a net win.
All of it, in easy cases. ISTR that the fast path could be a little wider -- it bails when the called function has default arguments, but I think this case could be handled easily enough.
When it has *any* default arguments, or only when it doesn't have values to supply for them?
Why are frames so big?
Because there are CO_MAXBLOCKS * 12 bytes in there for the block stack. If there was no need for that, frames could perhaps be allocated via pymalloc. They only have around 100 bytes or so in them, apart from the blockstack and locals/value stack.
> Do we need a tp_callmethod that takes an argument array, length, and > keywords, so that we can skip instancemethod allocation in the > common case of calling a method directly?
Hmm, didn't think of that, and I don't think it's how the CALL_ATTR attempt worked. I presume it would need to take a method name too :)
Er, yeah, I thought that was obvious. :)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com