At 03:56 PM 2/20/05 -0500, Phillip J. Eby wrote:
At 07:00 PM 2/20/05 +0000, Michael Hudson wrote:
Eliminating the blockstack would be nice (esp. if it's enough to get
frames small enough that they get allocated by PyMalloc) but this
seemed to be tricky too (or at least Armin, Samuele and I spent a
cuple of hours yakking about it on IRC and didn't come up with a clear
approach).  Dynamically allocating the blockstack would be simpler,
and might acheive a similar win.  (This is all from memory, I haven't
thought about specifics in a while).

I think I have an idea how to do it in a (relatively) simple fashion; see if you can find a hole in it:


* Change the PyTryBlock struct to include an additional member, 'int b_prev', that refers to the previous block in a chain

* Change the compiler's emission of SETUP_* opcodes, so that instead of a PyTryBlock being added to the blockstack at interpretation time, it's added to the end of a 'co_blktree' block array at compile time, with its 'b_prev' pointing to the current "top" of the block stack. Instead of the SETUP_* argument being the handler offset, have it be the index of the just-added blocktree entry.

* Replace f_blockstack and f_iblock with 'int f_iblktree', and change PyFrame_BlockSetup() to set this equal to the SETUP_* argument, and PyFrame_BlockPop() to use this as an index into the code's co_blktree to retrieve the needed values. PyFrame_BlockPop() would then set f_iblktree equal to the "popped" block's 'b_prev' member, thus "popping" the block from this virtual stack.

(Note, by the way, that the blocktree could actually be created as a post-processing step of the current compilation process, by a loop that scans the bytecode and tracks the current stack and blockstack levels, and then replaces the SETUP_* opcodes' arguments. This might be a simpler option than trying to change the compiler to do it along the way.)

Can anybody see any flaws in this concept? As far as I can tell it just generates all possible block stack states at compile time, but doesn't change block semantics in the least, and it scarcely touches the eval loop. It seems like it could drop the size of frames enough to let them use pymalloc instead of the OS malloc, at the cost of a 16 bytes per block increase in the size of code objects. (And of course the necessary changes to 'marshal' and 'dis' as well as the compiler and eval loop.)

(More precisely, frames whose f_nlocals + f_stacksize is 40 or less, would be 256 bytes or less, and therefore pymalloc-able. However, this should cover all but the most complex functions.)

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to