New submission from Mark Shannon <m...@hotpy.org>:
The two plausible layouts from evaluation stack frames are described here: https://github.com/faster-cpython/ideas/issues/31#issuecomment-844263795 We opted for layout A, although it is a bit more complex to manage and slightly more expensive in terms of pointers. The reason for this was that it theoretically allows zero-copying Python-to-Python calls. I now believe this was the wrong decision and we should have chosen layout B. B is cheaper. It needs 2 pointers, not 3, meaning that there is another register available for use in the interpreter. Also the linkage area doesn't need the nlocalsplus field. The benefit of zero-copy calls is much smaller than I thought: * Any calls from a generator functions do not benefit * An additional check is needed to make sure that both frames are in the same stack chunk * Any jitted code will keep stack values in registers, so stores will still be needed in either case. * The average number of arguments copied is low (typically 2 or 3). Even in the ideal case (interpreter, no generator, same stack chunk) changing to layout B will cost 2/3 memory moves (independent of each other), but will gain us extra code for checking chunks, and one move (moving nlocalsplus). So at best we only save 1/2 moves. In other cases layout B is better. One final improvement to layout B: saving the stackdepth as an offset from locals[0] not from stack[0] further speeds frame handling. ---------- assignee: Mark.Shannon messages: 400202 nosy: Mark.Shannon, pablogsal priority: normal severity: normal status: open title: Change layout of frames back to specials-locals-stack (from locals-specials-stack) _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue44990> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com