[Python-ideas] Re: PEP 671 late-bound defaults implementation

Chris Angelico Sun, 31 Oct 2021 05:15:52 -0700

On Sun, Oct 31, 2021 at 6:25 PM Steven D'Aprano <st...@pearwood.info> wrote:
>
> I have a suggestion for the implementation.
>
> I think that Chris' current approach is to compile the late-bound
> defaults in the function body. So if we have a function like this:
>
>     def func(a, b=early_expression, @c=late_expression):
>         block
>
> the function code looks like this (after compilation):
>
>     # Pseudocode
>     if c is unbound:
>         c = late_expression
>     block
>
> (except in bytecode of course).
>
> Chris, do I have that right? If it is wrong, probably everything I say
> next is irrelevant.


Yes, that's correct, along with some additional code when functions
get called that enables that to happen.

> There is a strange asymmetry to the way the default for b and c are
> handled. For b, it is the interpreter's responsibilty to load the
> default value (pre-evaluated and cached) and bind it to the parameter.
> But for c, it is the function object's responsibility.

Kinda. There's a hunk of code that initializes a stack frame, and the
interpreter makes use of both the caller and the target function in
order to map arguments to parameters etc, and then it calls the
function.

To illustrate the process (and note that I'm talking CPython here, so
things might not apply to others), I'll use this function:

def f(init, call):
    def g(a=init(), b=>call()): return a, b
    return g

It can then be called thus:

>>> f(lambda: print("Hello") or 1, lambda: print("world") or 2)
Hello
<function f.<locals>.g at 0x7f9c68943950>
>>> _()
world
(1, 2)

Chronologically, the sequence is:

1) Compilation
2) Function definition - when f() is called
3a) Function initialization - when g() is called
3b) Function invocation - actually executing the body of g()

Most of the time, 3a and 3b would be considered a single action, which
is why I'm not calling them 3 and 4.

At compilation time (1), bytecode for both types of function default
will be produced. Both f and g are part of the same compilation unit
(I just did this at the REPL, so it was done with exec, but it could
have been a module, or anything else), and get compiled at basically
the same time. They start out as part of the same AST, and end up as a
logical, coherent execution unit.

When the def statement is executed (2), bytecode for early defaults is
run, and a new function object is created that references the shared
code object, and also has the table of default values (I'm going to
pretend that __defaults__ and __kwdefaults__ together define a single
conceptual thing - a tuple for positional, a dict for kwonly, but
they're together defining the arg defaults) and the extra info that
distinguishes earlies and lates (same again here for
__defaults_extra__ and __kwdefaults_extra__).

Function initialization (3a) involves creating a new stack frame and
figuring out what variables it has in it. This means mapping
positional args to pos-only and pos-or-kwd, and keyword args to
pos-or-kwd and kwonly; any parameters that don't yet have values are
populated from the table of defaults. Late-bound defaults are
signalled, but aren't evaluated yet.

Then the function itself begins (3b), and the first thing it does is
to go through all its late-bound defaults, applying any that are
needed.


I would like to be able to have late-bound defaults happen in 3a, but
I don't know of a good way to do so. For one thing, it would mean that
they get processed before a generator pauses - currently, the stack
frame for a generator will pause without a single piece of function
bytecode being executed.

Unfortunately, "the interpreter" is responsible for literally every
part of this, so it becomes confusing to talk in those terms. So where
you say "the interpreter" below, I'm going to substitute "stack frame
initialization" and talk about step 3a. Hopefully I'm not
misrepresenting you with this transformation; if I am, please correct
me.

> I'd like to suggest a different approach which I expect will be more
> flexible, I hope won't cost too much in performance, and in my opinion
> much more closely matches the semantics of the feature.
>
> I think it should remain the interpreter's responsibility to set up all
> the parameters before entering the function, including late-bound
> defaults. That will have the big advantage that disassembling func will
> only show the code for "block", not the associated code that tests and
> binds late-bound defaults.
>
> (Just like currently, it doesn't show the code for binding early-bound
> defaults.)

Currently, the code for early-bound defaults can be found by
disassembling the surrounding function. The code to do that isn't part
of step 3a, it's part of step 2. All that happens in 3a is the mapping
from one namespace (the caller's) to the other (the new stack frame).

> This suggests that each late-bound expression should be compiled into a
> separate code object, all of which are then squirrelled away in the
> function object (just as the __code__ and __defaults__ currently are).
>
> I imagine the process will be something like:
>
> * set up a new local namespace for the function call
>
> * bind arguments to parameters in that namespace
>
> * bind early-bound defaults from the function __defaults__
>   to parameters
>
> * (NEW) evaluate the appropriate late-bound expression code
>   objects, running them in the local namespace, and binding
>   their results to the parameters;
>
> * enter the function's code block.

Interesting.

> Benefits:
>
> - the function code block is smaller, since it no longer has to
>   include the "test, evaluate, bind" for every late-bound parameter;
>
> - this may improve code locality, which is good for performance
>   (or so I am told);

I'm not sure that cache locality would be materially affected by this,
since that's more of a C level thing, but I'm no expert on that.

> - introspection tools such as dis can disassemble the body of the
>   function independently of the late-bound parameters;
>
> - which means we can inspect the late-bound parameters independently
>   by passing their code object to dis;
>
> - for testing, we can evaluate the expression code objects using
>   eval (maybe?);

Not easily, since they need their proper execution context

> - we may be able to include the source code to the expression in
>   the expression's own code block, e.g. in the co_name field(?);
>
> - we may be able to replace/modify the defaults' code blocks
>   independently of the main function __code__, e.g. for byte-code
>   hacking, or other function object hacking.
>
> Costs:
>
> - the function object itself may be a little larger.
>
> Thoughts?

Hmm. Interesting, very very interesting.

Let's see. The biggest consequence is that step 3a would now involve
the execution of arbitrary Python code. That may have a LOT of
consequences, particularly for generators. I think I like some of
those consequences, but am not sure if I like them all.

Though - after inspecting the source code, I found a way to execute
arbitrary code during stack frame initialization. Define a subclass of
str, override __eq__, and use that as a dict key which gets unpacked
into kwargs, and your __eq__ function will be called to test whether
it's each of the parameters. So maybe that's not TOO big a problem.

Problem, though: You still need to compile the initialization code at
the same time as the surrounding context, in order to bind name
references correctly (nonlocals, references to earlier parameters,
etc). That severely restricts any sort of replacement on the function
object (per your second email), since you'd have to compile any
replacements at the same time the rest of the function is compiled.

I'm not sure how much benefit there would be, since arbitrary code
still can't be attached to the defaults. But philosophically, it is an
interesting concept, especially since it would give generators a
logical way to define initialization code.

ChrisA
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/D2MHRFK4RIXLKCNPFYSCXASLBOYQCKYG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: PEP 671 late-bound defaults implementation

Reply via email to