Hi Nick,

On 27/07/2021 2:29 pm, Nick Coghlan wrote:
(I'm not sure Mailman will get the threading right when I've received
the original email directly, so apologies in advance if this reply
creates a 2nd thread)
On Mon, 26 Jul 2021 at 23:57, Mark Shannon <m...@hotpy.org> wrote:

Hi,

First of all let me say that I agree with the aims of PEP 558 and most
of the design.
I do find the wording of the PEP itself a bit confusing in a couple of
places.

This critique is based on my understanding of the PEP.
If I am mistaken in my misunderstanding, then treat that as an implied
request for clarification of the PEP :)


Critique of PEP 558
===================

Layout of the PEP
-----------------

[This is largely to help the reader, it doesn't change the nature of the
PEP]

Could we replace the section "CPython Implementation Changes" with a
section that states what the behavior will be, not how it changes.
Having to mentally apply the suggested changes to the existing (and
convoluted) implementation makes it hard to work out what the proposed
behavior will be.

Just stating what the new behaviour is isn't sufficient to assess the
impact of the change, though - it needs to be compared to the old
(arcane) behaviour, and I don't want to assume readers already
understand what that behaviour is.

That said, the summary section is intended to describe how things will
work post-change. It just leaves out the details of the new proxy
implementation (which I admit is the most complicated part of the
updated implementation).

Could the design discussion be moved to an appendix or another document?
The key parts of it should be moved to the Rationale or Motivation.

It's effectively an Appendix already (the only things after it are the
pointer to the implementation and the acknoweldgements).

That said, I'll note that several of the questions you've asked in
this email are answered directly in the Design Discussions section
that you're suggesting removing from the PEP.

There should be a Motivation section explaining why this PEP is
necessary, for those not familiar with the weirdness of `locals()`.
Much of what is in the Rationale should perhaps be in the Motivation.

Yes, the Rationale section could be retitled Motivation - it's the
rationale for the PEP existing, not the rationale for the design
decisions (those are in the Design Discussion section).

The historical arcane behaviour at function scope is covered in
https://www.python.org/dev/peps/pep-0558/#historical-semantics-at-function-scope
but the bug references seemed more important for the Motivation
section (if the status quo *worked*, I'd never have tried to work out
how to change it, but with the status quo both arcane *and* broken, it
makes sense to try to figure out a better alternative).


Proposal
--------

[Ditto; this is largely to help the reader, it doesn't change the nature
of the PEP]

Drop the definitions of the type of scope. They are (at least they
should be) clearly defined in existing documentation.

They're not, unfortunately. As far as I am aware,
https://docs.python.org/3/reference/executionmodel.html#naming-and-binding
is as good as we've currently got, so

Why "largely" eliminate the concept of a separate tracing mode? Wasn't
the plan to eliminate it entirely?

The "largely" there just relates to the fact that even though the PEP
eliminates the side effects that tracing mode has historically had on
the behaviour of locals(), the core eval loop still has the notion of
"tracing or not" that turns off some of the execution shortcuts it can
otherwise take.

Rather than specifying what the documentation will be, could you specify
the semantics. The language here should be precise.

The reference documentation should be precise as well, since that is
what other implementations will be following.

What semantics do you feel are left unspecified?

The documentation needs to explain the behavior to the majority of
users, but the PEP should be worded so that it can be implemented
correctly from just reading the PEP.

There is no reason to remove the sugested documentation changes, but
they should be secondary to the more formal specification.

I don't know what you want to see on this front.

The current PEP seems complete to me, except in relation to the
specifics of the proxy behaviour, which it deliberately leaves
underspecified (mostly because my current implementation sucks in a
few areas, and I really don't want to enshrine those limitations in
the language spec).

All the above were mostly just suggestions, the format of the PEP is up to you.

But please, do not underspecify anything. I need to maintain this, as do the PyPy and MicroPython developers. We need this to be precise.


Resolving the issues with tracing mode behaviour
------------------------------------------------

According to this section the `f_locals` object (_PyFastLocalsProxy_Type
in C) will have *two* fields.
It should only have *one*, the pointer to the underlying frame object.

That's a good point - fast_refs doesn't reference the frame object, so
it's safe to store it on the frame object and reference it from the
individual proxies.

Switching to that approach also has the advantage that every proxy
after the first is much cheaper, since they don't need to build the
fast refs map again.

In order to maximize backwards compatibility and, more importantly,
avoid the synchronization issues that lead to obscure bugs, all the
state of the `f_locals` object should be kept on the frame.

Any changes to `f_locals` should be instantly visible to any thread both
through other `f_locals` objects (for the same frame) and the underlying
local variable (if it exists).

[snip proxy method implementation sketch] >
This gets the gist of how the implementation works, but it isn't quite
that simple for a few reasons:

1. The proxy needs to offer the same algorithmic complexity as dict
APIs, so doing O(n) linear searches of C arrays and Python tuples
inside O(1) lookup operations isn't OK (hence the fastrefs mapping to
allow for O(1) resolution of variable names to cell references and
local variable array offsets)

Linear searches are faster for small arrays, and there is nothing stopping you adding a helper map of names->indexes for large functions. Provided the mapping from name to local index is O(1), then the whole operation is O(1).

You really don't need much extra machinery to maintain O(1) behavior (which no one cares about, they just care about overall performance).


2. The proxy needs to deal with *SIX* kinds of possible key lookup, not two:

* unbound local variable (numeric index in fast refs map, NULL in fast
locals slot)
* bound local variable (numeric index in fast refs map, object pointer
in fast locals slot)
* unbound cell variable (cell reference in fast refs map, cell contains NULL)
* bound cell variable (cell reference in fast refs map, cell contains
object pointer)
* not yet converted cell variables (numeric index in fast refs map,
possibly NULL initial value in fast locals slot)
* extra keys injected via a fast locals proxy or PyEval_GetLocals()

There could be a million different cases internally, but that doesn't need to reflected in the API.


3. PyEval_GetLocals() needs to work essentially the same way it used
to (combined mapping containing the results of PyEval_FastToLocals()
along with an extra key/value pairs, any extra key/value pairs written
get includes in the result of future calls to locals(), but attempts
to write to actual local and cell variables will probably get
clobbered on the next refresh from the frame)

The combination of those 3 requirements is what lead me down the path
of using `f_locals` as a stateful cache that could be used both as the
return value for PyEval_GetLocals(), and to satisfy those mapping API
requirements that couldn't easily be met via the fast_refs mapping.

Remove the state from the proxy and (almost) all of your problems go away.


C API changes
-------------

The PEP suggests adding four new functions to the stable API.
Then in the "Changes to the public CPython C API" section, another five
functions are added for a total of nine new functions!

At the Python level, there are two ways to access a locals mapping.
1. locals()
2. frame.f_locals

We only need two C functions, one for each of the above.

No, we don't. My original C API design was along those lines, and it
was changed in early 2020 because it isn't adequate for C extension
authors.

The PEP explains this: Python code knows structurally how locals() is
going to behave, but C extension code has no idea. Therefore, C
extension code needs to be able to request the behaviour it needs, and
let the implementation decide how best to satisfy that.

Why do authors of C code have no idea what is going on, but authors of Python code do? That seems like an odd assumption. The C API has plenty of functions for querying the state of the VM, if that is what is needed.

What exactly are the requirements of C extension authors?


https://www.python.org/dev/peps/pep-0558/#proposing-several-additions-to-the-stable-c-api-abi
summarises the outcome of that previous discussion.

This discussion seems to date from when the design was quite different, is it relevant any more?



`PyEval_GetLocals()` should be equivalent to `locals()`.
It will have to return a new reference. It cannot return a borrowed
reference as it would be returning freed memory.
There is nothing to borrow the reference from, for a function scope.

Yes, which is why it can't do that: that API is already defined as
returning a borrowed reference, and trying to change that would be a
significant backwards compatibility break.

If you return a borrowed reference, you must borrow it from somewhere.
Where are you borrowing this reference from?
If you are borrowing from the frame, then the frame must have a strong reference to the proxy, which creates a cycle.


`PyFrame_GetLocals(PyFrameObject *)` should be equivalent to
`frame.f_locals`.

The PEP doesn't explicitly state why `PyLocals_GetKind()` is needed, but
I believe it is avoid unnecessary copying?
Why is creating an extra copy an issue? It takes a microsecond or so to
create the copy.

As described in the PEP, the query API is for the benefit of C
extensions that want to do something different when PyLocals_Get()
returns something other than a direct reference to the frame's local
namespace.

For example, it may try to get hold of the frame object to retrieve a
read/write proxy from it, and throw a custom error if that isn't
possible (as an implementation may support PyLocals_Get() without
supporting the frame introspection APIs).

If a function that returns a copy of the local namespace is really
needed, then why not offer that functionality directly?
E.g. `PyFrame_GetLocalsCopy()` which is roughly:

      PyObject *locals = PyEval_GetLocals();
      if (current_scope_is_function())
          return locals;
      return PyDict_Copy(locals); // This leaks, need to decref locals

The proposed API does offer that functionality directly:
PyLocals_GetCopy() and PyFrame_GetLocalsCopy(f).

Why two functions, not one?


Reducing the runtime overhead of trace hooks
--------------------------------------------

I'm confused by this part.
Since `_PyFrame_FastToLocals` and friends do not alter the logical state
of the frame object (or anything else), they should no-ops.
It is `PyFrame_GetLocals()` that creates the proxy object, surely.

_PyFrame_FastToLocals() still refreshes the frame cache that is used
as the return value from PyEval_GetLocals(), the same way it always
has.

The frame cache is implicitly kept in sync by operations on frame
proxy objects, but there are still two ways for it to get out of sync:
* code execution inside the frame (this updates the fast locals array,
not the key/value cache stored in the C level f_locals attribute)
* direct manipulation of the frame cache by callers of
PyEval_GetLocals (the frame will ignore any attempts to bind or unbind
variables that way, but it can still throw off proxy operations that
rely on the cache)

Just remove the state in the proxy and nothing will ever be out of sync.


The reason the frame cache still exists in the first place is because
there are a number of mapping APIs where the easiest way to meet the
algorithmic complexity expectations of the dict API is to use an
actual dict:

= Length checking =

The fast_refs mapping (and the frame metadata in general), includes
all variables defined for the frame, regardless of whether they're
currently bound or not.
The correct length for the fast locals proxy mapping only includes the
*currently bound* fast local and cell variables, together with any
extra keys that have been added.
The locals proxy keeps this an O(1) operation by reporting the size of
the frame cache, so it doesn't have to check which names are currently
bound every time.

It may be O(1), but it is incorrect. You state earlier that the cache might be out of date.
It is easy to implement anything in O(1) if it is allowed to be wrong ;)


= Mapping comparisons =

Mapping comparisons have an O(1) early exit if their lengths don't
match, before falling back to a full O(N) key value comparison. The
locals proxy takes advantage of that by delegating comparison
operations to the frame cache rather than reimplementing them.

= Iteration =

Rather than reimplementing forward and reverse iteration and the
keys(), values(), and items() methods, the mapping proxy just
delegates these operations to the frame cache.

Operations on the proxy other than these ones either don't rely on the
cache being up to date at all (because they can use the fast_refs
mapping instead, implicitly updating any affected cache entries along
the way), or else implicitly refresh the cache before relying on it
(e.g. copying the proxy mapping works by refreshing the frame cache,
then copying the cache).

I assume there will be ways to improve the implementation to make
explicit frame cache refreshes less necessary over time (hence the
caveat along those lines in the final paragraph of
https://www.python.org/dev/peps/pep-0558/#continuing-to-support-storing-additional-data-on-optimised-frames
), but the existence of the PyEval_GetLocals() API means that we're
highly unlikely to ever get rid of the frame cache entirely.

That's why PyEval_GetLocals() needs to return a new reference.
This is a chance to fix a fundamentally broken API, let's not waste the opportunity.

Cheers,
Mark.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EAO3VQZIMFEPAFLBH5IEAFCDLTNYFBWY/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to