Cool to see this on python-ideas. I'm really looking forward to this PEP 550 or 521.
On Wednesday, August 16, 2017 at 3:19:29 AM UTC-4, Nathaniel Smith wrote: > > On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov <yseliv...@gmail.com > <javascript:>> wrote: > > Hi, > > > > Here's the PEP 550 version 2. > > Awesome! > > Some of the changes from v1 to v2 might be a bit confusing -- in > particular the thing where ExecutionContext is now a stack of > LocalContext objects instead of just being a mapping. So here's the > big picture as I understand it: > > In discussions on the mailing list and off-line, we realized that the > main reason people use "thread locals" is to implement fake dynamic > scoping. Of course, generators/async/await mean that currently it's > impossible to *really* fake dynamic scoping in Python -- that's what > PEP 550 is trying to fix. So PEP 550 v1 essentially added "generator > locals" as a refinement of "thread locals". But... it turns out that > "generator locals" aren't enough to properly implement dynamic scoping > either! So the goal in PEP 550 v2 is to provide semantics strong > enough to *really* get this right. > > I wrote up some notes on what I mean by dynamic scoping, and why > neither thread-locals nor generator-locals can fake it: > > > https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb > > > Specification > > ============= > > > > Execution Context is a mechanism of storing and accessing data specific > > to a logical thread of execution. We consider OS threads, > > generators, and chains of coroutines (such as ``asyncio.Task``) > > to be variants of a logical thread. > > > > In this specification, we will use the following terminology: > > > > * **Local Context**, or LC, is a key/value mapping that stores the > > context of a logical thread. > > If you're more familiar with dynamic scoping, then you can think of an > LC as a single dynamic scope... > > > * **Execution Context**, or EC, is an OS-thread-specific dynamic > > stack of Local Contexts. > > ...and an EC as a stack of scopes. Looking up a ContextItem in an EC > proceeds by checking the first LC (innermost scope), then if it > doesn't find what it's looking for it checks the second LC (the > next-innermost scope), etc. > > > ``ContextItem`` objects have the following methods and attributes: > > > > * ``.description``: read-only description; > > > > * ``.set(o)`` method: set the value to ``o`` for the context item > > in the execution context. > > > > * ``.get()`` method: return the current EC value for the context item. > > Context items are initialized with ``None`` when created, so > > this method call never fails. > > Two issues here, that both require some expansion of this API to > reveal a *bit* more information about the EC structure. > > 1) For trio's cancel scope use case I described in the last, I > actually need some way to read out all the values on the LocalContext > stack. (It would also be helpful if there were some fast way to check > the depth of the ExecutionContext stack -- or at least tell whether > it's 1 deep or more-than-1 deep. I know that any cancel scopes that > are in the bottommost LC will always be attached to the given Task, so > I can set up the scope->task mapping once and re-use it indefinitely. > OTOH for scopes that are stored in higher LCs, I have to check at > every yield whether they're currently in effect. And I want to > minimize the per-yield workload as much as possible.) > > 2) For classic decimal.localcontext context managers, the idea is > still that you save/restore the value, so that you can nest multiple > context managers without having to push/pop LCs all the time. But the > above API is not actually sufficient to implement a proper > save/restore, for a subtle reason: if you do > > ci.set(ci.get()) > > then you just (potentially) moved the value from a lower LC up to the top > LC. > I agree with Nathaniel that this is an issue with the current API. I don't think it's a good idea to have set and get methods. It would be much better to reflect the underlying ExecutionContext *stack* in the API by exposing a mutating *context manager* on the Context Key object instead of set. For example, my_context = sys.new_context_key('my_context') options = my_context.get() options.some_mutating_method() with my_context.mutate(options): # Do whatever you want with the mutated context # Now, the context is reverted. Similarly, instead of my_context.set('spam') you would do with my_context.mutate('spam'): # Do whatever you want with the mutated context # Now, the context is reverted. > > Here's an example of a case where this can produce user-visible effects: > > > https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py > > > There are probably a bunch of options for fixing this. But basically > we need some API that makes it possible to temporarily set a value in > the top LC, and then restore that value to what it was before (either > the previous value, or 'unset' to unshadow a value in a lower LC). One > simple option would be to make the idiom be something like: > > @contextmanager > def local_value(new_value): > state = ci.get_local_state() > ci.set(new_value) > try: > yield > finally: > ci.set_local_state(state) > > where 'state' is something like a tuple (ci in EC[-1], > EC[-1].get(ci)). A downside with this is that it's a bit error-prone > (very easy for an unwary user to accidentally use get/set instead of > get_local_state/set_local_state). But I'm sure we can come up with > something. > > > Manual Context Management > > ------------------------- > > > > Execution Context is generally managed by the Python interpreter, > > but sometimes it is desirable for the user to take the control > > over it. A few examples when this is needed: > > > > * running a computation in ``concurrent.futures.ThreadPoolExecutor`` > > with the current EC; > > > > * reimplementing generators with iterators (more on that later); > > > > * managing contexts in asynchronous frameworks (implement proper > > EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.) > > > > For these purposes we add a set of new APIs (they will be used in > > later sections of this specification): > > > > * ``sys.new_local_context()``: create an empty ``LocalContext`` > > object. > > > > * ``sys.new_execution_context()``: create an empty > > ``ExecutionContext`` object. > > > > * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque > > to Python code, and there are no APIs to modify them. > > > > * ``sys.get_execution_context()`` function. The function returns a > > copy of the current EC: an ``ExecutionContext`` instance. > > If there are enough of these functions then it might make sense to > stick them in their own module instead of adding more stuff to sys. I > guess worrying about that can wait until the API details are more firm > though. > > > * If ``coro.cr_local_context`` is an empty ``LocalContext`` object > > that ``coro`` was created with, the interpreter will set > > ``coro.cr_local_context`` to ``None``. > > I like all the ideas in this section, but this specific point feels a > bit weird. Coroutine objects need a second hidden field somewhere to > keep track of whether the object they end up with is the same one they > were created with? > > If I set cr_local_context to something else, and then set it back to > the original value, does that trigger the magic await behavior or not? > What if I take the initial LocalContext off of one coroutine and > attach it to another, does that trigger the magic await behavior? > > Maybe it would make more sense to have two sentinel values: > UNINITIALIZED and INHERIT? > > > To enable correct Execution Context propagation into Tasks, the > > asynchronous framework needs to assist the interpreter: > > > > * When ``create_task`` is called, it should capture the current > > execution context with ``sys.get_execution_context()`` and save it > > on the Task object. > > I wonder if it would be useful to have an option to squash this > execution context down into a single LocalContext, since we know we'll > be using it for a while and once we've copied an ExecutionContext it > becomes impossible to tell the difference between one that has lots of > internal LocalContexts and one that doesn't. This could also be handy > for trio/curio's semantics where they initialize a new task's context > to be a shallow copy of the parent task: you could do > > new_task_coro.cr_local_context = get_current_context().squash() > > and then skip having to wrap every send() call in a run_in_context. > > > Generators > > ---------- > > > > Generators in Python, while similar to Coroutines, are used in a > > fundamentally different way. They are producers of data, and > > they use ``yield`` expression to suspend/resume their execution. > > > > A crucial difference between ``await coro`` and ``yield value`` is > > that the former expression guarantees that the ``coro`` will be > > executed fully, while the latter is producing ``value`` and > > suspending the generator until it gets iterated again. > > > > Generators, similarly to coroutines, have a ``gi_local_context`` > > attribute, which is set to an empty Local Context when created. > > > > Contrary to coroutines though, ``yield from o`` expression in > > generators (that are not generator-based coroutines) is semantically > > equivalent to ``for v in o: yield v``, therefore the interpreter does > > not attempt to control their ``gi_local_context``. > > Hmm. I assume you're simplifying for expository purposes, but 'yield > from' isn't the same as 'for v in o: yield v'. In fact PEP 380 says: > "Motivation: [...] a piece of code containing a yield cannot be > factored out and put into a separate function in the same way as other > code. [...] If yielding of values is the only concern, this can be > performed without much difficulty using a loop such as 'for v in g: > yield v'. However, if the subgenerator is to interact properly with > the caller in the case of calls to send(), throw() and close(), things > become considerably more difficult. As will be seen later, the > necessary code is very complicated, and it is tricky to handle all the > corner cases correctly." > > So it seems to me that the whole idea of 'yield from' is that it's > supposed to handle all the tricky bits needed to guarantee that if you > take some code out of a generator and refactor it into a subgenerator, > then everything works the same as before. This suggests that 'yield > from' should do the same magic as 'await', where by default the > subgenerator shares the same LocalContext as the parent generator. > (And as a bonus it makes things simpler if 'yield from' and 'await' > work the same.) > > > Asynchronous Generators > > ----------------------- > > > > Asynchronous Generators (AG) interact with the Execution Context > > similarly to regular generators. > > > > They have an ``ag_local_context`` attribute, which, similarly to > > regular generators, can be set to ``None`` to make them use the outer > > Local Context. This is used by the new > > ``contextlib.asynccontextmanager`` decorator. > > > > The EC support of ``await`` expression is implemented using the same > > approach as in coroutines, see the `Coroutine Object Modifications`_ > > section. > > You showed how to make an iterator that acts like a generator. Is it > also possible to make an async iterator that acts like an async > generator? It's not immediately obvious, because you need to make sure > that the local context gets restored each time you re-enter the > __anext__ generator. I think it's something like: > > class AIter: > def __init__(self): > self._local_context = ... > > # Note: intentionally not async > def __anext__(self): > coro = self._real_anext() > coro.cr_local_context = self._local_context > return coro > > async def _real_anext(self): > ... > > Does that look right? > > > ContextItem.get() Cache > > ----------------------- > > > > We can add three new fields to ``PyThreadState`` and > > ``PyInterpreterState`` structs: > > > > * ``uint64_t PyThreadState->unique_id``: a globally unique > > thread state identifier (we can add a counter to > > ``PyInterpreterState`` and increment it when a new thread state is > > created.) > > > > * ``uint64_t PyInterpreterState->context_item_deallocs``: every time > > a ``ContextItem`` is GCed, all Execution Contexts in all threads > > will lose track of it. ``context_item_deallocs`` will simply > > count all ``ContextItem`` deallocations. > > > > * ``uint64_t PyThreadState->execution_context_ver``: every time > > a new item is set, or an existing item is updated, or the stack > > of execution contexts is changed in the thread, we increment this > > counter. > > I think this can be refined further (and I don't understand > context_item_deallocs -- maybe it's a mistake?). AFAICT the things > that invalidate a ContextItem's cache are: > > 1) switching threadstates > 2) popping or pushing a non-empty LocalContext off the current > threadstate's ExecutionContext > 3) calling ContextItem.set() on *that* context item > > So I'd suggest tracking the thread state id, a counter of how many > non-empty LocalContexts have been pushed/popped on this thread state, > and a *per ContextItem* counter of how many times set() has been > called. > > > Backwards Compatibility > > ======================= > > > > This proposal preserves 100% backwards compatibility. > > While this is mostly true in the strict sense, in practice this PEP is > useless if existing thread-local users like decimal and numpy can't > migrate to it without breaking backcompat. So maybe this section > should discuss that? > > (For example, one constraint on the design is that we can't provide > only a pure push/pop API, even though that's what would be most > convenient context managers like decimal.localcontext or > numpy.errstate, because we also need to provide some backcompat story > for legacy functions like decimal.setcontext and numpy.seterr.) > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-ideas mailing list > python...@python.org <javascript:> > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/