Re: [Python-ideas] PEP 550 v2

Jelle Zijlstra Tue, 15 Aug 2017 23:54:15 -0700

2017-08-16 1:55 GMT+02:00 Yury Selivanov <yselivanov...@gmail.com>:

> Hi,
>
> Here's the PEP 550 version 2.  Thanks to a very active and insightful
> discussion here on Python-ideas, we've discovered a number of
> problems with the first version of the PEP.  This version is a complete
> rewrite (only Abstract, Rationale, and Goals sections were not updated).
>
> The updated PEP is live on python.org:
> https://www.python.org/dev/peps/pep-0550/
>
> There is no reference implementation at this point, but I'm confident
> that this version of the spec will have the same extremely low
> runtime overhead as the first version.  Thanks to the new ContextItem
> design, accessing values in the context is even faster now.
>
> Thank you!
>
>
> PEP: 550
> Title: Execution Context
> Version: $Revision$
> Last-Modified: $Date$
> Author: Yury Selivanov <y...@magic.io>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 11-Aug-2017
> Python-Version: 3.7
> Post-History: 11-Aug-2017, 15-Aug-2017
>
>
> Abstract
> ========
>
> This PEP proposes a new mechanism to manage execution state--the
> logical environment in which a function, a thread, a generator,
> or a coroutine executes in.
>
> A few examples of where having a reliable state storage is required:
>
> * Context managers like decimal contexts, ``numpy.errstate``,
>   and ``warnings.catch_warnings``;
>
> * Storing request-related data such as security tokens and request
>   data in web applications, implementing i18n;
>
> * Profiling, tracing, and logging in complex and large code bases.
>
> The usual solution for storing state is to use a Thread-local Storage
> (TLS), implemented in the standard library as ``threading.local()``.
> Unfortunately, TLS does not work for the purpose of state isolation
> for generators or asynchronous code, because such code executes
> concurrently in a single thread.
>
>
> Rationale
> =========
>
> Traditionally, a Thread-local Storage (TLS) is used for storing the
> state.  However, the major flaw of using the TLS is that it works only
> for multi-threaded code.  It is not possible to reliably contain the
> state within a generator or a coroutine.  For example, consider
> the following generator::
>
>     def calculate(precision, ...):
>         with decimal.localcontext() as ctx:
>             # Set the precision for decimal calculations
>             # inside this block
>             ctx.prec = precision
>
>             yield calculate_something()
>             yield calculate_something_else()
>
> Decimal context is using a TLS to store the state, and because TLS is
> not aware of generators, the state can leak.  If a user iterates over
> the ``calculate()`` generator with different precisions one by one
> using a ``zip()`` built-in, the above code will not work correctly.
> For example::
>
>     g1 = calculate(precision=100)
>     g2 = calculate(precision=50)
>
>     items = list(zip(g1, g2))
>
>     # items[0] will be a tuple of:
>     #   first value from g1 calculated with 100 precision,
>     #   first value from g2 calculated with 50 precision.
>     #
>     # items[1] will be a tuple of:
>     #   second value from g1 calculated with 50 precision (!!!),
>     #   second value from g2 calculated with 50 precision.
>
> An even scarier example would be using decimals to represent money
> in an async/await application: decimal calculations can suddenly
> lose precision in the middle of processing a request.  Currently,
> bugs like this are extremely hard to find and fix.
>
> Another common need for web applications is to have access to the
> current request object, or security context, or, simply, the request
> URL for logging or submitting performance tracing data::
>
>     async def handle_http_request(request):
>         context.current_http_request = request
>
>         await ...
>         # Invoke your framework code, render templates,
>         # make DB queries, etc, and use the global
>         # 'current_http_request' in that code.
>
>         # This isn't currently possible to do reliably
>         # in asyncio out of the box.
>
> These examples are just a few out of many, where a reliable way to
> store context data is absolutely needed.
>
> The inability to use TLS for asynchronous code has lead to
> proliferation of ad-hoc solutions, which are limited in scope and
> do not support all required use cases.
>
> Current status quo is that any library, including the standard
> library, that uses a TLS, will likely not work as expected in
> asynchronous code or with generators (see [3]_ as an example issue.)
>
> Some languages that have coroutines or generators recommend to
> manually pass a ``context`` object to every function, see [1]_
> describing the pattern for Go.  This approach, however, has limited
> use for Python, where we have a huge ecosystem that was built to work
> with a TLS-like context.  Moreover, passing the context explicitly
> does not work at all for libraries like ``decimal`` or ``numpy``,
> which use operator overloading.
>
> .NET runtime, which has support for async/await, has a generic
> solution of this problem, called ``ExecutionContext`` (see [2]_).
> On the surface, working with it is very similar to working with a TLS,
> but the former explicitly supports asynchronous code.
>
>
> Goals
> =====
>
> The goal of this PEP is to provide a more reliable alternative to
> ``threading.local()``.  It should be explicitly designed to work with
> Python execution model, equally supporting threads, generators, and
> coroutines.
>
> An acceptable solution for Python should meet the following
> requirements:
>
> * Transparent support for code executing in threads, coroutines,
>   and generators with an easy to use API.
>
> * Negligible impact on the performance of the existing code or the
>   code that will be using the new mechanism.
>
> * Fast C API for packages like ``decimal`` and ``numpy``.
>
> Explicit is still better than implicit, hence the new APIs should only
> be used when there is no acceptable way of passing the state
> explicitly.
>
>
> Specification
> =============
>
> Execution Context is a mechanism of storing and accessing data specific
> to a logical thread of execution.  We consider OS threads,
> generators, and chains of coroutines (such as ``asyncio.Task``)
> to be variants of a logical thread.
>
> In this specification, we will use the following terminology:
>
> * **Local Context**, or LC, is a key/value mapping that stores the
>   context of a logical thread.
>
> * **Execution Context**, or EC, is an OS-thread-specific dynamic
>   stack of Local Contexts.
>
> * **Context Item**, or CI, is an object used to set and get values
>   from the Execution Context.
>
> Please note that throughout the specification we use simple
> pseudo-code to illustrate how the EC machinery works.  The actual
> algorithms and data structures that we will use to implement the PEP
> are discussed in the `Implementation Strategy`_ section.
>
>
> Context Item Object
> -------------------
>
> The ``sys.new_context_item(description)`` function creates a
> new ``ContextItem`` object.  The ``description`` parameter is a
> ``str``, explaining the nature of the context key for introspection
> and debugging purposes.
>
> ``ContextItem`` objects have the following methods and attributes:
>
> * ``.description``: read-only description;
>
> * ``.set(o)`` method: set the value to ``o`` for the context item
>   in the execution context.
>
> * ``.get()`` method: return the current EC value for the context item.
>   Context items are initialized with ``None`` when created, so
>   this method call never fails.
>
> The below is an example of how context items can be used::
>
>     my_context = sys.new_context_item(description='mylib.context')
>     my_context.set('spam')
>


Minor suggestion: Could we allow something like
`sys.set_new_context_item(description='mylib.context',
initial_value='spam')`? That would make it easier for type checkers to
infer the type of a ContextItem, and it would save a line of code in the
common case.

With this modification, the type of new_context_item would be

@overload
def new_context_item(*, description: str, initial_value: T) ->
ContextItem[T]: ...
@overload
def new_context_item(*, description: str) -> ContextItem[Any]: ...

If we only allow the second variant, type checkers would need some sort of
special casing to figure out that after .set(), .get() will return the same
type.


>     # Later, to access the value of my_context:
>     print(my_context.get())
>
>
> Thread State and Multi-threaded code
> ------------------------------------
>
> Execution Context is implemented on top of Thread-local Storage.
> For every thread there is a separate stack of Local Contexts --
> mappings of ``ContextItem`` objects to their values in the LC.
> New threads always start with an empty EC.
>
> For CPython::
>
>     PyThreadState:
>         execution_context: ExecutionContext([
>             LocalContext({ci1: val1, ci2: val2, ...}),
>             ...
>         ])
>
> The ``ContextItem.get()`` and ``.set()`` methods are defined as
> follows (in pseudo-code)::
>
>     class ContextItem:
>
>         def get(self):
>             tstate = PyThreadState_Get()
>
>             for local_context in reversed(tstate.execution_context):
>                 if self in local_context:
>                     return local_context[self]
>
>         def set(self, value):
>             tstate = PyThreadState_Get()
>
>             if not tstate.execution_context:
>                 tstate.execution_context = [LocalContext()]
>
>             tstate.execution_context[-1][self] = value
>
> With the semantics defined so far, the Execution Context can already
> be used as an alternative to ``threading.local()``::
>
>     def print_foo():
>         print(ci.get() or 'nothing')
>
>     ci = sys.new_context_item(description='test')
>     ci.set('foo')
>
>     # Will print "foo":
>     print_foo()
>
>     # Will print "nothing":
>     threading.Thread(target=print_foo).start()
>
>
> Manual Context Management
> -------------------------
>
> Execution Context is generally managed by the Python interpreter,
> but sometimes it is desirable for the user to take the control
> over it.  A few examples when this is needed:
>
> * running a computation in ``concurrent.futures.ThreadPoolExecutor``
>   with the current EC;
>
> * reimplementing generators with iterators (more on that later);
>
> * managing contexts in asynchronous frameworks (implement proper
>   EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.)
>
> For these purposes we add a set of new APIs (they will be used in
> later sections of this specification):
>
> * ``sys.new_local_context()``: create an empty ``LocalContext``
>   object.
>
> * ``sys.new_execution_context()``: create an empty
>   ``ExecutionContext`` object.
>
> * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque
>   to Python code, and there are no APIs to modify them.
>
> * ``sys.get_execution_context()`` function.  The function returns a
>   copy of the current EC: an ``ExecutionContext`` instance.
>
>   The runtime complexity of the actual implementation of this function
>   can be O(1), but for the purposes of this section it is equivalent
>   to::
>
>     def get_execution_context():
>         tstate = PyThreadState_Get()
>         return copy(tstate.execution_context)
>
> * ``sys.run_with_execution_context(ec: ExecutionContext, func, *args,
>   **kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution
>   context::
>
>     def run_with_execution_context(ec, func, *args, **kwargs):
>         tstate = PyThreadState_Get()
>
>         old_ec = tstate.execution_context
>
>         tstate.execution_context = ExecutionContext(
>             ec.local_contexts + [LocalContext()]
>         )
>
>         try:
>             return func(*args, **kwargs)
>         finally:
>             tstate.execution_context = old_ec
>
>   Any changes to Local Context by ``func`` will be ignored.
>   This allows to reuse one ``ExecutionContext`` object for multiple
>   invocations of different functions, without them being able to
>   affect each other's environment::
>
>       ci = sys.new_context_item('example')
>       ci.set('spam')
>
>       def func():
>           print(ci.get())
>           ci.set('ham')
>
>       ec = sys.get_execution_context()
>
>       sys.run_with_execution_context(ec, func)
>       sys.run_with_execution_context(ec, func)
>
>       # Will print:
>       #   spam
>       #   spam
>
> * ``sys.run_with_local_context(lc: LocalContext, func, *args,
>   **kwargs)`` runs ``func(*args, **kwargs)`` in the current execution
>   context using the specified local context.
>
>   Any changes that ``func`` does to the local context will be
>   persisted in ``lc``.  This behaviour is different from the
>   ``run_with_execution_context()`` function, which always creates
>   a new throw-away local context.
>
>   In pseudo-code::
>
>     def run_with_local_context(lc, func, *args, **kwargs):
>         tstate = PyThreadState_Get()
>
>         old_ec = tstate.execution_context
>
>         tstate.execution_context = ExecutionContext(
>             old_ec.local_contexts + [lc]
>         )
>
>         try:
>             return func(*args, **kwargs)
>         finally:
>             tstate.execution_context = old_ec
>
>   Using the previous example::
>
>       ci = sys.new_context_item('example')
>       ci.set('spam')
>
>       def func():
>           print(ci.get())
>           ci.set('ham')
>
>       ec = sys.get_execution_context()
>       lc = sys.new_local_context()
>
>       sys.run_with_local_context(lc, func)
>       sys.run_with_local_context(lc, func)
>
>       # Will print:
>       #   spam
>       #   ham
>
> As an example, let's make a subclass of
> ``concurrent.futures.ThreadPoolExecutor`` that preserves the execution
> context for scheduled functions::
>
>     class Executor(concurrent.futures.ThreadPoolExecutor):
>
>         def submit(self, fn, *args, **kwargs):
>             context = sys.get_execution_context()
>
>             fn = functools.partial(
>                 sys.run_with_execution_context, context,
>                 fn, *args, **kwargs)
>
>             return super().submit(fn)
>
>
> EC Semantics for Coroutines
> ---------------------------
>
> Python :pep:`492` coroutines are used to implement cooperative
> multitasking.  For a Python end-user they are similar to threads,
> especially when it comes to sharing resources or modifying
> the global state.
>
> An event loop is needed to schedule coroutines.  Coroutines that
> are explicitly scheduled by the user are usually called Tasks.
> When a coroutine is scheduled, it can schedule other coroutines using
> an ``await`` expression.  In async/await world, awaiting a coroutine
> is equivalent to a regular function call in synchronous code.  Thus,
> Tasks are similar to threads.
>
> By drawing a parallel between regular multithreaded code and
> async/await, it becomes apparent that any modification of the
> execution context within one Task should be visible to all coroutines
> scheduled within it.  Any execution context modifications, however,
> must not be visible to other Tasks executing within the same OS
> thread.
>
>
> Coroutine Object Modifications
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> To achieve this, a small set of modifications to the coroutine object
> is needed:
>
> * New ``cr_local_context`` attribute.  This attribute is readable
>   and writable for Python code.
>
> * When a coroutine object is instantiated, its ``cr_local_context``
>   is initialized with an empty Local Context.
>
> * Coroutine's ``.send()`` and ``.throw()`` methods are modified as
>   follows (in pseudo-C)::
>
>     if coro.cr_local_context is not None:
>         tstate = PyThreadState_Get()
>
>         tstate.execution_context.push(coro.cr_local_context)
>
>         try:
>             # Perform the actual `Coroutine.send()` or
>             # `Coroutine.throw()` call.
>             return coro.send(...)
>         finally:
>             coro.cr_local_context = tstate.execution_context.pop()
>     else:
>         # Perform the actual `Coroutine.send()` or
>         # `Coroutine.throw()` call.
>         return coro.send(...)
>
> * When Python interpreter sees an ``await`` instruction, it inspects
>   the ``cr_local_context`` attribute of the coroutine that is about
>   to be awaited.  For ``await coro``:
>
>   * If ``coro.cr_local_context`` is an empty ``LocalContext`` object
>     that ``coro`` was created with, the interpreter will set
>     ``coro.cr_local_context`` to ``None``.
>
>   * If ``coro.cr_local_context`` was modified by Python code, the
>     interpreter will leave it as is.
>
>   This makes any changes to execution context made by nested coroutine
>   calls within a Task to be visible throughout the Task::
>
>       ci = sys.new_context_item('example')
>
>       async def nested():
>           ci.set('nested')
>
>       asynd def main():
>           ci.set('main')
>           print('before:', ci.get())
>           await nested()
>           print('after:', ci.get())
>
>       # Will print:
>       #   before: main
>       #   after: nested
>
>   Essentially, coroutines work with Execution Context items similarly
>   to threads, and ``await`` expression acts like a function call.
>
>   This mechanism also works for ``yield from`` in generators decorated
>   with ``@types.coroutine`` or ``@asyncio.coroutine``, which are
>   called "generator-based coroutines" according to :pep:`492`,
>   and should be fully compatible with native async/await coroutines.
>
>
> Tasks
> ^^^^^
>
> In asynchronous frameworks like asyncio, coroutines are run by
> an event loop, and need to be explicitly scheduled (in asyncio
> coroutines are run by ``asyncio.Task``.)
>
> With the currently defined semantics, the interpreter makes
> coroutines linked by an ``await`` expression share the same
> Local Context.
>
> The interpreter, however, is not aware of the Task concept, and
> cannot help with ensuring that new Tasks started in coroutines,
> use the correct EC::
>
>     current_request = sys.new_context_item(description='request')
>
>     async def child():
>         print('current request:', repr(current_request.get()))
>
>     async def handle_request(request):
>         current_request.set(request)
>         event_loop.create_task(child)
>
>     run(top_coro())
>
>     # Will print:
>     #   current_request: None
>
> To enable correct Execution Context propagation into Tasks, the
> asynchronous framework needs to assist the interpreter:
>
> * When ``create_task`` is called, it should capture the current
>   execution context with ``sys.get_execution_context()`` and save it
>   on the Task object.
>
> * When the Task object runs its coroutine object, it should execute
>   ``.send()`` and ``.throw()`` methods within the captured
>   execution context, using the ``sys.run_with_execution_context()``
>   function.
>
> With help from the asynchronous framework, the above snippet will
> run correctly, and the ``child()`` coroutine will be able to access
> the current request object through the ``current_request``
> Context Item.
>
>
> Event Loop Callbacks
> ^^^^^^^^^^^^^^^^^^^^
>
> Similarly to Tasks, functions like asyncio's ``loop.call_soon()``
> should capture the current execution context with
> ``sys.get_execution_context()`` and execute callbacks
> within it with ``sys.run_with_execution_context()``.
>
> This way the following code will work::
>
>     current_request = sys.new_context_item(description='request')
>
>     def log():
>         request = current_request.get()
>         print(request)
>
>     async def request_handler(request):
>         current_request.set(request)
>         get_event_loop.call_soon(log)
>
>
> Generators
> ----------
>
> Generators in Python, while similar to Coroutines, are used in a
> fundamentally different way.  They are producers of data, and
> they use ``yield`` expression to suspend/resume their execution.
>
> A crucial difference between ``await coro`` and ``yield value`` is
> that the former expression guarantees that the ``coro`` will be
> executed fully, while the latter is producing ``value`` and
> suspending the generator until it gets iterated again.
>
> Generators, similarly to coroutines, have a ``gi_local_context``
> attribute, which is set to an empty Local Context when created.
>
> Contrary to coroutines though, ``yield from o`` expression in
> generators (that are not generator-based coroutines) is semantically
> equivalent to ``for v in o: yield v``, therefore the interpreter does
> not attempt to control their ``gi_local_context``.
>
>
> EC Semantics for Generators
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Every generator object has its own Local Context that stores
> only its own local modifications of the context.  When a generator
> is being iterated, its local context will be put in the EC stack
> of the current thread.  This means that the generator will be able
> to see access items from the surrounding context::
>
>     local = sys.new_context_item("local")
>     global = sys.new_context_item("global")
>
>     def generator():
>         local.set('inside gen:')
>         while True:
>             print(local.get(), global.get())
>             yield
>
>     g = gen()
>
>     local.set('hello')
>     global.set('spam')
>     next(g)
>
>     local.set('world')
>     global.set('ham')
>     next(g)
>
>     # Will print:
>     #   inside gen: spam
>     #   inside gen: ham
>
> Any changes to the EC in nested generators are invisible to the outer
> generator::
>
>     local = sys.new_context_item("local")
>
>     def inner_gen():
>         local.set('spam')
>         yield
>
>     def outer_gen():
>         local.set('ham')
>         yield from gen()
>         print(local.get())
>
>     list(outer_gen())
>
>     # Will print:
>     #   ham
>
>
> Running generators without LC
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Similarly to coroutines, generators with ``gi_local_context``
> set to ``None`` simply use the outer Local Context.
>
> The ``@contextlib.contextmanager`` decorator uses this mechanism to
> allow its generator to affect the EC::
>
>     item = sys.new_context_item('test')
>
>     @contextmanager
>     def context(x):
>         old = item.get()
>         item.set('x')
>         try:
>             yield
>         finally:
>             item.set(old)
>
>     with context('spam'):
>
>         with context('ham'):
>             print(1, item.get())
>
>         print(2, item.get())
>
>     # Will print:
>     #   1 ham
>     #   2 spam
>
>
> Implementing Generators with Iterators
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> The Execution Context API allows to fully replicate EC behaviour
> imposed on generators with a regular Python iterator class::
>
>     class Gen:
>
>         def __init__(self):
>             self.local_context = sys.new_local_context()
>
>         def __iter__(self):
>             return self
>
>         def __next__(self):
>             return sys.run_with_local_context(
>                 self.local_context, self._next_impl)
>
>         def _next_impl(self):
>             # Actual __next__ implementation.
>             ...
>
>
> Asynchronous Generators
> -----------------------
>
> Asynchronous Generators (AG) interact with the Execution Context
> similarly to regular generators.
>
> They have an ``ag_local_context`` attribute, which, similarly to
> regular generators, can be set to ``None`` to make them use the outer
> Local Context.  This is used by the new
> ``contextlib.asynccontextmanager`` decorator.
>
> The EC support of ``await`` expression is implemented using the same
> approach as in coroutines, see the `Coroutine Object Modifications`_
> section.
>
>
> Greenlets
> ---------
>
> Greenlet is an alternative implementation of cooperative
> scheduling for Python.  Although greenlet package is not part of
> CPython, popular frameworks like gevent rely on it, and it is
> important that greenlet can be modified to support execution
> contexts.
>
> In a nutshell, greenlet design is very similar to design of
> generators.  The main difference is that for generators, the stack
> is managed by the Python interpreter.  Greenlet works outside of the
> Python interpreter, and manually saves some ``PyThreadState``
> fields and pushes/pops the C-stack.  Thus the ``greenlet`` package
> can be easily updated to use the new low-level `C API`_ to enable
> full support of EC.
>
>
> New APIs
> ========
>
> Python
> ------
>
> Python APIs were designed to completely hide the internal
> implementation details, but at the same time provide enough control
> over EC and LC to re-implement all of Python built-in objects
> in pure Python.
>
> 1. ``sys.new_context_item(description='...')``: create a
>    ``ContextItem`` object used to access/set values in EC.
>
> 2. ``ContextItem``:
>
>    * ``.description``: read-only attribute.
>    * ``.get()``: return the current value for the item.
>    * ``.set(o)``: set the current value in the EC for the item.
>
> 3. ``sys.get_execution_context()``: return the current
>    ``ExecutionContext``.
>
> 4. ``sys.new_execution_context()``: create a new empty
>    ``ExecutionContext``.
>
> 5. ``sys.new_local_context()``: create a new empty ``LocalContext``.
>
> 6. ``sys.run_with_execution_context(ec: ExecutionContext,
>    func, *args, **kwargs)``.
>
> 7. ``sys.run_with_local_context(lc:LocalContext,
>    func, *args, **kwargs)``.
>
>
> C API
> -----
>
> 1. ``PyContextItem * PyContext_NewItem(char *desc)``: create a
>    ``PyContextItem`` object.
>
> 2. ``PyObject * PyContext_GetItem(PyContextItem *)``: get the
>    current value for the context item.
>
> 3. ``int PyContext_SetItem(PyContextItem *, PyObject *)``: set
>    the current value for the context item.
>
> 4. ``PyLocalContext * PyLocalContext_New()``: create a new empty
>    ``PyLocalContext``.
>
> 5. ``PyLocalContext * PyExecutionContext_New()``: create a new empty
>    ``PyExecutionContext``.
>
> 6. ``PyExecutionContext * PyExecutionContext_Get()``: get the
>    EC for the active thread state.
>
> 7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the
>    passed EC object as the current for the active thread state.
>
> 8. ``int PyExecutionContext_SetWithLocalContext(PyExecutionContext *,
>    PyLocalContext *)``: allows to implement
>    ``sys.run_with_local_context`` Python API.
>
>
> Implementation Strategy
> =======================
>
> LocalContext is a Weak Key Mapping
> ----------------------------------
>
> Using a weak key mapping for ``LocalContext`` implementation
> enables the following properties with regards to garbage
> collection:
>
> * ``ContextItem`` objects are strongly-referenced only from the
>   application code, not from any of the Execution Context
>   machinery or values they point to.  This means that there
>   are no reference cycles that could extend their lifespan
>   longer than necessary, or prevent their garbage collection.
>
> * Values put in the Execution Context are guaranteed to be kept
>   alive while there is a ``ContextItem`` key referencing them in
>   the thread.
>
> * If a ``ContextItem`` is garbage collected, all of its values will
>   be removed from all contexts, allowing them to be GCed if needed.
>
> * If a thread has ended its execution, its thread state will be
>   cleaned up along with its ``ExecutionContext``, cleaning
>   up all values bound to all Context Items in the thread.
>
>
> ContextItem.get() Cache
> -----------------------
>
> We can add three new fields to ``PyThreadState`` and
> ``PyInterpreterState`` structs:
>
> * ``uint64_t PyThreadState->unique_id``: a globally unique
>   thread state identifier (we can add a counter to
>   ``PyInterpreterState`` and increment it when a new thread state is
>   created.)
>
> * ``uint64_t PyInterpreterState->context_item_deallocs``: every time
>   a ``ContextItem`` is GCed, all Execution Contexts in all threads
>   will lose track of it.  ``context_item_deallocs`` will simply
>   count all ``ContextItem`` deallocations.
>
> * ``uint64_t PyThreadState->execution_context_ver``: every time
>   a new item is set, or an existing item is updated, or the stack
>   of execution contexts is changed in the thread, we increment this
>   counter.
>
> The above two fields allow implementing a fast cache path in
> ``ContextItem.get()``, in pseudo-code::
>
>     class ContextItem:
>
>         def get(self):
>             tstate = PyThreadState_Get()
>
>             if (self.last_tstate_id == tstate.unique_id and
>                 self.last_ver == tstate.execution_context_ver
>                 self.last_deallocs ==
>                     tstate.iterp.context_item_deallocs):
>                 return self.last_value
>
>             value = None
>             for mapping in reversed(tstate.execution_context):
>                 if self in mapping:
>                     value = mapping[self]
>                     break
>
>             self.last_value = value
>             self.last_tstate_id = tstate.unique_id
>             self.last_ver = tstate.execution_context_ver
>             self.last_deallocs = tstate.interp.context_item_deallocs
>
>             return value
>
> This is similar to the trick that decimal C implementation uses
> for caching the current decimal context, and will have the same
> performance characteristics, but available to all
> Execution Context users.
>
>
> Approach #1: Use a dict for LocalContext
> ----------------------------------------
>
> The straightforward way of implementing the proposed EC
> mechanisms is to create a ``WeakKeyDict`` on top of Python
> ``dict`` type.
>
> To implement the ``ExecutionContext`` type we can use Python
> ``list`` (or a custom stack implementation with some
> pre-allocation optimizations).
>
> This approach will have the following runtime complexity:
>
> * O(M) for ``ContextItem.get()``, where ``M`` is the number of
>   Local Contexts in the stack.
>
>   It is important to note that ``ContextItem.get()`` will implement
>   a cache making the operation O(1) for packages like ``decimal``
>   and ``numpy``.
>
> * O(1) for ``ContextItem.set()``.
>
> * O(N) for ``sys.get_execution_context()``, where ``N`` is the
>   total number of items in the current **execution** context.
>
>
> Approach #2: Use HAMT for LocalContext
> --------------------------------------
>
> Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)
> to implement high performance immutable collections [5]_, [6]_.
>
> Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)
> performance for both ``set()``, ``get()``, and ``merge()`` operations,
> which is essentially O(1) for relatively small mappings
> (read about HAMT performance in CPython in the
> `Appendix: HAMT Performance`_ section.)
>
> In this approach we use the same design of the ``ExecutionContext``
> as in Approach #1, but we will use HAMT backed weak key Local Context
> implementation.  With that we will have the following runtime
> complexity:
>
> * O(M * log\ :sub:`32`\ N) for ``ContextItem.get()``,
>   where ``M`` is the number of Local Contexts in the stack,
>   and ``N`` is the number of items in the EC.  The operation will
>   essentially be O(M), because execution contexts are normally not
>   expected to have more than a few dozen of items.
>
>   (``ContextItem.get()`` will have the same caching mechanism as in
>   Approach #1.)
>
> * O(log\ :sub:`32`\ N) for ``ContextItem.set()`` where ``N`` is the
>   number of items in the current **local** context.  This will
>   essentially be an O(1) operation most of the time.
>
> * O(log\ :sub:`32`\ N) for ``sys.get_execution_context()``, where
>   ``N`` is the total number of items in the current **execution**
>   context.
>
> Essentially, using HAMT for Local Contexts instead of Python dicts,
> allows to bring down the complexity of ``sys.get_execution_context()``
> from O(N) to O(log\ :sub:`32`\ N) because of the more efficient
> merge algorithm.
>
>
> Approach #3: Use HAMT and Immutable Linked List
> -----------------------------------------------
>
> We can make an alternative ``ExecutionContext`` design by using
> a linked list.  Each ``LocalContext`` in the ``ExecutionContext``
> object will be wrapped in a linked-list node.
>
> ``LocalContext`` objects will use an HAMT backed weak key
> implementation described in the Approach #2.
>
> Every modification to the current ``LocalContext`` will produce a
> new version of it, which will be wrapped in a **new linked list
> node**.  Essentially this means, that ``ExecutionContext`` is an
> immutable forest of ``LocalContext`` objects, and can be safely
> copied by reference in ``sys.get_execution_context()`` (eliminating
> the expensive "merge" operation.)
>
> With this approach, ``sys.get_execution_context()`` will be an
> **O(1) operation**.
>
>
> Summary
> -------
>
> We believe that approach #3 enables an efficient and complete
> Execution Context implementation, with excellent runtime performance.
>
> `ContextItem.get() Cache`_ enables fast retrieval of context items
> for performance critical libraries like decimal and numpy.
>
> Fast ``sys.get_execution_context()`` enables efficient management
> of execution contexts in asynchronous libraries like asyncio.
>
>
> Design Considerations
> =====================
>
> Can we fix ``PyThreadState_GetDict()``?
> ---------------------------------------
>
> ``PyThreadState_GetDict`` is a TLS, and some of its existing users
> might depend on it being just a TLS.  Changing its behaviour to follow
> the Execution Context semantics would break backwards compatibility.
>
>
> PEP 521
> -------
>
> :pep:`521` proposes an alternative solution to the problem:
> enhance Context Manager Protocol with two new methods: ``__suspend__``
> and ``__resume__``.  To make it compatible with async/await,
> the Asynchronous Context Manager Protocol will also need to be
> extended with ``__asuspend__`` and ``__aresume__``.
>
> This allows to implement context managers like decimal context and
> ``numpy.errstate`` for generators and coroutines.
>
> The following code::
>
>     class Context:
>
>         def __enter__(self):
>             self.old_x = get_execution_context_item('x')
>             set_execution_context_item('x', 'something')
>
>         def __exit__(self, *err):
>             set_execution_context_item('x', self.old_x)
>
> would become this::
>
>     local = threading.local()
>
>     class Context:
>
>         def __enter__(self):
>             self.old_x = getattr(local, 'x', None)
>             local.x = 'something'
>
>         def __suspend__(self):
>             local.x = self.old_x
>
>         def __resume__(self):
>             local.x = 'something'
>
>         def __exit__(self, *err):
>             local.x = self.old_x
>
> Besides complicating the protocol, the implementation will likely
> negatively impact performance of coroutines, generators, and any code
> that uses context managers, and will notably complicate the
> interpreter implementation.
>
> :pep:`521` also does not provide any mechanism to propagate state
> in a local context, like storing a request object in an HTTP request
> handler to have better logging.  Nor does it solve the leaking state
> problem for greenlet/gevent.
>
>
> Can Execution Context be implemented outside of CPython?
> --------------------------------------------------------
>
> Because async/await code needs an event loop to run it, an EC-like
> solution can be implemented in a limited way for coroutines.
>
> Generators, on the other hand, do not have an event loop or
> trampoline, making it impossible to intercept their ``yield`` points
> outside of the Python interpreter.
>
>
> Backwards Compatibility
> =======================
>
> This proposal preserves 100% backwards compatibility.
>
>
> Appendix: HAMT Performance
> ==========================
>
> To assess if HAMT can be used for Execution Context, we implemented
> it in CPython [7]_.
>
> .. figure:: pep-0550-hamt_vs_dict.png
>    :align: center
>    :width: 100%
>
>    Figure 1.  Benchmark code can be found here: [9]_.
>
> Figure 1 shows that HAMT indeed displays O(1) performance for all
> benchmarked dictionary sizes.  For dictionaries with less than 100
> items, HAMT is a bit slower than Python dict/shallow copy.
>
> .. figure:: pep-0550-lookup_hamt.png
>    :align: center
>    :width: 100%
>
>    Figure 2.  Benchmark code can be found here: [10]_.
>
> Figure 2 shows comparison of lookup costs between Python dict
> and an HAMT immutable mapping.  HAMT lookup time is 30-40% worse
> than Python dict lookups on average, which is a very good result,
> considering how well Python dicts are optimized.
>
> Note, that according to [8]_, HAMT design can be further improved.
>
>
> Acknowledgments
> ===============
>
> I thank Elvis Pranskevichus and Victor Petrovykh for countless
> discussions around the topic and PEP proof reading and edits.
>
> Thanks to Nathaniel Smith for proposing the ``ContextItem`` design
> [17]_ [18]_, for pushing the PEP towards a more complete design, and
> coming up with the idea of having a stack of contexts in the thread
> state.
>
> Thanks to Nick Coghlan for numerous suggestions and ideas on the
> mailing list, and for coming up with a case that cause the complete
> rewrite of the initial PEP version [19]_.
>
>
> References
> ==========
>
> .. [1] https://blog.golang.org/context
>
> .. [2] https://msdn.microsoft.com/en-us/library/system.threading.
> executioncontext.aspx
>
> .. [3] https://github.com/numpy/numpy/issues/9444
>
> .. [4] http://bugs.python.org/issue31179
>
> .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
>
> .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-
> persistenthashmap-part-ii.html
>
> .. [7] https://github.com/1st1/cpython/tree/hamt
>
> .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf
>
> .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd
>
> .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
>
> .. [11] https://github.com/1st1/cpython/tree/pep550
>
> .. [12] https://www.python.org/dev/peps/pep-0492/#async-await
>
> .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/
> bench/echoserver.py
>
> .. [14] https://github.com/MagicStack/pgbench
>
> .. [15] https://github.com/python/performance
>
> .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c
>
> .. [17] https://mail.python.org/pipermail/python-ideas/2017-
> August/046752.html
>
> .. [18] https://mail.python.org/pipermail/python-ideas/2017-
> August/046772.html
>
> .. [19] https://mail.python.org/pipermail/python-ideas/2017-
> August/046780.html
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] PEP 550 v2

Reply via email to