[Python-ideas] Re: Permanent code objects (less memory, quicker load, less Unix Copy On Write)

Antoine Pitrou Thu, 18 Jun 2020 11:52:23 -0700


Hello,


I think you forgot the all-important parts:

1) How does it work technically?
2) What performance gain on which benchmark?

Regards

Antoine.


On Thu, 18 Jun 2020 10:36:11 +0100
Jonathan Fine <jfine2...@gmail.com> wrote:
> Hi All
> 
> Summary: Shared objects in Unix are a major influence. This proposal can be
> seen as a first step towards packaging pure Python modules as Unix shared
> objects.
> 
> First, there's a high level overview. Then some technical stuff in
> Appendices.
> 
> An object is transient if it can be garbage collected. An object is
> permanent if it will never be garbage collected. Every interpreted Python
> function has a code object (that contains instructions for the
> interpreter). Many of these code objects persist to the end of the program,
> and are used for little else than providing interpreter instructions.
> 
> We show that extending Python, to provide and take advantage of permanent
> code objects, will bring some benefits. The cost is expected to be quite
> small.
> 
> When a Python function is called, the interpreter increases the refcount of
> its code object. At the end of the function's execution, the interpreter
> decreases the refcount. (An example below shows this.)
> 
> If Python were extended to take advantage of permanent code objects, then
> for example popular code objects could be loaded into memory in this way.
> This can reduce memory usage (by sharing immutable resources) and reduce
> startup time.
> 
> In addition, a Unix forked process would have less need to do copy-on-write
> (see below). This is related to packaging pure Python modules as Unix
> shared objects.
> 
> The core of implementing this change would be to provide if ... else ...
> branching, around the interpreter source code that changes the refcount of
> a code object. The interpreter itself will of course want direct access to
> the permanent code object. There is no harm in that.
> 
> The cost is that unprivileged access to fn.__code__ will be slower, due to
> an additional indirection. However, as such commands are rarely executed in
> ordinary programs, the cost is expected to be small.
> 
> It might be helpful, after checking the analysis and before coding, to do
> some simple timing tests and calculations to estimate the performance
> benefits and costs of making such a change. These would of course depend on
> the use case.
> 
> I hope this helps.
> 
> Jonathan
> 
> APPENDICES
> ===========
> 
> SOME IMPLEMENTATION DETAILS AND COMMENTS
> Because fn.__code__ must not return a permanent object, some sort of opaque
> proxy would be required. Because Python programs rarely inspect
> fn.__code__, in practice the cost of this additional indirection is likely
> to be small.
> 
> As things are, the time spent changing the refcount of fn.__code__ is
> probably insignificant. The benefit is that permanent code objects are made
> immutable, and so can be stored safely in read-only memory (that can be
> shared across all processes and users). Code objects are special, in that
> they are only rarely looked at directly. Their main purpose is to be used
> by the interpreter.
> 
> Python allows the user to replace fn.__code__ by a different code object.
> This is a rarely done dirty trick. The transient / permanent nature of
> fn.__code__ could be stored as a hidden field on the fn object. This would
> reduce the cost of the if ... else ... branching, as it amounts to caching
> the transient / permanent nature of fn.__code__.
> 
> FORK AND COPY ON WRITE
> On Unix, the fork system call causes a process to make a child of itself.
> The parent and child share memory. To avoid confusion and errors, when
> either asks the system to write to shared memory, the system ensures that
> both parent and child have their own copy (of the page of memory that is
> being written to). This is an expensive operation.
> See: https://en.wikipedia.org/wiki/Copy-on-write
> 
> INTERPRETER SESSION
> 
>     >>> from sys import getrefcount as grc  
> 
>     # Identical functions with different code objects.
>     >>> def f1(obj): return grc(obj)
>     >>> def f2(obj): return grc(obj)
>     >>> f1.__code__ is f2.__code__  
>     False
> 
>     # Initial values.
>     >>> grc(f1.__code__), grc(f2.__code__)  
>     (2, 2)
> 
>     # Calling f1 increases the refcount of f1.__code__.
>     >>> f1(f1), f1(f2), f2(f1), f2(f2)  
>     (6, 4, 4, 6)
> 
>     # If fn is a generator function, then x = fn() will increase the
>     # refcount of fn.__code__.
>     >>> def f1(): yield True
>     >>> grc(f1.__code__)  
>     2
> 
>     # Let's create and store 10 generators.
>     >>> iterables = [f1() for i in range(10)]
>     >>> grc(f1.__code__)  
>     22
> 
>     # Let's get one item from each.
>     >>> [next(i) for i in iterables]  
>     [True, True, True, True, True, True, True, True, True, True]
>     >>> grc(f1.__code__)  
>     22
> 
>     # Let's exhaust all the iterables. This reduces the refcount.
>     >>> [next(i, False) for i in iterables]  
>     [False, False, False, False, False, False, False, False, False, False]
>     >>> grc(f1.__code__)  
>     12
> 
>     # Nearly done. Now let go of the iterables.
>     >>> del iterables
>     >>> grc(f1.__code__)  
>     2
> 


_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VG3RQAGDP4J3ZNCFTWIQPUDTGBFNLSKP/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Permanent code objects (less memory, quicker load, less Unix Copy On Write)

Reply via email to