Dino Viehland <dinoviehl...@gmail.com> added the comment:

The 20MB of savings is actually the amount of byte code that exists in the IG 
code base.  I was just measuring the web site code, and not the other various 
Python code in the process (e.g. no std lib code, no 3rd party libraries, 
etc...).  The IG code base is pretty monolithic and starting up the site 
requires about half of the code to get imported.  So I think the 20MB per 
process is a pretty realistic number.

I've also created a C extension and the object implementing the buffer protocol 
looks like:

typedef struct {
    PyObject_HEAD
    const char* data;
    size_t size;
    Py_ssize_t hash;
    CIceBreaker *breaker;
    size_t exports;
    PyObject* code_obj; /* borrowed reference, the code object keeps us alive */
} CIceBreakerCode;

All of the modules are currently getting compiled into a single memory mapped 
file and then these objects get created which implement the buffer protocol for 
each function.  So the overhead it just takes a byte code w/ 16 opcodes before 
it breaks even, so it is significantly lighter weight than using a memoryview 
object.

It's certainly true that the byte code isn't the #1 source of memory here (the 
code objects themselves are pretty big), but in the serialized state it ends up 
representing 25% of the serialized data.  I would expect when you add in ref 
counts and typing information it's not quite as good, but reducing the overhead 
of code by 20% is still a pretty nice win.

I can't make any promises about open sourcing the import system, but I can 
certainly look into that as well.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36839>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to