Larry Hastings <la...@hastings.org> added the comment:

Since nobody's said so in so many words (so far in this thread anyway): the 
prototype from Jeethu Rao in 2018 was a different technology than what Eric is 
doing.  The "Programs/_freeze_importlib.c" Eric's playing with essentially 
inlines a .pyc file as C static data.  The Jeethu Rao approach is more 
advanced: instead of serializing the objects, it stores the objects from the 
.pyc file as pre-initialized C static objects.  So it saves the un-marshalling 
step, and therefore should be faster.  To import the module you still need to 
execute the module body code object though--that seems unavoidable.

The python-dev thread covers nearly everything I remember about this.  The one 
thing I guess I never mentioned is that building and working with the prototype 
was frightful; it had both Python code and C code, and it was fragile and hard 
to get working.  My hunch at the time was that it shouldn't be so fragile; it 
should be possible to write the converter in Python: read in .pyc file, 
generate .c file.  It might have to make assumptions about the internal 
structure of the CPython objects it instantiates as C static data, but since 
we'd ship the tool with CPython this should be only a minor maintenance issue.

In experimenting with the prototype, I observed that simply calling stat() to 
ensure the frozen .py file hadn't changed on disk lost us about half the 
performance win from this approach.  I'm not much of a systems programmer, but 
I wonder if there are (system-proprietary?) library calls one could make to get 
the stat info for all files in a single directory all at once that might be 
faster overall.  (Of course, caching this information at startup might make for 
a crappy experience for people who edit Lib/*.py files while the interpreter is 
running.)

One more observation about the prototype: it doesn't know how to deal with any 
mutable types.  marshal.c can deal with list, dict, and set.  Does this matter? 
 ISTM the tree of objects under a code object will never have a reference to 
one of these mutable objects, so it's probably already fine.

Not sure what else I can tell you.  It gave us a measurable improvement in 
startup time, but it seemed fragile, and it was annoying to work with/on, so 
after hacking on it for a week (at the 2018 core dev sprint in Redmond WA) I 
put it aside and moved on to other projects.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45020>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to