On Wed, Apr 11, 2018 at 03:38:08AM +1000, Chris Angelico wrote: > On Wed, Apr 11, 2018 at 2:14 AM, Serhiy Storchaka <storch...@gmail.com> wrote: > > Currently pyc files contain data that is useful mostly for developing and is > > not needed in most normal cases in stable program. There is even an option > > that allows to exclude a part of this information from pyc files. It is > > expected that this saves memory, startup time, and disk space (or the time > > of loading from network). I propose to move this data from pyc files into > > separate file or files. pyc files should contain only external references to > > external files. If the corresponding external file is absent or specific > > option suppresses them, references are replaced with None or NULL at import > > time, otherwise they are loaded from external files. > > > > 1. Docstrings. They are needed mainly for developing. > > > > 2. Line numbers (lnotab). They are helpful for formatting tracebacks, for > > tracing, and debugging with the debugger. Sources are helpful in such cases > > too. If the program doesn't contain errors ;-) and is sipped without > > sources, they could be removed. > > > > 3. Annotations. They are used mainly by third party tools that statically > > analyze sources. They are rarely used at runtime. > > > > Docstrings will be read from the corresponding docstring file unless -OO is > > supplied. This will allow also to localize docstrings. Depending on locale > > or other settings different docstring file can be used. > > > > For suppressing line numbers and annotations new options can be added. > > A deployed Python distribution generally has .pyc files for all of the > standard library. I don't think people want to lose the ability to > call help(), and unless I'm misunderstanding, that requires > docstrings. So this will mean twice as many files and twice as many > file-open calls to import from the standard library. What will be the > impact on startup time?
I shouldn't think that the number of files on disk is very important, now that they're hidden away in the __pycache__ directory where they can be ignored by humans. Even venerable old FAT32 has a limit of 65,534 files in a single folder, and 268,435,437 on the entire volume. So unless the std lib expands to 16000+ modules, the number of files in the __pycache__ directory ought to be well below that limit. I think even MicroPython ought to be okay with that. (But it would be nice to find out for sure: does it support file systems with *really* tiny limits?) The entire __pycache__ directory is supposed to be a black box except under unusual circumstances, so it doesn't matter (at least not to me) if we have: __pycache__/spam.cpython-38.pyc alone or: __pycache__/spam.cpython-38.pyc __pycache__/spam.cpython-38-doc.pyc __pycache__/spam.cpython-38-lno.pyc __pycache__/spam.cpython-38-ann.pyc (say). And if the external references are loaded lazily, on need, rather than eagerly, this could save startup time, which I think is the intention. The doc strings would be still available, just not loaded until the first time you try to use them. However, Python supports byte-code only distribution, using .pyc files external to the __pycache__. In that case, it would be annoying and inconvenient to distribute four top-level files, so I think that the use of external references has to be optional, and there has to be a way to either compile to a single .pyc file containing all four parts, or an external tool that can take the existing four files and merge them. -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/