[issue22789] Compress the marshalled data in PYC files

2021-10-25 Thread Barry A. Warsaw


Change by Barry A. Warsaw :


--
nosy: +barry

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2021-10-23 Thread Filipe Laíns

Change by Filipe Laíns :


--
nosy: +FFY00

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2021-10-21 Thread Guido van Rossum


Guido van Rossum  added the comment:

The space savings are nice, but I doubt that it will matter for startup time -- 
startup is most relevant in situations where it's *hot* (e.g. a shell script 
that repeatedly calls out to utilities written in Python).

--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2020-03-18 Thread Brett Cannon


Change by Brett Cannon :


--
nosy:  -brett.cannon

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-08 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Compressing pyc files one by one wouldn't save much space because disk space is 
allocated by blocks (up to 32 KiB on FAT32). If the size of pyc file is less 
than block size, we will not gain anything. ZIP file has advantage due more 
compact packing of files. In additional it can has less access time due to less 
fragmentation. Unfortunately it doesn't support the LZ4 compression, but we can 
store LZ4 compressed files in ZIP file without additional compression.

Uncompressed TAR file has same advantages but needs longer initialization time 
(for building the index).

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-08 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 08.11.2014 10:28, Serhiy Storchaka wrote:
 Compressing pyc files one by one wouldn't save much space because disk space 
 is allocated by blocks (up to 32 KiB on FAT32). If the size of pyc file is 
 less than block size, we will not gain anything. ZIP file has advantage due 
 more compact packing of files. In additional it can has less access time due 
 to less fragmentation. Unfortunately it doesn't support the LZ4 compression, 
 but we can store LZ4 compressed files in ZIP file without additional 
 compression.
 
 Uncompressed TAR file has same advantages but needs longer initialization 
 time (for building the index).

The aim is to reduce file load time, not really to save disk space.
By having less data to read from the disk, it may be possible
to achieve a small startup speedup.

However, you're right in that using a single archive with many PYC files
would be more efficient, since it lowers the number of stat() calls.
The trick to store LZ4 compressed data in a ZIP file would enable this.

BTW: We could add optional LZ4 compression to the marshal format to
make all this work transparently and without having to change the
import mechanism itself:

We'd just need to add a new flag or type code indicating that the rest
of the stream is LZ4 compressed. The PYC writer could then enable this
flag or type code per default (or perhaps enabled via some env var od
command line flag) and everything would then just work with both
LZ4 compressed byte code as well as non-compressed byte code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-07 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 there is really no reason why they should take more space on disk
 than necessary, so it's a sure win in any case.

That is a nice summary.

 FWIW, LZ4HC compression sounds like an obvious choice for
 write-once-read-many data like .pyc files to me.

+1

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-06 Thread Antoine Pitrou

Antoine Pitrou added the comment:

FWIW, I personally doubt this would actually reduce startup time. Disk I/O cost 
is in the first access, not in the transfer size (unless we're talking hundreds 
of megabytes). But in any case, someone interested has to do measurements :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-06 Thread Stefan Behnel

Stefan Behnel added the comment:

FWIW, LZ4HC compression sounds like an obvious choice for write-once-read-many 
data like .pyc files to me. Blosc shows that you can achieve a pretty major 
performance improvement just by stuffing more data into less space (although it 
does it for RAM and CPU cache, not disk). And even if it ends up not being 
substantially faster for the specific case of .pyc files, there is really no 
reason why they should take more space on disk than necessary, so it's a sure 
win in any case.

--
nosy: +scoder

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

This is similar to the idea of loading the stdlib from a zip file (but less 
intrusive and more debugging-friendly). The time savings will depend on whether 
the filesystem cache is cold or hot. In the latter case, my intuition is that 
decompression will slow things down a bit :-)

Quick decompression benchmark on a popular stdlib module, and a fast CPU:

$ ./python -m timeit -s import zlib; data = 
zlib.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read()) 
zlib.decompress(data)
1 loops, best of 3: 180 usec per loop

--
nosy: +tim.peters

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-04 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 04.11.2014 10:41, Antoine Pitrou wrote:
 
 Antoine Pitrou added the comment:
 
 This is similar to the idea of loading the stdlib from a zip file (but less 
 intrusive and more debugging-friendly). The time savings will depend on 
 whether the filesystem cache is cold or hot. In the latter case, my intuition 
 is that decompression will slow things down a bit :-)
 
 Quick decompression benchmark on a popular stdlib module, and a fast CPU:
 
 $ ./python -m timeit -s import zlib; data = 
 zlib.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read()) 
 zlib.decompress(data)
 1 loops, best of 3: 180 usec per loop

zlib is rather slow when it comes to decompression. Something like
snappy or lz4 could work out, though:

https://code.google.com/p/snappy/
https://code.google.com/p/lz4/

Those were designed to be fast on decompression.

--
nosy: +lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Ok, comparison between zlib/snappy/lz4:

$ python3.4 -m timeit -s import zlib; data = 
zlib.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read()); 
print(len(data)) zlib.decompress(data)
1 loops, best of 3: 181 usec per loop

$ python3.4 -m timeit -s import snappy; data = 
snappy.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read()); 
print(len(data)) snappy.decompress(data)
1 loops, best of 3: 35 usec per loop

$ python3.4 -m timeit -s import lz4; data = 
lz4.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read()); 
print(len(data)) lz4.decompress(data)
1 loops, best of 3: 21.3 usec per loop

Compressed sizes for threading.cpython-35.pyc (the file used above):
- zlib: 14009 bytes
- snappy: 20573 bytes
- lz4: 21038 bytes
- uncompressed: 38973 bytes

Packages used:
https://pypi.python.org/pypi/lz4/0.7.0
https://pypi.python.org/pypi/python-snappy/0.5

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

lz4 also has a high compression mode which improves the compression ratio (- 
17091 bytes compressed), for a similar decompression speed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-04 Thread Georg Brandl

Georg Brandl added the comment:

Both lz4 and snappy are BSD-licensed, but snappy is written in C++.

--
nosy: +georg.brandl

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-04 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-04 Thread Brett Cannon

Brett Cannon added the comment:

Just FYI, there can easily be added into importlib since it works through 
marshal's API to unmarshal the module's data. There is also two startup 
benchmarks in the benchmark suite to help measure possible performance 
gains/losses which should also ferret out if cache warmth will play a 
significant role in the performance impact.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-04 Thread Christian Heimes

Changes by Christian Heimes li...@cheimes.de:


--
nosy: +christian.heimes

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-03 Thread Raymond Hettinger

New submission from Raymond Hettinger:

Save space and reduce I/O time (reading and writing) by compressing the 
marshaled code in  files.

In my code tree for Python 3, there was a nice space savings 19M to 7M.  Here's 
some of the output from my test:

8792 - 4629 ./Tools/scripts/__pycache__/reindent.cpython-35.pyc
1660 - 1063 ./Tools/scripts/__pycache__/rgrep.cpython-35.pyc
1995 - 1129 ./Tools/scripts/__pycache__/run_tests.cpython-35.pyc
1439 -  973 ./Tools/scripts/__pycache__/serve.cpython-35.pyc
 727 -  498 ./Tools/scripts/__pycache__/suff.cpython-35.pyc
3240 - 1808 ./Tools/scripts/__pycache__/svneol.cpython-35.pyc
   74866 -23611 ./Tools/scripts/__pycache__/texi2html.cpython-35.pyc
5562 - 2870 ./Tools/scripts/__pycache__/treesync.cpython-35.pyc
1492 -  970 ./Tools/scripts/__pycache__/untabify.cpython-35.pyc
1414 -  891 ./Tools/scripts/__pycache__/which.cpython-35.pyc
19627963 -  6976410 Total

I haven't measured it yet, but I believe this will improve Python's start-up 
time (because fewer bytes get transferred from disk).

--
files: compress_pyc.py
messages: 230576
nosy: rhettinger
priority: normal
severity: normal
stage: needs patch
status: open
title: Compress the marshalled data in PYC files
type: enhancement
versions: Python 3.5
Added file: http://bugs.python.org/file37125/compress_pyc.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-03 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Looking into this further, I suspect that the cleanest way to implement this 
would be to add a marshal version 4 that compresses and decompresses using zlib.

--
components: +Interpreter Core
nosy: +brett.cannon, pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-03 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Looking into this further, I suspect that the cleanest way to implement this 
would be to add a zlib compression and decompression using to the marshal.c 
(bumping the version number to 5).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22789] Compress the marshalled data in PYC files

2014-11-03 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
Removed message: http://bugs.python.org/msg230580

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22789
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com