Paul Ganssle <[email protected]> added the comment:
I think for now skipping the tests when lzma is missing is the easiest thing,
though another option would be to drop the compression on the input test data
so that the tests don't depend on lzma.
Taking a look at the data files, it looks like we get around 50% compression
using either lzma or gzip, but the uncompressed file is only 32k to start with:
$ du -b tests/data/*
31054 tests/data/zoneinfo_data.json
15127 tests/data/zoneinfo_data.json.gz
12895 tests/data/zoneinfo_data.json.lz
We're also currently using the "fat" binaries that `zic` produces (which
includes hard-coded transitions all the way until 2038). The new default for
`zic` is to produce "slim" binaries, and the script to update test data does
nothing to explicitly request fat binaries. If we were to switch over to "slim"
binaries, the result would be more like this:
$ du -b tests/data/*
8297 tests/data/zoneinfo_data_slim.json.gz
7750 tests/data/zoneinfo_data_slim.json.lz
15551 tests/data/zoneinfo_data_unc_slim.json
So we're still looking at ~2:1 compression for both gzip and lzma, but the
overall file size is 50% of what it was to start with. The biggest downside to
this is that the way the "slim" binaries work is that once a rule repeats
indefinitely, `zic` stops producing explicit transitions for it, and falls back
to a simple repeating rule, meaning that the current set of tests would take a
different code path.
I think we can go with the following course of action (3 or 4 different PRs):
1. Start by skipping the tests when `lzma` is missing.
2. Update the test suite so that it is testing more or less the same thing when
the binaries are compiled with `-b slim`.
3. Change `Lib/test/test_zoneinfo/data/update_test_data.py` so that it pulls
the raw data from the `tzdata` module on PyPI (which is compiled with `-b
slim`) instead of the user's machine.
4. Change `update_test_data.py` to stop using `lzma` and change the tests so
that they are able to process the new format of the JSON files.
If we ever decide that we really want the compression again, I assume that
`gzip` is found more commonly than `lzma` among systems that don't build the
whole standard library, so it might be mildly preferable to switch to `gzip`.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com