Paul Ganssle <p.gans...@gmail.com> added the comment:

I think for now skipping the tests when lzma is missing is the easiest thing, 
though another option would be to drop the compression on the input test data 
so that the tests don't depend on lzma.

Taking a look at the data files, it looks like we get around 50% compression 
using either lzma or gzip, but the uncompressed file is only 32k to start with:

    $ du -b tests/data/*
    31054   tests/data/zoneinfo_data.json
    15127   tests/data/zoneinfo_data.json.gz
    12895   tests/data/zoneinfo_data.json.lz

We're also currently using the "fat" binaries that `zic` produces (which 
includes hard-coded transitions all the way until 2038). The new default for 
`zic` is to produce "slim" binaries, and the script to update test data does 
nothing to explicitly request fat binaries. If we were to switch over to "slim" 
binaries, the result would be more like this:

    $ du -b tests/data/*
    8297    tests/data/zoneinfo_data_slim.json.gz
    7750    tests/data/zoneinfo_data_slim.json.lz
    15551   tests/data/zoneinfo_data_unc_slim.json

So we're still looking at ~2:1 compression for both gzip and lzma, but the 
overall file size is 50% of what it was to start with. The biggest downside to 
this is that the way the "slim" binaries work is that once a rule repeats 
indefinitely, `zic` stops producing explicit transitions for it, and falls back 
to a simple repeating rule, meaning that the current set of tests would take a 
different code path.

I think we can go with the following course of action (3 or 4 different PRs):

1. Start by skipping the tests when `lzma` is missing.
2. Update the test suite so that it is testing more or less the same thing when 
the binaries are compiled with `-b slim`.
3. Change `Lib/test/test_zoneinfo/data/update_test_data.py` so that it pulls 
the raw data from the `tzdata` module on PyPI (which is compiled with `-b 
slim`) instead of the user's machine.
4. Change `update_test_data.py` to stop using `lzma` and change the tests so 
that they are able to process the new format of the JSON files.

If we ever decide that we really want the compression again, I assume that 
`gzip` is found more commonly than `lzma` among systems that don't build the 
whole standard library, so it might be mildly preferable to switch to `gzip`.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41371>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to