Re: Python 3.7's Deterministic pycs
Hi, I saw many changes related to pyc last week, so I had a look. I don't understand well these issues. Here are my notes to try to understand the context ;-) I don't request any change, I'm fine with the latest choices made in Fedora. -- There are different issues: (1) Performance regression on importing .py files (2) Getting reproducible .rpm binaries (3) .pyc files are not fully reproducible In the reverse order: (3) should be fixed in Python, I reported the bug upstream: https://bugs.python.org/issue37596 (2) is a work-in-progress, Fedora builds are not reproducible yet. (1) is the main question here. -- First of all, the rationale for the .pyc file change in Python 3.7 is described in the PEP 552: https://www.python.org/dev/peps/pep-0552/#rationale "Reproducibility is important for security" But I'm not sure how important is it if it's only done half-way? Fedora doesn't seem to support reproducible builds yet, even if recent changes show that it's moving on. Old documents about Fedora: https://fedoraproject.org/wiki/Reproducible_Builds https://github.com/kholia/ReproducibleBuilds Debian is making good progress: https://wiki.debian.org/ReproducibleBuilds OpenSUSE is also working on that: https://en.opensuse.org/openSUSE:Reproducible_Builds -- On Python 3.7 and newer, if SOURCE_DATE_EPOCH is set, py_compile uses hash-based pyc file: pyc files don't contain a timepstamp, but a hash instead. "The default is PycInvalidationMode.CHECKED_HASH if the SOURCE_DATE_EPOCH environment variable is set, otherwise the default is PycInvalidationMode.TIMESTAMP." says py_compile doc. When importing a .py file, the content of the .py file is hashed and compared to the hash stored in the .pyc file. For timestamp based .pyc, only the mtime attribute of the .py and .pyc files are compared: .pyc is only regenerated if its mtime is older than the .py file. Note: Python 3.7 also has a --check-hash-based-pycs command line option, but it looks to be for specific use cases. (See also _imp.check_hash_based_pycs value.) -- redhat-rpm-config was modified 18 days ago in Fedora to set SOURCE_DATE_EPOCH to the timestamp of the topmost changelog entry: https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/57 OpenSUSE had issues with reproducible Python build and .pyc files: https://bugzilla.opensuse.org/show_bug.cgi?id=1133809 For this reason, %clamp_mtime_to_source_date_epoch is still off by default (in Fedora and OpenSUSE). -- SOURCE_DATE_EPOCH was disabled in the Python 3.7 package (" %global source_date_epoch_from_changelog 0 "), because test_cmd_line_script, test_multiprocessing_main_handling and test_runpy fail if SOURCE_DATE_EPOCH is set. These tests have been fixed in Python 3.8. -- glib2 sets PYTHONHASHSEED=0 environment variable to workaround one of the remaining bug for reproducible .pyc files: frozenset are not written in a deterministic order in .pyc files: https://bugzilla.redhat.com/show_bug.cgi?id=1686078 This issue should be fixed in Python, I reported the bug upstream: https://bugs.python.org/issue37596 -- If SOURCE_DATE_EPOCH is set when building Python, pyc uses the hash, no timestamp. Some people consider that it's a performance regression. So Python was modified to force the usage of timestamp when Python is built: when RPM_BUILD_ROOT env var is set. https://github.com/fedora-python/cpython/pull/3/files Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ python-devel mailing list -- python-devel@lists.fedoraproject.org To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org
Re: Python 3.7's Deterministic pycs
On 01. 02. 18 16:25, Neal Gompa wrote: On Thu, Feb 1, 2018 at 10:21 AM, Nick Coghlan wrote: On 1 February 2018 at 23:54, Petr Viktorin wrote: Honestly, I'm not sure we want to use this in Fedora. Is anyone here into reproducible builds, to make a better argument for this? I believe rpmbuild (et al) all set SOURCE_DATE_EPOCH in the environment, so Fedora's likely to get the new CHECKED_HASH behaviour by default: https://docs.python.org/dev/library/py_compile.html#py_compile.compile Given that SELinux typically won't allow user applications to rewrite the bytecode anyway, we may want to specify the use of UNCHECKED_HASH at build time instead - with that setting, Python will ignore source file changes entirely, and trust that RPM will keep the source and pyc files consistent. We have not set this to be on in Fedora. It's still switched off by default. To the best of my knowledge, the only distribution doing it so far is openSUSE. This is now set in Fedora: https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/57 Now all Python pyc files (except python3 itself) are in CHECKED_HASH mode. We need to figure this out. -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok ___ python-devel mailing list -- python-devel@lists.fedoraproject.org To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org
Re: Python 3.7's Deterministic pycs
On 02/01/2018 04:21 PM, Nick Coghlan wrote: On 1 February 2018 at 23:54, Petr Viktorin wrote: Honestly, I'm not sure we want to use this in Fedora. Is anyone here into reproducible builds, to make a better argument for this? I believe rpmbuild (et al) all set SOURCE_DATE_EPOCH in the environment, so Fedora's likely to get the new CHECKED_HASH behaviour by default: https://docs.python.org/dev/library/py_compile.html#py_compile.compile Wait. These docs say "invalidation_mode will be forced to PycInvalidationMode.CHECKED_HASH", which sounds quite scary. Is it possible to use UNCHECKED_HASH with SOURCE_DATE_EPOCH? (I don't think we use SOURCE_DATE_EPOCH now, but we might in the future.) Given that SELinux typically won't allow user applications to rewrite the bytecode anyway, we may want to specify the use of UNCHECKED_HASH at build time instead - with that setting, Python will ignore source file changes entirely, and trust that RPM will keep the source and pyc files consistent. And it lets us... avoid a stat call per import? I still fail to see the advantage :( -- Petr Viktorin ___ python-devel mailing list -- python-devel@lists.fedoraproject.org To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Re: Python 3.7's Deterministic pycs
On Thu, Feb 1, 2018 at 10:21 AM, Nick Coghlan wrote: > On 1 February 2018 at 23:54, Petr Viktorin wrote: >> Honestly, I'm not sure we want to use this in Fedora. Is anyone here into >> reproducible builds, to make a better argument for this? > > I believe rpmbuild (et al) all set SOURCE_DATE_EPOCH in the > environment, so Fedora's likely to get the new CHECKED_HASH behaviour > by default: > https://docs.python.org/dev/library/py_compile.html#py_compile.compile > > Given that SELinux typically won't allow user applications to rewrite > the bytecode anyway, we may want to specify the use of UNCHECKED_HASH > at build time instead - with that setting, Python will ignore source > file changes entirely, and trust that RPM will keep the source and pyc > files consistent. > We have not set this to be on in Fedora. It's still switched off by default. To the best of my knowledge, the only distribution doing it so far is openSUSE. -- 真実はいつも一つ!/ Always, there's only one truth! ___ python-devel mailing list -- python-devel@lists.fedoraproject.org To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Re: Python 3.7's Deterministic pycs
On 1 February 2018 at 23:54, Petr Viktorin wrote: > Honestly, I'm not sure we want to use this in Fedora. Is anyone here into > reproducible builds, to make a better argument for this? I believe rpmbuild (et al) all set SOURCE_DATE_EPOCH in the environment, so Fedora's likely to get the new CHECKED_HASH behaviour by default: https://docs.python.org/dev/library/py_compile.html#py_compile.compile Given that SELinux typically won't allow user applications to rewrite the bytecode anyway, we may want to specify the use of UNCHECKED_HASH at build time instead - with that setting, Python will ignore source file changes entirely, and trust that RPM will keep the source and pyc files consistent. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ python-devel mailing list -- python-devel@lists.fedoraproject.org To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org