Re: Python 3.7's Deterministic pycs

2019-07-15 Thread Victor Stinner

Hi,



I saw many changes related to pyc last week, so I had a look. I don't 
understand well these issues. Here are my notes to try to understand the 
context ;-) I don't request any change, I'm fine with the latest choices 
made in Fedora.


--

There are different issues:

(1) Performance regression on importing .py files
(2) Getting reproducible .rpm binaries
(3) .pyc files are not fully reproducible

In the reverse order:

(3) should be fixed in Python, I reported the bug upstream:
https://bugs.python.org/issue37596

(2) is a work-in-progress, Fedora builds are not reproducible yet.

(1) is the main question here.

--

First of all, the rationale for the .pyc file change in Python 3.7 is 
described in the PEP 552:


  https://www.python.org/dev/peps/pep-0552/#rationale
  "Reproducibility is important for security"

But I'm not sure how important is it if it's only done half-way? Fedora 
doesn't seem to support reproducible builds yet, even if recent changes 
show that it's moving on. Old documents about Fedora:


  https://fedoraproject.org/wiki/Reproducible_Builds
  https://github.com/kholia/ReproducibleBuilds

Debian is making good progress:

  https://wiki.debian.org/ReproducibleBuilds

OpenSUSE is also working on that:

   https://en.opensuse.org/openSUSE:Reproducible_Builds

--

On Python 3.7 and newer, if SOURCE_DATE_EPOCH is set, py_compile uses 
hash-based pyc file: pyc files don't contain a timepstamp, but a hash 
instead.


"The default is PycInvalidationMode.CHECKED_HASH if the 
SOURCE_DATE_EPOCH environment variable is set, otherwise the default is 
PycInvalidationMode.TIMESTAMP." says py_compile doc.


When importing a .py file, the content of the .py file is hashed and 
compared to the hash stored in the .pyc file.


For timestamp based .pyc, only the mtime attribute of the .py and .pyc 
files are compared: .pyc is only regenerated if its mtime is older than 
the .py file.


Note: Python 3.7 also has a --check-hash-based-pycs command line option, 
but it looks to be for specific use cases. (See also 
_imp.check_hash_based_pycs value.)


--

redhat-rpm-config was modified 18 days ago in Fedora to set 
SOURCE_DATE_EPOCH to the timestamp of the topmost changelog entry:

https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/57

OpenSUSE had issues with reproducible Python build and .pyc files:
https://bugzilla.opensuse.org/show_bug.cgi?id=1133809

For this reason, %clamp_mtime_to_source_date_epoch is still off by 
default (in Fedora and OpenSUSE).


--

SOURCE_DATE_EPOCH was disabled in the Python 3.7 package ("
%global source_date_epoch_from_changelog 0
"), because test_cmd_line_script, test_multiprocessing_main_handling and 
test_runpy fail if SOURCE_DATE_EPOCH is set. These tests have been fixed 
in Python 3.8.


--

glib2 sets PYTHONHASHSEED=0 environment variable to workaround one of 
the remaining bug for reproducible .pyc files: frozenset are not written 
in a deterministic order in .pyc files:

https://bugzilla.redhat.com/show_bug.cgi?id=1686078

This issue should be fixed in Python, I reported the bug upstream:
https://bugs.python.org/issue37596

--

If SOURCE_DATE_EPOCH is set when building Python, pyc uses the hash, no 
timestamp. Some people consider that it's a performance regression. So 
Python was modified to force the usage of timestamp when Python is 
built: when RPM_BUILD_ROOT env var is set.


https://github.com/fedora-python/cpython/pull/3/files

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org


Re: Python 3.7's Deterministic pycs

2019-07-10 Thread Miro Hrončok

On 01. 02. 18 16:25, Neal Gompa wrote:

On Thu, Feb 1, 2018 at 10:21 AM, Nick Coghlan  wrote:

On 1 February 2018 at 23:54, Petr Viktorin  wrote:

Honestly, I'm not sure we want to use this in Fedora. Is anyone here into
reproducible builds, to make a better argument for this?


I believe rpmbuild (et al) all set SOURCE_DATE_EPOCH in the
environment, so Fedora's likely to get the new CHECKED_HASH behaviour
by default: 
https://docs.python.org/dev/library/py_compile.html#py_compile.compile

Given that SELinux typically won't allow user applications to rewrite
the bytecode anyway, we may want to specify the use of UNCHECKED_HASH
at build time instead - with that setting, Python will ignore source
file changes entirely, and trust that RPM will keep the source and pyc
files consistent.



We have not set this to be on in Fedora. It's still switched off by
default. To the best of my knowledge, the only distribution doing it
so far is openSUSE.


This is now set in Fedora:

https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/57

Now all Python pyc files (except python3 itself) are in CHECKED_HASH mode.
We need to figure this out.

--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org


Re: Python 3.7's Deterministic pycs

2018-02-01 Thread Petr Viktorin

On 02/01/2018 04:21 PM, Nick Coghlan wrote:

On 1 February 2018 at 23:54, Petr Viktorin  wrote:

Honestly, I'm not sure we want to use this in Fedora. Is anyone here into
reproducible builds, to make a better argument for this?


I believe rpmbuild (et al) all set SOURCE_DATE_EPOCH in the
environment, so Fedora's likely to get the new CHECKED_HASH behaviour
by default: 
https://docs.python.org/dev/library/py_compile.html#py_compile.compile


Wait. These docs say "invalidation_mode will be forced to 
PycInvalidationMode.CHECKED_HASH", which sounds quite scary. Is it 
possible to use UNCHECKED_HASH with SOURCE_DATE_EPOCH?


(I don't think we use SOURCE_DATE_EPOCH now, but we might in the future.)


Given that SELinux typically won't allow user applications to rewrite
the bytecode anyway, we may want to specify the use of UNCHECKED_HASH
at build time instead - with that setting, Python will ignore source
file changes entirely, and trust that RPM will keep the source and pyc
files consistent.


And it lets us... avoid a stat call per import? I still fail to see the 
advantage :(




--
Petr Viktorin
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org


Re: Python 3.7's Deterministic pycs

2018-02-01 Thread Neal Gompa
On Thu, Feb 1, 2018 at 10:21 AM, Nick Coghlan  wrote:
> On 1 February 2018 at 23:54, Petr Viktorin  wrote:
>> Honestly, I'm not sure we want to use this in Fedora. Is anyone here into
>> reproducible builds, to make a better argument for this?
>
> I believe rpmbuild (et al) all set SOURCE_DATE_EPOCH in the
> environment, so Fedora's likely to get the new CHECKED_HASH behaviour
> by default: 
> https://docs.python.org/dev/library/py_compile.html#py_compile.compile
>
> Given that SELinux typically won't allow user applications to rewrite
> the bytecode anyway, we may want to specify the use of UNCHECKED_HASH
> at build time instead - with that setting, Python will ignore source
> file changes entirely, and trust that RPM will keep the source and pyc
> files consistent.
>

We have not set this to be on in Fedora. It's still switched off by
default. To the best of my knowledge, the only distribution doing it
so far is openSUSE.

-- 
真実はいつも一つ!/ Always, there's only one truth!
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org


Re: Python 3.7's Deterministic pycs

2018-02-01 Thread Nick Coghlan
On 1 February 2018 at 23:54, Petr Viktorin  wrote:
> Honestly, I'm not sure we want to use this in Fedora. Is anyone here into
> reproducible builds, to make a better argument for this?

I believe rpmbuild (et al) all set SOURCE_DATE_EPOCH in the
environment, so Fedora's likely to get the new CHECKED_HASH behaviour
by default: 
https://docs.python.org/dev/library/py_compile.html#py_compile.compile

Given that SELinux typically won't allow user applications to rewrite
the bytecode anyway, we may want to specify the use of UNCHECKED_HASH
at build time instead - with that setting, Python will ignore source
file changes entirely, and trust that RPM will keep the source and pyc
files consistent.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org