Re: reproducible builds and python

2014-09-18 Thread Toshio Kuratomi
On Mon, Aug 11, 2014 at 08:01:23PM +0200, bmorbach wrote:
> Hi everyone!
> 
> I've been doing some work towards reproducible builds in Fedora (mostly
> with various upstreams so far) and one of the elephants in the Room are
> obviously Pythons .pyc and .pyo files.
> 
> As those contain the mtime of the original .py file, they might be
> different for each rebuild of an srpm.
> For many rpms this isn't a problem, because the files are not modified
> and thus retain their timestamp from the archive. Quite a few rpms do
> modify to .py files though and because of that, every build has a
> different result.
> 
> I would like to propose to set the mtime of all .py files to a fixed
> (for this specific srpm) time. This could be done
> in /usr/lib/rpm/brp-python-bytecompile before doing the actual
> byte-compilation. This would result in the same .py{c,o} files being
> created for each rebuild.
> 
> The timestamp could be e.g. the mtime of the oldest file in the
> buildroot (which would assume that not _all_ of the files are modified)
> But if you are interested in the idea, I'd certainly be open to
> suggestions.
> 
> To address the obvious question:
> Why not special-case those files when comparing rpms?
> 
> It will certainly be impossible to achieve this for all packages in
> Fedora, so for some files this might indeed be needed, but I think we
> should avoid this where possible. The idea of reproducible builds
> becomes meaningless if the amount of differences that you just ignore
> gets to big.
> 
> 
> What do you think of this proposal?
> 
I'm not in love with this as it's throwing out information.  OTOH, the
information that's being thrown out may not be all that useful.  It would
help if we knew what the goal was here (besides simply being 100%
reproducible for the sake of being reproducible) as we could then think
about what we stand to gain for losing the real timestamp information.

One idea I just had -- only modify the timestamp of files whose mtime is
more recent than when the tarballs were unpacked.  This would leave files
that were simply untarred with the real upstream timestamps and any files
which would have had their timestamps modified anyway will get the new
timestamp.

One actual (though small) problem I see with lying about timestamps is that
your going to need to select a fixed time in the past that this timestamp
would be.  Looking at a directory of files and seeing that one file is from
about the time the package was built while the others are all older can be
an indication that the distro has changed the file from what is upstream
which can then lead to comparing against vanilla upstream to determine if
we're applying a patch that causes a bug.  If we have to apply a timestamp
in the past, that indication goes away.  (Small help because I only remember
doing this once in all my years of looking at files from packages :-)

-Toshio


pgp5FNwlVovUM.pgp
Description: PGP signature
___
python-devel mailing list
python-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/python-devel

Re: reproducible builds and python

2014-08-21 Thread Bohuslav Kabrda
- Original Message -
> Hi everyone!
> 
> I've been doing some work towards reproducible builds in Fedora (mostly
> with various upstreams so far) and one of the elephants in the Room are
> obviously Pythons .pyc and .pyo files.
> 
> As those contain the mtime of the original .py file, they might be
> different for each rebuild of an srpm.
> For many rpms this isn't a problem, because the files are not modified
> and thus retain their timestamp from the archive. Quite a few rpms do
> modify to .py files though and because of that, every build has a
> different result.
> 
> I would like to propose to set the mtime of all .py files to a fixed
> (for this specific srpm) time. This could be done
> in /usr/lib/rpm/brp-python-bytecompile before doing the actual
> byte-compilation. This would result in the same .py{c,o} files being
> created for each rebuild.
> 
> The timestamp could be e.g. the mtime of the oldest file in the
> buildroot (which would assume that not _all_ of the files are modified)
> But if you are interested in the idea, I'd certainly be open to
> suggestions.

Generally, I like this idea, but I have some concerns:
- So the bytecompile script would "touch" all *.py files? It seems a bit hacky, 
not mentioning that in some specfiles (notably python3 itself) we actually have 
to do bytecompilation by hand for certain reasons.
- Obviously another question is what happens when _all_ files are modified. I 
can pretty much guarantee you that at any given time there will be at least one 
package in Fedora that will have all files modified (e.g. python-six has just 
one py file, so if we patch/touch it in some way, the problem is here). I'd 
like to see a proposal that handles this situation in a sane way.

Having {read about,experimented with} reproducible builds before, I can see the 
advantage that Fedora would get from this. Perhaps you could sum up the actual 
benefits of reproducible builds here so that even those who have never heard of 
this can see why this is worthwile?

Just thinking aloud here, but this should also be beneficial for RPMs generated 
with "setup.py bdist_rpm", right? As in "two RPMs generated by bdist_rpm from 
the same git/hg revision on the same architecture would have the same hash" - 
or am I wrong here?

Thanks,
Slavek

> To address the obvious question:
> Why not special-case those files when comparing rpms?
> 
> It will certainly be impossible to achieve this for all packages in
> Fedora, so for some files this might indeed be needed, but I think we
> should avoid this where possible. The idea of reproducible builds
> becomes meaningless if the amount of differences that you just ignore
> gets to big.
> 
> 
> What do you think of this proposal?
> 
> Greetings,
> Benedikt
> 
> ___
> python-devel mailing list
> python-devel@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/python-devel

-- 
Regards,
Slavek Kabrda
___
python-devel mailing list
python-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/python-devel

Re: reproducible builds and python

2014-08-11 Thread Nick Coghlan
On 08/12/2014 04:01 AM, bmorbach wrote:
> Hi everyone!
> 
> I've been doing some work towards reproducible builds in Fedora (mostly
> with various upstreams so far) and one of the elephants in the Room are
> obviously Pythons .pyc and .pyo files.
> 
> As those contain the mtime of the original .py file, they might be
> different for each rebuild of an srpm.
> For many rpms this isn't a problem, because the files are not modified
> and thus retain their timestamp from the archive. Quite a few rpms do
> modify to .py files though and because of that, every build has a
> different result.
> 
> I would like to propose to set the mtime of all .py files to a fixed
> (for this specific srpm) time. This could be done
> in /usr/lib/rpm/brp-python-bytecompile before doing the actual
> byte-compilation. This would result in the same .py{c,o} files being
> created for each rebuild.

This sounds like a reasonable approach to me, especially if the mtime
for the SRPM is derived from dist-git.

Regards,
Nick.

-- 
Nick Coghlan
Red Hat Hosted & Shared Services
Software Engineering & Development, Brisbane

HSS Provisioning Architect
___
python-devel mailing list
python-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/python-devel