Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-03-05 Thread Panu Matilainen
In case folks didn't notice the PR from @mlschroe : 
https://github.com/rpm-software-management/rpm/pull/2944

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8676851
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-03-01 Thread ニール・ゴンパ
I don't think it's a good idea to offer. I am not convinced these knobs are a 
good idea for RPM to expose for any reason, especially reproducibility.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8643933
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-03-01 Thread ニール・ゴンパ
I am aware of some tools that use `RPMTAG_BUILDTIME` to sort packages in 
various situations, especially if they have the same NVRA (ie. rebuilds). It is 
also useful in diagnostic purposes when trying to figure out a factor of 
breakage.

I would rather not falsify this tag.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8643922
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-03-01 Thread Zbigniew Jędrzejewski-Szmek
Yes, I think both are worthwhile. But they must be opt-in. 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8643884
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-03-01 Thread Michael Schroeder
I think this all has drifted away from the initial proposal. The goal was to be 
able to improve reproducibility of a given rpm by:
- adding a way to specify the buildtime
- adding an option to clamp the file mtimes to the buildtime

Disregarding the implementation details, do you all think this is worthwhile to 
have?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8643827
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-03-01 Thread Bernhard M. Wiedemann
I did not mean to alter signing time - but keep it as it is (it is dropped by 
delsign anyway), while changing "Build Date" instead to something that does not 
vary in (changeless) rebuilds.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8641508
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-03-01 Thread Zbigniew Jędrzejewski-Szmek
I think the signature must give the real date of when the signature was 
actually made. Setting a fake date would be very very icky, undermining the 
trust in the signing process and the holders of the signing key used in such a 
manner. At the more technical level, keys have a creation time, e.g. for Fedora 
the keys are created a few months in advance of the release 
(RPM-GPG-KEY-fedora-rawhide-x86_64 has Public key creation time - Tue Jan 24 
22:22:52 CET 2023). This means that those keys cannot be used to create valid 
signatures for older packages, but at various points there certainly are 
packages that haven't been touched and have a SOURCE_DATE_EPOCH older than they 
key creation date. Also, at least in Fedora, packages are resigned with a newer 
signature for a new release. (E.g. a .f39 or .f40 package, when downloaded from 
the F41/rawhide repository, is not rebuilt, but is resigned with the F41 key.)  
So we *need* a signing time that is separate from BUILDTIME.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8641433
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-03-01 Thread Bernhard M. Wiedemann
When I normalize BUILDTIME with `%use_source_date_epoch_as_buildtime`, the 
signature still gives the real date. Is there a value in keeping both? e.g. 
[this 
package](https://build.opensuse.org/package/show/home:bmwiedemann:reproducible/strip-nondeterminism)
 `rpm -ql` has
```
Signature   : RSA/SHA256, 2024-02-26T12:00:49 UTC, Key ID 8adc26dbb49c2121
Source RPM  : strip-nondeterminism-1.13.1-33.9.src.rpm
Build Date  : 2023-07-28T16:19:49 UTC
```
Not overriding BUILDHOST is fine as it still allows easy verification.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8640953
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-29 Thread ニール・ゴンパ
I've been bitten enough times personally that I would rather not have BUILDHOST 
and BUILDTIME set to fake values.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8630113
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-29 Thread Zbigniew Jędrzejewski-Szmek
I don't think that a custom "rpmhash" tool is the problem. We have to "trust" 
the tools anyway… A tool that deletes signatures is as much an opaque binary as 
the tool that calculates some hash.

I think it would a reasonable compromise to say that the hypothetical "rpmhash" 
tool must give a result that is identical to delsign+sha256sum. The problem is 
to agree on what exactly is stripped and/or skipped in the hash.

FWIW, I've been going through Fedora rebuilds over the last few days, and there 
is clear value in having BUILDHOST set to a non-fake value. For example in 
https://bugzilla.redhat.com/show_bug.cgi?id=2266767#c4, if it was very helpful 
in diagnosing an arch-specific issue in a noarch package.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8630015
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-29 Thread Bernhard M. Wiedemann
I'm always thinking about rebuild+compare as one operation.
In the Debian and Archlinux space there were also discussions about centralized 
collections of multiple rebuilder-results. Those are signed data containing 
"$rebuildername built $package $version and got output $hash".
That would work poorly with fuzzy-matching. It could work with a custom rpmhash 
tool, but how do you prove that it indeed covers all relevant bits? I don't 
like that and would rather see us reach bit-reproducible rpms (after delsign) 
that work with generic `sha256sum`.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8629486
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-28 Thread Zbigniew Jędrzejewski-Szmek
If we could drop OPTFLAGS, that'd be great.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8623707
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-28 Thread Zbigniew Jędrzejewski-Szmek
"Implementation detail". The important part is to get the payload and 
significant metadata to be identical. Once we have that, we can do 
optimizations to handle comparisons efficiently. One option is to strip fields 
and hash that. Another option, for example, would be to define a hash method 
where some fields are masked (simply skipped when hashing). In fact, I think 
that this second option is more efficient, because you only need to read the 
original archive once and don't even need to write a dummy rpm.

> needing build outputs in addition to build inputs is still needing more

We're only talking about using build outputs for the comparison. We don't need 
them for the rebuild itself.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8623344
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-28 Thread Bernhard M. Wiedemann
Yes, but needing build outputs in addition to build inputs is still needing 
more.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8619915
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-28 Thread ニール・ゴンパ
You already need all the inputs to correctly reproduce packages in openSUSE. 
The build system doesn't capture this, but it's still required.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8618519
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-28 Thread Bernhard M. Wiedemann
keszybz wrote:
> any party can recreate copies of the artifacts that are identical except for 
> the signatures and parts of metadata

I don't think it is a good idea to exclude metadata. One benefit that you can 
only get with bit-identical reproducibility is that you can list the one and 
only correct hash value of the build result. (that also works with signed rpms 
+ delsign).
However with weaker variants, you always need another full rpm to compare to. 
I.e. for our 16k packages, instead of publishing a list of 16k hashes you then 
need to keep the full archive (100GB) to allow people to reproduce any package.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8618492
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-28 Thread Panu Matilainen
Oh BTW, just a quick side-remark on this:
> OPTFLAGS and PLATFORM are often different because a "random" noarch package 
> is selected

OPTFLAGS shouldn't be even defined on noarch builds, much less included in the 
header. The former is hard to fix for various hysterical reasons, but the 
latter should be easy.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8618013
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-28 Thread ニール・ゴンパ
It's also important to keep in mind the context of Debian style 
reproducibility: their archive format is a tarball with ar archives inside. 
That makes things very different for them than us.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8617949
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-28 Thread Panu Matilainen
> I saw "reproducability" mentioned a few times. I assume it's not a typo, but 
> I have no idea how it's supposed to be different from "reproducibility".

Eh. All my life I've been talking about reproducers, and reproducable bugs. And 
now builds. :flushed: 
That misspelling is going to be hard to unlearn, but thanks for setting me 
straight there.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8617916
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] Reproducible builds improvements (Discussion #2934)

2024-02-28 Thread Zbigniew Jędrzejewski-Szmek
> Wait, what? If those differ then the packages do differ, so its not actually 
> bit-per-bit identical. Which is what _I've_ assumed reproducability to mean. 
> This just goes to point out how completely different expectations people 
> have. No wonder having a meaningful discussion about reproducable packages 
> always seems so hard 

I wrote a long piece about this 
[here](https://discussion.fedoraproject.org/t/report-from-the-reproducible-builds-hackfest-during-flock-2023/87469).

> Over the last years I just used `rpm --delsign` to compare with my 
> replication builds and was able to get bit-identical results

Whether we skip some fields when doing a comparison, or take an rpm and strip 
those fields, and then do the comparison, is just an implementation detail. In 
practice, users get rpms that are signed. Thus, the format that the users are 
interested in checking is by definition the signed rpm.

(The other end is interesting too. We generally talk about reproducibility in 
the sense of starting from srpms. This view originates in the Debian world 
where the source deb is the only common denominator. Packagers do not have to 
use git, they do not even have to use a vcs, and people do 
non-version-controlled binNMUs. Thus, when talking about the whole distro, 
starting from source debs is the only option. When working with rpms, at the 
technical level, getting the part from srpm until the binary rpm reproducible 
is challenging, so it makes sense for us to work on this part in the beginning. 
But what we actually want in the end is reproducibility of the **full 
pipeline**, i.e. starting from dist-git. I assume that adding the additional 
step where we generate the srpm from dist-git will be easy. And in dist-git, we 
want to have the upstream pristine tarballs, including a signature. In the end, 
ideally the user would be able to verify that the signed upstream tarball + a 
specific commit with our spec file leads to the rpms that they download from 
the mirror, reproducibly.)

> Having a written definition of what "reproducability" means would help 
> driving towards that goal. 

I saw "reproducability" mentioned a few times. I assume it's not a typo, but I 
have no idea how it's supposed to be different from "reproducibility".

Please see the link above for my definition of "reproducibility".

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2934#discussioncomment-8617435
You are receiving this because you are subscribed to this thread.

Message ID: 
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint