Bug#1068483: Bug#882511: dpkg-buildpackage: should allow caller to force inclusion of source in buildinfo

2024-04-11 Thread Guillem Jover
Hi!

On Wed, 2024-04-10 at 15:22:45 -0700, Vagrant Cascadian wrote:
> On 2024-04-09, Guillem Jover wrote:
> > I've now finished the change I had in that branch, which implements
> > support so that dpkg-buildpackage can be passed a .dsc or a source-dir,
> > and in the former will first extract it, and for both then it will
> > change directory to the source tree. If it got passed a .dsc then it
> > will instruct dpkg-genbuildinfo to include a ref to it.
> >
> > Which I think accomplishes the requested behavior in a safe way? I've
> > attached what I've got, which I'm planning on merging for 1.22.7. I'll
> > probably split that into two commits though before merging.
> 
> Had a chance to take this for a test run, and it appears to work, though
> with a few surprises...

Ah, thanks for the testing, that was very helpful! :)

>   dpkg-buildpackage -- hello_2.10-3.dsc
> 
> Ends up regenerating the .dsc, as --build=any,all,source by default
> ... which may end up with a different .dsc checksum in the .buildinfo
> than .dsc that was passed on the commandline. Which makes some sense,
> but maybe would be better to error out? I would not expect to regenerate
> the .dsc if you're passing dpkg-buildpackage a .dsc!

Hmm, right I think I had documented that locally in the manual page,
but I can see how this can be surprising. I've for now switched the
code to not regenerate the .dsc when that is being passed, but the
problem is that I think the three options are potentially correct:

  * regen: If you built the source on a stable/unstable system, then
you'd want to regenerate it on the target one (for unstable or a
backport or stable update), otherwise we might get compatibility
issues or missed updates. It is also what is being requested when
calling dpkg-buildpackage (as in "please build source and
binaries" :).
  * no-regen: If we rebuild then we might end up with inconsistent
sources if these are tracked in different places, and if you pass
it the sources then it seems logical to expect them not to be
regenerated.
  * error: This is the safe option of "both options are correct, let's
do none :D", of deferring the interface behavior.

Even though I changed it to no-regen for now, I'm thinking, though,
that the regen behavior is the more correct one.

>   dpkg-buildpackage --build=any,all -- /path/to/hello_2.10-3.dsc
> 
> Fails to find the .dsc file, as it appears to extract the sources to
> hello-2.10 and then expects to find ../hello_2.10-3.dsc

Ah, right, this is expected to be a filename not a pathname. (Placing
the source elsewhere is not currently feasible, see #657401; I mean I
guess dpkg-buildpackage could copy the source but…).

I've now added a check, although I'll be reworking it a bit before
merging, because it will emit confusing output if you specify
«./filename.dsc» as not being in the current directory. :)

> All that said ... this seemed to work for me:
> 
>   dpkg-buildpackage --build=any,all -- hello_2.10-3.dsc
> 
> So yay, progress! Thanks!

Great, thanks!

> All of the above cases do not clean up the hello-2.10 extracted from the
> .dsc file, so re-running any of the above need to manually clean that or
> run from a clean directory or experience various failure modes with the
> existing hellp-2.10 directory.

I've also added an explicit check, and dpkg-buildpackage now will error
out if the directory already exists. I don't think removing a
pre-existing directory would be safe (at least w/o an explicit option
to do so). But perhaps, as you hinted, removing the source tree (for a
successful build) after finishing would indeed be an option, hmm.

> So a few little glitches, but overall this seems close to something we
> have really wanted for reproducible builds! And just for good measure,
> thanks!

I force-pushed the reworked code into:

  
https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/log/?h=pu/dpkg-buildpackage-dsc

Thanks,
Guillem



Bug#1068483: Bug#882511: dpkg-buildpackage: should allow caller to force inclusion of source in buildinfo

2024-04-10 Thread Vagrant Cascadian
On 2024-04-09, Guillem Jover wrote:
> I've now finished the change I had in that branch, which implements
> support so that dpkg-buildpackage can be passed a .dsc or a source-dir,
> and in the former will first extract it, and for both then it will
> change directory to the source tree. If it got passed a .dsc then it
> will instruct dpkg-genbuildinfo to include a ref to it.
>
> Which I think accomplishes the requested behavior in a safe way? I've
> attached what I've got, which I'm planning on merging for 1.22.7. I'll
> probably split that into two commits though before merging.

Had a chance to take this for a test run, and it appears to work, though
with a few surprises...

  dpkg-buildpackage -- hello_2.10-3.dsc

Ends up regenerating the .dsc, as --build=any,all,source by default
... which may end up with a different .dsc checksum in the .buildinfo
than .dsc that was passed on the commandline. Which makes some sense,
but maybe would be better to error out? I would not expect to regenerate
the .dsc if you're passing dpkg-buildpackage a .dsc!


  dpkg-buildpackage --build=any,all -- /path/to/hello_2.10-3.dsc

Fails to find the .dsc file, as it appears to extract the sources to
hello-2.10 and then expects to find ../hello_2.10-3.dsc


All that said ... this seemed to work for me:

  dpkg-buildpackage --build=any,all -- hello_2.10-3.dsc

So yay, progress! Thanks!


All of the above cases do not clean up the hello-2.10 extracted from the
.dsc file, so re-running any of the above need to manually clean that or
run from a clean directory or experience various failure modes with the
existing hellp-2.10 directory.


So a few little glitches, but overall this seems close to something we
have really wanted for reproducible builds! And just for good measure,
thanks!


live well,
  vagrant


signature.asc
Description: PGP signature


Bug#1068483: Bug#882511: dpkg-buildpackage: should allow caller to force inclusion of source in buildinfo

2024-04-08 Thread Guillem Jover
Control: forcemerge 882511 1068483

Hi!

After replying to Adrian's report, I recalled there being a previous one
that was similar, and then recalled that I had an even older branch that
implemented a potential solution for this. See below.

On Thu, 2017-11-23 at 16:23:29 +0100, Ximin Luo wrote:
> Package: dpkg-dev
> Version: 1.19.0.4
> Severity: wishlist
> Tags: patch

> dpkg-buildpackage currently does not automatically list the source .dsc nor
> its hash in the call to dpkg-genbuildinfo when doing a binary-only build. This
> is understandable because in a binary-only build, dpkg-buildpackage does not
> have any concept of a source package and therefore does not know (and cannot
> verify) if the working tree was actually generated from any .dsc or not.
> 
> However, the caller knows this information, and it is useful for reproducible
> builds to track exactly which (i.e. hash-wise) source code generates which
> binary packages. So it should be possible for the caller to tell
> dpkg-buildpackage, "yes please do include the .dsc hash in the buildinfo, I am
> telling you it is correct, you can assume this safely".
> 
> Tools like sbuild/pbuilder could then do this, as well as users or rebuilders.
> 
> The attached patch implements this in the simplest way possible. It allows the
> caller to run something like:
> 
>   $ dpkg-buildpackage --no-sign -b --buildinfo-option=--build=full
> 
> The resulting $pkg_$ver_$arch.buildinfo then contains the .dsc and its hash.
> 
> However this requires the caller to know which option to pass, which would 
> either be
> 
>   --buildinfo-option=--build=full
>   --buildinfo-option=--build=any,source
>   --buildinfo-option=--build=all,source
> 
> depending on whether the original build request (to dpkg-buildpackage) was a 
> -b, -B, or -A.
> 
> For this reason, it may be better (more usable) to add a 
> --force-source-in-buildinfo
> flag (or similar name) and when this is switched on, do this instead:
> 
> -push @buildinfo_opts, "--build=$build_types" if 
> build_has_none(BUILD_DEFAULT);
> +push @buildinfo_opts, "--build=$build_types,source" if 
> build_has_none(BUILD_DEFAULT);
> 
> Let me know if you like this idea and I'll be happy to implement that instead 
> of
> the attached patch.

The problem with this solution is that it is prone do accidental use,
as it is very easy for a user to unknowingly have recreated the sources
from a locally extracted tree (be that modified or not).

On Sat, 2024-04-06 at 02:57:40 +0200, Guillem Jover wrote:
> On Sat, 2024-04-06 at 02:56:02 +0300, Adrian Bunk wrote:
> > Package: dpkg-dev
> > Version: 1.22.6
> > Severity: normal
> > X-Debbugs-Cc: reproducible-bui...@lists.alioth.debian.org
> 
> > A thought I already wrote in a recent debian-devel discussion:
> > 
> > In theory source package filenames should be eternally and globally
> > unique, but in practice there are cornercases where this assumption
> > might break like for example:
> > - *stable-security does not currently have a copy of the sources
> >   in the main archive, one always have to upload the source archive
> >   there and this might accidentally be a different orig.tar
> > - dak does not keep an eternal history of everything it ever knew,
> >   e.g. RM and later re-NEW of a source version might have a different
> >   source .orig.tar or even different sources for a Debian revision
> > - Debian and Ubuntu might have different orig.tar for the same version,
> >   if Ubuntu updated a package before Debian did, or with packages
> >   were development is completely independent in Debian and Ubuntu
> >   (e.g. OpenStack, KDE)
> > 
> > The reason for different files might be as trivial as "git archive"
> > not always producing the same output when running in different
> > environments, e.g. the autogenerated tarball for a git tag on Github
> > might have different checksums depending on whether it is downloaded
> > today or next year despite identical contents due to slightly
> > different gzip compression.
> > 
> > Should buildinfo files contain the hashes of the source package,
> > to clearly define what sources have been used?
> 
> Ideally? Yes, and I think we considered that at the time when we
> introduced the .buildinfo files. Although a ref to the .dsc does get
> included if the build is also creating the source package.
> 
> The problem is that when dpkg-buildpackage is not building the source
> package, there is no guarantee the source package is going to be
> present, or that if it is present it matches what is currently being
> built from the working directory.

I've now finished the change I had in that branch, which implements
support so that dpkg-buildpackage can be passed a .dsc or a source-dir,
and in the former will first extract it, and for both then it will
change directory to the source tree. If it got passed a .dsc then it
will instruct dpkg-genbuildinfo to include a ref to it.

Which I think accomplishes the requested behavior in a safe way? I've
attached