Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-02-02 Thread Jérémy Bobbio
Hi!

We were discussions the restrictions on buildinfo identifiers:

fweb_1.62-12+b2_brahms-20120530T114812Z.buildinfo
^^^
  this part

The proposal was “the string should consist only of alphanumeric
characters and hyphens”. Guillem made the following comment while
reviewing the patches for dpkg:

Guillem Jover:
> Can we just simply use the package name rules instead? It also avoids
> potential problems with case and similar. (There's a
> pkg_name_is_illegal function in Dpkg::Package already.)

After reaching out to Ansgar with a patch for dak to implement the
above, he replied:

Ansgar Burchardt:
> The allowed sets for package names and the suffix of buildinfo
> filenames won't be the same even with that change.  However currently
> the suffix of buildinfo filenames matches what is allowed for .changes
> files.
> 
> I'm not sure why allowing capital letters used in the suffix of
> buildinfo files should be an issue? After all we allow capital letters
> for both version numbers (part of the filename) and in names of changes
> files.
> 
> (In the other direction not everything allowed as a package name can be
> used as the suffix of .changes and .buildinfo files either.)

Guillem, any further comments? Do you have any strong opposition to the
initial proposal?

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-30 Thread Jérémy Bobbio
Guillem Jover:
> Lunar:
> > I think the proposed patch is missing a field to record some environment
> > variables that can affect the build process. Right now, I'm thinking of
> > DEB_BUILD_OPTIONS and DEB_flag_{SET,STRIP,APPEND,PREPEND} (from
> > dpkg-buildflags). Maybe others? Build profiles?
> >
> > Initially I was against recording such information, but now that we
> > understand the purpose of .buildinfo files better and not mandate that
> > they be reproducible themselves, it doesn't matter if one contains
> > `DEB_BUILD_OPTIONS=parallel=4` and the other
> > `DEB_BUILD_OPTIONS=parallel=16`. It is the responsibility of users
> > trying to recreate a given package to filter this out.
> 
> Hmm right, makes sense, but I see this also to be a bit problematic.
> There are many variables that do affect the build which we'd need to
> record. Including all of the them seems like another privacy
> concerning issue. Whitelisting, we might end up missing some but it's
> privacy-safe; blacklisting we might end-up including sensitive ones,
> but not miss any build-related ones, which is privacy-unsafe.
> 
> Some things that come to mind that do affect the build in a significant
> way: CC, LD_LIBRARY_PATH, DEB*, DPKG_*, PATH, MAKEFLAGS.
> 
> The traditional build flags (i.e. CFLAGS, LDFLAGS, etc) might also affect
> the build depending on the rules file.

I was thinking of a whitelist approach and to start with use cases we
can already think of, adding more variables to record if we identify
them as missing in the future.

How about naming the field “Environment-Variables”?

I will also add another vendor hook for the list of variables.

> Build profiles are already recorded in the binary packages, but having
> that in the .buildinfo file seems right as it makes it easier to
> reproduce the build environment.

Should it be a new field or would recording the DEB_BUILD_PROFILES
variable be enough?

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-29 Thread Jérémy Bobbio
Guillem Jover:
> > One of the main change is that `.buildinfo` should now be named with an
> > arbitrary identifier. By default this defaults to $HOSTNAME-$TIMESTAMP
> > but can be set to an arbitrary value by the `--buildinfo-identifier`
> > command line flag.
> 
> Hmmm, leaking the hostname seems slightly privacy-concerning? If the
> information therein is not relevant I'd rather use something like an
> UUID (although that would require increasing the pseudo-build-essential
> set), or just hashing the hostname-timestamp with something like md5
> or sha1 or similar.

My hunch is that having a timestamp visible in the file name might help
recognizing files quickly after a bunch of builds, especially to
identify the last one. So I'd rather keep it.

If privacy is the goal, hashing just the hostname would not be help
much, as any precomputed table would work.

How about $TIMESTAMP-$EIGHT_FIRST_CHARS_OF_BUILDINFO_MD5?

(I'm picking md5 because it's already used in dpkg-gensymbols.)

> Can we just simply use the package name rules instead? It also avoids
> potential problems with case and similar. (There's a
> pkg_name_is_illegal function in Dpkg::Package already.)

Sounds reasonable. I've updated the wiki page and prepared a patch for
dak.

> > +} else {
> > +warning(_g('no .dsc file, skipping .buildinfo generation'));
> > +}
> >  }
>
> ISTR mentioning this before, but I don't see why generating the
> buildinfo file is tied to existing a source package at all? The source
> should be included if we are including a source in the upload, that's
> it.

The whole puprose of the reproducible builds effort is to provide a
verifiable path from sources to binaries. Signed .buildinfo files are
certifications that the listed binary packages have been built using the
described source and environment.

Only listing the source in .buildinfo when a source is included in the
upload would prevent us to have multiple builders certify the same
binaries. That would cut us from providing multiple certifications and
would undermine the purpose of reproducible builds.

So I could remove the limitation, but the resulting .buildinfo file
would not be very useful for reproducible builds.

> > +$fields->{'Source'} = $spackage;
> > +if ($changelog->{'Binary-Only'}) {
> > +$fields->{'Source'} .= ' (' . $sourceversion . ')';
> > +$fields->{'Changes'} = $changelog->{'Changes'} . "\n\n"
> > + . ' -- ' . $changelog->{'Maintainer'}
> > + . '  ' . $changelog->{'Date'};
> > +}
> 
> Hmmm, it bothers me slightly that the Changes field here diverges in
> form from the one in the .changes file.

I can understand. It's been designed that way because it's actually only
there for binNMUs where the source is the same as the original and we
need a way to reconstruct the right changelog file.

Because sbuild might actually change its strings in the future, it felt
like plain copy/pasting was the safest. So recreating the changelog in
case of binNMU is about outputing the value of the Changes field in the
.buildinfo, a blank line, and the changelog from the original source.

> I think I'd prefer to have the Date as its own field, maybe always
> included. And also the Maintainer field. Any reason to not include
> them all the time or on their own?

I think they would be confusing. If we would to include the “Maintainer”
I guess we should call it “Changed-By” like in .changes. “Date”
as such would be a confusing name because I would tend to think of it
as the date of the build, and not as the date of the latest changelog
of a binNMU… So maybe “Changed-On” or “Change-Date”.

But this feels just more complicated than just the current
implementation, even if the format differs slightly. Maybe we can rename
that field instead to “Extra-Changelog-Entry” or something else so it's
clear they have different format?

> > +my $environment = Dpkg::Deps::AND->new();
> > +foreach my $pkg (sort keys %env_pkgs) {
> > +foreach my $installed_pkg (@{$facts->{pkg}->{$pkg}}) {
> > +my $version = $installed_pkg->{version};
> > +my $architecture = $installed_pkg->{architecture};
> > +my $pkg_name = $pkg;
> 
> > +if (defined $architecture &&
> > +$architecture ne 'all' && $architecture ne $build_arch) {
> > +$pkg_name = "$pkg_name:$architecture";
> > +}
> […]
> Also this will include all Multi-Arch instances for a given package
> regardless of them being used or not, I don't think that's desirable.

Can we know for sure which one have been used?


I'm already working on other changes you suggested.

Thanks,
-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Repr

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-28 Thread Holger Levsen
Hi,

On Freitag, 29. Januar 2016, Guillem Jover wrote:
> > (I'd be in favor of naming the first accepted buildinfo
> > file of the archive just "" so that it's predictable…
> I'm not sure how we'd use a sequential number in a distributed manner
> starting with 0s though? :9

we can't :) but dak could do it. (in the Debian use case.)


cheers,
Holger


signature.asc
Description: This is a digitally signed message part.
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-28 Thread Guillem Jover
Hi!

zOn Thu, 2016-01-28 at 19:01:59 +0100, Holger Levsen wrote:
> On Donnerstag, 28. Januar 2016, Guillem Jover wrote:
> > Hmmm, leaking the hostname seems slightly privacy-concerning? If the
> > information therein is not relevant I'd rather use something like an
> > UUID (although that would require increasing the pseudo-build-essential
> > set), or just hashing the hostname-timestamp with something like md5
> > or sha1 or similar.
> 
> yeah, "something / anything" is fine. dak / the archive software can rename 
> it 
> anyway, as it likes.

Ah ok, perfect then.

> (I'd be in favor of naming the first accepted buildinfo 
> file of the archive just "" so that it's predictable…

I'm not sure how we'd use a sequential number in a distributed manner
starting with 0s though? :9

> > I've some pending changes I'll be committing to master or a separate
> > branch, that I'd like to be tested on the reproducible setup (ideally
> > against the already generated and pre-existing reproducible binaries),
> > if that's possible, I'll ask about that when those land, I just need
> > to finish up fewm more unit tests.
> 
> That's possible, though not (yet nor in near future) against pre-existing 
> binaries. (We lack the code for that.)
> 
> What we can do easily, is build and upload dpkg to our repo and use it to 
> build the whole Debian archive on amd64, which roughly takes 8 days for both 
> sid+stretch, and thus roughly 4 days for one suite, if we disable building 
> the 
> other. (Which we can definitly do, especially if we don't disable building of 
> new versions in that other suite…)

Oh ok, that's a bit unfortunate, it would be nice to be able to catch
any possible regressions in the generated binaries, and it would also
show how to use reproducible builds to see that nothing has actually
changed, even after implementation changes in the toolchain.

On Thu, 2016-01-28 at 19:02:53 +0100, Holger Levsen wrote:
> +many thanks for your thorough review! :-)

No problem!

Regards,
Guillem

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-28 Thread Jérémy Bobbio
Guillem Jover:
> I've some pending changes I'll be committing to master or a separate
> branch, that I'd like to be tested on the reproducible setup (ideally
> against the already generated and pre-existing reproducible binaries),
> if that's possible, I'll ask about that when those land, I just need
> to finish up fewm more unit tests.
> 
> Here's a quick review: […]

Thanks for the review! Just so I can organize my work, are your pending
changes related to the .buildinfo?

I can spend the next days improving the patch with your remarks, but I'd
rather not duplicate your work. :)

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-28 Thread Holger Levsen
+many thanks for your thorough review! :-)


signature.asc
Description: This is a digitally signed message part.
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-28 Thread Holger Levsen
Hi Guillem,

just quickly commenting on two sub topics…

On Donnerstag, 28. Januar 2016, Guillem Jover wrote:
> > One of the main change is that `.buildinfo` should now be named with an
> > arbitrary identifier. By default this defaults to $HOSTNAME-$TIMESTAMP
> > but can be set to an arbitrary value by the `--buildinfo-identifier`
> > command line flag.
> Hmmm, leaking the hostname seems slightly privacy-concerning? If the
> information therein is not relevant I'd rather use something like an
> UUID (although that would require increasing the pseudo-build-essential
> set), or just hashing the hostname-timestamp with something like md5
> or sha1 or similar.

yeah, "something / anything" is fine. dak / the archive software can rename it 
anyway, as it likes. (I'd be in favor of naming the first accepted buildinfo 
file of the archive just "" so that it's predictable…
 
> I've some pending changes I'll be committing to master or a separate
> branch, that I'd like to be tested on the reproducible setup (ideally
> against the already generated and pre-existing reproducible binaries),
> if that's possible, I'll ask about that when those land, I just need
> to finish up fewm more unit tests.

That's possible, though not (yet nor in near future) against pre-existing 
binaries. (We lack the code for that.)

What we can do easily, is build and upload dpkg to our repo and use it to 
build the whole Debian archive on amd64, which roughly takes 8 days for both 
sid+stretch, and thus roughly 4 days for one suite, if we disable building the 
other. (Which we can definitly do, especially if we don't disable building of 
new versions in that other suite…)


cheers,
Holger


signature.asc
Description: This is a digitally signed message part.
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-28 Thread Guillem Jover
Hi!

On Wed, 2016-01-27 at 08:58:47 +0100, Jérémy Bobbio wrote:
> Jérémy Bobbio:
> > The attached patch will enable dpkg-buildpackage to create .buildinfo
> > files as specified on the Debian wiki [1]. They have two main purposes:
> > 
> >  * recording information about the system environment used during a
> >particular build—versions of the build dependencies installed, system
> >architecture, etc. for easier forensics/debugging;
> >  * describe how to recreate (partially or in full) the original
> >environment when trying to reproduce a particular build.
> 
> I think the proposed patch is missing a field to record some environment
> variables that can affect the build process. Right now, I'm thinking of
> DEB_BUILD_OPTIONS and DEB_flag_{SET,STRIP,APPEND,PREPEND} (from
> dpkg-buildflags). Maybe others? Build profiles?
>
> Initially I was against recording such information, but now that we
> understand the purpose of .buildinfo files better and not mandate that
> they be reproducible themselves, it doesn't matter if one contains
> `DEB_BUILD_OPTIONS=parallel=4` and the other
> `DEB_BUILD_OPTIONS=parallel=16`. It is the responsibility of users
> trying to recreate a given package to filter this out.

Hmm right, makes sense, but I see this also to be a bit problematic.
There are many variables that do affect the build which we'd need to
record. Including all of the them seems like another privacy
concerning issue. Whitelisting, we might end up missing some but it's
privacy-safe; blacklisting we might end-up including sensitive ones,
but not miss any build-related ones, which is privacy-unsafe.

Some things that come to mind that do affect the build in a significant
way: CC, LD_LIBRARY_PATH, DEB*, DPKG_*, PATH, MAKEFLAGS.

The traditional build flags (i.e. CFLAGS, LDFLAGS, etc) might also affect
the build depending on the rules file.

Build profiles are already recorded in the binary packages, but having
that in the .buildinfo file seems right as it makes it easier to
reproduce the build environment. Ideally, parallel=N should not have
any visible effect but I guess it currently might. Most of the other
DEB_BUILD_OPTIONS do have a visible effect on the artifacts generated.

Thanks,
Guillem

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-28 Thread Guillem Jover
Hi!

On Tue, 2016-01-05 at 14:32:51 +0100, Jérémy Bobbio wrote:
> Control: retitle -1 dpkg-dev: please add support for .buildinfo files
> Control: tag -1 + patch

> The attached patch will enable dpkg-buildpackage to create .buildinfo
> files as specified on the Debian wiki [1]. They have two main purposes:
> 
>  * recording information about the system environment used during a
>particular build—versions of the build dependencies installed, system
>architecture, etc. for easier forensics/debugging;
>  * describe how to recreate (partially or in full) the original
>environment when trying to reproduce a particular build.
> 
> Since Guillem's preliminary review in February 2015 [2], the
> specification has slightly elvolved to be a bit more relaxed and the
> code have been improved.

Cool, thanks!

> One of the main change is that `.buildinfo` should now be named with an
> arbitrary identifier. By default this defaults to $HOSTNAME-$TIMESTAMP
> but can be set to an arbitrary value by the `--buildinfo-identifier`
> command line flag.

Hmmm, leaking the hostname seems slightly privacy-concerning? If the
information therein is not relevant I'd rather use something like an
UUID (although that would require increasing the pseudo-build-essential
set), or just hashing the hostname-timestamp with something like md5
or sha1 or similar.

> To address privacy concerns, the Build-Path field is now only included
> when either the build path starts by `/build/` or
> `--always-include-path` has been specified on the command line of
> `dpkg-genbuildinfo`.

Thanks!

> .buildinfo files are now accepted (although discarded) by the Debian
> archive [3]. This change should thus not affect Debian developpers in
> their daily work.

Ah perfect, thanks for pushing for this. I'm planning on including
most of the patches that look fine for 1.18.5 or .6 ideally.

I've some pending changes I'll be committing to master or a separate
branch, that I'd like to be tested on the reproducible setup (ideally
against the already generated and pre-existing reproducible binaries),
if that's possible, I'll ask about that when those land, I just need
to finish up fewm more unit tests.

Here's a quick review:

> From 7b6d953f834b1e8800d3f8af4570d57d86e5c592 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?J=C3=A9r=C3=A9my=20Bobbio?= 
> Date: Fri, 6 Nov 2015 12:17:39 +
> Subject: [PATCH] Add support for .buildinfo files

> diff --git a/man/dpkg-buildpackage.1 b/man/dpkg-buildpackage.1
> index 13770ba..54cee7b 100644
> --- a/man/dpkg-buildpackage.1
> +++ b/man/dpkg-buildpackage.1
> @@ -317,6 +322,12 @@ The source package version (without the epoch).
>  The upstream version.
>  .RE
>  .TP
> +.BI \-\-buildinfo-identifier= identifier
> +By default, \fBdpkg\-buildpackage\fP put the system hostname and the

“puts”, but perhaps better “uses foo and bar to create the quux
filename”.

> +current time in the name of the \fB.buildinfo\fP file. An arbitrary
> +identifier can be specified as a replacement (since dpkg 1.18.5).

I'd probably describe first what the option actually does, so that the
version when it got intruduced makes sense to what it refers to. And
then how does that diverges from the default.

And please, place new sentences on a new line. (I should probably
update the "coding-style" about that.)

> It must
> +contain only alphanumeric characters and hyphens.
> +.TP
>  .BI \-p sign-command
>  When \fBdpkg\-buildpackage\fP needs to execute GPG to sign a source
>  control (\fB.dsc\fP) file or a \fB.changes\fP file it will run

> diff --git a/man/dpkg-genbuildinfo.1 b/man/dpkg-genbuildinfo.1
> new file mode 100644
> index 000..77f2a76
> --- /dev/null
> +++ b/man/dpkg-genbuildinfo.1
> @@ -0,0 +1,98 @@

> +.BI \-\-always\-include\-path
> +By default, the \fBBuild-Path\fR field will only be written if the current
> +directory starts with \fB/build/\fR. Specify this option to always write
> +a \fBBuild-Path\fR field when generating the \fB.buildinfo\fR.

Missing escaped dash in field names.

> diff --git a/scripts/Dpkg/Control/Types.pm b/scripts/Dpkg/Control/Types.pm
> index 09e12d1..ad6e11b 100644
> --- a/scripts/Dpkg/Control/Types.pm
> +++ b/scripts/Dpkg/Control/Types.pm
> @@ -51,16 +52,17 @@ between Dpkg::Control and Dpkg::Control::Fields.
>  
>  use constant {
>  CTRL_UNKNOWN => 0,
> -CTRL_INFO_SRC => 1,  # First control block in debian/control
> -CTRL_INFO_PKG => 2,  # Subsequent control blocks in debian/control
> -CTRL_INDEX_SRC => 4, # Entry in repository's Packages files
> -CTRL_INDEX_PKG => 8, # Entry in repository's Sources files
> -CTRL_PKG_SRC => 16,  # .dsc file of source package
> -CTRL_PKG_DEB => 32,  # DEBIAN/control in binary packages
> -CTRL_FILE_CHANGES => 64, # .changes file
> -CTRL_FILE_VENDOR => 128, # File in $Dpkg::CONFDIR/origins
> -CTRL_FILE_STATUS => 256, # $Dpkg::ADMINDIR/status
> -CTRL_CHANGELOG => 512,   # Output of dpkg-parsechangelog
> +   

Re: [Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

2016-01-26 Thread Jérémy Bobbio
Jérémy Bobbio:
> The attached patch will enable dpkg-buildpackage to create .buildinfo
> files as specified on the Debian wiki [1]. They have two main purposes:
> 
>  * recording information about the system environment used during a
>particular build—versions of the build dependencies installed, system
>architecture, etc. for easier forensics/debugging;
>  * describe how to recreate (partially or in full) the original
>environment when trying to reproduce a particular build.

I think the proposed patch is missing a field to record some environment
variables that can affect the build process. Right now, I'm thinking of
DEB_BUILD_OPTIONS and DEB_flag_{SET,STRIP,APPEND,PREPEND} (from
dpkg-buildflags). Maybe others? Build profiles?

Initially I was against recording such information, but now that we
understand the purpose of .buildinfo files better and not mandate that
they be reproducible themselves, it doesn't matter if one contains
`DEB_BUILD_OPTIONS=parallel=4` and the other
`DEB_BUILD_OPTIONS=parallel=16`. It is the responsibility of users
trying to recreate a given package to filter this out.

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds