Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-10 Thread Tomasz Buchert
Package: lintian
Version: 2.5.21
Severity: wishlist

Dear Maintainer,

There is an ongoing project to build reproducible deps
(see https://wiki.debian.org/ReproducibleBuilds). One of tasks
is to update lintian to emit a tag on gzips that contain timestamps.
I've written a simple checker that does exactly that and emits
"package-contains-timestamped-gzip". The patch is attached.

Please note that I'm no perl programmer and it is my first prospective
lintian contribution.

Cheers,
Tomasz



-- System Information:
Debian Release: jessie/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'stable'), (200, 'unstable'), (1, 
'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.12-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=fr_FR.utf8, LC_CTYPE=fr_FR.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages lintian depends on:
ii  binutils   2.24-3
ii  bzip2  1.0.6-5
ii  diffstat   1.58-1
ii  file   1:5.14-2
ii  gettext0.18.3.2-1
ii  hardening-includes 2.5
ii  intltool-debian0.35.0+20060710.1
ii  libapt-pkg-perl0.1.29+b1
ii  libarchive-zip-perl1.30-7
ii  libclass-accessor-perl 0.34-1
ii  libclone-perl  0.36-1
ii  libdpkg-perl   1.17.6
ii  libemail-valid-perl1.192-1
ii  libfile-basedir-perl   0.03-1
ii  libipc-run-perl0.92-1
ii  liblist-moreutils-perl 0.33-1+b2
ii  libparse-debianchangelog-perl  1.2.0-1
ii  libtext-levenshtein-perl   0.06~01-2
ii  libtimedate-perl   2.3000-1
ii  liburi-perl1.60-1
ii  man-db 2.6.6-1
ii  patchutils 0.3.2-3
ii  perl [libdigest-sha-perl]  5.18.2-2
ii  t1utils1.37-2

Versions of packages lintian recommends:
pn  libperlio-gzip-perl 
ii  perl-modules [libautodie-perl]  5.18.2-2

Versions of packages lintian suggests:
pn  binutils-multiarch 
ii  dpkg-dev   1.17.6
ii  libhtml-parser-perl3.71-1+b1
ii  libtext-template-perl  1.46-1
ii  libyaml-perl   0.84-1
ii  xz-utils   5.1.1alpha+20120614-2

-- no debconf information
>From f389948be4631df98cbf1a140857a541b76ffe77 Mon Sep 17 00:00:00 2001
From: Tomasz Buchert 
Date: Mon, 10 Feb 2014 23:53:37 +0100
Subject: [PATCH] added reproducibility checker

---
 checks/reproducibility.desc|  13 ++
 checks/reproducibility.pm  |  51 +
 t/tests/reproducibility/debian/debian/control.in   |  17 +++
 .../debian/debian/unreproducible-pkg.install   |   1 +
 t/tests/reproducibility/debian/file|   1 +
 .../reproducibility/debian/file-with-timestamp.gz  | Bin 0 -> 39 bytes
 .../debian/file-without-timestamp.gz   | Bin 0 -> 34 bytes
 t/tests/reproducibility/debian/prepare |   4 ++
 t/tests/reproducibility/desc   |   6 +++
 t/tests/reproducibility/tags   |   1 +
 10 files changed, 94 insertions(+)
 create mode 100644 checks/reproducibility.desc
 create mode 100644 checks/reproducibility.pm
 create mode 100644 t/tests/reproducibility/debian/debian/control.in
 create mode 100644 t/tests/reproducibility/debian/debian/unreproducible-pkg.install
 create mode 100644 t/tests/reproducibility/debian/file
 create mode 100644 t/tests/reproducibility/debian/file-with-timestamp.gz
 create mode 100644 t/tests/reproducibility/debian/file-without-timestamp.gz
 create mode 100755 t/tests/reproducibility/debian/prepare
 create mode 100644 t/tests/reproducibility/desc
 create mode 100644 t/tests/reproducibility/tags

diff --git a/checks/reproducibility.desc b/checks/reproducibility.desc
new file mode 100644
index 000..26f390a
--- /dev/null
+++ b/checks/reproducibility.desc
@@ -0,0 +1,13 @@
+Check-Script: reproducibility
+Author: Tomasz Buchert 
+Abbrev: repro
+Type: binary, udeb
+Needs-Info: index
+Info: This script checks packages for unreproducible elements.
+
+Tag: package-contains-timestamped-gzip
+Severity: normal
+Certainty: certain
+Info: The package contains a gzip'ed file that
+ has timestamps. Such files make the produced
+ packages unreproducible.
diff --git a/checks/reproducibility.pm b/checks/reproducibility.pm
new file mode 100644
index 000..59c13d9
--- /dev/null
+++ b/checks/reproducibility.pm
@@ -0,0 +1,51 @@
+# reproducibility -- lintian check script -*- perl -*-
+#
+# Copyright (C) 2014 Tomasz Buchert
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHO

Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-10 Thread Jakub Wilk

* Tomasz Buchert , 2014-02-11, 00:56:
There is an ongoing project to build reproducible deps (see 
https://wiki.debian.org/ReproducibleBuilds). One of tasks is to update 
lintian to emit a tag on gzips that contain timestamps. I've written a 
simple checker that does exactly that and emits 
"package-contains-timestamped-gzip". The patch is attached.


We check for a very similar thing already in files.pm (the 
gzip-file-is-not-multi-arch-same-safe tag). Perhaps it would be better 
to reuse that code.


Niels, do you remember why we read timestamp with sysread() and unpack() 
instead of using file_info? I have a vague recollection that we did it 
on purpose, but can't remember the details.



+Severity: normal


It think it should be at most "wishlist", perhaps even "pedantic".

--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-10 Thread Niels Thykier
On 2014-02-11 01:20, Jakub Wilk wrote:
> * Tomasz Buchert , 2014-02-11, 00:56:
>> There is an ongoing project to build reproducible deps (see
>> https://wiki.debian.org/ReproducibleBuilds). One of tasks is to update
>> lintian to emit a tag on gzips that contain timestamps. I've written a
>> simple checker that does exactly that and emits
>> "package-contains-timestamped-gzip". The patch is attached.
> 
> We check for a very similar thing already in files.pm (the
> gzip-file-is-not-multi-arch-same-safe tag). Perhaps it would be better
> to reuse that code.
> 
> Niels, do you remember why we read timestamp with sysread() and unpack()
> instead of using file_info? I have a vague recollection that we did it
> on purpose, but can't remember the details.
> 

Yes, file(1) cannot reliably detect gzip files[1] and I guess I figured
it was easier to do it with sysread than have file-info-helper replace
even more of file(1) job.

>> +Severity: normal
> 
> It think it should be at most "wishlist", perhaps even "pedantic".
> 


~Niels

[1] Apparently based on the timestamp it "randomly" decides to misreport
it as a different file type. At least that is my conclusion based on the
test failures we used to have.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-11 Thread Tomasz Buchert
On 11/02/14 06:54, Niels Thykier wrote:
> On 2014-02-11 01:20, Jakub Wilk wrote:
> > * Tomasz Buchert , 2014-02-11, 00:56:
> >> There is an ongoing project to build reproducible deps (see
> >> https://wiki.debian.org/ReproducibleBuilds). One of tasks is to update
> >> lintian to emit a tag on gzips that contain timestamps. I've written a
> >> simple checker that does exactly that and emits
> >> "package-contains-timestamped-gzip". The patch is attached.
> > 
> > We check for a very similar thing already in files.pm (the
> > gzip-file-is-not-multi-arch-same-safe tag). Perhaps it would be better
> > to reuse that code.

Ok, no problem. Should I put the test inside files.pm then? I wanted
to dedicate a whole check for prospective reproduciblity checks.

> > 
> > Niels, do you remember why we read timestamp with sysread() and unpack()
> > instead of using file_info? I have a vague recollection that we did it
> > on purpose, but can't remember the details.
> > 
> 
> Yes, file(1) cannot reliably detect gzip files[1] and I guess I figured
> it was easier to do it with sysread than have file-info-helper replace
> even more of file(1) job.

That's funny! Not a problem, my prototype was written that way, but
then I found file_info and decided to do it the easy way.

> 
> >> +Severity: normal
> > 
> > It think it should be at most "wishlist", perhaps even "pedantic".
> > 

Let's make it "pedantic", but hopefully one day
it will be "normal".

> 
> 
> ~Niels
> 
> [1] Apparently based on the timestamp it "randomly" decides to misreport
> it as a different file type. At least that is my conclusion based on the
> test failures we used to have.
> 
> 

Cheers,
Tomasz


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-11 Thread Jérémy Bobbio
Tomasz Buchert:
> > >> +Severity: normal
> > > 
> > > It think it should be at most "wishlist", perhaps even "pedantic".
> > > 
> 
> Let's make it "pedantic", but hopefully one day
> it will be "normal".

Could we go for “wishlist” instead?

I know that switching to reproducible builds sounds like a major
shift in Debian's current practices but we already have way more
packages reproducible that one might expect.

The following wiki page describe the last large scale experiment that
was done: 
67% out of the 6887 packages that were tested were reproducible. 103 of
them failed due to one or more timestamp in gzip files.

I think “wishlist” is more appropriate because we are trying to get the
the archive reproducible and asking interested maintainers for help.
I don't think this fall under a “particular Debian packaging style”
as worded in the man page about `--pedantic`.

In any cases, my dear Lintian maintainers, I trust you to sort things
out appropriately. :)

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature


Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-11 Thread Tomasz Buchert
On 11/02/14 10:09, Jérémy Bobbio wrote:
> Tomasz Buchert:
> > > >> +Severity: normal
> > > > 
> > > > It think it should be at most "wishlist", perhaps even "pedantic".
> > > > 
> > 
> > Let's make it "pedantic", but hopefully one day
> > it will be "normal".
> 
> Could we go for “wishlist” instead?

Hi,
I reworked the patch so that it reuses the machinery
in files.pm. I also made it "wishlist" this time. I attach the patch.

Currently, it will emit "package-contains-timestamped-gzip"
on any file ending with ".gz", being a gzip file and containing
a timestamp. It means that currently the tag 
"gzip-file-is-not-multi-arch-same-safe" 
will imply "package-contains-timestamped-gzip". Moreover, the new tag breaks 
multiple
tests (files-gzip, manpages-general, etc.) because they use timestamped gzips. 
I will
fix it, but first I'd like to know that implementation of the tag is ok.

Cheers,
Tomasz

> 
> I know that switching to reproducible builds sounds like a major
> shift in Debian's current practices but we already have way more
> packages reproducible that one might expect.
> 
> The following wiki page describe the last large scale experiment that
> was done: 
> 67% out of the 6887 packages that were tested were reproducible. 103 of
> them failed due to one or more timestamp in gzip files.
> 
> I think “wishlist” is more appropriate because we are trying to get the
> the archive reproducible and asking interested maintainers for help.
> I don't think this fall under a “particular Debian packaging style”
> as worded in the man page about `--pedantic`.
> 
> In any cases, my dear Lintian maintainers, I trust you to sort things
> out appropriately. :)
> 
> -- 
> Lunar.''`. 
> lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
> `. `'` 
>   `-   


>From 4dcc45c75df792820c356beca0fa84b067cf0268 Mon Sep 17 00:00:00 2001
From: Tomasz Buchert 
Date: Tue, 11 Feb 2014 10:11:20 +0100
Subject: [PATCH] new tag: package-contains-timestamped-gzip (+ test)

---
 checks/files.desc  |   8 
 checks/files.pm|  14 +-
 t/tests/reproducibility/debian/debian/control.in   |  17 +
 .../debian/debian/unreproducible-pkg.install   |   1 +
 t/tests/reproducibility/debian/file|   1 +
 t/tests/reproducibility/debian/file-with-timestamp.gz  | Bin 0 -> 39 bytes
 .../reproducibility/debian/file-without-timestamp.gz   | Bin 0 -> 34 bytes
 t/tests/reproducibility/debian/prepare |   4 
 t/tests/reproducibility/desc   |   6 ++
 t/tests/reproducibility/tags   |   1 +
 10 files changed, 47 insertions(+), 5 deletions(-)
 create mode 100644 t/tests/reproducibility/debian/debian/control.in
 create mode 100644 t/tests/reproducibility/debian/debian/unreproducible-pkg.install
 create mode 100644 t/tests/reproducibility/debian/file
 create mode 100644 t/tests/reproducibility/debian/file-with-timestamp.gz
 create mode 100644 t/tests/reproducibility/debian/file-without-timestamp.gz
 create mode 100755 t/tests/reproducibility/debian/prepare
 create mode 100644 t/tests/reproducibility/desc
 create mode 100644 t/tests/reproducibility/tags

diff --git a/checks/files.desc b/checks/files.desc
index 760f86a..e8237f0 100644
--- a/checks/files.desc
+++ b/checks/files.desc
@@ -1448,3 +1448,11 @@ Info: The given file is in PATH but consists of non-ASCII characters.
  .
  Note that Lintian may be unable to display the filename accurately.
  Unprintable characters may have been replaced.
+
+Tag: package-contains-timestamped-gzip
+Severity: wishlist
+Certainty: certain
+Info: The package contains a gzip'ed file that has timestamps.
+ Such files make the produces packages unreproducible.
+ .
+ Pass "-n" flag to gzip to avoid it.
diff --git a/checks/files.pm b/checks/files.pm
index 5c5a60d..21a0f0c 100644
--- a/checks/files.pm
+++ b/checks/files.pm
@@ -1400,23 +1400,27 @@ sub run {
 my $finfo = $info->file_info($file) || '';
 if ($finfo !~ m/gzip compressed/) {
 tag 'gz-file-not-gzip', $file;
-} elsif ($isma_same && $file !~ m/\Q$arch\E/o) {
+} else {
 my $path = $info->unpacked($file);
 my $buff;
+my $mtime;
 open(my $fd, '<', $path);
 # We need to read at least 8 bytes
 if (sysread($fd, $buff, 1024) >= 8) {
 # Extract the flags and the mtime.
 #  NN NN  NN NN, NN NN NN NN  - bytes read
 #  __ __  __ __,$mtime- variables
-my (undef, $mtime) = unpack('NN', $buff);
-   

Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-11 Thread Niels Thykier
On 2014-02-11 10:53, Tomasz Buchert wrote:
> On 11/02/14 10:09, Jérémy Bobbio wrote:
>> > Tomasz Buchert:
>> > > > >> +Severity: normal
> > > > > 
> > > > > It think it should be at most "wishlist", perhaps even "pedantic".
> > > > > 
>>> > > 
>>> > > Let's make it "pedantic", but hopefully one day
>>> > > it will be "normal".
>> > 
>> > Could we go for “wishlist” instead?
> Hi,
> I reworked the patch so that it reuses the machinery
> in files.pm. I also made it "wishlist" this time. I attach the patch.
> 


Hi,

Thanks for working on it and producing a patch for it as well. :)

> Currently, it will emit "package-contains-timestamped-gzip"
> on any file ending with ".gz", being a gzip file and containing
> a timestamp. It means that currently the tag 
> "gzip-file-is-not-multi-arch-same-safe" 
> will imply "package-contains-timestamped-gzip".

Ok - not sure if anyone has any feeling for or against that. I am a
/little/ concerned with it creating "too much output" (for new users),
but other than that I don't care too much.


> Moreover, the new tag breaks multiple
> tests (files-gzip, manpages-general, etc.) because they use timestamped 
> gzips. I will
> fix it, but first I'd like to know that implementation of the tag is ok.
> 
> Cheers,
> Tomasz
> 

Noted, I have written my comments below:

> [...]
> 
> 0001-new-tag-package-contains-timestamped-gzip-test.patch
> 
> 
>>From 4dcc45c75df792820c356beca0fa84b067cf0268 Mon Sep 17 00:00:00 2001
> From: Tomasz Buchert 
> Date: Tue, 11 Feb 2014 10:11:20 +0100
> Subject: [PATCH] new tag: package-contains-timestamped-gzip (+ test)
> 
> ---
>  checks/files.desc  |   8 
>  checks/files.pm|  14 +-
>  t/tests/reproducibility/debian/debian/control.in   |  17 
> +

Test should be renamed to "files-reproducibility" (or something else
starting with "files-").  Otherwise, it won't be run with "debian/rules
runtests onlyrun=files" as it should.

>  .../debian/debian/unreproducible-pkg.install   |   1 +
>  t/tests/reproducibility/debian/file|   1 +
>  t/tests/reproducibility/debian/file-with-timestamp.gz  | Bin 0 -> 39 bytes
>  .../reproducibility/debian/file-without-timestamp.gz   | Bin 0 -> 34 bytes

The gz files seem like a mistake?

>  t/tests/reproducibility/debian/prepare |   4 
>  t/tests/reproducibility/desc   |   6 ++
>  t/tests/reproducibility/tags   |   1 +
>  10 files changed, 47 insertions(+), 5 deletions(-)
>  create mode 100644 t/tests/reproducibility/debian/debian/control.in
>  create mode 100644 
> t/tests/reproducibility/debian/debian/unreproducible-pkg.install
>  create mode 100644 t/tests/reproducibility/debian/file
>  create mode 100644 t/tests/reproducibility/debian/file-with-timestamp.gz
>  create mode 100644 t/tests/reproducibility/debian/file-without-timestamp.gz
>  create mode 100755 t/tests/reproducibility/debian/prepare
>  create mode 100644 t/tests/reproducibility/desc
>  create mode 100644 t/tests/reproducibility/tags
> 
> diff --git a/checks/files.desc b/checks/files.desc
> index 760f86a..e8237f0 100644
> --- a/checks/files.desc
> +++ b/checks/files.desc
> @@ -1448,3 +1448,11 @@ Info: The given file is in PATH but consists of 
> non-ASCII characters.
>   .
>   Note that Lintian may be unable to display the filename accurately.
>   Unprintable characters may have been replaced.
> +
> +Tag: package-contains-timestamped-gzip
> +Severity: wishlist
> +Certainty: certain
> +Info: The package contains a gzip'ed file that has timestamps.
> + Such files make the produces packages unreproducible.

You probably want to define "unreproducible" here a bit more.  (Also,
could use a grammar check "files make the produces packages").

Feel free to add a "Ref: 

> + .
> + Pass "-n" flag to gzip to avoid it.

[nitpick] Maybe consider something like:

"""
Please consider passing the "-n" flag to gzip to avoid this.
"""

Being polite sometimes goes a long way.

> diff --git a/checks/files.pm b/checks/files.pm
> index 5c5a60d..21a0f0c 100644
> --- a/checks/files.pm
> +++ b/checks/files.pm
> @@ -1400,23 +1400,27 @@ sub run {
>  [...]
>  close($fd);
> +if ($mtime != 0) {
> +if ($isma_same && $file !~ m/\Q$arch\E/o) {
> +tag 'gzip-file-is-not-multi-arch-same-safe', 
> $file;
> +}
> +tag 'package-contains-timestamped-gzip', $file;
> +}
>  }
>  }
>  

Looks good (and also seems to be trivial to disable the "implication"
issue if needed be).

> diff --git a/t/tests/reproducibility/debian/debian/control.in 
> b/t/tests/reproducibility/debian/debian/control.in
> new file mode 100644
> index 000..a7e8050
> --- /dev/null
> +++ b/t/tests/reprodu

Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-12 Thread Tomasz Buchert

Hi Niels,

On 11/02/14 19:11, Niels Thykier wrote:
> On 2014-02-11 10:53, Tomasz Buchert wrote:
> [...]
> 
> Hi,
> 
> Thanks for working on it and producing a patch for it as well. :)
> 
> > Currently, it will emit "package-contains-timestamped-gzip"
> > on any file ending with ".gz", being a gzip file and containing
> > a timestamp. It means that currently the tag 
> > "gzip-file-is-not-multi-arch-same-safe" 
> > will imply "package-contains-timestamped-gzip".
> 
> Ok - not sure if anyone has any feeling for or against that. I am a
> /little/ concerned with it creating "too much output" (for new users),
> but other than that I don't care too much.
> 

The reason I did it is that I wanted to keep "backwards compatibility".
Another solution is to drop "gzip-file-is-not-multi-arch-same-safe"
altogether, of course.

> 
> [...]
> 
> 
> Otherwise, it looks god at first glance (without having tested it).
> 
> ~Niels
> 

Thanks for the review! I attach a new patch that (hopefully)
addresses your issues.

Tomasz
>From 5f9f1e9fea7435f3eacbc95b00ebe835c8f1eca9 Mon Sep 17 00:00:00 2001
From: Tomasz Buchert 
Date: Tue, 11 Feb 2014 10:11:20 +0100
Subject: [PATCH] new tag: package-contains-timestamped-gzip (+ test)

---
 checks/files.desc | 10 ++
 checks/files.pm   | 14 +-
 t/tests/files-reproducibility/debian/Makefile |  9 +
 t/tests/files-reproducibility/desc|  6 ++
 t/tests/files-reproducibility/tags|  1 +
 5 files changed, 35 insertions(+), 5 deletions(-)
 create mode 100644 t/tests/files-reproducibility/debian/Makefile
 create mode 100644 t/tests/files-reproducibility/desc
 create mode 100644 t/tests/files-reproducibility/tags

diff --git a/checks/files.desc b/checks/files.desc
index 760f86a..f0b9444 100644
--- a/checks/files.desc
+++ b/checks/files.desc
@@ -1448,3 +1448,13 @@ Info: The given file is in PATH but consists of non-ASCII characters.
  .
  Note that Lintian may be unable to display the filename accurately.
  Unprintable characters may have been replaced.
+
+Tag: package-contains-timestamped-gzip
+Severity: wishlist
+Certainty: certain
+Info: The package contains a gzip'ed file that has timestamps.
+ Such files make the packages unreproducible, because their
+ contents depend on the time when the package was built.
+ .
+ Please consider passing the "-n" flag to gzip to avoid this.
+Ref: https://wiki.debian.org/ReproducibleBuilds
diff --git a/checks/files.pm b/checks/files.pm
index 5c5a60d..21a0f0c 100644
--- a/checks/files.pm
+++ b/checks/files.pm
@@ -1400,23 +1400,27 @@ sub run {
 my $finfo = $info->file_info($file) || '';
 if ($finfo !~ m/gzip compressed/) {
 tag 'gz-file-not-gzip', $file;
-} elsif ($isma_same && $file !~ m/\Q$arch\E/o) {
+} else {
 my $path = $info->unpacked($file);
 my $buff;
+my $mtime;
 open(my $fd, '<', $path);
 # We need to read at least 8 bytes
 if (sysread($fd, $buff, 1024) >= 8) {
 # Extract the flags and the mtime.
 #  NN NN  NN NN, NN NN NN NN  - bytes read
 #  __ __  __ __,$mtime- variables
-my (undef, $mtime) = unpack('NN', $buff);
-if ($mtime){
-tag 'gzip-file-is-not-multi-arch-same-safe',$file;
-}
+(undef, $mtime) = unpack('NN', $buff);
 } else {
 fail "reading $file: $!";
 }
 close($fd);
+if ($mtime != 0) {
+if ($isma_same && $file !~ m/\Q$arch\E/o) {
+tag 'gzip-file-is-not-multi-arch-same-safe', $file;
+}
+tag 'package-contains-timestamped-gzip', $file;
+}
 }
 }
 
diff --git a/t/tests/files-reproducibility/debian/Makefile b/t/tests/files-reproducibility/debian/Makefile
new file mode 100644
index 000..c5f6bc7
--- /dev/null
+++ b/t/tests/files-reproducibility/debian/Makefile
@@ -0,0 +1,9 @@
+ROOT=$(DESTDIR)/usr/share/files-reproducibility
+
+default:
+	:
+
+install:
+	mkdir -p $(ROOT)
+	echo "Hello" | gzip - -c > $(ROOT)/gzip-with-timestamp.gz
+	echo "Hello" | gzip - -nc > $(ROOT)/gzip-without-timestamp.gz
diff --git a/t/tests/files-reproducibility/desc b/t/tests/files-reproducibility/desc
new file mode 100644
index 000..8cbbae9
--- /dev/null
+++ b/t/tests/files-reproducibility/desc
@@ -0,0 +1,6 @@
+Testname: files-reproducibility
+Sequence: 6000
+Version: 1.0
+Description: Test if package is reproducible
+Test-For:
+ package-contains-timestamped-gzip
diff --git a/t/tests/files-reproducibility/tags b/t/te

Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-12 Thread Jérémy Bobbio
Tomasz Buchert:
> The reason I did it is that I wanted to keep "backwards compatibility".
> Another solution is to drop "gzip-file-is-not-multi-arch-same-safe"
> altogether, of course.

The latter is severity “important”. “Multi-Arch: Same” packages for
different architecture will be uninstallable if they do not contain
identical data files. That's a serious problem. Reproducibility issues
do not affect users in the same way.

It could make sense to emit either “gzip-file-is-not-multi-arch-same-safe” or
“package-contains-timestamped-gzip” instead of emitting both as they
should be fixed by the same changes.

-- 
Jérémy Bobbio.''`.
jeremy.bob...@irq7.fr   : :   : lu...@debian.org
`. `'`  lu...@torproject.org
  `-


signature.asc
Description: Digital signature


Bug#738591: lintian: Add checker for timestamped gzip files

2014-02-12 Thread Tomasz Buchert
On 12/02/14 11:03, Jérémy Bobbio wrote:
> Tomasz Buchert:
> > The reason I did it is that I wanted to keep "backwards compatibility".
> > Another solution is to drop "gzip-file-is-not-multi-arch-same-safe"
> > altogether, of course.
> 
> The latter is severity “important”. “Multi-Arch: Same” packages for
> different architecture will be uninstallable if they do not contain
> identical data files. That's a serious problem. Reproducibility issues
> do not affect users in the same way.
> 
> It could make sense to emit either “gzip-file-is-not-multi-arch-same-safe” or
> “package-contains-timestamped-gzip” instead of emitting both as they
> should be fixed by the same changes.
> 
> -- 
> Jérémy Bobbio.''`.
> jeremy.bob...@irq7.fr   : :   : lu...@debian.org
> `. `'`  lu...@torproject.org
>   `-

I wasn't aware of this. I updated the patch.

Tomasz
>From b21b1dd328d6efb3b8e9d63e3c56e3ce3e0b2d8e Mon Sep 17 00:00:00 2001
From: Tomasz Buchert 
Date: Tue, 11 Feb 2014 10:11:20 +0100
Subject: [PATCH] new tag: package-contains-timestamped-gzip (+ test)

---
 checks/files.desc | 10 ++
 checks/files.pm   | 15 ++-
 t/tests/files-reproducibility/debian/Makefile |  9 +
 t/tests/files-reproducibility/desc|  6 ++
 t/tests/files-reproducibility/tags|  1 +
 5 files changed, 36 insertions(+), 5 deletions(-)
 create mode 100644 t/tests/files-reproducibility/debian/Makefile
 create mode 100644 t/tests/files-reproducibility/desc
 create mode 100644 t/tests/files-reproducibility/tags

diff --git a/checks/files.desc b/checks/files.desc
index 760f86a..f0b9444 100644
--- a/checks/files.desc
+++ b/checks/files.desc
@@ -1448,3 +1448,13 @@ Info: The given file is in PATH but consists of non-ASCII characters.
  .
  Note that Lintian may be unable to display the filename accurately.
  Unprintable characters may have been replaced.
+
+Tag: package-contains-timestamped-gzip
+Severity: wishlist
+Certainty: certain
+Info: The package contains a gzip'ed file that has timestamps.
+ Such files make the packages unreproducible, because their
+ contents depend on the time when the package was built.
+ .
+ Please consider passing the "-n" flag to gzip to avoid this.
+Ref: https://wiki.debian.org/ReproducibleBuilds
diff --git a/checks/files.pm b/checks/files.pm
index 5c5a60d..858d9f4 100644
--- a/checks/files.pm
+++ b/checks/files.pm
@@ -1400,23 +1400,28 @@ sub run {
 my $finfo = $info->file_info($file) || '';
 if ($finfo !~ m/gzip compressed/) {
 tag 'gz-file-not-gzip', $file;
-} elsif ($isma_same && $file !~ m/\Q$arch\E/o) {
+} else {
 my $path = $info->unpacked($file);
 my $buff;
+my $mtime;
 open(my $fd, '<', $path);
 # We need to read at least 8 bytes
 if (sysread($fd, $buff, 1024) >= 8) {
 # Extract the flags and the mtime.
 #  NN NN  NN NN, NN NN NN NN  - bytes read
 #  __ __  __ __,$mtime- variables
-my (undef, $mtime) = unpack('NN', $buff);
-if ($mtime){
-tag 'gzip-file-is-not-multi-arch-same-safe',$file;
-}
+(undef, $mtime) = unpack('NN', $buff);
 } else {
 fail "reading $file: $!";
 }
 close($fd);
+if ($mtime != 0) {
+if ($isma_same && $file !~ m/\Q$arch\E/o) {
+tag 'gzip-file-is-not-multi-arch-same-safe', $file;
+} else {
+tag 'package-contains-timestamped-gzip', $file;
+}
+}
 }
 }
 
diff --git a/t/tests/files-reproducibility/debian/Makefile b/t/tests/files-reproducibility/debian/Makefile
new file mode 100644
index 000..c5f6bc7
--- /dev/null
+++ b/t/tests/files-reproducibility/debian/Makefile
@@ -0,0 +1,9 @@
+ROOT=$(DESTDIR)/usr/share/files-reproducibility
+
+default:
+	:
+
+install:
+	mkdir -p $(ROOT)
+	echo "Hello" | gzip - -c > $(ROOT)/gzip-with-timestamp.gz
+	echo "Hello" | gzip - -nc > $(ROOT)/gzip-without-timestamp.gz
diff --git a/t/tests/files-reproducibility/desc b/t/tests/files-reproducibility/desc
new file mode 100644
index 000..8cbbae9
--- /dev/null
+++ b/t/tests/files-reproducibility/desc
@@ -0,0 +1,6 @@
+Testname: files-reproducibility
+Sequence: 6000
+Version: 1.0
+Description: Test if package is reproducible
+Test-For:
+ package-contains-timestamped-gzip
diff --git a/t/tests/files-reproducibility/tags b/t/tests/