Source: gnupg2
Severity: minor
X-Debbugs-Cc: Daniel Kahn Gillmor <d...@fifthhorseman.net>

The gnupg2 package is built from source based on the upstream released
tarball.  Upstream also uses git for revision control, and we track
upstream git as well as the released tarballs.  upstream uses OpenPGP to
sign both git tags and released tarballs.

We trim many prebuilt files from the tarball, so what's in our debian
packaging repositories are pretty close to upstream's git repos.  But
not quite all of them.

Inspired by the recent xz mess, where malicious files were slipped into
a tarball, i'd like to minimize the amount of non-tracked source used in
GnuPG.  I think we should use debian/clean (and gbp import-orig's
filtering, see #1071200) to trim out all of the generated files before
build, so that what we're building from source is as close to upstream
traceable git commits as possible.

I did a quick scan of what we're shipping in revision control (hence,
what's in the filtered tarball) that the upstream git tag isn't
accounting for.  Here's what i found:

$ git diff --stat gnupg-2.2.43..upstream/2.2.43 | grep -e '\+' -e 'Bin 0 ->'
 ChangeLog                                          | 34710 ++++++++++++++++++-
 VERSION                                            |     1 +
 common/audit-events.h                              |   116 +
 common/status-codes.h                              |   248 +
 doc/defsincdate                                    |     1 +
 doc/gnupg-card-architecture.pdf                    |   Bin 0 -> 19221 bytes
 doc/gnupg-card-architecture.png                    |   Bin 0 -> 8843 bytes
 doc/gnupg-module-overview.pdf                      |   408 +
 doc/gnupg-module-overview.png                      |   Bin 0 -> 124560 bytes
 po/ca.po                                           |  2295 +-
 po/cs.po                                           |  2303 +-
 po/da.po                                           |  2299 +-
 po/de.po                                           |  2310 +-
 po/el.po                                           |  2295 +-
 po/e...@boldquot.po                                  | 10967 ++++++
 po/e...@quot.po                                      | 10951 ++++++
 po/eo.po                                           |  2295 +-
 po/es.po                                           |  2307 +-
 po/et.po                                           |  2299 +-
 po/fi.po                                           |  2295 +-
 po/fr.po                                           |  2299 +-
 po/gl.po                                           |  2303 +-
 po/gnupg2.pot                                      | 10636 ++++++
 po/hu.po                                           |  2295 +-
 po/id.po                                           |  2295 +-
 po/it.po                                           |  2295 +-
 po/ja.po                                           |  2295 +-
 po/nb.po                                           |  2295 +-
 po/pl.po                                           |  2295 +-
 po/pt.po                                           |  2295 +-
 po/ro.po                                           |  2307 +-
 po/ru.po                                           |  2303 +-
 po/sk.po                                           |  2303 +-
 po/sv.po                                           |  2299 +-
 po/tr.po                                           |  2295 +-
 po/uk.po                                           |  2299 +-
 po/zh_CN.po                                        |  2295 +-
 po/zh_TW.po                                        |  2291 +-
 regexp/_unicode_mapping.c                          |   284 +
 242 files changed, 127919 insertions(+), 42329 deletions(-)
$

the doc/*.{pdf,png} stuff is fixed already, as of 2.2.43-3, and will be
filtered out whenever we move to the next upstream release.

Here's my attempt at analyzing what remains:

ChangeLog: this is generated automatically by upstream from upstream git
history, and we ship it (half a meg after compression!) in all the
produced packages.  This seems like a lot, and we ought to be able to
drop it from nearly everywhere.  what if we just shipped it with
gnupg2-doc, and left the other packages with a simple text file?  or
What if we just stopped shipping it altogether?  will anyone mind?
The details are at developer-level, and it'll still be in the source
tarballs if anyone wants to read the file.

VERSION: this contains only the upstream version number.  Can we
generate it manually from debian/changelog?

doc/defsincdate: this file is generated upstream, and can potentially
introduce non-reproducibility (see
debian/patches/debian-packaging/avoid-regenerating-defsincdate-use-shipped-file.patch
for more discussion).  If we strip that file, and drop the above patch
(or tune it so that it only works with $SOURCE_DATE_EPOCH) then we
should be able to avoid unreproducibility.  Doing so would mean that
generated documentation files would have the timestamp of the changelog
entry, though, rather than the timestamp of the upstream tarball.
that might make (for example) a diffoscope comparison of shipped files
between point releases unnecessarily noisy.

common/{audit-events,status-codes}.h: these appear to be stripped and
rebuilt in maintainer-mode.  we're currently building (at least one of
our builds) in maintainer-mode, so it seems like we ought to be able to
strip them and ensure that they get rebuilt, but i haven't tested.

regexp/_unicode_mapping.c: this is another maintainer-mode file,
generated from UnicodeData.txt.  Looks like it contains a mapping
between upper and lower case codepoints.  Debian ships a more up-to-date
UnicodeData.txt in the unicode-data package, which includes some
codepoints (like GLAGOLITIC CAPITAL LETTER CAUDATE CHRIVI and GLAGOLITIC
SMALL LETTER CAUDATE CHRIVI) that are paired casewise, but are not
represented in this file.  Maybe the right (and more up-to-date)
solution is to build-depend on unicode-data, strip both this file and
UnicodeData.txt in debian/clean, and patch to generate this file from
/usr/share/unicode/UnicodeData.txt instead.

I'm not sure what to do about the po/??.po files.  they appear to all be
modified/annotated (adding source code file and line number annotations)
by upstream during "make dist" (when the tarball is created), and then
our build process re-annotates them.  Seems like it would be nicer to
work with the unannotated files, because then we could apply patches
that are simpler to port from version to version.

I also don't fully understand the l10n mechanism used here: if
po/e...@boldquot.po, po/e...@quot.po, and po/gnupg2.pot are generated during
"make dist", it seems like we ought to be able to generate them
ourselves directly, but i haven't tested.

Happy to hear any suggestions about the right way forward to bring GnuPG
in debian more in line with upstream's revision control, to reduce the
amount of slippage that can be introduced in a tarball.

If we could somehow prune to a state where we are building from (a
subset of) the intersection of the upstream git tag and the released
tarball, that would give us something concrete to automatically check on
each version upgrade.

       --dkg

-- System Information:
Debian Release: trixie/sid
  APT prefers testing-debug
  APT policy: (500, 'testing-debug'), (500, 'testing'), (500, 'stable'), (500, 
'oldstable'), (200, 'unstable-debug'), (200, 'unstable'), (1, 
'experimental-debug'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 6.7.12-amd64 (SMP w/4 CPU threads; PREEMPT)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

-- no debconf information

Attachment: signature.asc
Description: PGP signature

Reply via email to