On Sun, Jan 30, 2022 at 05:35:37PM +0100, Johannes Schauer Marin Rodrigues 
wrote:
> Quoting Colin Watson (2022-01-30 15:03:30)
> > I'm a bit confused, because this seems to work at the wrong layer.
> > Debian packages are supposed to preserve timestamps from the source
> > package wherever possible, and failing that it would be possible to
> > ensure that the timestamp of generated manual pages in binary packages
> > is set to SOURCE_DATE_EPOCH.  Flattening timestamps to an epoch at mandb
> > time seems like the wrong place for this at first inspection, and I'd
> > like some clearer rationale for why you ended up with this approach.
> > 
> > I would suggest instead ensuring that mtimes of manual pages are
> > reproducible, after which mandb should produce reproducible databases
> > (and if it doesn't I'd consider that a bug).
> > 
> > Deliberately setting database timestamps that don't match the filesystem
> > will confuse mandb into doing unnecessary work in later runs, so I don't 
> > like
> > this approach.
> 
> My reasoning was, that tools that care about reproducible index.db will
> "flatten" the mtimes to SOURCE_DATE_EPOCH in the tarball or image they 
> produce,
> so setting the timestamp in index.db to SOURCE_DATE_EPOCH for those timestamps
> larger than SOURCE_DATE_EPOCH seemed like the approach that would result in a
> consistent overall state.

It might not be entirely impossible, but I'd really prefer not to break
the link between filesystem timestamps and database timestamps if we can
avoid it.  I know your implementation is more or less like "tar
--mtime=DATE --clamp-time", but the difference is that tar doesn't also
compare filesystem timestamps against the archive later.

> But if that's the wrong approach, lets think of the alternative: making sure
> that the mtimes of manual pages is reproducible. If I use gdbm_dump on the
> index.db of two different chroots, then it looks like the following manual 
> pages
> have differing timestamps:
> 
> bash-builtins, which, dash, mawk, pager, awk, sh, more, nawk, builtins
> 
> Most of those seem to be symlinks into /etc/alternatives and those symlinks 
> get
> created by maintainer scripts using update-alternatives. Are you suggesting
> that update-alternatives should gain support for setting the mtime of the 
> files
> it creates to SOURCE_DATE_EPOCH?

I think that would at least be worth considering.  It doesn't seem any
less obvious a thing to do for reproducible installs than hacking mandb
would be, and it would deal with the problem closer to its source: for
instance, it would get you closer to being able to produce
bitwise-identical reproducible images by e.g. tarring up the filesystem,
which would preserve filesystem mtimes in the image.  (Though I guess
--clamp-mtime deals with that, but maybe not all image archiving tools
have something like that?)

Another approach might be to modify filesystem timestamps after
postinsts have finished running but before mandb runs to clamp
timestamps to SOURCE_DATE_EPOCH; a bit like your proposed patch, but
actually modifying the filesystem timestamps as well.  I'm not sure
where that could go, though.  It can't be in mandb because the postinst
deliberately doesn't run mandb as root; and of course mandb is itself
run from a postinst.  Maybe some kind of dpkg hook, or maybe it would be
simplest to just run a post-processing step that clamps all the
filesystem timestamps and then runs the equivalent of "sudo -u man mandb
-cq"?  (This might be more palatable with man-db 2.10.0, where this will
take more like 10 seconds rather than several minutes; see #1003089.)

> I'm puzzled by bash-builtins though because that one is not a symlink. So I
> don't understand why the timestamp differs there.

This puzzled me for a while too, but it's because
/usr/share/man/man7/builtins.7.gz is a symlink created by
update-alternatives and references bash-builtins in its NAME, which
provoked https://bugs.debian.org/691643.  I've now fixed that upstream:

  
https://gitlab.com/cjwatson/man-db/-/commit/37ab864354c1d0ac09e27d2346a1221bf4628509

This may cause your comparisons to show more differences, but it should
mean that they're more reliably the *same* differences.  Previously, the
behaviour depended on directory iteration order (actually usually the
location of the first physical extent of each file on disk, since mandb
sorts by that for improved performance on rotational disk drives).

-- 
Colin Watson (he/him)                              [cjwat...@debian.org]

Reply via email to