Bug#1004355: man-db modifies cache file contents but resets mtime timestamp

2022-01-29 Thread Colin Watson
Control: tag -1 fixed-upstream

On Tue, Jan 25, 2022 at 05:27:35PM +, Ian Jackson wrote:
> My backup system, like many, relies on file mtimes to know when to
> back files up.  *Un*like most other systems, it does a cross-check: it
> checks that the file *contents* (via checksum) are the same on the
> backed-up host and as is recorded in the backup.
> 
> This seems to me to be a correct and cautious approach.  On at least
> one occasion it has saved me from a serious problem by giving me early
> warning of a storage failure, by flagging up corruption in
> luckily-unimportant files.
> 
> But it means that if a file is modified, but the mtime is reset, the
> backups fail.
> 
> Empirically, this seems to happen with /var/cache/man/*/index.db.
> 
> Please could man-db not do this.  Specifically, if it modifies the
> file, I would like it to either not reset the time timestamp, or at
> least not set the timestamp to the same value it had before.

I've fixed this upstream; the fix will be in man-db 2.10.0, which I
expect to release in a week or so.

For purposes of cherry-picking to stable releases, I'm afraid the
changes involved were quite extensive, since they involved rearranging
all of man-db's database-handling code in order to make it
straightforward for mandb to open each database at most once in any
given run.  It might have been possible to fix the immediate bug with a
smaller patch, but I felt that attempting to do so would make the
relevant logic even more complex and thus more bug-prone, whereas this
approach ultimately leaves the logic simpler.  In particular, there's no
longer any manual adjustment of database mtimes, except when making
temporary copies of databases (roughly equivalent to "cp -a").  I doubt
it will be possible to construct a version of this short enough that it
could realistically be reviewed by the stable release managers, and I'd
much rather let it settle in bookworm anyway.

If need be, I expect that I can help with putting together cherry-picks
for local use.  However, I think I'd recommend simply building a
backport of man-db 2.10.0 instead (and I might consider pushing one to
bullseye-backports and perhaps even buster-backports-sloppy, once this
reaches testing).  There's much less chance of me missing something in
the cherry-picking process that way, and it would mean that you'd get
the significant mandb performance improvements too.

Here's the series I committed to fix this, for the record:

  
https://gitlab.com/cjwatson/man-db/-/commit/53dad5d746a376385ff50e4f2d3a948d1567f8e1
  
https://gitlab.com/cjwatson/man-db/-/commit/b1e9df213e8f1a009e6117673861806b03cec950
  
https://gitlab.com/cjwatson/man-db/-/commit/5cb38c5d360ba2fdbd2178843057c85c4efab576
  
https://gitlab.com/cjwatson/man-db/-/commit/73ab48605e6757240917f77e658f009549ee2f57
  
https://gitlab.com/cjwatson/man-db/-/commit/bf6d84c6d828db3d94d8218b2266b771ab6e5fa3
  
https://gitlab.com/cjwatson/man-db/-/commit/106287fe531ee04ae3f6d0a793084b01659afa16

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Bug#1004355: man-db modifies cache file contents but resets mtime timestamp

2022-01-25 Thread Ian Jackson
Package: man-db
Version: 2.8.5-2

(This report was written before you drew my attention to

  https://bugs.launchpad.net/ubuntu/+source/man-db/+bug/1411633

which is a report of the same issue.  I'm filing it anyway so we have
record of it in the Debian BTS and so that I have a record myself
of where and what this is bug.  Thanks for tolerating this.)

My backup system, like many, relies on file mtimes to know when to
back files up.  *Un*like most other systems, it does a cross-check: it
checks that the file *contents* (via checksum) are the same on the
backed-up host and as is recorded in the backup.

This seems to me to be a correct and cautious approach.  On at least
one occasion it has saved me from a serious problem by giving me early
warning of a storage failure, by flagging up corruption in
luckily-unimportant files.

But it means that if a file is modified, but the mtime is reset, the
backups fail.

Empirically, this seems to happen with /var/cache/man/*/index.db.

Please could man-db not do this.  Specifically, if it modifies the
file, I would like it to either not reset the time timestamp, or at
least not set the timestamp to the same value it had before.
Alternatively, possibly using a deterministic algorithm would work?
(I think https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=760895
may be relevant.)

Thanks,
Ian.

-- 
Ian JacksonThese opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.