Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
On Sun, Oct 02, 2022 at 04:00:58PM +0100, Colin Watson wrote: > Control: tag -1 fixed-upstream > Success! > https://gitlab.com/cjwatson/man-db/-/compare/5d2594d0a0...866c3571d3 awesome! On Sun, Oct 02, 2022 at 05:56:19PM +0100, Colin Watson wrote: > I thought I'd set SOURCE_DATE_EPOCH, but I'd failed to pass it through > sudo. After fixing that, I indeed get cmp-identical tarballs. very nice! much cheers! -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ Plastic bottles: made to last forever, designed to throw away. signature.asc Description: PGP signature
Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
On Sun, Oct 02, 2022 at 05:50:07PM +0200, Johannes Schauer Marin Rodrigues wrote: > Quoting Colin Watson (2022-10-02 17:00:58) > > As well as more localized testing, I built a .deb with this and used > > josch's instructions from the start of this bug to build mmdebstrap > > tarballs via disorderfs, using > > "--hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount > > --include=./man-db_2.10.3~20221002-1_amd64.deb" to inject the new .deb. > > The two resulting tarballs had somewhat differing file lists (timestamps > > etc.), but all the actual files in the tarballs were bitwise-identical. > > Did you maybe forget the "export SOURCE_DATE_EPOCH=XXX" step? Just replace XXX > with the output of `date +%s` but make sure that both mmdebstrap invocations > see the same value for SOURCE_DATE_EPOCH and then there should be zero > differences and a "cmp" should be sufficient to make sure that it works. I thought I'd set SOURCE_DATE_EPOCH, but I'd failed to pass it through sudo. After fixing that, I indeed get cmp-identical tarballs. -- Colin Watson (he/him) [cjwat...@debian.org]
Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Quoting Colin Watson (2022-10-02 17:00:58) > Success! > > https://gitlab.com/cjwatson/man-db/-/compare/5d2594d0a0...866c3571d3 Thank you!! :D > > As well as more localized testing, I built a .deb with this and used > josch's instructions from the start of this bug to build mmdebstrap > tarballs via disorderfs, using > "--hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount > --include=./man-db_2.10.3~20221002-1_amd64.deb" to inject the new .deb. > The two resulting tarballs had somewhat differing file lists (timestamps > etc.), but all the actual files in the tarballs were bitwise-identical. Did you maybe forget the "export SOURCE_DATE_EPOCH=XXX" step? Just replace XXX with the output of `date +%s` but make sure that both mmdebstrap invocations see the same value for SOURCE_DATE_EPOCH and then there should be zero differences and a "cmp" should be sufficient to make sure that it works. Thanks! cheers, josch signature.asc Description: signature
Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Control: tag -1 fixed-upstream Success! https://gitlab.com/cjwatson/man-db/-/compare/5d2594d0a0...866c3571d3 As well as more localized testing, I built a .deb with this and used josch's instructions from the start of this bug to build mmdebstrap tarballs via disorderfs, using "--hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount --include=./man-db_2.10.3~20221002-1_amd64.deb" to inject the new .deb. The two resulting tarballs had somewhat differing file lists (timestamps etc.), but all the actual files in the tarballs were bitwise-identical. Feel free to do any other testing you think might be useful. There's a bootstrapped source tarball attached as an artifact to the "build-distcheck" CI job in GitLab that you can easily use to build a snapshot .deb if you need one. -- Colin Watson (he/him) [cjwat...@debian.org]
Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Hi Colin, On Sun, Sep 25, 2022 at 11:18:19PM +0100, Colin Watson wrote: > This weekend's work has been: > https://gitlab.com/cjwatson/man-db/-/compare/bb0f7086ba...5d2594d0a0 wow, impressive! (and thank you for taking care of man-db for so many years now! :) [...] > I'll need a bit more concentrated hacking time here, but I'll continue > to work on these; this has been a great opportunity to clean up some > truly unpleasant bits of code. Once I have the accessdb diff down to > zero, we'll see whether there's any further instability in the on-disk > GDBM representation, and also whether there are any other issues that > don't show up in the set of pages I have installed. sounds great! also thank you for keeping us updated here, i'm looking forward to hear more good news eventually! :) -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ I'm looking forward to Corona being a beer again and Donald a duck. signature.asc Description: PGP signature
Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
This weekend's work has been: https://gitlab.com/cjwatson/man-db/-/compare/bb0f7086ba...5d2594d0a0 A lot of this was code rearrangement that I needed to do before I could make progress on the real issues, but if you look at the NEWS.md diff you'll see a number of changes that relate to this bug. With all of that, there are 33 lines of diff of accessdb output remaining on my system against the result of josch's patch, which come down to two issues: * unstable choice of whatis target for pages with many entries in NAME, some but not all of which are represented as symlinks in the filesystem to a file name that is not itself in NAME (there are some examples of this in libbsd-dev and libmd-dev) * some difficulty deciding exactly what to do with cross-section links in some cases (inetd.conf(5) → inetd(8)) I'll need a bit more concentrated hacking time here, but I'll continue to work on these; this has been a great opportunity to clean up some truly unpleasant bits of code. Once I have the accessdb diff down to zero, we'll see whether there's any further instability in the on-disk GDBM representation, and also whether there are any other issues that don't show up in the set of pages I have installed. -- Colin Watson (he/him) [cjwat...@debian.org]
Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Hi Colin, On Thu, Sep 22, 2022 at 08:53:07PM +0100, Colin Watson wrote: > Yeah, this has taken me a bit longer than expected, but I have in fact > been making some progress. josch's patch has been very useful in that > it provides an easy way to see differences between unsorted and sorted > traversal, and I've taken my goal as being to drive those differences to > zero. The only bit I've committed so far has been: > > > https://gitlab.com/cjwatson/man-db/-/commit/bb0f7086ba4ce4503761737bf612088c03b6c495 cool, thanks for the update and all your man-db work! > I'll update this bug as I make further progress. great, thanks again! -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ Imagine god created trillions of galaxies but freaks out because some dude kisses another. signature.asc Description: PGP signature
Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Control: tag -1 - patch On Thu, Sep 22, 2022 at 03:48:30PM +, Holger Levsen wrote: > Colin, what's the status of this bug? You said you were working on improving > josch' patch in May 2022...?! :) Yeah, this has taken me a bit longer than expected, but I have in fact been making some progress. josch's patch has been very useful in that it provides an easy way to see differences between unsorted and sorted traversal, and I've taken my goal as being to drive those differences to zero. The only bit I've committed so far has been: https://gitlab.com/cjwatson/man-db/-/commit/bb0f7086ba4ce4503761737bf612088c03b6c495 I also have a few hundred lines of somewhat untidy patch that I'll commit in a few pieces as soon as I'm certain of it; this is all essentially about stabilizing the decisions about which database entries win compared to which other entries, so that the end result doesn't change depending on the scan order. With that, I'm down to on the order of 150 lines of diff of accessdb output against the result of josch's patch, and I think there are only about one or two problems left. A lot of the remaining difficulties are due to somewhat impenetrable old code which appeared to be trying to micro-optimize memory usage in a way that I don't think makes sense nowadays, so I may take a bit of a digression into reorganizing some of this. I'll update this bug as I make further progress. > Also, the bug is currently tagged 'patch', I guess it's appropriate to remove > that tag? Done. -- Colin Watson (he/him) [cjwat...@debian.org]
Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
hi! Colin, what's the status of this bug? You said you were working on improving josch' patch in May 2022...?! :) Also, the bug is currently tagged 'patch', I guess it's appropriate to remove that tag? josch: btw you said you you submitted other patches missing freeing of memory, have you updated those other patches? -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ We live in a world where teenagers get more and more desperate trying to convince adults to behave like grown ups. signature.asc Description: PGP signature