On Mon, Jan 17, 2022 at 04:10:02AM +0000, Colin Watson wrote: > Significant progress! See the end of this email.
Thanks for dealing with this. I'm deleting most of your text, but be sure that I've read it :-) And I understand most of it (although I of course disagree at some points, I think those are beside the discussion). > We definitely do need to sort out encoding conversion, though. Although > UTF-8 has been recommended for many years, policy still allows "the > usual legacy encoding" and we've never got round to mandating UTF-8: > > $ w3m -dump https://lintian.debian.org/tags/national-encoding | grep > --count usr/share/man > 502 I tried looking at this, but TBH I don't think the man-db path _ever_ inserts a conversion. The parameter to the lexer path is simply always NULL in all calls, except for from a test. Am I missing something? > I've been experimenting with a few things, and I think the most elegant > way to do this is in fact to add another layer of abstraction! This is > a much cheaper one, though. If instead of returning a pipeline we have > decompress_open return a new tagged union type, we can clone the simple > pipeline_{read,peek}* functions on top of that, and convert everything > over to use those rather than talking to libpipeline directly; the > functions that specifically need pipeline support (mainly in man(1), but > also things like the cat page case in lexgrog.l) can ask to drop down to > a lower level. This sounds fine to me. It's not something I would pick if I maintained the code myself, but I'm obviously not doing that, and I don't see anything in it that would preclude optimization. :-) > Would you care to have a look at this? > > https://gitlab.com/cjwatson/man-db/-/merge_requests/2 Thanks! I'll have a go at a review, but it might need to wait until the end of the week. I'll try to get it done earlier, though. How would you like any review comments? Email or somehow in Gitlab? > There's probably still room for improvement, but unlikely to be much > more than a factor of two or so at this point, and I think this should > get us comfortably back to the point where it's no longer annoying > people during upgrades. Someone else suggested an idea I thought of throwing around: In addition to optimizing the code, perhaps the postinst trigger should simply launch the man-db timer in the background? At least for systemd users, this should be pretty straightforward, and give the control back to the installation process. (Given that dpkg is very much single-threaded, we're not generally throughput-bound, so using up a core shouldn't be a big problem.) /* Steinar */ -- Homepage: https://www.sesse.net/