On Fri, Jan 21, 2022 at 09:48:06PM +0000, Colin Watson wrote: > So the current behaviour isn't a bug as such, but there's definitely > room for optimization here: when operating in-process, and in the common > case where the target encoding is UTF-8, the UTF-8 to UTF-8 trial > decoding path could be changed to just do a read-only "is this UTF-8" > test rather than effectively copying everything to a new buffer via > iconv. I don't know how much faster that would be, though it seems > likely to be an improvement.
Technically, UTF-8 validation can be done at a few gigabytes per second per core: https://lemire.me/blog/2018/05/16/validating-utf-8-strings-using-as-little-as-0-7-cycles-per-byte/ but that is probably overkill. :-) > I'll see if I can make time for this, though I think a reasonable > priority for me is to finish working on your existing MR comments first > and get this ready to land. Sure, I agree this is a good prioritization. I saw a new patch set landed, but I'm not sure if you wanted me to look at it again yet? (Fundamentally, though, almost everything I have is style nits; if the patch went into man-db as-is, I would still be happy about it.) /* Steinar */ -- Homepage: https://www.sesse.net/