On Fri, Jan 21, 2022 at 09:48:06PM +0000, Colin Watson wrote:
> So the current behaviour isn't a bug as such, but there's definitely
> room for optimization here: when operating in-process, and in the common
> case where the target encoding is UTF-8, the UTF-8 to UTF-8 trial
> decoding path could be changed to just do a read-only "is this UTF-8"
> test rather than effectively copying everything to a new buffer via
> iconv.  I don't know how much faster that would be, though it seems
> likely to be an improvement.

Technically, UTF-8 validation can be done at a few gigabytes per second
per core:

  
https://lemire.me/blog/2018/05/16/validating-utf-8-strings-using-as-little-as-0-7-cycles-per-byte/

but that is probably overkill. :-)

> I'll see if I can make time for this, though I think a reasonable
> priority for me is to finish working on your existing MR comments first
> and get this ready to land.

Sure, I agree this is a good prioritization. I saw a new patch set landed,
but I'm not sure if you wanted me to look at it again yet? (Fundamentally,
though, almost everything I have is style nits; if the patch went into man-db
as-is, I would still be happy about it.)

/* Steinar */
-- 
Homepage: https://www.sesse.net/

Reply via email to