On Fri, Jan 21, 2022 at 11:38:56PM +0100, Steinar H. Gunderson wrote:
> On Fri, Jan 21, 2022 at 09:48:06PM +0000, Colin Watson wrote:
> > So the current behaviour isn't a bug as such, but there's definitely
> > room for optimization here: when operating in-process, and in the common
> > case where the target encoding is UTF-8, the UTF-8 to UTF-8 trial
> > decoding path could be changed to just do a read-only "is this UTF-8"
> > test rather than effectively copying everything to a new buffer via
> > iconv.  I don't know how much faster that would be, though it seems
> > likely to be an improvement.
> 
> Technically, UTF-8 validation can be done at a few gigabytes per second
> per core:
> 
>   
> https://lemire.me/blog/2018/05/16/validating-utf-8-strings-using-as-little-as-0-7-cycles-per-byte/
> 
> but that is probably overkill. :-)

Quite :-)

> > I'll see if I can make time for this, though I think a reasonable
> > priority for me is to finish working on your existing MR comments first
> > and get this ready to land.
> 
> Sure, I agree this is a good prioritization. I saw a new patch set landed,
> but I'm not sure if you wanted me to look at it again yet? (Fundamentally,
> though, almost everything I have is style nits; if the patch went into man-db
> as-is, I would still be happy about it.)

Not yet - that was just trivial rebasing after I found and fixed a few
unrelated things I'd broken on main and wanted to get them into this
tree to simplify my own testing.  I have a larger pile of rearrangements
in progress, but I'll post replies on the MR when they're ready.

-- 
Colin Watson (he/him)                              [cjwat...@debian.org]

Reply via email to